RAG 系统构建实践:给大模型接入私有知识库

大模型不了解你的内部系统,RAG 能让它”读懂”你的 Runbook 和 Wiki,回答具体的内部问题。

RAG 原理

用户提问

向量化查询 → 向量数据库检索 → 召回相关文档片段

                          拼接到 Prompt → LLM 生成回答

核心思路:不是让模型”记住”知识,而是每次回答时动态检索相关内容作为上下文。

技术栈选择

  • 向量数据库:Qdrant(推荐,性能好,易部署)
  • Embedding 模型bge-m3(中文效果最好)
  • LLM:Qwen2.5 / DeepSeek(本地部署)
  • 框架:LangChain 或直接调用 API

部署向量数据库

docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -v qdrant_storage:/qdrant/storage \
  qdrant/qdrant

文档处理与向量化

from pathlib import Path
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer
import hashlib

# 初始化
embedder = SentenceTransformer("BAAI/bge-m3")
client = QdrantClient(host="localhost", port=6333)

# 创建集合
client.recreate_collection(
    collection_name="ops-knowledge",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)

def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
    """按句子边界切分文本"""
    sentences = text.split("。")
    chunks, current = [], ""
    for sent in sentences:
        if len(current) + len(sent) > chunk_size:
            if current:
                chunks.append(current)
            current = current[-overlap:] + sent + "。"
        else:
            current += sent + "。"
    if current:
        chunks.append(current)
    return chunks

def index_document(file_path: str, source: str):
    """索引单个文档"""
    text = Path(file_path).read_text(encoding="utf-8")
    chunks = chunk_text(text)

    points = []
    for i, chunk in enumerate(chunks):
        doc_id = hashlib.md5(f"{source}_{i}".encode()).hexdigest()[:16]
        vector = embedder.encode(chunk, normalize_embeddings=True).tolist()
        points.append(PointStruct(
            id=abs(hash(doc_id)) % (2**63),
            vector=vector,
            payload={"text": chunk, "source": source, "chunk_id": i}
        ))

    client.upsert(collection_name="ops-knowledge", points=points)
    print(f"已索引 {len(chunks)} 个片段:{source}")

# 批量索引文档
docs_dir = Path("/opt/runbooks")
for md_file in docs_dir.rglob("*.md"):
    index_document(str(md_file), md_file.name)

检索与问答

from openai import OpenAI

llm = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

def search(query: str, top_k: int = 5) -> list[dict]:
    """语义检索"""
    query_vector = embedder.encode(query, normalize_embeddings=True).tolist()
    results = client.search(
        collection_name="ops-knowledge",
        query_vector=query_vector,
        limit=top_k,
        score_threshold=0.6,   # 过滤低相关度结果
    )
    return [
        {"text": r.payload["text"], "source": r.payload["source"], "score": r.score}
        for r in results
    ]

def ask(question: str) -> str:
    """RAG 问答"""
    docs = search(question)
    if not docs:
        return "未找到相关文档,请检查知识库是否已索引。"

    context = "\n\n---\n\n".join(
        f"来源:{d['source']}\n{d['text']}" for d in docs
    )

    prompt = f"""根据以下文档内容回答问题。如果文档中没有相关信息,请明确说明。

文档内容:
{context}

问题:{question}"""

    response = llm.chat.completions.create(
        model="qwen2.5:14b",
        messages=[
            {"role": "system", "content": "你是一个运维知识助手,基于提供的文档准确回答问题。"},
            {"role": "user", "content": prompt},
        ],
        temperature=0.1,
    )
    return response.choices[0].message.content

# 使用示例
answer = ask("MySQL 主从同步延迟怎么排查?")
print(answer)

构建 Web 界面

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Question(BaseModel):
    query: str

@app.post("/ask")
async def ask_endpoint(q: Question):
    docs = search(q.query)
    answer = ask(q.query)
    return {
        "answer": answer,
        "sources": [{"source": d["source"], "score": d["score"]} for d in docs]
    }

提升效果的技巧

  1. 文档预处理:去除无意义内容,保留结构化信息
  2. 混合检索:向量检索 + 关键词检索(BM25)结合
  3. 重排序:用 reranker 模型对召回结果重新排序
  4. 定期更新:文档变更时自动触发重新索引

RAG 是目前最实用的 LLM 落地方案,特别适合运维知识库、故障手册、操作规程等场景。

← 返回文章列表