RAG 系统构建实践:给大模型接入私有知识库
大模型不了解你的内部系统,RAG 能让它”读懂”你的 Runbook 和 Wiki,回答具体的内部问题。
RAG 原理
用户提问
↓
向量化查询 → 向量数据库检索 → 召回相关文档片段
↓
拼接到 Prompt → LLM 生成回答
核心思路:不是让模型”记住”知识,而是每次回答时动态检索相关内容作为上下文。
技术栈选择
- 向量数据库:Qdrant(推荐,性能好,易部署)
- Embedding 模型:
bge-m3(中文效果最好) - LLM:Qwen2.5 / DeepSeek(本地部署)
- 框架:LangChain 或直接调用 API
部署向量数据库
docker run -d \
--name qdrant \
-p 6333:6333 \
-v qdrant_storage:/qdrant/storage \
qdrant/qdrant
文档处理与向量化
from pathlib import Path
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer
import hashlib
# 初始化
embedder = SentenceTransformer("BAAI/bge-m3")
client = QdrantClient(host="localhost", port=6333)
# 创建集合
client.recreate_collection(
collection_name="ops-knowledge",
vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)
def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
"""按句子边界切分文本"""
sentences = text.split("。")
chunks, current = [], ""
for sent in sentences:
if len(current) + len(sent) > chunk_size:
if current:
chunks.append(current)
current = current[-overlap:] + sent + "。"
else:
current += sent + "。"
if current:
chunks.append(current)
return chunks
def index_document(file_path: str, source: str):
"""索引单个文档"""
text = Path(file_path).read_text(encoding="utf-8")
chunks = chunk_text(text)
points = []
for i, chunk in enumerate(chunks):
doc_id = hashlib.md5(f"{source}_{i}".encode()).hexdigest()[:16]
vector = embedder.encode(chunk, normalize_embeddings=True).tolist()
points.append(PointStruct(
id=abs(hash(doc_id)) % (2**63),
vector=vector,
payload={"text": chunk, "source": source, "chunk_id": i}
))
client.upsert(collection_name="ops-knowledge", points=points)
print(f"已索引 {len(chunks)} 个片段:{source}")
# 批量索引文档
docs_dir = Path("/opt/runbooks")
for md_file in docs_dir.rglob("*.md"):
index_document(str(md_file), md_file.name)
检索与问答
from openai import OpenAI
llm = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
def search(query: str, top_k: int = 5) -> list[dict]:
"""语义检索"""
query_vector = embedder.encode(query, normalize_embeddings=True).tolist()
results = client.search(
collection_name="ops-knowledge",
query_vector=query_vector,
limit=top_k,
score_threshold=0.6, # 过滤低相关度结果
)
return [
{"text": r.payload["text"], "source": r.payload["source"], "score": r.score}
for r in results
]
def ask(question: str) -> str:
"""RAG 问答"""
docs = search(question)
if not docs:
return "未找到相关文档,请检查知识库是否已索引。"
context = "\n\n---\n\n".join(
f"来源:{d['source']}\n{d['text']}" for d in docs
)
prompt = f"""根据以下文档内容回答问题。如果文档中没有相关信息,请明确说明。
文档内容:
{context}
问题:{question}"""
response = llm.chat.completions.create(
model="qwen2.5:14b",
messages=[
{"role": "system", "content": "你是一个运维知识助手,基于提供的文档准确回答问题。"},
{"role": "user", "content": prompt},
],
temperature=0.1,
)
return response.choices[0].message.content
# 使用示例
answer = ask("MySQL 主从同步延迟怎么排查?")
print(answer)
构建 Web 界面
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class Question(BaseModel):
query: str
@app.post("/ask")
async def ask_endpoint(q: Question):
docs = search(q.query)
answer = ask(q.query)
return {
"answer": answer,
"sources": [{"source": d["source"], "score": d["score"]} for d in docs]
}
提升效果的技巧
- 文档预处理:去除无意义内容,保留结构化信息
- 混合检索:向量检索 + 关键词检索(BM25)结合
- 重排序:用 reranker 模型对召回结果重新排序
- 定期更新:文档变更时自动触发重新索引
RAG 是目前最实用的 LLM 落地方案,特别适合运维知识库、故障手册、操作规程等场景。