在 Python 中使用 LangChain 和 Ollama

假设我们正在学习经典作品，比如 the Odyssey 的 《Homer》。我们可能对涅勒斯和他的家族有疑问。如果你向 llama2 询问这些信息，你可能会得到如下回答：

非常抱歉，我是一个大型语言模型，无法提供现实中不存在的人或家庭的信息。Neleus 并非真实存在的人物或角色，因此他没有家人或其他个人信息。对于由此造成的任何混淆，我深感歉意。还有其他我可以帮助你的吗？

这听起来像是一个典型的审查回复，但即使是 llama2- 未审查版也给出了一个平庸的答案：

Neleus was a legendary king of Pylos and the father of Nestor, one of the Argonauts. His mother was Clymene, a sea nymph, while his father was Neptune, the god of the sea.

那么，让我们来研究如何使用 LangChain 与 Ollama 配合，用 Python 向荷马的《Homer》这部实际文档提出问题。

让我们从一个简单的问题开始，我们可以使用 Ollama 从 Llama2 模型中获取答案。首先，我们需要安装 LangChain 包：

安装 langchain_community：pip install langchain_community

然后我们可以创建一个模型并提出问题：

python

from langchain_community.llms import Ollama
ollama = Ollama(
    base_url='http://localhost:11434',
    model="llama3"
)
print(ollama.invoke("why is the sky blue"))

请注意，我们正在定义 Ollama 的模型和基础 URL。

现在让我们加载一个文档来提出问题。我将加载荷马的《奥德赛》，你可以在 Project Gutenberg 找到它。我们需要 WebBaseLoader，它是 LangChain 的一部分，可以从任何网页加载文本。在我的机器上，为了使其正常工作，我还需要安装 bs4 ，所以请运行 pip install bs4。

python

from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.gutenberg.org/files/1727/1727-h/1727-h.htm")
data = loader.load()

这个文件挺大的。仅前言部分就有 3000 个令牌。这意味着完整的文档无法适应模型的上下文。所以我们需要将其分割成更小的部分。

python

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

它被分割了，但我们必须找到相关的切片，然后将这些切片提交给模型。我们可以通过创建嵌入并将其存储在向量数据库中来实现这一点。我们可以直接使用 Ollama 实例化一个嵌入模型。在这个例子中，我们将使用 ChromaDB 作为向量数据库。pip install chromadb

python

from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed)

Now let's ask a question from the document. Who was Neleus, and who is in his family? Neleus is a character in the Odyssey, and the answer can be found in our text.

python

question="Who is Neleus and who is in Neleus' family?"
docs = vectorstore.similarity_search(question)
len(docs)

这将输出与搜索相似的数据块的匹配数。

下一步是将问题和文档的相关部分发送到模型，看看我们能否得到一个好的答案。但是我们将过程的两个部分拼接在一起，这称为链。这意味着我们需要定义一个链：

python

from langchain.chains import RetrievalQA
qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())
qachain.invoke({"query": question})

这个链条收到的答案是：

Neleus is a character in Homer's "Odyssey" and is mentioned in the context of Penelope's suitors. Neleus is the father of Chloris, who is married to Neleus and bears him several children, including Nestor, Chromius, Periclymenus, and Pero. Amphinomus, the son of Nisus, is also mentioned as a suitor of Penelope and is known for his good natural disposition and agreeable conversation.

这不是一个完美的答案，因为它暗示了尼勒斯娶了他的女儿，实际上克洛里斯是伊阿西斯儿子安菲翁和迈尼斯人奥尔科墨努斯国王最年幼的女儿，她是皮洛斯的王后。

我将文本拆分器的 chunk_overlap 更新为 20 并再次尝试，得到了一个好得多的答案：

Neleus is a character in Homer's epic poem "The Odyssey." He is the husband of Chloris, who is the youngest daughter of Amphion son of Iasus and king of Minyan Orchomenus. Neleus has several children with Chloris, including Nestor, Chromius, Periclymenus, and Pero.

那是一个更好的答案。

在 Python 中使用 LangChain 和 Ollama ​

在 Python 中使用 LangChain 和 Ollama