当前位置：首页 > news >正文

建行移动门户网站首页中企动力做的网站山西太原

news 2026/4/24 18:20:19

建行移动门户网站首页,中企动力做的网站山西太原,互联网装饰网站,旅游网站设计完整代码在本文中#xff0c;使用LangChain、HuggingFaceEmbeddings和HuggingFace的Mistral-7B LLM创建一个简单的Python程序#xff0c;可以从任何pdf文件中回答问题。一、LangChain简介 LangChain是一个在语言模型之上开发上下文感知应用程序的框架。LangChain使用带prompt和few-… 在本文中使用LangChain、HuggingFaceEmbeddings和HuggingFace的Mistral-7B LLM创建一个简单的Python程序可以从任何pdf文件中回答问题。一、LangChain简介 LangChain是一个在语言模型之上开发上下文感知应用程序的框架。LangChain使用带prompt和few-shot示例的LLM来提供相关响应和推理。LangChain擅长文档问答、聊天机器人、分析结构化数据等。LangChain提供方便处理LLM的抽象组件及其实现还为更高级别的任务提供组件Chain。安装langchain pip install langchain LangChain中的模块Model I/O模型I/O, Retrieval检索, Chains链, Agents代理, Memory记忆, Callbacks回调 1.1 模型I/O模块模型I/O是应用程序的核心元素。使用LangChain可以使用任何大语言模型。这个接口需要三个组件大语言模型、提示和输出解析器。 LangChain提供了许多类和函数来构建提示为各种任务提供现成的提示模板也可以自定义提示模板。 LangChain可以使用LLM也可以使用以聊天消息列表为输入并返回聊天聊天消息。它可以与许多LLM一起工作包括OpenAI LLMs和开源LLM。输出解析器用于构建从LLM接收的响应PydanticOutputParser是LangChain中输出解析器的主要类型。 1.2 检索模块检索模块实现了检索增强生成RAG可以访问大模型训练数据之外的用户私有数据。检索步骤包括以下几步加载数据、转换数据、创建或获取嵌入、存储嵌入和检索嵌入。LangChain拥有大约100个文档加载器可以读取主要的文档格式比如CSV、HTML、pdf、代码等。它可以使用不同的算法转换数据。LangChain集成了超过25个嵌入模型和超过50家向量数据库。 1.3 链条模块复杂的应用程序通常需要组合多个LLM来完成。LangChain提供了Chain功能可以集成多个LLMChain也可以调用其他Chain。 1.4 代理模块代理也是一种Chain负责决定下一步动作。代理由一个语言模型和一个提示组成它需要以下输入可用工具列表、用户输入和历史执行信息如果有的话。代理cals的功能被称为“工具”。代理使用LLM来决定要采取的操作和顺序。操作包括——使用工具观察工具的输出向用户返回响应。 1.5 记忆模块记忆模块使系统能够记住过去的信息这在对话机器人中非常重要。 1.6 回调模块回调机制允许用户使用API的“回调”参数返回LLM应用程序不同阶段的信息比如用于日志记录、监控、流式传输等。二、Mistral-7B Mistral-7B是一个强大的语言模型目前是开源的具有73亿个参数性能优于很多参数量更高的大模型。它可以下载以供离线使用也可以在云中使用或从HuggingFace下载。使用langchain中的HuggingFaceHub可以使用以下代码加载并使用Mistral-7B repo_id mistralai/Mistral-7B-v0.1llm HuggingFaceHub(huggingfacehub_api_tokenyour huggingface access token here, repo_idrepo_id, model_kwargs{temperature:0.2, max_new_tokens:50}) 三、HuggingFace Embedding 在处理文本、图像、音频、视频、文档等数据时通常首先会进行embedding把他们表示成数字类型这样便于神经网络处理embedding不仅仅是一种数字表示它也可以捕捉数据的上下文语义信息。 HuggingFace提供了Sentence Transformers模型可以进行embedding安装如下所示 pip install -U sentence-transformers 然后使用它加载一个预先训练好的模型来对文本句子进行编码。四、chroma向量存储 chroma是一个开源的嵌入数据库矢量存储用于创建、存储、检索和进行嵌入的语义搜索。安装如下 pip install chroma 它允许用户连接到chroma客户端创建一个集合将带有元数据和id的文档添加到集合此步骤创建嵌入然后查询此集合语义检索。五、pypdf库 pypdf库可以读取、拆分、合并、裁剪、转换pdf文件的页面添加自定义数据更改查看选项为pdf文件添加密码从pdf文件中检索文本和元数据。安装如下所示 pip install pypdf 要将pypdf与AES加密或解密一起使用请安装额外的依赖项 pip install pypdf[crypto] 六、实现代码 # Install dependencies!pip install huggingface_hub!pip install chromadb!pip install langchain!pip install pypdf!pip install sentence-transformers # import required librariesfrom langchain.document_loaders import PyPDFLoaderfrom langchain.text_splitter import CharacterTextSplitterfrom langchain.embeddings import HuggingFaceEmbeddingsfrom langchain.llms import HuggingFaceHubfrom langchain.vectorstores import Chromafrom langchain.chains import ConversationalRetrievalChain # Load the pdf file and split it into smaller chunksloader PyPDFLoader(report.pdf)documents loader.load()# Split the documents into smaller chunks text_splitter CharacterTextSplitter(chunk_size1000, chunk_overlap0)texts text_splitter.split_documents(documents) # We will use HuggingFace embeddings embeddings HuggingFaceEmbeddings() #Using Chroma vector database to store and retrieve embeddings of our textdb Chroma.from_documents(texts, embeddings)retriever db.as_retriever(search_kwargs{k: 2}) # We are using Mistral-7B for this question answering repo_id mistralai/Mistral-7B-v0.1llm HuggingFaceHub(huggingfacehub_api_tokenyour huggingface access token here, repo_idrepo_id, model_kwargs{temperature:0.2, max_new_tokens:50}) # Create the Conversational Retrieval Chainqa_chain ConversationalRetrievalChain.from_llm(llm, retriever,return_source_documentsTrue) #We will run an infinite loop to ask questions to LLM and retrieve answers untill the user wants to quitimport syschat_history []while True: query input(Prompt: ) #To exit: use exit, quit, q, or Ctrl-D., if query.lower() in [exit, quit, q]: print(Exiting) sys.exit() result qa_chain({question: query, chat_history: chat_history}) print(Answer: result[answer] \n) chat_history.append((query, result[answer])) 至此基于PDF的聊天机器人就搭建好了你可以从一个长而难的pdf中回答你的所有问题。Just do it 参考文献 [1] https://medium.com/nimritakoul01/chat-with-your-pdf-files-using-mistral-7b-and-langchain-f3be9363301c [2] https://colab.research.google.com/corgiredirector?sitehttps%3A%2F%2Fmedium.com%2F%40woyera%2Fhow-to-chat-with-your-pdf-using-python-llama-2-41df80c4e674 [3] https://www.shakudo.io/blog/build-pdf-bot-open-source-llms

查看全文

http://www.hkea.cn/news/14398059/