Langchain (未允禁转)

Langchain Overview

Langchain Overview

set this aside and go to the following chapters if you are not familiar with LangChain. Back here after you learn all the important concepts of Langchain.

Runnable interface

Langchain中,我们将一系列Runnable实现串联起来,就构成了Chain。那么,我们先来关注构成Chain的基本元素:Runnable interface

methods of Runnable

The core method of Runnable interface is invoke:

  • invoke/ainvoke: Transforms a single input into an output.

and other key methods are:

  • batch/abatch: Efficiently transforms multiple inputs into outputs.
  • stream/astream: Streams output from a single input as it's produced.
  • astream_log: Streams output and selected intermediate results from an input.

these other key methods are based on invoke/ainvoke

Runnables schematic information

Runnables expose schematic information about their input, output and config via the input_schema property, the output_schema property and config_schema method

每一个Runnable实现都具有特定的input schema/output schema,可以通过访问input_schema output_schema属性获得

需要注意的是,不同的Runnable实现具有不同的input schema/output schema,前一Runnable实现的output schema 必须匹配后一Runnable实现的input schema,才能顺利完成连接

Runnable sequences

One key advantage of the Runnable interface is that any two runnables can be “chained” together into sequences. The output of the previous runnable’s .invoke() call is passed as input to the next runnable. This can be done using the pipe operator (|), or the more explicit .pipe() method

例如,创建一个LLMChain类的实例 chain = prompt | llm | output_parser,里面的prompt``llm``output_parser都是Runnable实现

如果定义一个和prompt输出数据结构不同的Runnable实例,那么这个实例将无法与llm直接连接。为了让它们能够连接在一起,你需要确保这个新定义的Runnable 实例与llm之间的数据交换是可行的。这可能需要重新定义你的实例或在它们之间插入一个适配器来转换输入/输出数据结构,以便让这两个部分能够顺利地相互协作

Core Implementation of Runnable: LLMChain

LLMChain 是 LangChain 里面最核心、应用最基础最广泛的Runnable实现。上文其实已经提到,LLMChain有3个重要组成部分

  • prompt: BasePromptTemplate。是提供语言模型的指令,可以控制LLM的原始输出

    大多数LLM应用不会直接将用户输入传递到LLM中。通常,它们会将用户输入添加到一个更大的文本片段中,称为提示模板 Prompt Template,该模板提供了描述特定任务的附加上下文

    在LangChain中的相关组件主要有PromptTemplate

  • llm: Union[Runnable[LanguageModelInput, str], Runnable[LanguageModelInput, BaseMessage]]。语言模型是这条Chain里的核心推理引擎,是大脑

    大模型在LangChain中分为2种,它们的输入输出不同

    • LLM. string in and string out
    • ChatModel. sequence ChatMessages in, one ChatMessage out

    ChatModel抽象了Chat这一场景下LLM的使用模式,是LLM的再封装

    LangChain为两者提供了一个标准接口,标准接口有两种方法:

    • predict: 接受一个字符串,返回一个字符串
    • predict_messages: 接受一个ChatMessage列表,返回一个ChatMessage
    from langchain.llms import OpenAI
    from langchain.chat_models import ChatOpenAI
    
    llm = OpenAI()
    chat_model = ChatOpenAI()
    
    llm.predict("hi!")
    >>> "Hi"
    
    chat_model.predict("hi!")
    >>> "Hi"
    
  • output_parser: BaseLLMOutputParser。用于将LLM的原始输出转换为目标格式

    OutputParsers将LLM的原始输出转换为可以在下游使用的格式。输出解析器有几种主要类型,包括:

    • 将LLM的文本转换为结构化信息(例如JSON)
    • 将ChatMessage转换为字符串
    • 将除消息之外的其他信息(如OpenAI函数调用)转换为字符串

    你可以自定义OutputParsers,让它和你的Prompt配合使用,或者满足其他需要

以下是一个完整的组合Prompt Template, Output Parser生成LLMChain的具体用例

from langchain_cohere import ChatCohere
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.chains.llm import LLMChain
def basic_chain_example():
    """
    basic_chain_example
    """

    print("-"*20 + "basic_chain_example start" + "-"*20)
    
    # generate a prompt
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are world class technical documentation writer."),
        ("user", "{input}")
    ])

    # get a llm
    llm = ChatCohere(cohere_api_key=load_config().cohere_api_key)

    # get a output_parser
    output_parser = StrOutputParser()

    # make a chain
    # chain = prompt | llm | output_parser
    chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

    # do a query
    resp = chain.invoke({"input": "how can langsmith help with testing?"})
    print(f"basic_chain_example resp is {resp}")
    print("-"*20 + "basic_chain_example end" + "-"*20)

chain = prompt | llm | output_parserchain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)这两种写法在功能上是相同的,但它们在语法上有所不同。

chain = prompt | llm | output_parser使用了管道(|)操作符。这种写法通常用于将多个操作连接在一起,使得前一个操作的输出成为下一个操作的输入。在这个例子中,prompt、llm和output_parser被连接在一起,形成一个chain对象。这种写法简洁且易于阅读,但需要在背后实现一些特殊的方法(如重写__or__方法)来支持管道操作符。例如,这里ChatPromptTemplate->BasePromptTemplate->RunnableSerializable内重写了__or__方法

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)这种写法更明确地表明了我们正在创建LLMChain类的一个实例,并且可以更容易地在调用时传递其他参数

显式调用类构造函数的写法更常见,推荐使用

Runnable framework overview

            Runnable interface
              /     |     \     \
            impl   impl   impl     Chain interface
            /       |       \       / (LLMChain as an impl)
      ------------------------------------
      |  prompt --> llm --> output_parser |      
      ------------------------------------

Agent

什么是Agent

我们经常说,AI Agent. 什么是Agent?

Agent 是一个具体类,其实例具有必须属性llm_chain: LLMChainallowed_toolsoutput_parser,方法from_llm_and_tools可直观佐证。简单来说,Agent是对LLMChain的再封装,融合了Tools(Tool将在下文介绍)

Agent什么作用

  • calls the language model
  • and parse the language model's output to decide the next action.

Agent核心目标是使用LLM进行推理来选择要采取的下一动作

这里面的关键逻辑是:

  • 基于 【self.allowed_tools】 和 【self.llm_chain】 之前产生的输出作为中间结果(如有) render a concrete prompt
  • call predict on llm_chain with the concrete prompt generated above
  • use output_parser to parse the llm_chain's output to get the next action

以上逻辑被封装到Agent的核心方法Agent.plan

class Agent(BaseSingleActionAgent):
    llm_chain: LLMChain
    output_parser: AgentOutputParser
    allowed_tools: Optional[List[str]] = None

    ...

    def plan(
        self,
        intermediate_steps: List[Tuple[AgentAction, str]],
        callbacks: Callbacks = None,
        **kwargs: Any,
    ) -> Union[AgentAction, AgentFinish]:
        """Given input, decided what to do.

        Args:
            intermediate_steps: Steps the LLM has taken to date,
                along with observations
            callbacks: Callbacks to run.
            **kwargs: User inputs.

        Returns:
            Action specifying what tool to use.
        """
        # make full_inputs for llm_chain
        full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
        # call `predict` on `llm_chain` with the concrete prompt
        full_output = self.llm_chain.predict(callbacks=callbacks, **full_inputs)
        # use `output_parser` to parse the `llm_chain`'s output, get -> (AgentAction | AgentFinish)
        return self.output_parser.parse(full_output)

我们看到,本质上,Agent对LLM的调用和一条普通的LLMChain调用在链路上没有差别,都是 生成具体prompt->call LLMChain->parse output from llm

那Agent为什么能够 decide the next action based on the output of LLMChain?很简单

对于开发者而言,LLM模型本身是不可修改的,直接调用。开发者能够控制的,只有prompt和output_parser。因此,Agent的关键设计其实就在于,提供了特定的prompt,融合了tools描述和回答格式,来引导LLM模型进行推理决策并让LLM模型进行格式化的输出;同时,配置了特定的output_parser,去parse LLM模型的格式化输出,得到python runtime的 AgentAction | AgentFinish 实例对象

例如,ZeroShotAgent默认PromptTemplate如下

from langchain_google_community import GoogleSearchAPIWrapper
from langchain_core.tools import Tool
from langchain.agents import ZeroShotAgent

search = GoogleSearchAPIWrapper(
        google_api_key="xxx", google_cse_id="yyy")
google_tool = Tool(name="google search",
                    description="For any questions, you must use this tool to search Google for helpful results", func=search.run)

prompt = ZeroShotAgent.create_prompt(
    tools=[google_tool],
    input_variables=["input", "agent_scratchpad"]
)
print(f"prompt.template is:\n\n{prompt.template}")

output:

prompt.template is:

Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

google search: For any questions, you must use this tool to search Google for helpful results

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question



Begin! 

Question: {input}
Thought:{agent_scratchpad}

可以看到,prompt.template给出了tools的描述,并且要求llm按照Thought/Action/Action Input/Observation的格式进行输出

Tool

在Langchain中,一个Tool定义了一个工具。根据开发者的需要,工具可以做任何事情,如db查询工具、搜索引擎工具、分词工具、情感计算工具等等

如何定义一个Tool

  • 接受一个name参数,作为Tool的ID
  • 接受一个description参数,用于llm判断在特定任务下是否选择该Tool
  • 接受一个func: Callable[..., str]参数,这个func定义了这个Tool的功能
# google_tool 是一个google搜索工具
search = GoogleSearchAPIWrapper(
        google_api_key="xxx", google_cse_id="yyy")
google_tool = Tool(name="google search",
                   description="For any questions, you must use this tool to search Google for helpful results",
                   func=search.run)  # func: Callable[..., str]

Tool是如何被LLM选择的呢?实际上,Agent的Tools会被render成plain txt,然后填充到在Agent.llm的prompt上的,如ZeroShotAgent.create_prompt方法(上文提到过)

class ZeroShotAgent(Agent):
    ...

    @classmethod
    def create_prompt(
        cls,
        tools: Sequence[BaseTool],
        prefix: str = PREFIX,
        suffix: str = SUFFIX,
        format_instructions: str = FORMAT_INSTRUCTIONS,
        input_variables: Optional[List[str]] = None,
    ) -> PromptTemplate:
        """Create prompt in the style of the zero shot agent.

        Args:
            tools: List of tools the agent will have access to, used to format the
                prompt.
            prefix: String to put before the list of tools.
            suffix: String to put after the list of tools.
            input_variables: List of input variables the final prompt will expect.

        Returns:
            A PromptTemplate with the template assembled from the pieces here.
        """
        tool_strings = render_text_description(list(tools))  # rendor tools to txt
        tool_names = ", ".join([tool.name for tool in tools])
        format_instructions = format_instructions.format(tool_names=tool_names)
        template = "\n\n".join([prefix, tool_strings, format_instructions, suffix])  # generate PromptTemplate
        if input_variables:
            return PromptTemplate(template=template, input_variables=input_variables)
        return PromptTemplate.from_template(template)

调用create_prompt我们可以得到融合了tools描述的prompt:

Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

# This line is noted by wxx, not part of prompt. Below is the render plain txt of tools, for the llm to comprehend and choose
google search: For any questions, you must use this tool to search Google for helpful results
langsmith_search: Search for information about LangSmith. For any questions about LangSmith, you must use this tool!

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google search, langsmith_search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question



Begin! 

Question: {input}
Thought:{agent_scratchpad}

AgentExecutor

AgentExecutor是Chain接口的一个具体实现,核心数据结构如下

class AgentExecutor(Chain):
    """Agent that is using tools."""

    agent: Union[BaseSingleActionAgent, BaseMultiActionAgent]
    """The agent to run for creating a plan and determining actions
    to take at each step of the execution loop."""
    tools: Sequence[BaseTool]
    """The valid tools the agent can call."""

有了Agent,为什么还需要AgentExecutor?因为Agent仅仅负责决策(使用LLM决策下一动作),而不负责具体执行,而AgentExecutor则负责听从Agent的决策从而执行具体的tool

因此我们可以看到,AgentExecutor持有agent: Union[BaseSingleActionAgent, BaseMultiActionAgent] tools: Sequence[BaseTool]的引用,AgentExecutor从agent中获得”指令“,执行目标tool

具体地,The agent executor is the runtime for an agent. This is what actually calls the agent, executes the actions that agent chooses, passes the action-related tool's outputs back to the agent, and repeats. In pseudocode, this looks roughly like:

# logic of agent executor
next_action = agent.get_action(...)  # agent 只持有 tools render txt,render txt被融合到prompt供llm判断next_action
while next_action != AgentFinish:
    observation = run(next_action)  # agent executor 持有 tools 本体,可执行目标tool
    next_action = agent.get_action(..., next_action, observation)
return next_action

While this may seem simple, there are several complexities this runtime handles for you, including:

  • Handling cases where the agent selects a non-existent tool
  • Handling cases where the tool errors
  • Handling cases where the agent produces output that cannot be parsed into a tool invocation
  • Logging and observability at all levels (agent decisions, tool calls) to stdout and/or to LangSmith

AgentExecutor实现Agent runtime的核心方法:

class AgentExecutor(Chain):
    ...

    def _call(
        self,
        inputs: Dict[str, str],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
        """Run text through and get agent response."""
        # Construct a mapping of tool name to tool for easy lookup
        name_to_tool_map = {tool.name: tool for tool in self.tools}
        # We construct a mapping from each tool to a color, used for logging.
        color_mapping = get_color_mapping(
            [tool.name for tool in self.tools], excluded_colors=["green", "red"]
        )
        intermediate_steps: List[Tuple[AgentAction, str]] = []
        # Let's start tracking the number of iterations and time elapsed
        iterations = 0
        time_elapsed = 0.0
        start_time = time.time()
        # We now enter the agent loop (until it returns something).
        while self._should_continue(iterations, time_elapsed):
            next_step_output = self._take_next_step(
                name_to_tool_map,
                color_mapping,
                inputs,
                intermediate_steps,
                run_manager=run_manager,
            )
            if isinstance(next_step_output, AgentFinish):
                return self._return(
                    next_step_output, intermediate_steps, run_manager=run_manager
                )

            intermediate_steps.extend(next_step_output)
            if len(next_step_output) == 1:
                next_step_action = next_step_output[0]
                # See if tool should return directly
                tool_return = self._get_tool_return(next_step_action)
                if tool_return is not None:
                    return self._return(
                        tool_return, intermediate_steps, run_manager=run_manager
                    )
            iterations += 1
            time_elapsed = time.time() - start_time
        output = self.agent.return_stopped_response(
            self.early_stopping_method, intermediate_steps, **inputs
        )
        return self._return(output, intermediate_steps, run_manager=run_manager)

LangChain中,AgentExecutor是如何调用Tool的

在Langchain中,Agent选择Next Tool,AgentExecutor实际调用Tool

Agent负责生成一个包含动作和动作输入的输出,具体地,通过Agent.plan()返回一个Union[AgentAction, AgentFinish]类型的指令,然后AgentExecutor根据这个指令来调用相应的Tool:

  • 首先,AgentExecutor将用户的输入传递给Agent。通过调用agent_executor.run("your question?")实现。

  • Agent调用plan()方法通过绑定的LLMChain生成一个包含动作和动作输入的输出Union[AgentAction, AgentFinish]。具体来说,LLMChain将用户的输入(input)和Agent的内部状态(agent_scratchpad,它保存的是Agent(即LLMChain+Tool)的历史输出)传递给LLM模型,如:

    {
        'input': 'How many people live in Canada as of 2023?',
        'agent_scratchpad': 'Thought: Arrrr, time t\' find out how many landlubbers be livin\' up in Canada, arrr!\nAction: google search\nAction Input: canada population 2023\nObservation:\nObservation: Canada ranks 37th by population among countries of the world, comprising about 0.5% of the world\'s total, with more than 40.7 million Canadians. London is a city in southwestern Ontario, Canada, along the Quebec City–Windsor Corridor. The city had a population of 422,324 according to the 2021\xa0... Quebec City officially Québec is the capital city of the Canadian province of Quebec. As of July 2021, the city had a population of 549,459,\xa0... This is a list of countries and dependencies by population. It includes sovereign states, inhabited dependent territories and, in some cases,\xa0... British Columbia is a Canadian province with a population of about 5.6 million people. The province represents about 13.2% of the population of the Canadian\xa0... ... population is shrinking (US Census Bureau, 2018). This trend has been observed in other White-majority countries including Canada (Statistics Canada, 2017)\xa0... In the 2021 Canadian census conducted by Statistics Canada, Vancouver had a population ... Observer, and ... ^ "Top Public Universities in Canada 2023 [uniRank]". Victoria Day is a federal Canadian public holiday observed on the last Monday preceding May 25 to honour Queen Victoria, who is known as the "Mother of\xa0... DST is observed in parts of this time zone. In Canada, the provinces of New Brunswick, Nova Scotia, and Prince Edward Island are in this zone\xa0... As of 2010, the Association of Southeast Asian Nations (ASEAN) has 10 member states, one candidate member state, and one observer state.\nThought:',
        'stop': ['\nObservation:', '\n\tObservation:']
    }
    

    然后模型再根据这些信息生成一个输出。如:

    Thought: Arrrr, time t' find out how many landlubbers be livin' up in Canada, arrr!
    Action: google search
    Action Input: canada population 2023
    Observation:
    

    如果模型认为可以得到最终答案,那么这个输出可能类似于:

    Thought: Arrrr, me hearties! I be searchin' fer the number o' scurvy dogs livin' in them Canadian lands, and I be findin' out that thar be more than 40.7 million people walkin' about on them cold, northern soils! A fine number o' potential crew members, arrr!
    Final Answer: Thar be more than 40.7 million mateys livin' in Canada, as o' 2023, ye scurvy dog!
    
  • Agent将这个输出parse得到AgentAction or AgentFinish,传递给AgentExecutor

  • 如果得到AgentAction,AgentExecutor根据解析出的动作和动作输入调用相应的Tool。在上面例子中,AgentExecutor会调用google search工具,并将动作输入作为参数传递给它。

  • Tool执行相应的操作并返回结果。在这个例子中,google search工具会执行搜索操作,并返回一个包含加拿大2023年人口信息的相关结果。

  • AgentExecutor将Tool的结果传递回Agent。Agent更新其内部状态(agent_scratchpad),然后继续生成下一个动作和动作输入。这个过程会重复进行,直到Agent生成一个最终答案。

最后,AgentExecutor返回Agent生成的最终答案。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,287评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,346评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,277评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,132评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,147评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,106评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,019评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,862评论 0 274
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,301评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,521评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,682评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,405评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,996评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,651评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,803评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,674评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,563评论 2 352

推荐阅读更多精彩内容