前言
近期做了不少AI应用相关工作,有些感悟但是隐隐约约又不是十分清晰,直到读到《building effect AI Agents》这篇文章。原文很精确,因此我大部分都摘录了原文。
1. 感悟和总结
这篇文章写于2024年10月,如果是当时我读这篇文章,我会觉得太理论化,没有深切体会。现在读有些晚,但是因为有实践支撑,真的能看懂并且有收获。
最核心的关键点,就是区分Agent和workflow,这个分类直击AI应用的本质:
- Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
- Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
总结一下近来遇到以及实践:
- 基本上我做的所有应用都是workflow,例如生成代码和评审,是workflow: Evaluator-Optimizer;代码逆构用到的是workflow:prompt-chain,此前Graphrag源码用到的是workflow: paramlization,包括现在的deepsearch,实际上也是workflow
- 真正的Agent比较少,目前亲身体会到的就是cursor,它会读懂人类需求,进行规划,并且调用工具集来执行完成代码以及测试、评审。
2. workflow
对于这些workflow,原文写的很清楚很精彩,包括使用场景:
2.1 Workflow: Prompt chaining
Prompt chaining decomposes a task into a sequence of steps, where each LLM call processes the output of the previous one. You can add programmatic checks (see "gate” in the diagram below) on any intermediate steps to ensure that the process is still on track.
When to use this workflow: This workflow is ideal for situations where the task can be easily and cleanly decomposed into fixed subtasks. The main goal is to trade off latency for higher accuracy, by making each LLM call an easier task.
2.2 Workflow: Routing
Routing classifies an input and directs it to a specialized followup task. This workflow allows for separation of concerns, and building more specialized prompts. Without this workflow, optimizing for one kind of input can hurt performance on other inputs.
When to use this workflow: Routing works well for complex tasks where there are distinct categories that are better handled separately, and where classification can be handled accurately, either by an LLM or a more traditional classification model/algorithm.
2.3 Workflow: Routing
Routing classifies an input and directs it to a specialized followup task. This workflow allows for separation of concerns, and building more specialized prompts. Without this workflow, optimizing for one kind of input can hurt performance on other inputs.
When to use this workflow: Routing works well for complex tasks where there are distinct categories that are better handled separately, and where classification can be handled accurately, either by an LLM or a more traditional classification model/algorithm.
2.4 Workflow: Parallelization
LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow, parallelization, manifests in two key variations:
- Sectioning: Breaking a task into independent subtasks run in parallel.
- Voting: Running the same task multiple times to get diverse outputs.
When to use this workflow: Parallelization is effective when the divided subtasks can be parallelized for speed, or when multiple perspectives or attempts are needed for higher confidence results. For complex tasks with multiple considerations, LLMs generally perform better when each consideration is handled by a separate LLM call, allowing focused attention on each specific aspect.
2.5 Workflow: Orchestrator-workers
In the orchestrator-workers workflow, a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results.
When to use this workflow: This workflow is well-suited for complex tasks where you can’t predict the subtasks needed (in coding, for example, the number of files that need to be changed and the nature of the change in each file likely depend on the task). Whereas it’s topographically similar, the key difference from parallelization is its flexibility—subtasks aren't pre-defined, but determined by the orchestrator based on the specific input.
2.6 Workflow: Evaluator-optimizer
In the evaluator-optimizer workflow, one LLM call generates a response while another provides evaluation and feedback in a loop.
When to use this workflow: This workflow is particularly effective when we have clear evaluation criteria, and when iterative refinement provides measurable value. The two signs of good fit are, first, that LLM responses can be demonstrably improved when a human articulates their feedback; and second, that the LLM can provide such feedback. This is analogous to the iterative writing process a human writer might go through when producing a polished document.
3. agent
Agent原文写了其实框架很简单,其实看完也只有概念上的认知,技术上到底怎么干只能再去体会了。但是也说明一个问题,写的最简单,实际上最复杂,否则的话为什么那么多coding公司,用户体感不如cursor。
Agents are emerging in production as LLMs mature in key capabilities—understanding complex inputs, engaging in reasoning and planning, using tools reliably, and recovering from errors. Agents begin their work with either a command from, or interactive discussion with, the human user. Once the task is clear, agents plan and operate independently, potentially returning to the human for further information or judgement. During execution, it's crucial for the agents to gain “ground truth” from the environment at each step (such as tool call results or code execution) to assess its progress. Agents can then pause for human feedback at checkpoints or when encountering blockers. The task often terminates upon completion, but it’s also common to include stopping conditions (such as a maximum number of iterations) to maintain control.
Agents can handle sophisticated tasks, but their implementation is often straightforward. They are typically just LLMs using tools based on environmental feedback in a loop. It is therefore crucial to design toolsets and their documentation clearly and thoughtfully. We expand on best practices for tool development in Appendix 2 ("Prompt Engineering your Tools").
When to use agents: Agents can be used for open-ended problems where it’s difficult or impossible to predict the required number of steps, and where you can’t hardcode a fixed path. The LLM will potentially operate for many turns, and you must have some level of trust in its decision-making. Agents' autonomy makes them ideal for scaling tasks in trusted environments.
4. 该用workflow或者Agent?
读完是否我们就要摈弃workflow,全面拥抱Agent。我也表达一下我的观点。
我们大部分应用都是垂直行业的应用,对于垂直行业,其流程实际上是固定的,包括专业知识都是特定的,应该更加广泛的去用workflow,读完这篇,只是更好的帮我们总结出怎样设计workflow。
而对于通用的,例如说curror,它面对的不是某个具体的行业,这里选择Agent是没错的。并且Agent应该依赖于一个生态的工具链,没有这些工具链作为底座,无法飞升成Agent。(这个观点有待考证,毕竟我现在还真没有实践过做真正的Agent)
做个阶段总结,再出发。