Agent Orchestration

Collective intelligence

Now that we have some usable machine intelligence - in the form of large language models - one of the next steps is to see whether we can get multiple agents to work together cooperatively to solve problems.

We can draw inspiration for this task from human societies. Companies, and other organizations frequently employ many humans to solve problems. There's also an academic name for the area involving combining the efforts of agents: collective intelligence.

Agents can be combined in series, in parallel, or some combination of the two. Combining agents in series is sometimes called "chaining". We can refer to the whole field as "agent orchestration".

Agent orchestration

The advantage of combining multiple agents is the promise of better quality results. The main disadvantage is the cost of producing a result and/or the time taken to produce results.

Diagrams

Serial operation Parallel operation

Mechanisms

  • Problem subdivision - Chaining responses allows prompts to be broken down and processed in pieces. Prompt manipulation can be used to start by asking for a task breakdown and then work on processing each task.

  • Iterative processing - Existing LLMs use a "one shot" approach at generating solutions. They can't go back and correct their work. Using agents with runtime-configurable workflows helps to avoid this limitation.

  • Specialization and division of labor - Humans (and social insects) gain from specialization. Generating specialized expert machines seems reasonably easy - subdivide the training data. This way it is possible to produce experts in different programming languages - for example. The usual reason for specialization is that it reduces training time and can be followed by division of labor - where different problems are routed to different agents.

  • Adversarial analysis - Many generation tools use an adversary to keep their output quality up. The adversary typically attempts to distinguish generated outputs from real results. This a bit like a developer iterating with a QA agent in a software engineering project.

  • Wisdom of crowds - It's often easier to recognize a good solution than it is to generate one. If multiple solutions are generated, it may be possible to use a "wisdom of crowds" approach to help decide between them. This is one of the advantages of the "peer review" process. It is often helpful to get another set of eyes on the problem.

  • Prediction markets - If multiple solutions are generated, one possibility is to use a prediction market to decide between them.

  • Hierarchical planning - If it is possible to identify tasks that need to be broken down further these can in turn be subdivided - resulting in a hierarchical task breakdown.

Routing

With multiple agents available, message routing is often needed. It may be possible to use market mechanisms to help with the routing - for example by asking agents to bid on the prompts they want to handle. The results could subsequently be scored by other agennts - to see whether the routing is working properly.

Aggregation

If multiple results are generated, these will need to be aggregated. There are various options. You could feed all the inputs to an agent and ask it to give a summary. You could get multiple agents to review the outputs and vote on which one is best.

Incorporating criticism

I list adversarial analysis as one of the possible tools above. Many workflows can benefit from this technique. That is similar way to the way in which many queries benefit from "chain of thought" prompting - which effectively gives an agent more time to think. Here, the output of the first agent is given to one-or-more critics. An "aggregation" agent is then given the original solution, the proposals of the critics and asked to combine all the data in the best way possible. The critics can operate in parallel - reducing the time cost of having multiple critics. So: you could have have one critic check the garmmar, another check for inconsistencies - and so on.

Second generation

When chaining the output of agents together it is helpful if the agents understand how to generate prompts. This ability was generally not present in the first generation of models. However, subsequent generations have more detailed information about prompt generation and prompt engineering included in their training data. Wkowledge of prompting makes agent chaining easier. Even without much knowledge of prompt generation, it is often possible to specify a machine-readable output format. This can then be turned into subsequent prompts by using manipulation and glue code.

Examples

Models which use a Mixture of experts technique are quite common. GPT-4 uses a "Mixture of experts" model - as does Mixtral.

A software engineer called Devlin has been announced that uses these types of techniques.

Matt Shumer has some related projects. He has an agent-based journalist, an agent-based investor and an agent-based book author. The book author uses multiple API calls to generate a book outline, chapter titles, and the text of each chapter. It knows whether to generate a cliffhanger at the end of the chapter or a happy ending. It even generates cover art and bundles the results up as a epub book.

Benchmarks

Multi-agent workflows can be time consuming. However, many machine intelligence benchmarks are untimed - or have generous time limits. For many of those, agent orchestration techniques could make good sense.

Significance

Agent orchestration looks as though it will be quite important. Andrew Ng wrote a positive post on the topic here. He offered an assessment of significance (as of 2024), writing:

I think AI agentic workflows will drive massive AI progress this year - perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it.

Video

Links