Artificial intelligence

Helping AI agents search for the best results in large language models | MIT News

Whether you’re a scientist putting together research ideas or a CEO hoping to automate human resources or finance, you’ll find that artificial intelligence tools are becoming the assistants you never knew you needed. In particular, many experts are tapping into the talents of autonomous software programs called AI agents, which can call on AI at specific points to solve problems and complete tasks.

AI agents work best when using large-scale linguistic models (LLMs) because those systems are powerful, efficient, and flexible. One way to program such technology is to define in code what you want your system to do (“workflow”), including when it should use LLM. If you were a software company trying to refactor your old codebase to use a more modern programming language for better optimization and security, you could build a program that uses LLM to translate the codebase one file at a time, checking each file as you go.

But what happens when LLMs make mistakes? You will want the agent to step back to make another attempt, incorporating the lessons learned from past mistakes. Coding this can take as much effort as using the first agent; if your translation system codebase consists of thousands of lines of code, you would be making thousands of lines of code changes or additions to support the logic of rolling back when LLMs make mistakes.

To save time and effort for programmers, researchers with MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Asari AI have developed a framework called “EnCompass.”

With EnCompass, you no longer have to make these changes yourself. Instead, when EnCompass runs your program, it automatically backs off when LLMs make mistakes. EnCompass can also clone the runtime system to make multiple attempts in parallel to find the best solution. In general, EnCompass evaluates the different possible paths your agent can take as a result of the different results of all LLM calls, looking for the path where the LLM finds the best solution.

Then, all you have to do is define the places where you might want to roll back or integrate the runtime of the program, and record any information that might be useful in the strategy used to search for the different possible paths of your agent (search strategy). You can specify the search strategy separately — you can use the one provided by EnCompass out of the box or, if you prefer, use your own custom search strategy.

“With EnCompass, we separated the search strategy from the basic workflow of an AI agent,” said lead author Zhening Li ’25, MEng ’25, an MIT electrical engineering and computer science (EECS) PhD student, CSAIL researcher, and research coordinator at Asari AI. “Our framework allows programmers to easily try different search strategies to find the one that makes the AI ​​agent perform best.”

EnCompass was used for agents implemented as Python programs called LLMs, where it demonstrated significant code savings. EnCompass has reduced the coding effort for search implementations by up to 80 percent for all agents, such as an agent for interpreting code sets and finding transformation rules for digital grids. In the future, EnCompass could enable agents to perform large-scale tasks, including managing large code libraries, designing and conducting scientific experiments, and creating blueprints for rockets and other hardware.

Removing a branch

When you schedule your agent, you mark certain activities – such as calls to LLM – where results can vary. These annotations are called “branchpoints.” If you think your agent is creating a single story plot line, adding branch points turns the story into a self-directed story game, where branches are places where the plot enters future plot lines.

You can then specify the strategy that EnCompass uses to navigate that story game, to get the best ending to the story. This can include launching a parallel execution thread or rolling back to a previous branch point when stuck at a dead end.

Users can connect and play several common search strategies that EnCompass provides out of the box, or define their own custom strategy. For example, you can choose a Monte Carlo tree search, which builds a search tree by balancing exploration and exploitation, or a beam search, which keeps the few best results from every step. EnCompass makes it easy to try different methods to find the best strategy to maximize your chances of successfully completing your task.

EnCompass code efficiency

So how well does EnCompass code for adding search to agent programs? According to the researchers’ findings, the framework has significantly reduced how much programmers need to add to their agent programs to add search, helping them test different strategies to find the one that works best.

For example, the researchers used EnCompass in an agent that translates a code repository from the Java programming language, which is commonly used to program business applications and software, to Python. They found that running searches with EnCompass — which mainly involves adding branch point annotations and annotations that record how well each step performed — required 348 fewer lines of code (about 82 percent) than using it manually. They also showed how EnCompass enabled them to easily try different search strategies, identifying the best strategy to be a two-level beam search algorithm, achieving an increase in accuracy of 15 to 40 percent across five different databases in a search budget of 16 times the LLM calls made by the agent without the search.

“As LLMs become an integral part of everyday software, it is very important to understand how to design effective software that uses their strengths and works around their limitations,” said co-author Armando Solar-Lezama, MIT professor of EECS and principal investigator of CSAIL. “EnCompass is an important step in that direction.”

The researchers add that EnCompass guides agents when the system specifies high-level workflow steps; the current iteration of their framework is less applicable to fully LLM-controlled agents. “For those agents, instead of having a plan that defines the steps and then using the LLM to carry out those steps, the LLM itself decides everything,” said Li. “There is no basic workflow, so you can perform inference-time searches on whatever LLM generates over time. In this case, there is little need for a tool like EnCompass that adjusts the way the system works with search and regression.”

Li and his colleagues plan to extend EnCompass to general search frameworks for AI agents. They also plan to test their system on complex tasks to prepare it for real-world use, including in companies. In addition, they explore how EnCompass helps agents work with humans on tasks such as designing hardware or translating very large code libraries. Currently, EnCompass is a powerful architecture that enables people to interact with AI agents easily, improving their performance.

“EnCompass is timely, as AI-driven agents and search-based techniques begin to reshape workflows in software engineering,” said Carnegie Mellon University Professor Yiming Yang, who was not involved in the research. “By cleanly separating the agent’s programming logic from its think-time search strategy, the framework provides an objective way to explore how structured search can improve code generation, interpretation, and analysis. This release provides a solid foundation for more structured and reliable search-driven approaches to software development.”

Li and Solar-Lezama co-authored the paper with two Asari AI researchers: Caltech Professor Yisong Yue, a consultant to the company; and senior author Stephan Zheng, co-founder and CEO. Their work is supported by Asari AI.

The team’s work was presented at the Conference on Neural Information Processing Systems (NeurIPS) in December.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button