Microsoft Research Introduces CORPGEN to Manage Multi-Horizon Tasks for Autonomous AI Agents Using Hierarchical Scheduling and Memory

0 1 3 minutes read

Microsoft Research Introduces CORPGEN to Manage Multi-Horizon Tasks for Autonomous AI Agents Using Hierarchical Scheduling and Memory

Microsoft researchers presented CORPGENan architecture-agnostic framework designed to handle the complexities of real-world organizational work through autonomous digital workers. While existing benchmarks test AI agents on isolated, single tasks, real-world corporate environments need to handle dozens of concurrent, disconnected tasks with complex interdependencies. The research team identifies this distinct problem category as Multi-Horizon Task Environments (MHTEs).

Performance Gap in MHTEs

Empirical testing reveals that computer based agents (CUAs) experience significant performance degradation when moved from single task scenarios to MHTEs. Using three independent implementations of CUA, the completion rate dropped from 16.7% at 25% load to 8.7% at 100% load.

The research team identified four basic failure modes that cause this decline:

Content saturation: Contextual requirements are increasing O(N) in the amount of work rather than O(1)quickly exceeds the capacity of the token window.
Memory Disorders: Information from one task often contaminates thinking about another when multiple tasks share a single context window.
Dependency Graph Complexity: Business functions form Directed Acyclic Graphs (DAGs) rather than direct chains, which require complex topological reasoning.
High Priority Setting: The difficulty of the decision increases O(N) with each cycle because the agents must re-evaluate the priorities of all active tasks.

CORPGEN Architecture

To deal with this failure, CORPGEN is implemented Multi-Objective Multi-Horizon Agent (MOMA) skills by using the four main methods of architecture.

(a) Hierarchical organization

Strategic coherence is maintained through the separation of purpose across three temporal scales:

Strategic objectives (Monthly): High-level goals and milestones based on agent ownership and role.
Smart Plans (Daily): Possible functions for certain applications with priority levels.
Performance Actions (Per Cycle): Each tool calls selected based on the current state and returned memory.

(b) Classification of sub-agent

Complex tasks, such as GUI automation or research, are broken down into standard sub-agents. These independent agents run at the scope of their own context and return only structured results to the host agent, preventing memory pollution of various tasks.

(c) Tiered Memory Architecture

The system uses a three-layer memory structure to manage state:

Working Memory: Intended for fast thinking, this layer resets each cycle.
Formed Long-Term Memory (LTM): Stores typed artifacts such as plans, snapshots, and displays.
Semantic Memory: It uses Mem0 supporting similarity-based retrieval of unstructured past context using embedding.

(d) Variable Summary

To limit content growth, CORPGEN uses rule-based compression. If the context length exceeds 4,000 tokens, ‘important content’ (such as tool calls and status changes) is stored verbatim, while ‘general content’ (central logic) is compressed into structured summaries.

Assessment and Learning Outcomes

For all three CUA backends (UFO2, OpenAI CUA, and hierarchical), CORPGEN achieved up to 3.5x improvement over baselines, achieving a completion rate of 15.2% compared to 4.3% for standalone UFO2 at 100% load.

Ablation research shows that learning experience offers great performance benefits. This mechanism breaks down successful executions into canonical criteria which are then displayed on the FAISS database. During execution, the same trajectories are returned as several examples in biased action selection from confirmed patterns.

The research team saw significant differences in testing methods. Artifact-based judgment (checking generated files and results) achieved a 90% agreement rate with human labels. In contrast, a track-based LLM award (relying on screenshots and kill logs) only reached 40% agreement. This suggests that current benchmarks may underestimate the agent’s performance by relying on limited visual tracking rather than the actual artifacts produced.

Key Takeaways

Identification of Multi-Horizon Employment Areas (MHTEs): The research team describes a new class of problems called MHTEs, where agents must manage a large number of discrete, long-horizon tasks (45+ tasks, 500-1500+ steps) within a single continuous context. This differs from traditional benchmarks that test individual tasks in isolation.
Detection of Catastrophic Performance Degradation: Computerized general agents (CUAs) experience a ‘severe’ drop in performance when workload increases, with the completion rate dropping from 16.7% at 25% workload to 8.7% at 100% workload.
Four Basic Failure Modes: Researchers have identified why current agents fail under load: the filling of the whole (O(N) growth), memory impairment (combination function), dependency complexity (handling Directed Acyclic Graphs), and re-prioritization (O(N) decision difficulty).
Reducing Structures with CORPGEN: The CORPGEN framework addresses this failure using four key approaches: sequential planning target orientation, sub-agent classification to prevent memory corruption, integrated memory (functional, structural, and semantic), and a synopsis managing token limits.
Key Benefits of Working with Literacy: Testing across multiple backends has shown that CORPGEN can improve performance by up to 3.5x over baseline. Ablation research reveals that learning experience-reusing proven successful trajectories-provides significant performance improvements between all components of the architecture.

Check it out Paper again Technical details. Also, feel free to follow us Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Michal Sutter is a data science expert with a Master of Science in Data Science from the University of Padova. With a strong foundation in statistical analysis, machine learning, and data engineering, Michal excels at turning complex data sets into actionable insights.