Liquid AI Releases LocalCowork Funded LFM2-24B-A2B to Implement Privacy-First Local Agent Workflows with Model Context Protocol (MCP)

Liquid AI released LFM2-24B-A2Ba model optimized for local, low-latency, hardware deployment LocalCoworkan open source desktop agent application available from their Liquid4All GitHub Cookbook. The release provides usable architecture to run business workflows entirely on the device, eliminating API calls and data exits for privacy-sensitive environments.
Architecture and configuration of worship
To achieve low-latency performance for consumer hardware, the LFM2-24B-A2B uses a Sparse Mixture-of-Experts (MoE) architecture. Although the model contains 24 billion parameters in total, it activates about 2 billion parameters per token at the time of decision.
This structural design allows the model to maintain a broad knowledge base while significantly reducing the total computation required for each generation step. Liquid AI compressed the model using the following hardware and software stack:
- Hardware: Apple M4 Max, 36 GB integrated memory, 32 GPU cores.
- Supply Engine:
llama-serverwith flash focus enabled. - Estimating the value:
Q4_K_M GGUFformat. - Memory Footprint: ~14.5 GB of RAM.
- Hyperparameters: Temp set to 0.1, top_p to 0.1, and max_tokens to 512 (optimized for deterministic, robust results).
LocalCowork Tool Integration
LocalCowork is a fully offline desktop AI agent that uses the Model Context Protocol (MCP) to use pre-built tools without relying on cloud APIs or compromising data privacy, putting all actions into a local audit trail. The system includes 75 tools across 14 MCP servers capable of handling tasks such as file system operations, OCR, and security scanning. However, the demo provided focuses on a highly reliable, selected subset of 20 tools across 6 servers, each rigorously tested to achieve over 80% single-step accuracy and guaranteed participation for multi-step sequences.
LocalCowork serves as a practical implementation of this model. It works completely offline and comes preconfigured with enterprise-grade tools:
- File Operation: Listing, reading, and searching the host’s file system.
- Security scan: Identifying leaked API keys and personally identifiable information (PII) within local directories.
- Document Processing: It uses Optical Character Recognition (OCR), to classify text, to classify contracts, and to generate PDFs.
- Audit Logging: Recording all tool calls on site to track compliance.
Performance benchmarks
The Liquid AI team tested the model with a workload of 100 single-step tool selection and 50 multi-step chains (requiring 3 to 6 tool executions, such as folder search, OCR, data transfer, segmentation, and export).
The delay
The model has an estimate ~385 ms response to select each instrument. This sub-second transmission time is best suited for interactive, human applications where a quick response is required.
Accuracy
- One step execution: 80% accuracy.
- Multi-step chains: 26% end-to-end completion rate.
Key Takeaways
- Privacy-First Localization: LocalCowork runs entirely on the device with no cloud API dependencies or data exits, making it ideal for regulated business environments that require strict data privacy.
- Active MoE Architecture: The LFM2-24B-A2B uses a Sparse Mixture-of-Experts (MoE) design, which activates only ~2 billion 24 billion parameters per token, allowing it to fit perfectly within the ~14.5 GB RAM footprint using
Q4_K_M GGUFquantization. - Sub-Second Latency in Consumer Hardware: When benchmarked on an Apple M4 Max laptop, the model achieves an average latency of ~385 ms for tool selection deployment, enabling highly interactive, real-time workflows.
- Standard MCP Tool Integration: The agent uses the Model Context Protocol (MCP) to easily communicate with local tools—including file system operations, OCR, and security scanning—while automatically logging all actions in the local inspection process.
- One-step robust accuracy with multiple-step limits: The model achieves 80% accuracy in using a single-step tool but drops to a 26% success rate in multi-step chains due to ‘sibling confusion’ (selecting the same but wrong tool), indicating that it currently works best in a guided, human-in-the-loop loop rather than a fully autonomous agent.
Check out Repo again Technical details. Also, feel free to follow us Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.



