Sandboxing LLM-generated code execution
Pigment provides a central AI platform to organizations for real-time business planning. Pigment AI is based on an agentic architecture that is described in more detail in this blog post.
One of our agents, the Analyst, already had multiple tools to perform simple calculations, such as contribution and variance analysis. In order to add more capabilities, we decided to leverage the code generation feature of LLMs rather than creating a dedicated tool for each capability.
LLM-generated code cannot be trusted by default. It is produced from user-controlled input, which means users may intentionally or accidentally steer the model toward unsafe behavior: reading sensitive data, calling internal services, exfiltrating data, pivoting into internal infrastructure, or exhausting compute resources. From a security perspective, the generated code has to be handled as an untrusted workload.
That requirement led us to build a sandboxed execution environment. In this article, we explain how we went from the initial risk analysis to a proof of concept, and eventually to a production-ready sandbox with support for large datasets.







