The Synthetic Data Company

Long-horizon training environments

Verified EnvironmentsForCoding AgentsAnd Post-Training Loops

The Synthetic Data Company builds and sells verifiable training environments for AI labs running RL and post-training. Our 1,000+ environments across 30+ domains pair binary rewards with rich feedback signals: detailed test failures, compiler diagnostics, LLM critique, debugging traces, and annotated-image review for stronger long-horizon agent performance.

1,000+

Verifiable environments

30+

Domain categories

Hours → Weeks+

Task horizon

The Scaling Law

More Environments Produce Better Models

Model performance scales with the number of diverse, verifiable long-horizon training environments, so environment count directly controls downstream capability growth.

Y-axis: Average Ranking (lower is better)

Data from Qwen 3.5 Technical Report (Alibaba, 2026). Average ranking computed across BFCL-V4, VITA-Bench, DeepPlanning, Tool-Decathlon, and MCP-Mark.

INSIDE AN ENVIRONMENT

One Environment, Three Agents, 3 Days

Each environment is a long-horizon project with structured milestones. Different agents find different paths.

Illustrative. Agents evaluated on the C compiler environment with four milestone tiers. Hover shows percent progress, where 100% means builds linux. Time axis is compressed from one minute to three days - early progress is fast, deep milestones take sustained work.

THE ENVOI SDK

Define Environments. Evaluate Agents.

environment.py

illustrative

import envoi

basics = envoi.suite("basics")
torture = envoi.suite("torture")

@envoi.setup
async def build_compiler(submission: envoi.Documents):
    result = await envoi.run("chmod +x build.sh && ./build.sh")
    if result.exit_code != 0:
        raise RuntimeError(f"Build failed:\n{result.stderr}")

@basics.test
async def variables():
    """Compile and run variable declaration tests."""
    result = await envoi.run("./cc tests/variables.c -o out && ./out")
    return {"passed": result.exit_code == 0,
            "stdout": result.stdout}

@torture.test("gcc_{test_id}")
async def gcc_torture(test_id: str):
    """Run a single GCC torture test case."""
    result = await envoi.run(f"./cc torture/{test_id}.c -o out && ./out")
    return {"passed": result.exit_code == 0,
            "stderr": result.stderr}

Step 1 — Define

Your task. Your rules.

An environment is a self-contained project with milestones that define what 'correct' means. Write async Python — envoi handles sandboxing, file transfer, and process isolation.

  • Structured milestonesGroup tests into suites. Each suite is a checkpoint agents must reach — basics, torture tests, the full build.
  • Real toolchains, real buildsAgents compile code, run binaries, and pass test harnesses. No multiple choice — just build the thing.
  • Rich diagnosticsReturn whatever you want — pass/fail, stdout, diffs, images. The training loop gets structured results, not just a score.

Step 2 — Evaluate

One call. Full signal.

Connect to a running environment, submit your agent's output, and get back structured results. Run from your training loop, your CI, or a notebook.

  • Session-based evaluationEach evaluation runs in an isolated session. No cross-contamination between runs, no cleanup needed.
  • Granular or holisticRun all suites at once, or target a specific milestone. Get exactly the signal you need for each training step.
  • Drop-in integrationThree lines to connect, submit, and score. Fits into any RL loop, any CI pipeline, any eval harness.

evaluate.py

illustrative

import envoi

async with await envoi.connect_session(
    "https://path-to-environment.com",
    submission=envoi.Documents("./agent-output/src"),
) as session:
    # Run all tests -- get structured results
    results = await session.test()

    # Or run a specific milestone
    basics = await session.test("basics")
    torture = await session.test("torture")

    # Each result is whatever the environment returns --
    # pass/fail, diffs, images, diagnostics, scores.
envoi is open-source

The deepest environments.
The longest horizons.
The strongest models.

Request access to our catalog, or tell us what you need built.