Synthetic data rooms for transactional law

Verified EnvironmentsForLaw FirmsAnd Post-Training Loops

The Synthetic Data Company builds fully synthetic data rooms for training computer use agents on transactional law workflows. Our 500+ environments span the full transaction lifecycle — from client intake through post-closing — with thousands of grounded documents per data room: contracts, corporate records, filings, and compliance packages.

500+

Verifiable environments

10

Transaction stages

Days → Weeks+

Task horizon

Ontology-Based Generation

Every Document Is Grounded In Facts

We generate fully synthetic data rooms by building and maintaining an ontology of facts, relations, and constraints for each synthetic company across a multi-year timeline. Every document is cross-referenced and internally consistent.

Documents per data room
10,000+

Contracts, filings, corporate records, and more

Document types
60+

PDFs, Excel, Word, PowerPoint, and email formats

Timeline coverage
10+ years

Full company history with consistent facts

Computer Use Agents

Agents Navigate Real Applications

Every environment is a full desktop — Windows or macOS — with real applications installed. Agents interact with the same tools that associates and partners use every day. No simplified APIs. No toy interfaces.

Regulatory

Filing Systems

Court filings, regulatory submissions, compliance

Email

Microsoft Outlook

Client correspondence, deal communication, scheduling

Drafting

Microsoft Word

Contract drafting, redlining, track changes

Spreadsheet

Microsoft Excel

Disclosure schedules, financial analysis, closing checklists

DMS

iManage

Document management, version control, matter filing

Matter Mgmt

Clio / Legal Tracker

Matter management, time tracking, client CRM

Redlining

Litera Compare

Document comparison, change tracking, version diff

Due Diligence

Virtual Data Room

Document review, Q&A workflows, access management

E-Signature

DocuSign

Contract execution, signature workflows, closing docs

Intake

Lawmatics

Client intake, conflict checking, engagement letters

Regulatory

Filing Systems

Court filings, regulatory submissions, compliance

Email

Microsoft Outlook

Client correspondence, deal communication, scheduling

Sample Agent Trace

Agent Runtime — External

Your infrastructure
0102030405060708
Authenticated CUA Client · REST / gRPC

Sandbox Environment

Isolated · Snapshotted

Step 01Outlook

Read client engagement letter, extract deal parameters

Application Layer

Microsoft Outlook
Microsoft Word
iManage
Virtual Data Room

Operating System

Windows 11 Pro · macOS Sonoma

Virtual Machine

Reproducible · Reset on demand

Deploy At Scale

Thousands of sandboxed environments. On demand.

Each environment is a fully provisioned macOS VM with 10 applications installed, a complete synthetic data room, and verifiable ground-truth. Spin up hundreds in parallel for training runs, or deploy continuously for post-training evaluation loops.

  • Identical environments — every VM is provisioned from the same snapshot for reproducible evaluation
  • Parallel execution — run hundreds of agents simultaneously across independent sandboxes
  • Continuous evaluation — integrate into post-training loops for ongoing model improvement

Environment Fleet

Envs Deployed

2,847

Docs Processed

1,243,600

Compute Hours

4,218

Active Now

1,849

ENV-0847running
Apps8
Docs12,400
OSWin 11
Uptime14m
ENV-1203running
Apps10
Docs8,720
OSmacOS
Uptime33m
ENV-0412ready
Apps8
Docs15,100
OSWin 11
Uptime0m
ENV-2156running
Apps10
Docs9,340
OSmacOS
Uptime8m
ENV-1891provisioning
Apps8
Docs11,800
OSWin 11
Uptime0m
ENV-0098ready
Apps10
Docs7,960
OSmacOS
Uptime0m
ENV-1547running
Apps8
Docs13,200
OSWin 11
Uptime21m
ENV-2634running
Apps10
Docs10,500
OSmacOS
Uptime35m
ENV-0331ready
Apps8
Docs6,414
OSWin 11
Uptime0m
ENV-1762running
Apps10
Docs14,512
OSWin 11
Uptime11m
ENV-2401provisioning
Apps8
Docs5,330
OSmacOS
Uptime0m
ENV-0673running
Apps10
Docs7,155
OSmacOS
Uptime18m

+ 1,000s more — identical snapshots, deployed on demand

Numbers shown are illustrative

The Scaling Law

More Environments Produce Better Models

Model performance scales with the number of diverse, verifiable long-horizon training environments, so environment count directly controls downstream capability growth.

Y-axis: Average Ranking (lower is better)

Data from Qwen 3.5 Technical Report (Alibaba, 2026). Average ranking computed across BFCL-V4, VITA-Bench, DeepPlanning, Tool-Decathlon, and MCP-Mark.

The deepest environments.
The longest horizons.
The strongest models.

Request access to our catalog, or tell us what you need built.