Synthetic data rooms for finance

Verified EnvironmentsForPE & BankingAnd Post-Training Loops

The Synthetic Data Company builds fully synthetic data rooms for training computer use agents on finance workflows. Our 500+ environments span the full deal lifecycle — from origination through post-close — with thousands of grounded documents per data room: financial models, memos, presentations, and compliance packages.

500+

Verifiable environments

7

Deal stages

Days → Weeks+

Task horizon

Ontology-Based Generation

Every Document Is Grounded In Facts

We generate fully synthetic data rooms by building and maintaining an ontology of facts, relations, and constraints for each synthetic company across a multi-year timeline. Every document is cross-referenced and internally consistent.

Documents per data room
10,000+

Financial statements, models, memos, and more

Document types
50+

PDFs, Excel, Word, PowerPoint, and email formats

Timeline coverage
10+ years

Full company history with consistent facts

Computer Use Agents

Agents Navigate Real Applications

Every environment is a full desktop — Windows or macOS — with real applications installed. Agents interact with the same tools that analysts and associates use every day. No simplified APIs. No toy interfaces.

Regulatory

Filing Systems

SEC filings, compliance submissions, regulatory forms

Spreadsheet

Microsoft Excel

Financial models, sensitivity analysis, cap tables

Email

Microsoft Outlook

Deal correspondence, internal memos, client updates

Presentations

Microsoft PowerPoint

Management decks, pitchbooks, IC presentations

Due Diligence

Virtual Data Room

Document review, Q&A workflows, access management

Market Data

Bloomberg Terminal

Comparable analysis, market data, precedent transactions

CRM

Salesforce / DealCloud

Pipeline tracking, deal management, relationship mapping

DMS

Document Management

Version control, filing, document organization

Regulatory

Filing Systems

SEC filings, compliance submissions, regulatory forms

Spreadsheet

Microsoft Excel

Financial models, sensitivity analysis, cap tables

Sample Agent Trace

Agent Runtime — External

Your infrastructure
0102030405060708
Authenticated CUA Client · REST / gRPC

Sandbox Environment

Isolated · Snapshotted

Step 01Data Room

Navigate to financial folder, locate Q3 income statement

Application Layer

Microsoft Excel
Microsoft Outlook
Microsoft PowerPoint
Virtual Data Room

Operating System

Windows 11 Pro · macOS Sonoma

Virtual Machine

Reproducible · Reset on demand

Deploy At Scale

Thousands of sandboxed environments. On demand.

Each environment is a fully provisioned Windows VM with 8 applications installed, a complete synthetic data room, and verifiable ground-truth. Spin up hundreds in parallel for training runs, or deploy continuously for post-training evaluation loops.

  • Identical environments — every VM is provisioned from the same snapshot for reproducible evaluation
  • Parallel execution — run hundreds of agents simultaneously across independent sandboxes
  • Continuous evaluation — integrate into post-training loops for ongoing model improvement

Environment Fleet

Envs Deployed

2,847

Docs Processed

1,243,600

Compute Hours

4,218

Active Now

1,849

ENV-0847running
Apps8
Docs12,400
OSWin 11
Uptime14m
ENV-1203running
Apps10
Docs8,720
OSmacOS
Uptime33m
ENV-0412ready
Apps8
Docs15,100
OSWin 11
Uptime0m
ENV-2156running
Apps10
Docs9,340
OSmacOS
Uptime8m
ENV-1891provisioning
Apps8
Docs11,800
OSWin 11
Uptime0m
ENV-0098ready
Apps10
Docs7,960
OSmacOS
Uptime0m
ENV-1547running
Apps8
Docs13,200
OSWin 11
Uptime21m
ENV-2634running
Apps10
Docs10,500
OSmacOS
Uptime35m
ENV-0331ready
Apps8
Docs6,414
OSWin 11
Uptime0m
ENV-1762running
Apps10
Docs14,512
OSWin 11
Uptime11m
ENV-2401provisioning
Apps8
Docs5,330
OSmacOS
Uptime0m
ENV-0673running
Apps10
Docs7,155
OSmacOS
Uptime18m

+ 1,000s more — identical snapshots, deployed on demand

Numbers shown are illustrative

The Scaling Law

More Environments Produce Better Models

Model performance scales with the number of diverse, verifiable long-horizon training environments, so environment count directly controls downstream capability growth.

Y-axis: Average Ranking (lower is better)

Data from Qwen 3.5 Technical Report (Alibaba, 2026). Average ranking computed across BFCL-V4, VITA-Bench, DeepPlanning, Tool-Decathlon, and MCP-Mark.

The deepest environments.
The longest horizons.
The strongest models.

Request access to our catalog, or tell us what you need built.