Private codebases for frontier AI

Production Code.Decision Context.

We acquire private production codebases and the operational artifacts around them — tickets, PRDs, architecture docs, support threads, postmortems. Not just what was built, but why.

Shutting down or sunsetting a product?

Your code is more valuable than you think.

The systems your team spent years building — the COBOL mainframe, the internal platform, the monolith that ran the business — didn't stop being valuable just because the company moved on. AI labs are paying for exactly this kind of code: real, battle-tested, full of the engineering judgment that only comes from shipping under real constraints.

Defunct companies

Your startup shut down but the code still exists. We turn years of engineering into a payout.

Legacy systems

Migrating off a legacy stack? The old system has training value that the new one never will.

What We Acquire

The Entire Software Engineering Process

A codebase alone is a snapshot. We acquire the full development lifecycle — the plans that shaped it, the collaboration that refined it, and the operational context that kept it running. This is how AI learns to build software the way enterprises actually do.

Planning & design

PRDs, RFCs, architecture decision records, design documents. The reasoning before the first line of code.

Collaboration & review

Code reviews, PR discussions, technical debates, team threads. How engineers negotiate trade-offs in practice.

Production codebases

Complete repositories with full commit history, branching patterns, and engineering decisions baked into every refactor.

Build & deploy systems

CI/CD pipelines, build scripts, deployment configurations, monitoring setups. How code gets to production.

Test suites & QA

Unit, integration, and end-to-end tests. The verification layer that defines what "correct" means.

Operations & incidents

Postmortems, runbooks, incident timelines, on-call rotations. What happens after deployment when things go wrong.

Backlog

Evaluate message queue options

PLAT-112

RK

Draft API versioning RFC

PLAT-108

JL

Rotate production secrets

SEC-155

AT

Audit third-party SDK licenses

PLAT-120

JL

Runbook for payment failover

OPS-44

RK

Dependency vulnerability scan

SEC-160

AT

To Do

PCI scope review

SEC-142

AT

Ledger service RFC

PLAT-89

JL

Webhook retry backoff

PAY-312

MH

Idempotency key migration

PAY-315

RK

In Progress

Adyen integration

PAY-301

MH

Reconciliation service

PAY-305

RK

Circuit breaker middleware

PLAT-95

JL

Done

Stripe load test

PAY-298

MH

Cost analysis memo

PAY-295

RK

Gateway interface spec

PAY-290

JL

Vendor evaluation matrix

PAY-287

MH

Throughput benchmark

PAY-283

RK
JulAugSepOctNov

Discovery & audit

Vendor evaluation

Architecture RFC

RFC approved

Adyen integration

Stripe consumer path

Reconciliation svc

Ledger service

PCI compliance

Data migration

Integration testing

Load testing

Security review

Go-live

Rollout & monitoring

Runbook & handoff

+342−89feat: migrate payment gateway to Adyen
Approved

src/payments/gateway.ts — Lines 47-58

const result = await stripe.charges.create(params);
return normalizeStripeResponse(result);
+ const gateway = amount >= ENTERPRISE_THRESHOLD
+ ? adyenClient : stripeClient;
+ const result = await gateway.process(params);
+ return normalizeResponse(result, gateway.name);
SR

reviewer on gateway.ts:52

This retry logic needs a circuit breaker — we hit cascading failures last quarter without one.

Suggestion:

+ const breaker = new CircuitBreaker(gateway, { threshold: 5 });

MH

author replied

Good call. Added exponential backoff with jitter + breaker at 5 consecutive failures. Also pulled the threshold into config so ops can tune it without a deploy.

#payments-migration · 4 members

SC

Sarah C.

10:02 AM

PCI scope review came back cleaner than expected. Adyen processing can live in an isolated VPC with no shared state.

MR

Marcus R.

10:15 AM

That simplifies things. Here's what the reconciliation flow looks like:

async function reconcile(stripeSettlement, adyenSettlement) {
  const combined = mergeByReference(stripeSettlement, adyenSettlement);
  const variance = computeVariance(combined, generalLedger);
  if (variance.amount > THRESHOLD) {
    await escalate(variance, { channel: '#payments-ops' });
  }
  return { settled: combined.length, variance };
}
JL

Jun L.

10:23 AM

Should we handle the case where Adyen settlement is delayed? Last time that happened we had phantom variance alerts for 6 hours.

100 3

2 replies

SC

Sarah C.

10:31 AM

Yes — adding a settlement window buffer. If Adyen hasn't settled within 4 hours of expected time, we mark it pending instead of flagging variance.

ACCT-PROC.cbl

COBOL

IDENTIFICATION DIVISION.
       PROGRAM-ID. ACCT-PROC.

       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 WS-ACCOUNT-REC.
          05 WS-ACCT-NUM        PIC X(10).
          05 WS-ACCT-NAME       PIC X(30).
          05 WS-ACCT-BAL        PIC S9(9)V99.
          05 WS-ACCT-TYPE       PIC X(02).
             88 SAVINGS         VALUE 'SA'.
             88 CHECKING        VALUE 'CH'.
             88 MONEY-MARKET    VALUE 'MM'.
       01 WS-TRANS-AMT          PIC S9(9)V99.
       01 WS-NEW-BAL            PIC S9(9)V99.

       PROCEDURE DIVISION.
       MAIN-PARA.
           PERFORM INIT-ACCT
           PERFORM PROCESS-TRANSACTIONS
              UNTIL WS-EOF = 'Y'
           PERFORM CLOSE-FILES
           STOP RUN.

       PROCESS-TRANSACTIONS.
           READ TRANS-FILE INTO WS-TRANS-REC
              AT END MOVE 'Y' TO WS-EOF
              NOT AT END
                 EVALUATE WS-TRANS-TYPE
                    WHEN 'CR' PERFORM CREDIT-ACCT
                    WHEN 'DB' PERFORM DEBIT-ACCT
                    WHEN 'XF' PERFORM TRANSFER-ACCT
                 END-EVALUATE
           END-READ.

Codebases

Beyond GitHub

The codebases AI labs need most aren't on the public internet. Legacy enterprise systems, niche domain-specific code, proprietary tooling — with real commit history, real engineering decisions, and real complexity. Not another JavaScript todo app.

  • Decades of accumulated logicCode spanning the 1970s through today. Multiple generations of architecture layered together — the kind of complexity AI needs to learn to navigate.
  • Underrepresented domainsHealthcare systems, industrial control, financial infrastructure, telecom — verticals where AI training data barely exists.
  • Full operational contextNot just source files — commit history, build systems, test suites, deployment scripts, and the engineering decisions that shaped them.
COBOLFortran.NET FrameworkEmbedded C/C++Mainframe JCLVHDL/VerilogAdaPL/SQLABAPRPGDelphiPowerBuilder

Infrastructure Code

The Code That Runs The Infrastructure

Behind every production system is infrastructure code that defines how it's deployed, scaled, and maintained. Terraform modules, Kubernetes manifests, CI/CD pipelines, Helm charts — the operational DNA of real infrastructure that AI has barely seen.

  • Production-grade patternsNot toy examples — real multi-environment setups with networking, security groups, IAM policies, and monitoring. The patterns that take years to develop.
  • Evolution over timeInfrastructure code with commit history showing how systems evolved — migrations, scaling decisions, incident-driven changes, cost optimizations.
  • Cross-tool complexityReal deployments span Terraform, Kubernetes, CI/CD, and monitoring. We source the full stack, not isolated snippets.

Infrastructure as Code

TerraformCloudFormationPulumiCDKBicepCrossplane

Configuration Management

AnsibleChefPuppetSaltStack

Container & Orchestration

Kubernetes manifestsHelm chartsDocker ComposeKustomize

CI/CD Pipelines

Jenkins pipelinesGitHub ActionsGitLab CIArgoCD configs

infra/production/main.tf

Terraform

resource "aws_vpc" "production" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(local.common_tags, {
    Name        = "${var.env}-vpc"
    Environment = var.env
    ManagedBy   = "terraform"
  })
}

resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.production.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index)
  availability_zone = var.availability_zones[count.index]

  tags = merge(local.common_tags, {
    Name = "${var.env}-private-${count.index}"
    Tier = "private"
  })
}

module "eks_cluster" {
  source  = "./modules/eks"
  version = "~> 3.0"

  cluster_name    = "${var.env}-platform"
  vpc_id          = aws_vpc.production.id
  subnet_ids      = aws_subnet.private[*].id
  instance_types  = var.node_instance_types
  desired_size    = var.node_desired_count
  max_size        = var.node_max_count

  enable_irsa     = true
  cluster_version = "1.28"
}

Anonymization

The How, Not The Who

The point of enterprise artifacts for AI training isn't who made the decisions — it's how they made them. Our anonymization pipeline strips personally identifiable information while preserving the operational knowledge, decision points, and process patterns that make the data valuable.

Raw Artifact

Prepared by:Sarah Chen, VP Finance
Date:March 15, 2024
Approved by:james.wong@acmecorp.com
Entity:Meridian Holdings, Delaware
Revenue Q3:$4.2M (+12% YoY)
Decision:Proceed with Series B at $155M

Anonymization Pipeline

Anonymized Output

Prepared by:[PERSON_A], [TITLE_A]
Date:March 15, 2024
Approved by:[EMAIL_A]
Entity:[COMPANY_A], [STATE_A]
Revenue Q3:$4.2M (+12% YoY)
Decision:Proceed with Series B at $155M

What stays, what goes

Names, emails, company identifiers, and proprietary details are replaced with consistent anonymous tokens. The business logic, financial structures, decision rationale, and process flows remain intact — exactly what AI needs to learn how enterprises work.

Private data. Verified environments.
Production-ready agents.

Tell us what you need — we will scope availability, anonymization, and pricing.