Financial ServicesCustom tierCustom tier — embedded engineering pod, 14-week build + 8-week parallel run22 weeks

Compliance-aligned underwriting assistant for a working-capital lender — Custom tier

Client: Mid-market fintech (SMB working-capital lender, $50k–$1.5M lines)

Custom-tier build for a regulated lender that needed extraction reliable enough to feed a credit model and auditable enough to explain to examiners. Embedded pod owned model selection, retrieval, eval suite, and the SOC2-aligned audit ledger. Now pre-decisions 92% of applications with a fully cited reasoning trail per field.

extraction accuracy on credit-model fields: 99.4%
median time-to-decision (was 6.1 days): 11 hrs
default-rate delta vs. control (flat): 0.2 pts
decisions with field-level citation to source PDF: 100%

Challenge

Working-capital lender funding $50k–$1.5M lines. Underwriting required human review of bank statements, tax returns, and AR aging — 40-60 pages per application. Average decision was 6.1 business days against competitors quoting 48 hours, costing roughly a third of qualified applicants. A previous document-AI vendor had 78% extraction accuracy, unusable for a credit-model input where misreading current-period revenue is a six-figure write-off.

The Pro tier wouldn't fit. They needed: (a) extraction at 99%+ on the credit-model fields, (b) field-level citation back to source pages for the audit trail, (c) a kill-switch the chief credit officer could pull on any specific borrower segment, (d) SOC2-aligned logging the bank examiners would accept. Custom tier — embedded pod, 14-week build, 8-week parallel-run validation against the existing underwriters.

Approach

Document-classification layer (statement vs. tax return vs. AR aging) feeding field-specific extractors. Each extractor has its own eval set built from 3,200 historical applications hand-labeled by the senior underwriter. Every extracted field carries a confidence score and a citation to the source page. The credit model refuses to decision when any input field comes in below threshold — those applications route to a human with the low-confidence fields flagged.

Audit ledger sits on Postgres with append-only writes; every decision logs the model version, prompt hash, retrieved context, and field-level citations. Examiners can replay any decision deterministically.

We ran the system in parallel with the human underwriting team for eight weeks. The system's decision was logged but not binding; weekly reconciliation reviewed every disagreement. Those reviews produced 31 material prompt and schema adjustments before flipping to binding mode.

Outcome

92% of applications pre-decisioned without human review. Median time-to-decision: 11 working hours (was 6.1 business days). Default rate within 0.2 pts of historical control — statistically indistinguishable. Application-to-funded conversion up 34%. Eleven of fourteen underwriters redeployed to portfolio monitoring. The audit ledger was accepted by their primary regulator on first review; the field-level citation trail meant no examiner ever asked us to explain a decision twice. Custom-tier engagement transitioned to a $12k/mo on-call retainer at month 6.

Stack

Claude Opus (extraction + reasoning)
Custom document classifier (in-house)
Append-only decision ledger on Postgres
Airflow + Kafka orchestration
Field-level eval harness (3,200 labeled apps)

Working on something similar?

A partner will respond personally within one business day. If there isn't a fit, we'll tell you that, and point you somewhere better.

Start a conversation More case studies