We're hiring

How to Extract Data From ACORD Forms (125, 126, 130, 140) Without Re-Typing Them

ACORD forms arrive half-filled and get re-typed by hand. Here's how agencies extract structured data from ACORDs, dec pages, and loss runs instead.

June 10, 2026

Every commercial submission starts the same way: an ACORD lands in the inbox, and someone opens it next to another screen and starts typing.

One agency operator described the workflow to us plainly: "We go through, strip the ACORD, figure out what data exists, what doesn't exist, and then you're trying to map it to other records." Strip is the right word. It's manual extraction, line by line, and it happens thousands of times a year in a busy shop.

This guide covers what's actually on the common ACORD forms, why they're harder to extract from than they look, and the options for getting structured data out of them without the typing.

The forms that eat the most time

FormWhat it isWhy it's painful
ACORD 125Commercial insurance applicationThe anchor form. Insured info, locations, prior carriers. Arrives half-filled more often than not.
ACORD 126General liability sectionClassifications, exposures, subcontractor questions. The answers drive the quote, and they're often missing.
ACORD 130Workers' comp applicationPayroll by class code, owner/officer info, rating worksheets. Numbers that must be exact.
ACORD 140Property sectionBuilding details, construction, protection class, valuation. Pairs with dec pages that disagree with it.

Add dec pages and loss runs from prior carriers, each in its own layout, and a single commercial submission can be 15+ pages of source documents for one risk.

Why ACORD extraction is harder than it looks

ACORDs are standardized forms. That should make them easy to read automatically. Three things break that assumption:

They arrive incomplete. The form you receive was filled out by a producer in a hurry, a client guessing, or a wholesaler forwarding what they got. Empty fields are information too ("nobody knows the payroll split yet"), and a naive extraction treats blank as zero.

They arrive as scans and photos. A crisp digital PDF is the lucky case. The normal case is a scan of a printout, sometimes with handwriting in the margins. Template-based extraction tools that expect fields in exact pixel positions fall over here.

The same fact appears in three places and disagrees. Year built says 1987 on the ACORD 140, 1989 on the dec page, and the county record says 1987. A human catches that. An extraction tool that just grabs the first match passes the conflict downstream, where it becomes a carrier question or a mis-rated quote.

That third one is the real trap. The cost of bad extraction isn't a blank field. It's a wrong value that looks right.

The extraction options

Re-typing by hand (the default)

Free, accurate when people are fresh, and brutally slow. A 6-page dec page plus an ACORD 125/126 set is 30 to 60 minutes of transcription per risk. Error rates climb in the afternoon. This is the baseline everything else gets measured against.

Sending it offshore

A VA team types it for $6 an hour. Cheaper per hour, same elapsed time, and quality control becomes your job. One agency told us their VA-filled ACORDs come back with made-up values where the source was blank, because "blank" wasn't an option in the instructions. You cannot QC what you didn't read yourself.

Generic OCR and document AI

General-purpose tools (the document AI features inside the big cloud platforms, or generic PDF-to-text tools) read the words fine. They don't know insurance. They'll extract "1987" without knowing it's a roof year vs. a year built, and they have no opinion about whether a payroll figure is plausible for a class code. You get text out, then you build the insurance logic yourself.

Insurance-native extraction

Purpose-built tools know what an ACORD 126 contains and what a dec page looks like across carriers. This is what Relay's Document Parsing does: drop in ACORDs, dec pages, driver's licenses, or loss runs, and structured fields come out, mapped to what carriers actually ask.

The design decision that matters most: flag, don't guess. Every extracted field carries a confidence check. A shaky read (handwriting, a low-quality scan, two documents that disagree) gets marked "Needs review" and routed to a human instead of silently passed along as fact. Bad inputs slow down. They don't slip through.

What the workflow looks like after

With extraction handled, the commercial intake flow inverts. Instead of "read everything, type everything, then think," it becomes:

  1. Documents come in (email, intake form upload, wherever)
  2. Parsing turns them into structured fields in seconds
  3. Your team reviews the flagged fields only: the conflicts and the gaps
  4. The verified record feeds quoting directly

The human time moves from transcription to judgment, which is the part that needed a licensed person in the first place. As one agency principal put it: "We don't want them doing data entry because it's a waste of their time."

Key Takeaways

  • ACORD extraction is hard because the forms arrive incomplete, scanned, and self-contradictory, not because the text is hard to read.
  • The dangerous failure mode is a wrong value that looks right. Extraction without confidence flags moves errors downstream instead of catching them.
  • Generic OCR reads words; insurance-native parsing knows what the words mean and which carrier fields they map to.
  • Look for "flag, don't guess" behavior: uncertain reads should be routed to a human, never silently filled.
  • Relay's Document Parsing turns ACORDs, dec pages, and loss runs into quote-ready fields, with shaky reads flagged for review.

Buried in ACORDs? Send us a sample submission and we'll show you what parsing returns on your actual documents.

The Summer ’26 batch is open.

Spring ’26 is full. We're building the Summer ’26 list now. Drop your email and we'll reach out when there's a fit.

No spam. No contract. Unsubscribe anytime.

Month-to-month. No contract.