Coding Agents in Finance

Most finance teams have a basic question they can't answer quickly, and where they're making and losing money is usually one of them. The math is trivial. It's hard because the answer lives in systems that don't agree, and lining them up by hand takes days that are stale by Monday. The agent that could reason its way across all of those systems on its own, and the connector ecosystem that would feed it, aren't quite here yet. What is here is more boring, and for now more useful: a coding agent that writes the code to line them up for you.

By coding agent I mean the kind you describe a task to in plain language, and it writes and runs the code to do it. That sounds narrow, because writing code sounds like an engineering job. But in finance, most of the work was never the reasoning you were hired for. It's the work in front of the reasoning: pulling data from a few places, cleaning it, reconciling it, and getting it into a shape where the question becomes easy to answer. A coding agent is good at all of that, and that is where the time goes.

Why the simple questions are hard

Take the margin question. Where do you make and lose money across your product lines? The arithmetic is nothing. The problem is that the inputs are scattered, and they don't agree with each other. Billing knows what customers used. The general ledger knows what was recognized. The CRM knows the contract and the discount. A cost system knows what it took to serve. Payroll knows who's loaded against which team. Each system holds one piece of the answer, and they don't share a common key. The customer in the billing system is not the account in the CRM is not the cost center in the ledger. To get one clean margin number, someone has to line all of those up, and that lining-up is the real work.

In most teams that work lives in a spreadsheet one person maintains by hand, with a tab of lookups nobody else understands. It takes a few days, it goes stale almost immediately, and it breaks whenever something upstream changes. So the honest answer gets produced rarely, and most of the time people decide off whatever they could assemble in an hour.

Build the pipeline, don't ask for the answer

You can hand the whole question to a reasoning agent and get an answer back in a minute. The problem is that it worked across systems it doesn't really understand, and you have no way to check the number it gives you. For a quick gut-check that's fine. For anything that goes in a board deck or a budget decision, it isn't, because you can't show your work and you can't reproduce it next quarter.

The better use is to have the agent build the pipeline instead of answer the question. It pulls each source, reconciles the keys, and stages one clean table you can rerun and read. The agent writes the code. You still decide what the numbers mean, which is the part that was always your job. The difference is bigger than it sounds. One gives you an answer that disappears the moment you close the chat. The other gives you a small piece of software that runs again next quarter the same way, that someone else can open and read, and that you can fix when the business changes.

This is just data engineering, and that is the point, not a caveat. Pulling, cleaning, reconciling, and staging data is exactly what coding agents are good at, and it used to require a data team, a ticket, and a wait.

Who can build it now

What changed is who can do the work. The old bottleneck wasn't really the queue, it was translation. The analyst who knows the reconciliation logic thinks in margins, product lines, and which product rolls into which segment. The data engineer who can write the pipeline thinks in schemas, grain, and joins. Neither is fully fluent in the other's language, so the analyst writes a spec, the engineer builds something subtly wrong, the analyst can tell the number is off but can't say why, and three weeks later you're on the next round. A coding agent removes the queue and the translation at once, because it speaks both languages. The person who understands the problem can build against output she recognizes, notice that one region looks doubled, and fix it on the spot.

Getting it connected

There's an unglamorous part of this I skipped over, which is actually reaching the systems. Every source has its own API and its own way of letting you in, whether that's an API key, an OAuth flow, a service account, or single sign-on. Wiring all of that up is real work.

A few companies have started putting their internal systems behind a single connector layer for agents, often built on MCP. For an agent doing interactive work, that helps a lot. For a pipeline, it isn't quite the shape you want. You don't really want every run of a production number routed through an agent and a gateway, with a reasoning step sitting in the path. You want the pipeline to authenticate and pull from each source directly, and you want to trigger the whole thing with one command, on a schedule or whenever you need it.

That's the cleaner way to think about it. The agent helps you build the pipeline. The pipeline runs on its own. Once it's built, you get the number by running a command, and the agent doesn't need to be in the loop at all.

None of this is free, though. Building a pipeline this way still takes some systems thinking and a real feel for how your data fits together, from the auth down to the joins, even when the agent writes the actual code. You have to know what a clean answer should look like, where the keys are likely to mismatch, and which mismatch matters. Not everyone in finance clears that bar today, and pretending otherwise is how you end up with confident, wrong numbers. The skill is learnable, but it is a skill.

Making it trustworthy

The obvious worry is trust. Should someone who can't write a SQL join be shipping a pipeline an agent wrote? The honest answer is that the manual spreadsheet it replaces was never trustworthy either. It's undocumented, its logic lives in one person's memory, and it leaves the company when they do. Code you can read, review, version, and test is easier to trust, not harder. The work shifts from doing the reconciliation by hand to verifying that it's right, which is one of the oldest skills in finance. You don't re-derive every number. You design the controls and tie-outs that would catch an error, and you check those.

The way you make an agent-built pipeline trustworthy is to have the agent build the checks too, as part of the same job. The output has to tie to the general ledger within a tolerance, or the run fails. The row count can't move more than you'd expect from one period to the next without raising a flag. Every customer has to map to exactly one account, and the ones that don't get listed for you to look at. You can add a pass whose only job is to assume the answer is wrong and go looking for the broken join or the double count. Each run reports the state of its own checks, so you know what you're trusting before you forward the number. This kind of verification used to be too slow to do by hand, so it got skipped. An agent makes it cheap enough to be the default, which is part of why a well-checked pipeline can end up more reliable than the manual process it replaced. Anthropic's data team wrote about running their own analytics on Claude and landed in the same place: the reliability came from structure and checking, not a smarter prompt.

What you don't need, and where this goes

None of this needs a special finance agent that reasons about your business for you. It needs a general coding agent and a person who knows what the right answer should look like. That already covers a large share of what finance teams spend their days on, which is not high-level reasoning about the company. It's assembling trustworthy numbers so the reasoning can happen at all.

This is also why I treat coding agents as what works now, not the end state. The day an agent can reason reliably across a dozen disagreeing systems, with a connector ecosystem mature enough to feed it clean context, some of this building will fold into the model itself. We're not there yet. Until we are, a coding agent plus a person with systems sense is the combination that holds up, and the systems sense you build along the way won't go to waste when the tools get better.

Where to start

The test is small. Pick one question you answer on a recurring basis and care about getting right. Have one person build the whole thing for it with a coding agent, the pipeline and the checks, end to end. Then look at how long it took, whether you trust the output, and where that person's time goes once the manual version is gone. The tooling is improving quickly, so some of what you build to work around today's limits won't be needed in a year. The basic approach, building and verifying small pieces of software instead of waiting for someone else to, holds up.