Evolution of Agentic AI: What It Tells You About the Next 6-12 Months (Part 1)
Part 1 of a 3-part series. Part 2 covers the payments standards war; Part 3 covers agent security.
The same thing, three names, one quarter
In January 2025, OpenAI shipped Operator, an agent that drives a computer the way a person does, moving a cursor, clicking buttons, filling forms. Three days later Alibaba’s Qwen team open-sourced Qwen2.5-VL, which does the same pixel-to-action trick. Both arrived ninety-odd days after Anthropic shipped “computer use” the previous October. Same capability.
I went back through every agentic capability the major labs have shipped since roughly 2023 (function calling, plugins, memory, computer use, deep research, multi-agent orchestration) and pulled the first-party announcement date for each one. The shape repeats every time. One lab ships a capability first. Within months, sometimes within days, the rivals ship their own version under a different brand. Then one of two things happens: the capability gets donated to a neutral standards body and stops being anyone’s advantage, or it quietly dies.
Copying is not the interesting part. Everyone knows copying happens. The interesting part is the clock. The gap between “one lab ships it” and “a rival ships the copy” has gone from about a year to about a week, and it is still shrinking. In 2023 it took Anthropic 352 days to match a capability OpenAI shipped. By 2025 the same kind of gap was down to 53 days. For agent payments in 2025, two card networks shipped competing standards one day apart.
So here is the question I want to answer across this article, because it is the one a technology budget actually turns on.
If every capability you can name today will be commoditized (copied, then donated to a standards body, then table stakes) inside a quarter or two, what exactly are you paying a premium for?
And where does the next twelve months leave you?
Let me walk the evolution first, because the answer falls out of the pattern.
ELI5: What is an “agentic” capability?
A normal chatbot answers a question and stops.
An agent takes a goal and does several steps on its own to reach it: searching the web, calling a tool, clicking through a website, checking its own work. Each of those abilities (call a tool, browse a page, remember across sessions) is a “capability.”
This article tracks which lab built each one first and how fast everyone else copied it.
How the capabilities evolved, 2023 to 2026
Start in 2023, because the pre-agent era is short and it is mostly setup.
The first capability that made agents possible was function calling: teaching a model to stop talking and instead emit a structured request that calls an external tool.
The academic groundwork came out of papers, ReAct and Toolformer, not a lab.
OpenAI productized it on June 13, 2023 with a function-calling update to GPT-4. That is the champion date.
Google matched it in Vertex about six months later.
Anthropic’s tool use went generally available on May 30, 2024, which is 352 days after OpenAI.
Meta’s Llama 3.1 built it in 406 days.
A year-plus of lag was normal in 2023.
ELI5: What is “function calling” or “tool use”?
Left alone, a language model can only produce text. Function calling is the wiring that lets it say “I need to call the weather API with city = Dallas” and have the software run that call and hand the result back. It is the difference between a model that can describe what to do and one that can make something happen. Every agent capability since is built on top of it.
Alibaba’s Qwen-Agent shipped in September 2023, and Zhipu’s ChatGLM3 had function calling that October, both well before Llama matched it.
The capability was co-championed in the open from the start.
This is an important distinction to remember.
Then the cadence speeds up. OpenAI shipped persistent cross-session memory in ChatGPT on February 13, 2024; Anthropic’s Projects workspace followed about 133 days later.
Anthropic shipped computer use, an agent that operates a desktop by looking at the screen, on October 22, 2024. OpenAI’s Operator followed in 93 days; Qwen’s open-weights version in 96. Google’s own computer-use model did not land until October 2025, a full 350 days behind.
That gap is its own tell.
Nobody leads on everything, and “behind” is not the same as “out.”
The champion even rotates.
Deep Research (an agent that runs a multi-step research job and writes you a report) is the capability everyone associates with OpenAI, and it was Google’s first.
Google shipped Gemini Deep Research on December 11, 2024. OpenAI’s version came 53 days later, on February 2, 2025.
If you remember it as OpenAI’s idea, that is branding winning over the record. OpenAI followed, and then OpenAI won the category.
Now the part that organizes everything. Every one of these capabilities lands in one of four outcomes, and only four:
Champion. One lab ships it first and gets a quarter or two of being the only one who has it.
Follower. Rivals ship a relabeled version. Operator is Anthropic’s computer use with an OpenAI badge. The capability is the same; the brand is different.
Commoditized to a standard. The capability stops being any lab’s property and becomes shared infrastructure, frequently by being donated to a neutral foundation. This is the endpoint that changed the game, and I will come back to it.
Abandoned. The capability, or the company that pioneered it, dies. OpenAI’s original ChatGPT Plugins (the 2023 attempt at connecting models to outside services) were deprecated in April 2024. Adept’s ACT-1, the GUI agent that predated computer use by two years, never shipped as a product and the founders left for Amazon. OpenAI’s Swarm shipped labeled “not production-ready” and went nowhere.
We see survivorship bias is prevalent.
Not every pioneering bet pays off; plenty of “firsts” are now footnotes.
After Plugins flopped, Anthropic shipped the Model Context Protocol (MCP) on November 25, 2024, an open standard for connecting any model to any tool or data source.
OpenAI adopted it 121 days later, in March 2025; Google about two weeks after that; Qwen built it in natively that April. Then, on December 9, 2025, MCP was donated to a new Agentic AI Foundation under the Linux Foundation, co-founded by Anthropic, OpenAI, and Block.
By that point it was running roughly 97 million monthly software development kit (SDK) downloads across something like 10,000 active servers.
Champion, then three followers, then donated standard: the full arc in about a year.
That is the template the rest of the industry is now running.
ELI5: What is the Model Context Protocol (MCP)?
Before MCP, every time you wanted an AI model to talk to a tool (your calendar, a database, a payment system) someone had to hand-build a custom connector for that exact model. MCP is a universal plug. Build one MCP connector for your tool and every model that speaks MCP can use it. Think USB versus a drawer full of proprietary cables. The point to hold onto: it makes tools portable between models, which is not the same as making it cheap to copy a capability. That distinction is what the next section turns on.
Here is the whole evolution in one view: every capability, who championed it, who followed, how many days behind, and which of the four outcomes it landed in.
Read down the lag column and the story tells itself.
Function calling: 352 days to the slowest major follower.
Memory: 133.
MCP adoption: 121.
Computer use: 93.
Deep research: 53.
Agent payments in 2025: one day between Mastercard and Visa, thirteen days between Google’s payments protocol and OpenAI’s.
The numbers fall, and they fall fastest where there is money on the table.
In the same quarter OpenAI cloned computer use in 93 days, Google took 350 to ship its own, with no obvious penalty for showing up a year late.
So the clean descending line is really “the fastest follower keeps getting faster,” not “everyone copies everything in a week now.”
That distinction matters more than it sounds, because it tells you who the clock is for. The 90-day clock is a vendor’s clock*.
It governs how long the company selling you a feature gets to charge a premium for it. It is not your clock.
If Google can be a year behind on computer use and lose nothing, a buyer can almost always afford to be late too. The pressure to move at the speed of the announcement cycle is a pressure the vendors need their customers to feel.
Mostly we shouldn’t.
Why the copy-lag collapsed (and why it is not the protocol)
The popular explanation, the one I half-believed when I started, is that MCP made copying cheap: shared connector standard, everyone wires up the same tools, capabilities spread fast.
The timeline doesn’t seem to add up though.
OpenAI copied Anthropic’s computer use in 93 days. That was January 2025. No rival adopted MCP until March 2025, roughly four months after Operator already shipped.
The fastest, cleanest copy in the whole dataset happened before the protocol that supposedly enabled it existed in anyone else’s stack. If MCP were the engine of cheap copying, the copying would not have run ahead of the engine.
Correlation got mistaken for cause.
So what is the actual engine? Three things, and none of them is a protocol.
1. Open-weights distillation
The first is open-weights distillation. When a lab releases a model’s weights, anyone can study it, fine-tune it, and bake its behavior into a smaller, cheaper model. The exhibit here is DeepSeek-R1, released January 20, 2025 under a permissive license.
DeepSeek did not just publish a reasoning model. It published the recipe, the reinforcement-learning method it used to get there, and then distilled that capability into Llama and Qwen variants anyone could download.
There was zero MCP involved.
A frontier reasoning capability went from one lab’s edge to a free download in a matter of weeks, because the weights and the method were both in the open.
ELI5: What is “distillation”?
Imagine a brilliant, expensive expert. Distillation is having that expert train a cheaper junior until the junior can do most of the same work at a fraction of the cost. In AI, a large model generates examples that teach a smaller model to mimic its behavior. When the big model’s weights are public, anyone can run this and end up with a cheap copy of the expensive capability, no permission and no license fee.
3. Published Recipes
The second is published recipes. The labs and the research community publish how they did it: in papers, in model cards, in open-source repositories.
ReAct and Toolformer told everyone how tool use works before any lab shipped it commercially. DeepSeek published its reasoning method. When the method is public, the rival is not reverse-engineering a black box; they are following a printed set of instructions.
3. Talent Flow
The third is talent flow. The people who built a capability at one lab leave and rebuild it at the next. Adept’s founders went to Amazon. Researchers rotate across OpenAI, Anthropic, Google DeepMind, and a dozen startups on a cadence measured in months.
The knowledge does not stay put because the people do not stay put.
Put those three together and copying is cheap because the ingredients are open: the weights, the methods, and the people.
The villain, if you want one, is the open ecosystem doing what an open ecosystem does.
MCP is the rail it runs on.
What MCP does (and this matters, because it changes where the next twelve months point) is make the endpoint faster.
It does not make the copy cheaper; it makes the commoditization smoother.
Once tools are portable across every model, the natural resting state for a capability is a shared standard that everyone implements, because no one can hold an integration advantage for long. MCP greased the slide from “copied” to “donated standard.”
The copying engine is the open ecosystem; the protocol is the off-ramp that engine drives onto.
The champion rotates. There is no permanent leader.
Anthropic led on computer use and connectors and is leading on sandboxing.
Google led on deep research and the agent-payments protocol.
OpenAI led on memory and function calling.
Chinese open-weights labs led or tied on tool use and phone-use autonomy.
Any strategy built on “Lab X always wins, so standardize on Lab X” rests on something the three-year record says is false.
The lead is capability-specific and temporary, and it moves.
If one lab always won, you could standardize on it and stop thinking.
The durable bet however is the interoperable layer underneath the labs, not the lab.
What this means for the next 6 to 12 months
If the pattern holds, the next year is mostly predictable, and parts of it have already happened.
The capabilities that defined 2024 and 2025 (connectors, tool use, computer control, deep research) are finishing their run not as features anyone owns but as donated standards.
MCP went to the Linux Foundation in December 2025.
Google’s Agent2Agent protocol (A2A), a year into its life, now sits with a neutral foundation and well over a hundred organizations.
The forecast is less a guess than a clock we can already predict.
The strongest evidence here is unusual: 2 of the things I would have forecast in this section came true while I was researching it.
1. Payments already hit the donated-standard endpoint.
I will cover this in depth in Part 2, so one paragraph here.
In April 2025 Mastercard and Visa shipped competing agent-payment standards one day apart.
Google’s Agent Payments Protocol (AP2) followed that September; OpenAI and Stripe’s version came thirteen days after that.
Then, on April 28, 2026, Google and Mastercard donated AP2 and a companion “Verifiable Intent” standard to the FIDO Alliance, which stood up two working groups to govern it.
That is the MCP movie, second showing, same ending: champions ship, rivals match in days, the standard lands at a neutral body inside a year.
Payments isn’t a frontier anymore; it’s the worked example.
Commoditization now turns into a donated standard, faster than anyone could build a clone.
ELI5: Why is “donated to a standards body” a bigger deal than “copied”?
A copy still leaves two rival products you have to choose between. A donated standard means the capability stops being a product at all. It becomes shared plumbing that every vendor implements the same way, governed by a neutral group. For a buyer, that is the moment the feature stops being a reason to pick one vendor over another. It just becomes assumed, like Wi-Fi.
2. Security looked like the one differentiated lane left, and it lasted about half a year.
Anthropic shipped an open-source sandbox runtime for Claude Code in late 2025: operating-system-level containment that cut permission prompts by 84%.
That looked like a durable edge, the rare capability nobody had copied.
Then OpenAI bolted sandboxing onto its Agents SDK in April 2026, and Microsoft put an OS-kernel agent sandbox into Windows at its Build conference in June 2026, with OpenAI and NVIDIA already onboard.
So the capability commoditized on roughly the same timeframe as everything else.
The part that matters for the budget:
OpenAI did not build its own runtime. It rents isolation from seven third-party providers behind an abstraction layer.
Anthropic open-sourced a runtime it owns; Microsoft owns the kernel and the management plane.
The capability got copied. What did not get copied is who owns the trust boundary.
The moat did not disappear. It moved one layer down.
When a capability commoditizes, the moat does not vanish. It relocates.
And right now it is relocating in 3 directions at once.
**1. The next rung on the escalator is already visible, and it is identity
Proving an agent is acting for a specific human, within set limits, with an audit trail when it does something.
Every payments thread and every security thread above collapses into that one unsolved question, and the whole industry pivoted to it in the first half of 2026.
NIST stood up an AI Agent Standards Initiative in February 2026, with agent security and identity as an explicit pillar, and published a concept paper arguing the field should adapt existing identity standards (OAuth, OpenID Connect, SPIFFE) rather than invent new ones.
FIDO opened an Agentic Authentication working group in April 2026 to verify that an agent is acting on behalf of an authenticated user within defined parameters. OpenAI joined FIDO to work on it, the rival co-signing the governance body exactly as it did with MCP.
What’s striking: identity is starting at the standards-body endpoint.
There is no champion-then-clone phase to wait through.
The standards bodies are drafting before any single lab can claim to own it. Identity is skipping the clone phase and going straight to a shared standard, the pattern accelerating on itself.
My sharpest forecast for the next six to twelve months is a draft interoperable agent-identity spec out of FIDO and an interoperability profile out of NIST. NIST has signaled a profile for late 2026; I would not bet the exact quarter, but the direction is clear on my mind.
One hedge on my own optimism, because identity is exactly where this could break.
Identity standards have a long history of not converging cleanly; SAML, OAuth, and OpenID Connect took most of a decade to sort out, and SPIFFE adoption is still thin.
And the incumbents who would have to cede control of agent identity (Okta, Microsoft Entra, the card networks) are the ones with the deepest pockets to keep it proprietary.
So the fourth outcome stays live here too.
Identity might converge fast, or it might be the capability where the standards-body dream stalls because the people who own identity today have the most to lose from neutrality.
I am forecasting convergence because the bodies are already drafting, but I would not write off fragmentation.
Where the moat moves, and what to do about it
The forecast that matters to a budget is not “which lab wins.”
It is “stop paying a premium for the feature that commoditizes in a quarter.”
3 relocations, each with concrete moves attached.
**1. From features to the trust boundary
Owning the runtime or the kernel is defensible; reselling someone else’s isolation is not.
If you run technology, buy for sandbox and identity posture, not feature parity. The capability is table stakes within a quarter; the containment-and-audit story is what you are actually paying for. When a vendor pitches a capability, assume every competitor ships it by next quarter, and ask who owns the trust boundary underneath.
2. From product to governance and distribution
The endpoint of every capability is now a donated standard at a neutral body: MCP to the Agentic AI Foundation, AP2 to FIDO, A2A to the Linux Foundation, Agent Skills made an open standard in 63 days. Donation is a move, not a law of physics. A lab gives away the layer it cannot win proprietary, partly to deny a rival the chance to own it, which is exactly why the layer a lab won’t donate tells you where it plans to charge. The durable position is being inside the governance body and inside the distribution channel (the operating system, the enterprise resource planning suite, the card network) rather than owning the spec. If you run data or platform strategy, track standards-body membership and watch the metric that predicts this: time-to-shared-standard, not time-to-clone. Align on the protocols that already landed at a foundation rather than betting on a proprietary one that will be obsolete the moment it is donated.
3. From “first” to “trusted and integrated”
Being first is worth about one quarter. It is the buyer’s discount window, not a vendor’s moat. If you hold the budget, stop paying a premium for “first,” and budget instead for the new integration tax. MCP plus A2A plus the payment protocols plus identity is four-plus overlapping standards you will have to govern at once. “Interoperability” is producing a new tax, not removing the old one, and that line item is real.
One reality check, because a forecast that only points up is not honest. The adoption curve is genuine but the production gap is wide. Analysts project that something like 40% of enterprise applications will embed task-specific agents by the end of 2026, up from under 5% a year earlier; the same analysts project that more than 40% of agentic-AI projects will be cancelled by 2027 on governance gaps, unclear return, and runaway cost. (Those figures are analyst estimates, not first-party numbers, so treat them as direction, not gospel.)
The read for a leader is not “go faster.” It is that the capability race is settling into shared standards, so your scarce attention belongs on governance, identity, and the integration tax, because that is where the projects that survive will differ from the ones that get cancelled.
The takeaway, and the question I keep coming back to
Three years of agentic AI, reduced to one sentence: capabilities are temporary, standards are the endpoint, and the moat keeps moving down a layer, from features to the trust boundary to identity and probably to something I cannot name yet. The lab that ships first gets a quarter of glory and then watches the open ecosystem distill, publish, and donate its edge into shared plumbing. Every time. The clock on that is now measured in weeks for the soft capabilities and in days for the ones with real money attached.
One precision before I overclaim, because I can hear the objection. A capability going table stakes (computer use, tool calling, memory) does not mean the model underneath stops mattering. Once every model can drive a browser, the only thing left to compete on is how reliably it does it on your actual work, at what cost, with how many wrong answers. Model choice does not disappear when features commoditize; it arguably matters more, because it is the last axis left.
What I am claiming is narrower than “the model is irrelevant.” It is: stop paying a premium for which model got a feature first, because that lead is gone in a quarter. Keep caring, a lot, about which model runs your workload best. That gap is real, and it does not commoditize on anyone’s 90-day clock.
This connects to a thread I have pulled before. When I argued that the real choice in agent architecture is task decomposability, not model capability, the underlying claim was that which model got a feature first is the least consequential decision you make. The evolution data is the receipt for that claim. The same logic ran through the failure-gradient piece: agent projects fail on coordination and boundaries, not on which frontier model sits underneath. And it is why I keep coming back to the governance-philosophy split between Microsoft and Anthropic, because governance, not capability, is turning out to be the thing that differs.
So here is the question I would put on the table at your next architecture review, the one I have not stopped chewing on. You can list, right now, the agent capabilities your vendors are selling you on. Now assume every one of them is free, standardized, and table stakes by the end of next quarter, because the record says they will be. What are you actually paying for? If you cannot answer that without naming the capability itself, you are paying a premium for “first,” and “first” has a half-life of about ninety days.
The thing worth buying is the part that does not commoditize: who owns the trust boundary your agents run inside, who can prove an agent acted for a specific person within limits, and whose seat at the standards table means you are implementing a spec instead of fighting one. That is where the next year’s advantage lives, and almost none of it is a feature.
Parts 2 and 3 go where the action still is. Part 2 takes the payments standards war apart: Mastercard versus Visa, Google’s protocol versus OpenAI and Stripe’s, and what the FIDO donation means for anyone whose agents will be spending money. Part 3 goes after security, the one lane that looked differentiated and is now quietly commoditizing, and the “owns the runtime versus rents it” split that is the real story underneath, picking up the thread from the vibe-coding attack taxonomy. If you only have budget attention for one of the three, make it the one closest to where your agents touch money or production. That is where the clock runs fastest.










