Friday, June 26, 2026

What’s Interactions API? What’s it for? What can it do? (ChatGPT, 2026-6-25) (菊次郎十六式)

https://youtu.be/8yZKAFctM20?si=hbpKJhJjzuC2WjnU

 If you’re referring to the Interactions API that has been getting attention recently, it’s Google DeepMind’s new API for the Gemini ecosystem, not an OpenAI API. It is designed as the successor to the older generateContent interface for building AI agents.  

The key idea is simple:

Instead of treating every prompt as an isolated request, it treats an AI interaction as a persistent, structured process.

That difference sounds subtle, but it changes what developers can build.

What problem does it solve?

Suppose you ask an AI:

“Research superconducting computing for AI, compare it with optical computing, search recent papers, write a report, generate figures, and come back in an hour.”

A traditional chat API has to juggle:

  • conversation history
  • reasoning
  • web searches
  • tool calls
  • intermediate results
  • final answer

Most of that orchestration has to be written by the developer.

The Interactions API moves much of that complexity into the platform itself. It records the whole interaction as a structured timeline of user inputs, model reasoning, tool invocations, tool results, and outputs.  


What can it do?

It supports many capabilities through one interaction model:

  • multi-turn conversations
  • persistent memory for an interaction
  • function calling
  • web search
  • document understanding
  • image understanding
  • audio understanding
  • video understanding
  • structured outputs
  • streaming responses
  • long-running research tasks
  • background execution

All of these become different “steps” inside a single Interaction.  


Think of it as a workflow engine

Instead of

User

 ↓

Model

 ↓

Answer

it looks more like

User

 ↓

Model thinks

 ↓

Searches the web

 ↓

Reads PDFs

 ↓

Calls external tools

 ↓

Runs code

 ↓

Searches again

 ↓

Produces report

Every step is recorded.


Why is this important?

Because AI is moving from

predicting text

to

completing tasks.

This is the essence of agentic AI.

For example,

You ask

“Find every paper by Thomas Südhof published since 2024, summarize them, compare them with AlphaFold, make a PowerPoint, and email it to my collaborators.”

Instead of writing hundreds of lines of orchestration code, the developer simply starts an Interaction.

The API manages the evolving state while the agent works.  


Built-in Deep Research

Google also exposes its Deep Research agent through the same API.

That means an application can invoke either:

  • a Gemini model
  • or a specialized research agent

using essentially the same interface.  


Why are developers excited?

It introduces several conveniences:

  • Server-side state — the service can remember interaction history so clients don’t have to resend everything each turn.
  • Background execution — long-running jobs can continue after the client disconnects.
  • Structured execution traces — developers can inspect reasoning steps, tool calls, and outputs rather than managing opaque chat histories.
  • One unified endpoint for models and managed agents.  


How does it compare with OpenAI?

Interestingly, Google and OpenAI are converging toward similar ideas, although with different APIs.

Google

OpenAI

Interactions API

Responses API + Conversations API

Interaction

Conversation + Response

Server-side interaction state

Stateful conversation objects

Built-in agents

Hosted tools and agent orchestration

Long-running workflows

Responses API with tool loops and conversation state


OpenAI’s current direction is to build agents around the  Responses API⁠ together with conversation state and hosted tools, rather than introducing a separate “Interactions API.”  


My broader view

I think APIs like the Interactions API represent a shift in what we mean by an AI system.

The first generation of LLM APIs (2020–2023) answered questions:

prompt → response

The second generation (2024–2026) executes workflows:

goal → plan → reasoning → tools → memory → execution → revision → result

In other words, the basic computational unit is no longer a prompt, but an interaction.

Given the themes we’ve often discussed—agentic AI, Deep Research, algorithmic selves, and AI as a long-term cognitive partner—I suspect this is the more significant transition. The future may be less about “asking a chatbot a question” and more about maintaining an ongoing collaboration with persistent AI agents that can remember context, use tools, and carry out work over hours or days rather than a single exchange.