Moving Compute to the Data: Microsoft Open Sources Pg_durable
The era of stitching together disparate infrastructure to manage state is ending. Microsoft has open-sourced pg_durable, a tool designed to bring durable execution directly inside PostgreSQL.
For teams managing complex workflows, the current standard involves a fragile architecture of cron jobs, workers, queues, and status tables. This fragmentation forces engineers to reconstruct state by hand when systems fail. By bringing execution inside the database, pg_durable eliminates the need for extra service infrastructure.
The architecture shifts the burden of reliability from the application tier to the database itself. A pg_durable function operates as a graph of SQL steps that PostgreSQL executes and checkpoints. If a crash, restart, or step failure occurs, the system resumes from the last durable checkpoint. This removes the need for the manual cleanup and uncertain replays that typically follow a failed API call or a single failed row.
This development targets three specific groups:
- Backend and data engineers who require workflows to live alongside the data they touch.
- DBAs and SREs automating runbooks that must remain auditable in SQL and survive restarts.
- Teams building data or AI pipelines that require durable execution per row, document, or batch.
The utility of this approach is most evident in high-stakes pipelines. For vector embedding pipelines, the tool can handle chunking, calling an embedding API, and upserting into pgvector. For ingestion, it can stage, deduplicate, transform, and publish large batches. It also provides a mechanism for external API workflows, such as enrichment and classification, directly from SQL.
The move toward in-database execution solves the structural risks of long-running transactions and distributed logic. Traditional methods—such as using pg_cron with a jobs table or external orchestrators like Airflow, Temporal, Step Functions, or Argo—often result in workflow logic spread across SQL, workers, and dashboards. Furthermore, long transactions in these setups can hold locks and grow WAL, making batch jobs fragile at scale.
By moving retry state, progress tracking, and checkpointing into Postgres, pg_durable reduces the surface area for partial-failure bugs and drift. The workflow definition moves into SQL, starting with df.start(...).
Engineers should evaluate whether their current orchestration overhead is a necessity or a symptom of fragmented infrastructure.
Subscribe to The Mansa Report
Strategic intelligence on AI, business building, and the future of technology. Delivered Monday through Friday.