GPT-Rosalind Enters the Lab

May 6, 2026 · By Mansa Muhammad

A new reasoning model has been introduced to accelerate work in biology and drug discovery, fields where the timeline from initial discovery to regulatory approval in the United States spans 10 to 15 years. This model, named GPT-Rosalind after the scientist Rosalind Franklin whose work was foundational to understanding DNA, is not a general-purpose system but one designed specifically for research in life sciences. (Source)

GPT-Rosalind is being positioned as a frontier model for research across biology, drug discovery, and translational medicine. Its performance is being measured against both its predecessor and human benchmarks. On the LABBench2 benchmark, GPT-Rosalind outperforms GPT-5.4 on 6 out of 11 tasks. In separate evaluations, model submissions ranked above the 95th percentile of human experts on a prediction task and the 84th percentile on a generation task. This capability is being made available as a research preview in ChatGPT, Codex, and the API for qualified customers. To ground the model in existing workflows, a new Life Sciences research plugin for Codex connects it to over 50 scientific tools and data sources. Early customers applying GPT-Rosalind to their own research and discovery work include Amgen, Moderna, and the Allen Institute.

The deployment of a specialized model like GPT-Rosalind signals a deliberate strategy. The fact that it outperforms the more generally designated GPT-5.4 on 6 out of 11 tasks on the LABBench2 benchmark indicates that progress is not being measured by general capability alone. Instead, the focus is shifting toward domain-specific reasoning where the metrics for success are tied to concrete scientific workflows. This is not about building a bigger model, but a sharper one. The performance against human experts—achieving the 95th and 84th percentiles on specific tasks—further reframes the model's utility away from abstract intelligence and toward high-level, task-specific assistance.

The immediate application of GPT-Rosalind by Amgen, Moderna, and the Allen Institute moves the evaluation from the benchmark to the laboratory. The model's value proposition will now be tested against the complex realities of their internal research and discovery pipelines. Its success will not be defined by its score on LABBench2, but by its ability to generate efficiencies or novel insights within these organizations. The integration with over 50 scientific tools via the Codex plugin is critical here. It suggests the model is intended to function not as a siloed knowledge base, but as an active participant in the research process, capable of manipulating the same tools and data sources as its human counterparts.

The open question is what "accelerate" will mean in practice. The 10 to 15 year timeline for drug approval in the United States is a function of many factors beyond initial discovery. While GPT-Rosalind is aimed at the earliest stages, its ultimate impact will be measured by whether efficiencies gained by Amgen or Moderna translate into a meaningful compression of that decade-plus cycle. The model's performance is established; its effect on the economics and timeline of drug discovery is now under observation.

Artificial Intelligence

Subscribe to The Mansa Report

Strategic intelligence on AI, business building, and the future of technology. Delivered Monday through Friday.