The Agentic Science Map
A Cambrian explosion of Scientific AI is unfolding, as once-separate components of scientific discovery become interoperable, API-accessible, and agent-operable. To navigate this fast-moving frontier, we can organize the landscape into five key layers. This map provides an overview of the tools and platforms that make up the modern autonomous science stack, from foundational data sources to scientist-facing applications.
Understanding this ecosystem is key to understanding the role of BioAgents, which are designed to operate across these layers, bridging gaps and accelerating discovery.
1. Knowledge & Data Sources
At the base are the vast knowledge and data repositories that AI systems feed on. Biology and chemistry have an explosion of publicly available data, from millions of papers on PubMed to the millions of chemical compounds in databases like ChEMBL and PubChem. These databases are the raw ingredients for AI-driven science.
Key Examples:
PubMed – The central biomedical literature database.
Protein Data Bank (PDB) – Repository of 3D structures for proteins and nucleic acids.
UniProt – A comprehensive protein sequence and functional annotation database.
ChEMBL – Curated bioactive drug-like molecules and targets.
bioRxiv – Preprint server for biology.
2. Foundation & Domain Models
On top of the data layer sit the foundation models and domain-specific AI that can interpret it. These include general-purpose LLMs (like GPT-4) and specialized models for biology and chemistry, such as AlphaFold 2 for protein structure prediction. These models serve as the workhorses of the ecosystem, transforming raw data into actionable insights.
Key Examples:
AlphaFold 2 (DeepMind) – AI for protein 3D structure prediction.
RFdiffusion (Baker Lab) – A generative diffusion model that creates novel protein designs.
ESM-3 (Meta) – A large language model trained on protein sequences for structure & function prediction.
General LLM backbones – GPT-4, Claude 3, etc., which provide broad reasoning and coding skills.
3. Agentic Orchestration Layer
This layer contains the AI agents and frameworks that coordinate tools, reason about problems, and drive workflows. These "AI scientists" glue together models, data sources, and execution platforms. This layer includes frameworks for running multi-agent systems and protocols for decentralized compute.
Key Examples:
ChemCrow – A GPT-4 chemistry agent that uses ~18 expert tools for synthesis and analysis.
Model Context Protocol (MCP) – An open interface that allows agents to query biomedical databases (genes, pathways, literature) in a standardized way.
elizaOS – A Web3-native agent operating system.
Prime Intellect – A decentralized protocol for pooling compute to train models.
4. Experiment & Simulation Execution
This layer executes experiments and simulations, closing the loop of scientific discovery. It includes both physical lab automation and virtual simulation engines. By automating lab protocols and simulations, this layer makes the feedback loop between AI planning and experimental results fast and scalable.
Key Examples:
Opentrons – Open-source liquid-handling robots programmable via Python API.
Cloud Labs (Emerald Cloud Lab, Strateos) – Platforms that allow researchers to run bio and chem assays on demand from a laptop.
INDRA – A framework that assembles causal and dynamical models from literature and database statements.
Cloud Molecular Simulation (OpenMM) – Platforms for protein folding and dynamics simulations.
5. Scientist-Facing Platforms & Assistants
The top layer consists of the interfaces and AI assistants that scientists use directly. These products often package many of the underlying tools into user-friendly applications, chatbots, or other intuitive interfaces.
Key Examples:
Consensus & Elicit – AI-powered research assistants that synthesize answers from the literature.
BioAgents – On-chain scientific agents equipped with web3 features like decentralized storage and verifiable knowledge graphs.
Iris.ai & Scite – AI platforms for semantic paper search and evidence mapping.
The Future: A Connected Ecosystem
The true power of scientific AI will be unlocked through network effects. Despite rapid progress, most of these tools remain fragmented. By introducing decentralized identity, on-chain data provenance, and smart contract-based incentives, we can bridge these islands and enable transparent, tamper-resistant coordination. This creates a global biology engine where humans and machines can co-create knowledge in an open and scalable manner—the core vision behind the Bio Protocol.
Last updated