Technical Deep Dive

BioAgents is an agentic framework for biological research and analysis, built on top of the Eliza v2 platform. This system provides intelligent automation for scientific paper processing, hypothesis generation, and knowledge graph integration.

1. Core Architecture

Framework Foundation

Base Platform: Built on Eliza v2 agentic framework
Language: TypeScript with modern ES modules
Architecture: Plugin-based system with modular services
Runtime: Node.js with pnpm package management

High-Level System Design

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Sources  │    │  Core Services  │    │  Storage Layer  │
│                 │    │                 │    │                 │
│ • Google Drive  │───▶│ • Hypothesis    │───▶│ • PostgreSQL    │
│ • PDF Papers    │    │   Generation    │    │ • Oxigraph      │
│ • SPARQL KG     │    │ • Paper Process │    │ • OriginTrail   │
└─────────────────┘    │ • DKG Interface │    │   DKG           │
                       └─────────────────┘    └─────────────────┘

Key Components

Agent Core: Central orchestration through Eliza runtime (src/index.ts)
Plugin Manager: DKG plugin registration and lifecycle management
Services: Shared functionalities for hypothesis generation, document processing, and knowledge graph integration
Data Flow: Scientific papers → Processing → Knowledge extraction → Hypothesis generation → Knowledge graph storage

2. Project Structure

BioAgents/
├── src/
│   ├── bioagentPlugin/          # Core plugin containing main logic
│   │   ├── actions/             # Available actions (DKG insert)
│   │   ├── routes/              # HTTP endpoints and webhooks
│   │   ├── services/            # Core business logic services
│   │   │   ├── anthropic/       # AI-powered hypothesis generation
│   │   │   ├── gdrive/          # Google Drive integration
│   │   │   └── kaService/       # Knowledge Assembly processing
│   │   ├── constants.ts         # DKG explorer links and config
│   │   ├── templates.ts         # Memory templates for DKG
│   │   └── types.ts            # TypeScript definitions
│   ├── db/
│   │   └── schemas/            # Database schemas with Drizzle ORM
│   ├── config.ts               # Environment configuration management
│   ├── index.ts                # Main entry point
│   └── scholar.ts              # Agent definition
├── drizzle/                    # Database migration files
├── scripts/                    # Utility scripts (JSON-LD processing)
├── sampleJsonLds/              # Sample scientific paper data
├── Dockerfile                  # Production container config
└── docker-compose.yml          # Local development services

3. The Plugin System

Plugin Architecture

BioAgents implements a single comprehensive plugin (dkgPlugin) that provides:

Plugin Lifecycle (src/bioagentPlugin/index.ts):

Initialization: Environment validation and service setup
Service Registration: Hypothesis generation, Google Drive sync
Route Mounting: API endpoints for webhooks and manual operations
Action Registration: DKG insert functionality

Plugin Components

Actions (`src/bioagentPlugin/actions/`)

DKG Insert (dkgInsert.ts): Processes scientific papers into Knowledge Assets and publishes to OriginTrail DKG

Services (`src/bioagentPlugin/services/`)

HypothesisService: Orchestrates paper processing and hypothesis generation
Anthropic Service: AI-powered research operations using Claude 3.7 Sonnet
Google Drive Service: Document synchronization and monitoring
KA Service: Knowledge Assembly for structured paper processing

Routes (`src/bioagentPlugin/routes/`)

Health Monitoring: System status endpoints
Google Drive Webhooks: Real-time file change notifications
Manual Sync: Trigger synchronization operations
Agent Metrics: Performance and status reporting

4. Database & Schemas

Database Setup

Client: PostgreSQL with pgvector extension for embeddings
ORM: Drizzle ORM for type-safe database operations
Schema: biograph namespace for all biological research data

Core Schemas (`src/db/schemas/`)

Hypotheses Table (`hypotheses.ts`)

{
  id: uuid (primary key)
  hypothesis: text (generated hypothesis content)
  filesUsed: text[] (source scientific papers)
  status: enum (pending, evaluated, published)
  judgellmScore: numeric (AI evaluation score 0-100)
  humanScore: numeric (human validation score)
  research: text (supporting research context)
  evaluation: text (detailed assessment)
  citations: text[] (referenced paper DOIs)
  createdAt: timestamp
  updatedAt: timestamp
}

File Metadata Table (`fileMetadata.ts`)

Tracks Google Drive file processing status
Manages PDF processing pipeline state
Links files to generated knowledge assets

Drive Sync Table (`driveSync.ts`)

Google Drive API synchronization state
File change detection and processing queues
Webhook notification management

5. Configuration & Deployment

Environment Variables (`src/config.ts`)

Database Configuration

POSTGRES_URL: Required PostgreSQL connection string
PROD_POSTGRES_URL: Production database endpoint

AI Services

OPENAI_API_KEY: Required for OpenAI integration
ANTHROPIC_API_KEY: Required for Claude integration

Knowledge Graph Integration

DKG_ENVIRONMENT: development|testnet|mainnet
DKG_HOSTNAME: OriginTrail node endpoint
DKG_PUBLIC_KEY: DKG wallet public key
DKG_PRIVATE_KEY: DKG wallet private key
DKG_BLOCKCHAIN_NAME: Target blockchain network

External Services

UNSTRUCTURED_API_KEY: Document processing service
BIONTOLOGY_KEY: Biological terminology API
GCP_JSON_CREDENTIALS: Google Cloud service account JSON
GROBID_URL: PDF parsing service (default: localhost:8070)

Local Development Setup

Start Supporting Services:

# Oxigraph SPARQL database
docker run --rm -v $PWD/oxigraph:/data -p 7878:7878 \
  ghcr.io/oxigraph/oxigraph serve --location /data --bind 0.0.0.0:7878

# Grobid PDF processing
docker run --rm --init --ulimit core=0 -p 8070:8070 \
  lfoppiano/grobid:0.8.2

# PostgreSQL with vector support
docker run --name BioAgents-db -e POSTGRES_PASSWORD=123 \
  -p 5432:5432 -d ankane/pgvector

Database Setup:

pnpm db:migrate    # Apply database migrations
pnpm db:studio     # Open database management UI

Data Loading:

# Load sample scientific papers into knowledge graph
pnpm run script scripts/jsonldToTriple.ts

Development Server:

pnpm run dev       # Start with hot reload

Production Deployment

Container: Multi-stage Dockerfile with TypeScript compilation
Database: PostgreSQL with pgvector extension
Services: Grobid and Oxigraph as separate containers
Environment: All configuration via environment variables

6. Key Dependencies

Core Framework

@elizaos/core: Agent runtime and plugin system
@elizaos/cli: Command-line interface and development tools
@elizaos/plugin-anthropic: Claude AI integration
@elizaos/plugin-discord: Social platform integration

Database & Storage

drizzle-orm: Type-safe SQL query builder
pg: PostgreSQL client
dkg.js: OriginTrail DKG integration library

Document Processing

pdf2pic: PDF to image conversion
libxmljs: XML parsing for TEI documents
cheerio: HTML/XML manipulation
jsonld-streaming-parser: JSON-LD processing

AI & Language Processing

@anthropic-ai/sdk: Claude API client
openai: OpenAI API integration
@instructor-ai/instructor: Structured output from LLMs

External Integrations

googleapis: Google Drive API client
axios: HTTP client for external APIs
n3: RDF/SPARQL triple processing
zod: Runtime type validation

7. Data Processing Pipelines

Scientific Paper Processing Pipeline

PDF Upload → Google Drive Webhook → File Processing Task →
Grobid Parsing → TEI XML → Structured Extraction →
Knowledge Assembly → JSON-LD Generation → Triple Store →
DKG Publication

Hypothesis Generation Pipeline

1. SPARQL Query → Extract Research Findings
2. LLM Selection → Choose 2 Relevant Findings  
3. Context Building → Gather Related Papers & Keywords
4. Claude 3.7 → Generate Structured Hypothesis
5. Evaluation → Score for Novelty & Plausibility
6. Storage → Save to PostgreSQL
7. Publication → Discord/Social Media Integration

Knowledge Graph Integration

Scientific Papers → JSON-LD → Local Oxigraph → SPARQL Queries →
Research Context → Hypothesis Generation → DKG Assets →
Persistent Knowledge Graph

This architecture enables BioAgents to efficiently process scientific literature, generate novel research hypotheses, and maintain a comprehensive, decentralized knowledge graph of biological research findings.

PreviousThe Agentic Science Map NextExample BioAgent: Aubrai

Last updated 3 months ago

1. Core Architecture

Framework Foundation

High-Level System Design

Key Components

2. Project Structure

3. The Plugin System

Plugin Architecture

Plugin Components

Actions (src/bioagentPlugin/actions/)

Services (src/bioagentPlugin/services/)

Routes (src/bioagentPlugin/routes/)

4. Database & Schemas

Database Setup

Core Schemas (src/db/schemas/)

Hypotheses Table (hypotheses.ts)

File Metadata Table (fileMetadata.ts)

Drive Sync Table (driveSync.ts)

5. Configuration & Deployment

Environment Variables (src/config.ts)

Database Configuration

AI Services

Knowledge Graph Integration

External Services

Local Development Setup

Production Deployment

6. Key Dependencies

Core Framework

Database & Storage

Document Processing

AI & Language Processing

External Integrations

7. Data Processing Pipelines

Scientific Paper Processing Pipeline

Hypothesis Generation Pipeline

Knowledge Graph Integration

Actions (`src/bioagentPlugin/actions/`)

Services (`src/bioagentPlugin/services/`)

Routes (`src/bioagentPlugin/routes/`)

Core Schemas (`src/db/schemas/`)

Hypotheses Table (`hypotheses.ts`)

File Metadata Table (`fileMetadata.ts`)

Drive Sync Table (`driveSync.ts`)

Environment Variables (`src/config.ts`)