Technical Deep Dive
BioAgents is an agentic framework for biological research and analysis, built on top of the Eliza v2 platform. This system provides intelligent automation for scientific paper processing, hypothesis generation, and knowledge graph integration.
1. Core Architecture
Framework Foundation
Base Platform: Built on Eliza v2 agentic framework
Language: TypeScript with modern ES modules
Architecture: Plugin-based system with modular services
Runtime: Node.js with pnpm package management
High-Level System Design
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Data Sources │ │ Core Services │ │ Storage Layer │
│ │ │ │ │ │
│ • Google Drive │───▶│ • Hypothesis │───▶│ • PostgreSQL │
│ • PDF Papers │ │ Generation │ │ • Oxigraph │
│ • SPARQL KG │ │ • Paper Process │ │ • OriginTrail │
└─────────────────┘ │ • DKG Interface │ │ DKG │
└─────────────────┘ └─────────────────┘
Key Components
Agent Core: Central orchestration through Eliza runtime (
src/index.ts
)Plugin Manager: DKG plugin registration and lifecycle management
Services: Shared functionalities for hypothesis generation, document processing, and knowledge graph integration
Data Flow: Scientific papers → Processing → Knowledge extraction → Hypothesis generation → Knowledge graph storage
2. Project Structure
BioAgents/
├── src/
│ ├── bioagentPlugin/ # Core plugin containing main logic
│ │ ├── actions/ # Available actions (DKG insert)
│ │ ├── routes/ # HTTP endpoints and webhooks
│ │ ├── services/ # Core business logic services
│ │ │ ├── anthropic/ # AI-powered hypothesis generation
│ │ │ ├── gdrive/ # Google Drive integration
│ │ │ └── kaService/ # Knowledge Assembly processing
│ │ ├── constants.ts # DKG explorer links and config
│ │ ├── templates.ts # Memory templates for DKG
│ │ └── types.ts # TypeScript definitions
│ ├── db/
│ │ └── schemas/ # Database schemas with Drizzle ORM
│ ├── config.ts # Environment configuration management
│ ├── index.ts # Main entry point
│ └── scholar.ts # Agent definition
├── drizzle/ # Database migration files
├── scripts/ # Utility scripts (JSON-LD processing)
├── sampleJsonLds/ # Sample scientific paper data
├── Dockerfile # Production container config
└── docker-compose.yml # Local development services
3. The Plugin System
Plugin Architecture
BioAgents implements a single comprehensive plugin (dkgPlugin
) that provides:
Plugin Lifecycle (src/bioagentPlugin/index.ts
):
Initialization: Environment validation and service setup
Service Registration: Hypothesis generation, Google Drive sync
Route Mounting: API endpoints for webhooks and manual operations
Action Registration: DKG insert functionality
Plugin Components
Actions (src/bioagentPlugin/actions/
)
src/bioagentPlugin/actions/
)DKG Insert (
dkgInsert.ts
): Processes scientific papers into Knowledge Assets and publishes to OriginTrail DKG
Services (src/bioagentPlugin/services/
)
src/bioagentPlugin/services/
)HypothesisService: Orchestrates paper processing and hypothesis generation
Anthropic Service: AI-powered research operations using Claude 3.7 Sonnet
Google Drive Service: Document synchronization and monitoring
KA Service: Knowledge Assembly for structured paper processing
Routes (src/bioagentPlugin/routes/
)
src/bioagentPlugin/routes/
)Health Monitoring: System status endpoints
Google Drive Webhooks: Real-time file change notifications
Manual Sync: Trigger synchronization operations
Agent Metrics: Performance and status reporting
4. Database & Schemas
Database Setup
Client: PostgreSQL with pgvector extension for embeddings
ORM: Drizzle ORM for type-safe database operations
Schema:
biograph
namespace for all biological research data
Core Schemas (src/db/schemas/
)
src/db/schemas/
)Hypotheses Table (hypotheses.ts
)
hypotheses.ts
){
id: uuid (primary key)
hypothesis: text (generated hypothesis content)
filesUsed: text[] (source scientific papers)
status: enum (pending, evaluated, published)
judgellmScore: numeric (AI evaluation score 0-100)
humanScore: numeric (human validation score)
research: text (supporting research context)
evaluation: text (detailed assessment)
citations: text[] (referenced paper DOIs)
createdAt: timestamp
updatedAt: timestamp
}
File Metadata Table (fileMetadata.ts
)
fileMetadata.ts
)Tracks Google Drive file processing status
Manages PDF processing pipeline state
Links files to generated knowledge assets
Drive Sync Table (driveSync.ts
)
driveSync.ts
)Google Drive API synchronization state
File change detection and processing queues
Webhook notification management
5. Configuration & Deployment
Environment Variables (src/config.ts
)
src/config.ts
)Database Configuration
POSTGRES_URL: Required PostgreSQL connection string
PROD_POSTGRES_URL: Production database endpoint
AI Services
OPENAI_API_KEY: Required for OpenAI integration
ANTHROPIC_API_KEY: Required for Claude integration
Knowledge Graph Integration
DKG_ENVIRONMENT: development|testnet|mainnet
DKG_HOSTNAME: OriginTrail node endpoint
DKG_PUBLIC_KEY: DKG wallet public key
DKG_PRIVATE_KEY: DKG wallet private key
DKG_BLOCKCHAIN_NAME: Target blockchain network
External Services
UNSTRUCTURED_API_KEY: Document processing service
BIONTOLOGY_KEY: Biological terminology API
GCP_JSON_CREDENTIALS: Google Cloud service account JSON
GROBID_URL: PDF parsing service (default: localhost:8070)
Local Development Setup
Start Supporting Services:
# Oxigraph SPARQL database
docker run --rm -v $PWD/oxigraph:/data -p 7878:7878 \
ghcr.io/oxigraph/oxigraph serve --location /data --bind 0.0.0.0:7878
# Grobid PDF processing
docker run --rm --init --ulimit core=0 -p 8070:8070 \
lfoppiano/grobid:0.8.2
# PostgreSQL with vector support
docker run --name BioAgents-db -e POSTGRES_PASSWORD=123 \
-p 5432:5432 -d ankane/pgvector
Database Setup:
pnpm db:migrate # Apply database migrations
pnpm db:studio # Open database management UI
Data Loading:
# Load sample scientific papers into knowledge graph
pnpm run script scripts/jsonldToTriple.ts
Development Server:
pnpm run dev # Start with hot reload
Production Deployment
Container: Multi-stage Dockerfile with TypeScript compilation
Database: PostgreSQL with pgvector extension
Services: Grobid and Oxigraph as separate containers
Environment: All configuration via environment variables
6. Key Dependencies
Core Framework
@elizaos/core: Agent runtime and plugin system
@elizaos/cli: Command-line interface and development tools
@elizaos/plugin-anthropic: Claude AI integration
@elizaos/plugin-discord: Social platform integration
Database & Storage
drizzle-orm: Type-safe SQL query builder
pg: PostgreSQL client
dkg.js: OriginTrail DKG integration library
Document Processing
pdf2pic: PDF to image conversion
libxmljs: XML parsing for TEI documents
cheerio: HTML/XML manipulation
jsonld-streaming-parser: JSON-LD processing
AI & Language Processing
@anthropic-ai/sdk: Claude API client
openai: OpenAI API integration
@instructor-ai/instructor: Structured output from LLMs
External Integrations
googleapis: Google Drive API client
axios: HTTP client for external APIs
n3: RDF/SPARQL triple processing
zod: Runtime type validation
7. Data Processing Pipelines
Scientific Paper Processing Pipeline
PDF Upload → Google Drive Webhook → File Processing Task →
Grobid Parsing → TEI XML → Structured Extraction →
Knowledge Assembly → JSON-LD Generation → Triple Store →
DKG Publication
Hypothesis Generation Pipeline
1. SPARQL Query → Extract Research Findings
2. LLM Selection → Choose 2 Relevant Findings
3. Context Building → Gather Related Papers & Keywords
4. Claude 3.7 → Generate Structured Hypothesis
5. Evaluation → Score for Novelty & Plausibility
6. Storage → Save to PostgreSQL
7. Publication → Discord/Social Media Integration
Knowledge Graph Integration
Scientific Papers → JSON-LD → Local Oxigraph → SPARQL Queries →
Research Context → Hypothesis Generation → DKG Assets →
Persistent Knowledge Graph
This architecture enables BioAgents to efficiently process scientific literature, generate novel research hypotheses, and maintain a comprehensive, decentralized knowledge graph of biological research findings.
Last updated