Technical Deep Dive

BioAgents is an agentic framework for biological research and analysis, built on top of the Eliza v2 platform. This system provides intelligent automation for scientific paper processing, hypothesis generation, and knowledge graph integration.

1. Core Architecture

Framework Foundation

  • Base Platform: Built on Eliza v2 agentic framework

  • Language: TypeScript with modern ES modules

  • Architecture: Plugin-based system with modular services

  • Runtime: Node.js with pnpm package management

High-Level System Design

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Sources  │    │  Core Services  │    │  Storage Layer  │
│                 │    │                 │    │                 │
│ • Google Drive  │───▶│ • Hypothesis    │───▶│ • PostgreSQL    │
│ • PDF Papers    │    │   Generation    │    │ • Oxigraph      │
│ • SPARQL KG     │    │ • Paper Process │    │ • OriginTrail   │
└─────────────────┘    │ • DKG Interface │    │   DKG           │
                       └─────────────────┘    └─────────────────┘

Key Components

  • Agent Core: Central orchestration through Eliza runtime (src/index.ts)

  • Plugin Manager: DKG plugin registration and lifecycle management

  • Services: Shared functionalities for hypothesis generation, document processing, and knowledge graph integration

  • Data Flow: Scientific papers → Processing → Knowledge extraction → Hypothesis generation → Knowledge graph storage

2. Project Structure

3. The Plugin System

Plugin Architecture

BioAgents implements a single comprehensive plugin (dkgPlugin) that provides:

Plugin Lifecycle (src/bioagentPlugin/index.ts):

  1. Initialization: Environment validation and service setup

  2. Service Registration: Hypothesis generation, Google Drive sync

  3. Route Mounting: API endpoints for webhooks and manual operations

  4. Action Registration: DKG insert functionality

Plugin Components

Actions (src/bioagentPlugin/actions/)

  • DKG Insert (dkgInsert.ts): Processes scientific papers into Knowledge Assets and publishes to OriginTrail DKG

Services (src/bioagentPlugin/services/)

  • HypothesisService: Orchestrates paper processing and hypothesis generation

  • Anthropic Service: AI-powered research operations using Claude 3.7 Sonnet

  • Google Drive Service: Document synchronization and monitoring

  • KA Service: Knowledge Assembly for structured paper processing

Routes (src/bioagentPlugin/routes/)

  • Health Monitoring: System status endpoints

  • Google Drive Webhooks: Real-time file change notifications

  • Manual Sync: Trigger synchronization operations

  • Agent Metrics: Performance and status reporting

4. Database & Schemas

Database Setup

  • Client: PostgreSQL with pgvector extension for embeddings

  • ORM: Drizzle ORM for type-safe database operations

  • Schema: biograph namespace for all biological research data

Core Schemas (src/db/schemas/)

Hypotheses Table (hypotheses.ts)

File Metadata Table (fileMetadata.ts)

  • Tracks Google Drive file processing status

  • Manages PDF processing pipeline state

  • Links files to generated knowledge assets

Drive Sync Table (driveSync.ts)

  • Google Drive API synchronization state

  • File change detection and processing queues

  • Webhook notification management

5. Configuration & Deployment

Environment Variables (src/config.ts)

Database Configuration

AI Services

Knowledge Graph Integration

External Services

Local Development Setup

  1. Start Supporting Services:

  1. Database Setup:

  1. Data Loading:

  1. Development Server:

Production Deployment

  • Container: Multi-stage Dockerfile with TypeScript compilation

  • Database: PostgreSQL with pgvector extension

  • Services: Grobid and Oxigraph as separate containers

  • Environment: All configuration via environment variables

6. Key Dependencies

Core Framework

  • @elizaos/core: Agent runtime and plugin system

  • @elizaos/cli: Command-line interface and development tools

  • @elizaos/plugin-anthropic: Claude AI integration

  • @elizaos/plugin-discord: Social platform integration

Database & Storage

  • drizzle-orm: Type-safe SQL query builder

  • pg: PostgreSQL client

  • dkg.js: OriginTrail DKG integration library

Document Processing

  • pdf2pic: PDF to image conversion

  • libxmljs: XML parsing for TEI documents

  • cheerio: HTML/XML manipulation

  • jsonld-streaming-parser: JSON-LD processing

AI & Language Processing

  • @anthropic-ai/sdk: Claude API client

  • openai: OpenAI API integration

  • @instructor-ai/instructor: Structured output from LLMs

External Integrations

  • googleapis: Google Drive API client

  • axios: HTTP client for external APIs

  • n3: RDF/SPARQL triple processing

  • zod: Runtime type validation

7. Data Processing Pipelines

Scientific Paper Processing Pipeline

Hypothesis Generation Pipeline

Knowledge Graph Integration

This architecture enables BioAgents to efficiently process scientific literature, generate novel research hypotheses, and maintain a comprehensive, decentralized knowledge graph of biological research findings.

Last updated