Technical Deep Dive

BioAgents is an agentic framework for biological research and analysis, built on top of the Eliza v2 platform. This system provides intelligent automation for scientific paper processing, hypothesis generation, and knowledge graph integration.

1. Core Architecture

Framework Foundation

  • Base Platform: Built on Eliza v2 agentic framework

  • Language: TypeScript with modern ES modules

  • Architecture: Plugin-based system with modular services

  • Runtime: Node.js with pnpm package management

High-Level System Design

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Sources  │    │  Core Services  │    │  Storage Layer  │
│                 │    │                 │    │                 │
│ • Google Drive  │───▶│ • Hypothesis    │───▶│ • PostgreSQL    │
│ • PDF Papers    │    │   Generation    │    │ • Oxigraph      │
│ • SPARQL KG     │    │ • Paper Process │    │ • OriginTrail   │
└─────────────────┘    │ • DKG Interface │    │   DKG           │
                       └─────────────────┘    └─────────────────┘

Key Components

  • Agent Core: Central orchestration through Eliza runtime (src/index.ts)

  • Plugin Manager: DKG plugin registration and lifecycle management

  • Services: Shared functionalities for hypothesis generation, document processing, and knowledge graph integration

  • Data Flow: Scientific papers → Processing → Knowledge extraction → Hypothesis generation → Knowledge graph storage

2. Project Structure

BioAgents/
├── src/
│   ├── bioagentPlugin/          # Core plugin containing main logic
│   │   ├── actions/             # Available actions (DKG insert)
│   │   ├── routes/              # HTTP endpoints and webhooks
│   │   ├── services/            # Core business logic services
│   │   │   ├── anthropic/       # AI-powered hypothesis generation
│   │   │   ├── gdrive/          # Google Drive integration
│   │   │   └── kaService/       # Knowledge Assembly processing
│   │   ├── constants.ts         # DKG explorer links and config
│   │   ├── templates.ts         # Memory templates for DKG
│   │   └── types.ts            # TypeScript definitions
│   ├── db/
│   │   └── schemas/            # Database schemas with Drizzle ORM
│   ├── config.ts               # Environment configuration management
│   ├── index.ts                # Main entry point
│   └── scholar.ts              # Agent definition
├── drizzle/                    # Database migration files
├── scripts/                    # Utility scripts (JSON-LD processing)
├── sampleJsonLds/              # Sample scientific paper data
├── Dockerfile                  # Production container config
└── docker-compose.yml          # Local development services

3. The Plugin System

Plugin Architecture

BioAgents implements a single comprehensive plugin (dkgPlugin) that provides:

Plugin Lifecycle (src/bioagentPlugin/index.ts):

  1. Initialization: Environment validation and service setup

  2. Service Registration: Hypothesis generation, Google Drive sync

  3. Route Mounting: API endpoints for webhooks and manual operations

  4. Action Registration: DKG insert functionality

Plugin Components

Actions (src/bioagentPlugin/actions/)

  • DKG Insert (dkgInsert.ts): Processes scientific papers into Knowledge Assets and publishes to OriginTrail DKG

Services (src/bioagentPlugin/services/)

  • HypothesisService: Orchestrates paper processing and hypothesis generation

  • Anthropic Service: AI-powered research operations using Claude 3.7 Sonnet

  • Google Drive Service: Document synchronization and monitoring

  • KA Service: Knowledge Assembly for structured paper processing

Routes (src/bioagentPlugin/routes/)

  • Health Monitoring: System status endpoints

  • Google Drive Webhooks: Real-time file change notifications

  • Manual Sync: Trigger synchronization operations

  • Agent Metrics: Performance and status reporting

4. Database & Schemas

Database Setup

  • Client: PostgreSQL with pgvector extension for embeddings

  • ORM: Drizzle ORM for type-safe database operations

  • Schema: biograph namespace for all biological research data

Core Schemas (src/db/schemas/)

Hypotheses Table (hypotheses.ts)

{
  id: uuid (primary key)
  hypothesis: text (generated hypothesis content)
  filesUsed: text[] (source scientific papers)
  status: enum (pending, evaluated, published)
  judgellmScore: numeric (AI evaluation score 0-100)
  humanScore: numeric (human validation score)
  research: text (supporting research context)
  evaluation: text (detailed assessment)
  citations: text[] (referenced paper DOIs)
  createdAt: timestamp
  updatedAt: timestamp
}

File Metadata Table (fileMetadata.ts)

  • Tracks Google Drive file processing status

  • Manages PDF processing pipeline state

  • Links files to generated knowledge assets

Drive Sync Table (driveSync.ts)

  • Google Drive API synchronization state

  • File change detection and processing queues

  • Webhook notification management

5. Configuration & Deployment

Environment Variables (src/config.ts)

Database Configuration

POSTGRES_URL: Required PostgreSQL connection string
PROD_POSTGRES_URL: Production database endpoint

AI Services

OPENAI_API_KEY: Required for OpenAI integration
ANTHROPIC_API_KEY: Required for Claude integration

Knowledge Graph Integration

DKG_ENVIRONMENT: development|testnet|mainnet
DKG_HOSTNAME: OriginTrail node endpoint
DKG_PUBLIC_KEY: DKG wallet public key
DKG_PRIVATE_KEY: DKG wallet private key
DKG_BLOCKCHAIN_NAME: Target blockchain network

External Services

UNSTRUCTURED_API_KEY: Document processing service
BIONTOLOGY_KEY: Biological terminology API
GCP_JSON_CREDENTIALS: Google Cloud service account JSON
GROBID_URL: PDF parsing service (default: localhost:8070)

Local Development Setup

  1. Start Supporting Services:

# Oxigraph SPARQL database
docker run --rm -v $PWD/oxigraph:/data -p 7878:7878 \
  ghcr.io/oxigraph/oxigraph serve --location /data --bind 0.0.0.0:7878

# Grobid PDF processing
docker run --rm --init --ulimit core=0 -p 8070:8070 \
  lfoppiano/grobid:0.8.2

# PostgreSQL with vector support
docker run --name BioAgents-db -e POSTGRES_PASSWORD=123 \
  -p 5432:5432 -d ankane/pgvector
  1. Database Setup:

pnpm db:migrate    # Apply database migrations
pnpm db:studio     # Open database management UI
  1. Data Loading:

# Load sample scientific papers into knowledge graph
pnpm run script scripts/jsonldToTriple.ts
  1. Development Server:

pnpm run dev       # Start with hot reload

Production Deployment

  • Container: Multi-stage Dockerfile with TypeScript compilation

  • Database: PostgreSQL with pgvector extension

  • Services: Grobid and Oxigraph as separate containers

  • Environment: All configuration via environment variables

6. Key Dependencies

Core Framework

  • @elizaos/core: Agent runtime and plugin system

  • @elizaos/cli: Command-line interface and development tools

  • @elizaos/plugin-anthropic: Claude AI integration

  • @elizaos/plugin-discord: Social platform integration

Database & Storage

  • drizzle-orm: Type-safe SQL query builder

  • pg: PostgreSQL client

  • dkg.js: OriginTrail DKG integration library

Document Processing

  • pdf2pic: PDF to image conversion

  • libxmljs: XML parsing for TEI documents

  • cheerio: HTML/XML manipulation

  • jsonld-streaming-parser: JSON-LD processing

AI & Language Processing

  • @anthropic-ai/sdk: Claude API client

  • openai: OpenAI API integration

  • @instructor-ai/instructor: Structured output from LLMs

External Integrations

  • googleapis: Google Drive API client

  • axios: HTTP client for external APIs

  • n3: RDF/SPARQL triple processing

  • zod: Runtime type validation

7. Data Processing Pipelines

Scientific Paper Processing Pipeline

PDF Upload → Google Drive Webhook → File Processing Task →
Grobid Parsing → TEI XML → Structured Extraction →
Knowledge Assembly → JSON-LD Generation → Triple Store →
DKG Publication

Hypothesis Generation Pipeline

1. SPARQL Query → Extract Research Findings
2. LLM Selection → Choose 2 Relevant Findings  
3. Context Building → Gather Related Papers & Keywords
4. Claude 3.7 → Generate Structured Hypothesis
5. Evaluation → Score for Novelty & Plausibility
6. Storage → Save to PostgreSQL
7. Publication → Discord/Social Media Integration

Knowledge Graph Integration

Scientific Papers → JSON-LD → Local Oxigraph → SPARQL Queries →
Research Context → Hypothesis Generation → DKG Assets →
Persistent Knowledge Graph

This architecture enables BioAgents to efficiently process scientific literature, generate novel research hypotheses, and maintain a comprehensive, decentralized knowledge graph of biological research findings.

Last updated