# Technical Deep Dive

BioAgents is an agentic framework for biological research and analysis, built on top of the Eliza v2 platform. This system provides intelligent automation for scientific paper processing, hypothesis generation, and knowledge graph integration.

## 1. Core Architecture

### Framework Foundation

* **Base Platform**: Built on [Eliza v2](https://github.com/elizaOS/eliza) agentic framework
* **Language**: TypeScript with modern ES modules
* **Architecture**: Plugin-based system with modular services
* **Runtime**: Node.js with pnpm package management

### High-Level System Design

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Sources  │    │  Core Services  │    │  Storage Layer  │
│                 │    │                 │    │                 │
│ • Google Drive  │───▶│ • Hypothesis    │───▶│ • PostgreSQL    │
│ • PDF Papers    │    │   Generation    │    │ • Oxigraph      │
│ • SPARQL KG     │    │ • Paper Process │    │ • OriginTrail   │
└─────────────────┘    │ • DKG Interface │    │   DKG           │
                       └─────────────────┘    └─────────────────┘
```

### Key Components

* **Agent Core**: Central orchestration through Eliza runtime (`src/index.ts`)
* **Plugin Manager**: DKG plugin registration and lifecycle management
* **Services**: Shared functionalities for hypothesis generation, document processing, and knowledge graph integration
* **Data Flow**: Scientific papers → Processing → Knowledge extraction → Hypothesis generation → Knowledge graph storage

## 2. Project Structure

```
BioAgents/
├── src/
│   ├── bioagentPlugin/          # Core plugin containing main logic
│   │   ├── actions/             # Available actions (DKG insert)
│   │   ├── routes/              # HTTP endpoints and webhooks
│   │   ├── services/            # Core business logic services
│   │   │   ├── anthropic/       # AI-powered hypothesis generation
│   │   │   ├── gdrive/          # Google Drive integration
│   │   │   └── kaService/       # Knowledge Assembly processing
│   │   ├── constants.ts         # DKG explorer links and config
│   │   ├── templates.ts         # Memory templates for DKG
│   │   └── types.ts            # TypeScript definitions
│   ├── db/
│   │   └── schemas/            # Database schemas with Drizzle ORM
│   ├── config.ts               # Environment configuration management
│   ├── index.ts                # Main entry point
│   └── scholar.ts              # Agent definition
├── drizzle/                    # Database migration files
├── scripts/                    # Utility scripts (JSON-LD processing)
├── sampleJsonLds/              # Sample scientific paper data
├── Dockerfile                  # Production container config
└── docker-compose.yml          # Local development services
```

## 3. The Plugin System

### Plugin Architecture

BioAgents implements a single comprehensive plugin (`dkgPlugin`) that provides:

**Plugin Lifecycle** (`src/bioagentPlugin/index.ts`):

1. **Initialization**: Environment validation and service setup
2. **Service Registration**: Hypothesis generation, Google Drive sync
3. **Route Mounting**: API endpoints for webhooks and manual operations
4. **Action Registration**: DKG insert functionality

### Plugin Components

#### Actions (`src/bioagentPlugin/actions/`)

* **DKG Insert** (`dkgInsert.ts`): Processes scientific papers into Knowledge Assets and publishes to OriginTrail DKG

#### Services (`src/bioagentPlugin/services/`)

* **HypothesisService**: Orchestrates paper processing and hypothesis generation
* **Anthropic Service**: AI-powered research operations using Claude 3.7 Sonnet
* **Google Drive Service**: Document synchronization and monitoring
* **KA Service**: Knowledge Assembly for structured paper processing

#### Routes (`src/bioagentPlugin/routes/`)

* **Health Monitoring**: System status endpoints
* **Google Drive Webhooks**: Real-time file change notifications
* **Manual Sync**: Trigger synchronization operations
* **Agent Metrics**: Performance and status reporting

## 4. Database & Schemas

### Database Setup

* **Client**: PostgreSQL with pgvector extension for embeddings
* **ORM**: Drizzle ORM for type-safe database operations
* **Schema**: `biograph` namespace for all biological research data

### Core Schemas (`src/db/schemas/`)

#### Hypotheses Table (`hypotheses.ts`)

```typescript
{
  id: uuid (primary key)
  hypothesis: text (generated hypothesis content)
  filesUsed: text[] (source scientific papers)
  status: enum (pending, evaluated, published)
  judgellmScore: numeric (AI evaluation score 0-100)
  humanScore: numeric (human validation score)
  research: text (supporting research context)
  evaluation: text (detailed assessment)
  citations: text[] (referenced paper DOIs)
  createdAt: timestamp
  updatedAt: timestamp
}
```

#### File Metadata Table (`fileMetadata.ts`)

* Tracks Google Drive file processing status
* Manages PDF processing pipeline state
* Links files to generated knowledge assets

#### Drive Sync Table (`driveSync.ts`)

* Google Drive API synchronization state
* File change detection and processing queues
* Webhook notification management

## 5. Configuration & Deployment

### Environment Variables (`src/config.ts`)

#### Database Configuration

```typescript
POSTGRES_URL: Required PostgreSQL connection string
PROD_POSTGRES_URL: Production database endpoint
```

#### AI Services

```typescript
OPENAI_API_KEY: Required for OpenAI integration
ANTHROPIC_API_KEY: Required for Claude integration
```

#### Knowledge Graph Integration

```typescript
DKG_ENVIRONMENT: development|testnet|mainnet
DKG_HOSTNAME: OriginTrail node endpoint
DKG_PUBLIC_KEY: DKG wallet public key
DKG_PRIVATE_KEY: DKG wallet private key
DKG_BLOCKCHAIN_NAME: Target blockchain network
```

#### External Services

```typescript
UNSTRUCTURED_API_KEY: Document processing service
BIONTOLOGY_KEY: Biological terminology API
GCP_JSON_CREDENTIALS: Google Cloud service account JSON
GROBID_URL: PDF parsing service (default: localhost:8070)
```

### Local Development Setup

1. **Start Supporting Services**:

```bash
# Oxigraph SPARQL database
docker run --rm -v $PWD/oxigraph:/data -p 7878:7878 \
  ghcr.io/oxigraph/oxigraph serve --location /data --bind 0.0.0.0:7878

# Grobid PDF processing
docker run --rm --init --ulimit core=0 -p 8070:8070 \
  lfoppiano/grobid:0.8.2

# PostgreSQL with vector support
docker run --name BioAgents-db -e POSTGRES_PASSWORD=123 \
  -p 5432:5432 -d ankane/pgvector
```

2. **Database Setup**:

```bash
pnpm db:migrate    # Apply database migrations
pnpm db:studio     # Open database management UI
```

3. **Data Loading**:

```bash
# Load sample scientific papers into knowledge graph
pnpm run script scripts/jsonldToTriple.ts
```

4. **Development Server**:

```bash
pnpm run dev       # Start with hot reload
```

### Production Deployment

* **Container**: Multi-stage Dockerfile with TypeScript compilation
* **Database**: PostgreSQL with pgvector extension
* **Services**: Grobid and Oxigraph as separate containers
* **Environment**: All configuration via environment variables

## 6. Key Dependencies

### Core Framework

* **@elizaos/core**: Agent runtime and plugin system
* **@elizaos/cli**: Command-line interface and development tools
* **@elizaos/plugin-anthropic**: Claude AI integration
* **@elizaos/plugin-discord**: Social platform integration

### Database & Storage

* **drizzle-orm**: Type-safe SQL query builder
* **pg**: PostgreSQL client
* **dkg.js**: OriginTrail DKG integration library

### Document Processing

* **pdf2pic**: PDF to image conversion
* **libxmljs**: XML parsing for TEI documents
* **cheerio**: HTML/XML manipulation
* **jsonld-streaming-parser**: JSON-LD processing

### AI & Language Processing

* **@anthropic-ai/sdk**: Claude API client
* **openai**: OpenAI API integration
* **@instructor-ai/instructor**: Structured output from LLMs

### External Integrations

* **googleapis**: Google Drive API client
* **axios**: HTTP client for external APIs
* **n3**: RDF/SPARQL triple processing
* **zod**: Runtime type validation

## 7. Data Processing Pipelines

### Scientific Paper Processing Pipeline

```
PDF Upload → Google Drive Webhook → File Processing Task →
Grobid Parsing → TEI XML → Structured Extraction →
Knowledge Assembly → JSON-LD Generation → Triple Store →
DKG Publication
```

### Hypothesis Generation Pipeline

```
1. SPARQL Query → Extract Research Findings
2. LLM Selection → Choose 2 Relevant Findings  
3. Context Building → Gather Related Papers & Keywords
4. Claude 3.7 → Generate Structured Hypothesis
5. Evaluation → Score for Novelty & Plausibility
6. Storage → Save to PostgreSQL
7. Publication → Discord/Social Media Integration
```

### Knowledge Graph Integration

```
Scientific Papers → JSON-LD → Local Oxigraph → SPARQL Queries →
Research Context → Hypothesis Generation → DKG Assets →
Persistent Knowledge Graph
```

This architecture enables BioAgents to efficiently process scientific literature, generate novel research hypotheses, and maintain a comprehensive, decentralized knowledge graph of biological research findings.
