High-Level System Components
π
Data Ingestion Layer
Bybit Exchange Integration (live trading)
Deribit DVOL (volatility data)
3+ years historical OHLCV (2022-2025)
4-hour bar resolution for institutional-grade analysis
π€
ML/RL Intelligence
Transfer Learning Models (BTC, ETH, SOL)
4-Tier RL Position Sizing
Regime Detection (4-state HMM)
Weekly automated updates with validation gates
π‘οΈ
Risk Management
HRAA v2 Algorithm (hierarchical allocation)
Circuit Breaker (3-state FSM)
Position Limits (per-instrument)
Kelly Criterion Baseline (regime-adaptive)
β‘
Event-Driven Core
NautilusTrader Framework
MessageBus Architecture (sub-ms routing)
Real-time Order Management
Portfolio tracking with tick-level precision
πΎ
Data Storage
PostgreSQL 15 + TimescaleDB 2.14
Redis 7.2 (feature cache, pub/sub)
MinIO (ML artifacts, models)
MLflow (model registry, experiments)
π
Monitoring Stack
Prometheus 2.48 (413+ metrics)
Grafana 10.2 (real-time dashboards)
Loki 2.9 (log aggregation)
30-day retention for forensic analysis
graph TB
subgraph "Data Sources"
BYBIT[Bybit Exchange
Live Trading]
DERIBIT[Deribit
DVOL Volatility]
HISTORICAL[Historical Data
2022-2025 OHLCV]
end
subgraph "Trade-Matrix Core Platform"
subgraph "Intelligence Layer"
ML[Transfer Learning Models
BTC/ETH/SOL]
RL[RL Position Sizing
4-Tier Fallback]
REGIME[Regime Detection
4-State HMM]
end
subgraph "Trading Engine"
MSGBUS[MessageBus
Event Router]
RISK[Risk Engine
HRAA v2 + Circuit Breaker]
EXEC[Execution Engine
Order Management]
PORTFOLIO[Portfolio Engine
Position Tracking]
end
subgraph "Data Layer"
REDIS[(Redis 7.2
Cache & Pub/Sub)]
POSTGRES[(PostgreSQL + TimescaleDB
Time Series Data)]
MINIO[(MinIO
ML Artifacts)]
MLFLOW[(MLflow
Model Registry)]
end
end
subgraph "Infrastructure"
K3S[K3S Cluster
Production Orchestration]
GITHUB[GitHub Actions
CI/CD Pipeline]
PROMETHEUS[Prometheus + Grafana
Monitoring Stack]
end
BYBIT -->|WebSocket| MSGBUS
DERIBIT -->|API| ML
HISTORICAL -->|Batch| ML
MSGBUS --> ML
ML --> RL
RL --> RISK
RISK --> EXEC
EXEC --> PORTFOLIO
MSGBUS <--> REDIS
ML <--> MLFLOW
PORTFOLIO --> POSTGRES
MLFLOW <--> MINIO
EXEC -->|Orders| BYBIT
GITHUB -->|Deploy| K3S
K3S -->|Runs| MSGBUS
PROMETHEUS -->|Monitor| K3S
style ML fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000
style RL fill:#00ff88,stroke:#000,stroke-width:2px,color:#000
style RISK fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000
style MSGBUS fill:#ff6b6b,stroke:#000,stroke-width:3px,color:#fff
Complete System Architecture
graph TB
subgraph "External Data Sources"
BYBIT_EX[Bybit Exchange
WebSocket + REST]
DERIBIT_EX[Deribit Exchange
DVOL API]
HISTORICAL_S3[Historical Data Store
MinIO S3-Compatible]
end
subgraph "NautilusTrader Core :8080"
MSGBUS[MessageBus
Central Event Hub]
subgraph "Core Engines"
DATAENGINE[DataEngine
Market Data Router]
RISKENGINE[RiskEngine
HRAA v2 + Circuit Breaker]
EXECENGINE[ExecutionEngine
Smart Order Router]
PORTFOLIO[PortfolioEngine
Position & PnL Tracker]
end
subgraph "Data Layer"
CACHE[Cache
In-Memory State]
CATALOG[DataCatalog
Historical Access]
end
subgraph "Trading Logic"
STRATEGIES[ML-Driven Strategies
Signal + Position Sizing]
end
end
subgraph "ML/RL Services"
subgraph "Real-time Inference"
ML_INFERENCE[Unified Signal Generator
Sub-5ms Latency]
RL_AGENT[RL Position Sizer
4-Tier Fallback]
REGIME_DETECT[Regime Detector
4-State HMM]
end
subgraph "Training Pipeline"
TL_TRAINER[TL Model Trainer
Weekly Automation]
RL_TRAINER[RL Agent Trainer
Curriculum Learning]
FEATURE_ENG[Feature Engineer
Boruta Selection]
end
end
subgraph "Storage Layer"
REDIS[(Redis 7.2
Cache + Streams)]
POSTGRES[(PostgreSQL 15
+ TimescaleDB 2.14)]
MINIO[(MinIO
ML Models + Artifacts)]
MLFLOW_DB[(MLflow
Experiment Tracking)]
end
subgraph "Monitoring & Observability"
PROMETHEUS[Prometheus 2.48
413+ Metrics]
GRAFANA[Grafana 10.2
Dashboards]
LOKI[Loki 2.9
Log Aggregation]
end
subgraph "Deployment Infrastructure"
K3S[K3S Cluster
Production Orchestration]
GHCR[GitHub Container Registry
Model Artifacts 319MB]
DROPLET[Droplet Private Registry
Base Image 6.24GB]
GITHUB_ACTIONS[GitHub Actions
CI/CD $0/month]
end
%% Real-time Data Flow (Green)
BYBIT_EX ==>|WebSocket| DATAENGINE
DATAENGINE ==>|Events| MSGBUS
MSGBUS ==>|Route| CACHE
CACHE ==>|Features| ML_INFERENCE
ML_INFERENCE ==>|Signals| RL_AGENT
RL_AGENT ==>|Position Size| STRATEGIES
STRATEGIES ==>|Orders| MSGBUS
MSGBUS ==>|Validate| RISKENGINE
RISKENGINE ==>|Approved| EXECENGINE
EXECENGINE ==>|Execute| BYBIT_EX
BYBIT_EX ==>|Fills| PORTFOLIO
%% Batch Training Flow (Red)
HISTORICAL_S3 -.->|OHLCV| FEATURE_ENG
DERIBIT_EX -.->|DVOL| FEATURE_ENG
FEATURE_ENG -.->|Dataset| TL_TRAINER
TL_TRAINER -.->|Models| MLFLOW_DB
MLFLOW_DB -.->|Deploy| ML_INFERENCE
FEATURE_ENG -.->|Env State| RL_TRAINER
RL_TRAINER -.->|Policy| MLFLOW_DB
MLFLOW_DB -.->|Load| RL_AGENT
%% Storage Connections
CACHE <-.->|Snapshot| REDIS
PORTFOLIO -.->|Persist| POSTGRES
CATALOG <-.->|Historical| MINIO
TL_TRAINER -.->|Artifacts| MINIO
%% Monitoring Connections (Purple)
MSGBUS -.->|Metrics| PROMETHEUS
RISKENGINE -.->|Alerts| PROMETHEUS
ML_INFERENCE -.->|Latency| PROMETHEUS
PROMETHEUS -.->|Query| GRAFANA
K3S -.->|Logs| LOKI
%% Deployment Flow (Yellow)
GITHUB_ACTIONS -->|Build Models| GHCR
GITHUB_ACTIONS -->|Build Base| DROPLET
GHCR -->|Pull 319MB| K3S
DROPLET -->|Pull 6.24GB| K3S
K3S -->|Orchestrate| MSGBUS
style MSGBUS fill:#ff6b6b,stroke:#000,stroke-width:3px,color:#fff
style ML_INFERENCE fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000
style RL_AGENT fill:#00ff88,stroke:#000,stroke-width:2px,color:#000
style RISKENGINE fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000
NautilusTrader Core Components
- MessageBus: Event-driven architecture enabling sub-millisecond routing between components. Handles 10,000+ events/second with zero message loss.
- DataEngine: Normalizes market data from multiple sources into unified format. Supports tick-by-tick precision for high-frequency strategies.
- RiskEngine: Implements HRAA v2 with per-instrument position limits, portfolio-level VaR constraints, and circuit breaker integration. Rejects orders in <100ΞΌs.
- ExecutionEngine: Smart order router with TWAP/VWAP algorithms, iceberg orders, and post-only execution. Tracks order lifecycle from submission to fill.
- PortfolioEngine: Real-time position tracking with mark-to-market PnL updates. Calculates Sharpe ratio, maximum drawdown, and other performance metrics on-the-fly.
ML/RL Services
- Unified Signal Generator: Ensemble of 3 TL models (BTC, ETH, SOL) with 4-tier resilient loading. Sub-5ms inference via feature caching and optimized sklearn pipelines.
- RL Position Sizer: Reinforcement Learning agent trained via curriculum learning. 4-tier fallback: FULL_RL β BLENDED (50/50 with Kelly) β PURE_KELLY β EMERGENCY_FLAT (0% on circuit breaker OPEN).
- Regime Detector: 4-state Hidden Markov Model with Markov-Switching GARCH. Classifies market as Bear/Neutral/Bull/Crisis. Kelly fractions: 25%/50%/67%/17% respectively.
- TL Model Trainer: Automated weekly training pipeline with Walk-Forward Validation (40 windows, 200-bar purge gap). Boruta feature selection locks 9-11 features per instrument to prevent overfitting.
- RL Agent Trainer: Proximal Policy Optimization (PPO) with curriculum learning. Trains in 45 minutes (vs 120 minutes without curriculum). Environment: Bybit 4H bars, transaction cost model, slippage simulation.
Storage Systems
- Redis 7.2: Feature cache (TTL 1 hour), pub/sub for ML signals, session persistence. Supports 100K+ ops/sec with <1ms latency.
- PostgreSQL 15 + TimescaleDB 2.14: Time-series storage for OHLCV bars, ML predictions, portfolio snapshots. Hypertable compression achieves 10:1 ratio after 7 days.
- MinIO: S3-compatible object store for ML models (200-500MB per model), training datasets (2-5GB), and backtest results. Organized by instrument and version.
- MLflow: Model registry with lifecycle management (Staging β Production), experiment tracking (1,000+ runs), and artifact versioning. Tag-based promotion workflow.
Monitoring Stack
- Prometheus 2.48: Collects 413+ time series metrics (71 base families Γ instrument/strategy/status labels). Retention: 30 days. Scrape interval: 15 seconds.
- Grafana 10.2: 8 dashboards (system health, trading performance, ML metrics, RL diagnostics, risk overview, deployment status, cost tracking, error analysis). Auto-refresh: 5 seconds.
- Loki 2.9: Log aggregation with 30-day retention. Indexes: service, level, instrument, strategy. Query performance: <1s for 10M log lines via LogQL.
Hybrid Deployment Architecture - $0/Month Cost Optimization
Cost Innovation: Trade-Matrix achieves $0/month infrastructure cost by leveraging GitHub PRO's generous free tier limits combined with intelligent container splitting. Comparable institutional trading platforms spend $500-5,000/month on AWS/GCP.
π¦
Droplet Private Registry
Base Image: 6.24GB (one-time build)
Contents: Python 3.12, dependencies, vendored NautilusTrader
Update Frequency: Only on dependency changes (~monthly)
Bandwidth: Minimal (cached on K3S nodes)
π―
GitHub Container Registry
Model Artifacts: 319MB (weekly updates)
Contents: TL models, RL policies, feature configs
Update Frequency: Every Sunday (automated)
Bandwidth: 1.3GB/month (within PRO 50GB limit)
βοΈ
GitHub Actions CI/CD
Weekly Pipeline: 73 minutes (training + deployment)
Compute Minutes: ~300/month (within PRO 3,000 limit)
Automation: 15-step validation pipeline
Zero Human Intervention
βΈοΈ
K3S Production Cluster
Orchestration: Lightweight Kubernetes (K3S 1.28)
Auto-scaling: Horizontal pod autoscaling
Health Checks: Liveness + readiness probes
Zero-Downtime: Rolling updates (max surge 1)
sequenceDiagram
participant DEV as Developer/PM
participant GITHUB as GitHub Actions
participant GHCR as GitHub Container Registry
participant DROPLET as Droplet Private Registry
participant K3S as K3S Production Cluster
participant TRADE as Trading System
Note over DEV,TRADE: Weekly Model Update Workflow (Every Sunday)
DEV->>GITHUB: git push (trigger weekly pipeline)
rect rgb(0, 50, 100)
Note over GITHUB: Phase 1: Training (65 min)
GITHUB->>GITHUB: Fetch data from Bybit
GITHUB->>GITHUB: Feature engineering (Boruta)
GITHUB->>GITHUB: Train TL models (3 instruments)
GITHUB->>GITHUB: Train RL agents (curriculum)
GITHUB->>GITHUB: Validate (IC β₯ 0.05, Sharpe > 0.5)
end
rect rgb(0, 100, 50)
Note over GITHUB: Phase 2: Package Models (3 min)
GITHUB->>GITHUB: Export MLflow artifacts
GITHUB->>GITHUB: Build model container (319MB)
GITHUB->>GHCR: Push to GHCR (within free tier)
end
rect rgb(100, 50, 0)
Note over K3S: Phase 3: Deployment (5 min)
K3S->>GHCR: Pull new model image (319MB)
K3S->>DROPLET: Reuse cached base (6.24GB, no pull)
K3S->>K3S: Rolling update (zero downtime)
K3S->>TRADE: Deploy new trading pods
TRADE->>TRADE: Health checks pass
K3S->>TRADE: Route traffic to new pods
K3S->>K3S: Terminate old pods
end
TRADE-->>DEV: Deployment complete notification
DEV->>K3S: Verify metrics (Grafana)
Note over DEV,TRADE: Total Time: 73 minutes | Cost: $0
Trade-Matrix (GitHub PRO Optimization)
- Compute: $0/month (300 mins/month Γ· 3,000 free mins = 10% utilization)
- Container Storage: $0/month (1.5GB Γ· 50GB free = 3% utilization)
- Bandwidth: $0/month (1.3GB Γ· 50GB free = 2.6% utilization)
- Base Registry: Self-hosted Droplet (one-time setup)
- Total: $0/month
Traditional AWS Deployment (Comparable Setup)
- EC2 Compute: t3.large (2 vCPU, 8GB RAM) Γ 2 = $120/month
- EKS Cluster: Control plane = $73/month
- ECR Storage: 10GB containers = $1/month
- S3 + RDS: Storage + backups = $80/month
- Data Transfer: 100GB/month = $9/month
- CloudWatch: Monitoring + logs = $30/month
- Total: $313/month ($3,756/year)
Traditional GCP Deployment (Comparable Setup)
- GCE Compute: n1-standard-2 Γ 2 = $100/month
- GKE Cluster: Control plane = $73/month
- Container Registry: 10GB = $2/month
- Cloud Storage + SQL: = $90/month
- Network Egress: 100GB/month = $12/month
- Stackdriver: Monitoring + logs = $40/month
- Total: $317/month ($3,804/year)
Annual Savings
$3,500-4,000/year saved
Cost savings equivalent to 1-2 months of junior developer salary, reinvested into strategy research
Scalability Note: While current deployment achieves $0/month cost, the architecture is designed to seamlessly scale to paid cloud infrastructure (AWS/GCP/Azure) if trading volume requires additional compute. The hybrid container strategy (large base + small models) remains optimal for bandwidth efficiency at any scale.