Production Architecture
Vidra's production architecture is a standard multi-tier layout: a CDN/edge layer absorbs static traffic, a load balancer distributes API requests across horizontally-scaled application instances, and separate tiers handle data, storage, and background transcoding. Each tier scales independently.
Architecture Diagram
The diagram shows how traffic enters at the CDN edge and fans out across the application, data, storage, and worker tiers. Federated peers bypass the CDN and connect directly through the load balancer.
Deployment Options
Single Server (Development / Small Scale)
Everything runs on one machine via Docker Compose. Suitable for up to approximately 100 concurrent viewers. No high-availability or failover — use this for staging environments or low-traffic instances.
Kubernetes (Production)
Vidra includes Kubernetes manifests in k8s/ and Terraform configurations in terraform/ for cloud deployment:
k8s/
deployment.yaml # API server pods
service.yaml # Service definitions
ingress.yaml # Ingress rules
configmap.yaml # Configuration
secrets.yaml # Sensitive config
hpa.yaml # Horizontal pod autoscaler
terraform/
main.tf # Infrastructure definition
variables.tf # Configurable parameters
outputs.tf # Deployment outputs
The hpa.yaml manifest configures a Horizontal Pod Autoscaler. Set CPU/memory thresholds based on your typical upload and transcoding load — transcoding is CPU-intensive and will drive scale-out faster than API request volume alone.
Recommended Production Setup
| Component | Recommended | Minimum |
|---|---|---|
| API Servers | 3+ instances behind load balancer | 1 instance |
| PostgreSQL | Managed (RDS, Cloud SQL) with read replica | Single instance |
| Redis | Managed (ElastiCache, Memorystore) | Single instance |
| Storage | S3 / MinIO cluster + CDN | Local filesystem |
| FFmpeg Workers | Dedicated instances, GPU-accelerated where possible | Same machine as API server |
| IPFS | Dedicated node(s) | Optional — omit if not using P2P distribution |
Transcoding is CPU/GPU-bound and can saturate a machine for minutes per video. Running workers on dedicated instances prevents transcoding jobs from affecting API response latency for other users.