Skip to main content

Production Architecture

Vidra's production architecture is a standard multi-tier layout: a CDN/edge layer absorbs static traffic, a load balancer distributes API requests across horizontally-scaled application instances, and separate tiers handle data, storage, and background transcoding. Each tier scales independently.

Architecture Diagram

The diagram shows how traffic enters at the CDN edge and fans out across the application, data, storage, and worker tiers. Federated peers bypass the CDN and connect directly through the load balancer.

Deployment Options

Single Server (Development / Small Scale)

Everything runs on one machine via Docker Compose. Suitable for up to approximately 100 concurrent viewers. No high-availability or failover — use this for staging environments or low-traffic instances.

Kubernetes (Production)

Vidra includes Kubernetes manifests in k8s/ and Terraform configurations in terraform/ for cloud deployment:

k8s/
deployment.yaml # API server pods
service.yaml # Service definitions
ingress.yaml # Ingress rules
configmap.yaml # Configuration
secrets.yaml # Sensitive config
hpa.yaml # Horizontal pod autoscaler

terraform/
main.tf # Infrastructure definition
variables.tf # Configurable parameters
outputs.tf # Deployment outputs
Scaling the application tier

The hpa.yaml manifest configures a Horizontal Pod Autoscaler. Set CPU/memory thresholds based on your typical upload and transcoding load — transcoding is CPU-intensive and will drive scale-out faster than API request volume alone.

ComponentRecommendedMinimum
API Servers3+ instances behind load balancer1 instance
PostgreSQLManaged (RDS, Cloud SQL) with read replicaSingle instance
RedisManaged (ElastiCache, Memorystore)Single instance
StorageS3 / MinIO cluster + CDNLocal filesystem
FFmpeg WorkersDedicated instances, GPU-accelerated where possibleSame machine as API server
IPFSDedicated node(s)Optional — omit if not using P2P distribution
Why separate FFmpeg workers?

Transcoding is CPU/GPU-bound and can saturate a machine for minutes per video. Running workers on dedicated instances prevents transcoding jobs from affecting API response latency for other users.