Production Architecture

Vidra's production architecture is a standard multi-tier layout: a CDN/edge layer absorbs static traffic, a load balancer distributes API requests across horizontally-scaled application instances, and separate tiers handle data, storage, and background transcoding. Each tier scales independently.

Architecture Diagram

The diagram shows how traffic enters at the CDN edge and fans out across the application, data, storage, and worker tiers. Federated peers bypass the CDN and connect directly through the load balancer.

Deployment Options

Single Server (Development / Small Scale)

Everything runs on one machine via Docker Compose. Suitable for up to approximately 100 concurrent viewers. No high-availability or failover — use this for staging environments or low-traffic instances.

Kubernetes (Production)

Vidra includes Kubernetes manifests in k8s/ and Terraform configurations in terraform/ for cloud deployment:

k8s/
  deployment.yaml          # API server pods
  service.yaml             # Service definitions
  ingress.yaml             # Ingress rules
  configmap.yaml           # Configuration
  secrets.yaml             # Sensitive config
  hpa.yaml                 # Horizontal pod autoscaler

terraform/
  main.tf                  # Infrastructure definition
  variables.tf             # Configurable parameters
  outputs.tf               # Deployment outputs

Scaling the application tier

The hpa.yaml manifest configures a Horizontal Pod Autoscaler. Set CPU/memory thresholds based on your typical upload and transcoding load — transcoding is CPU-intensive and will drive scale-out faster than API request volume alone.

Recommended Production Setup

Component	Recommended	Minimum
API Servers	3+ instances behind load balancer	1 instance
PostgreSQL	Managed (RDS, Cloud SQL) with read replica	Single instance
Redis	Managed (ElastiCache, Memorystore)	Single instance
Storage	S3 / MinIO cluster + CDN	Local filesystem
FFmpeg Workers	Dedicated instances, GPU-accelerated where possible	Same machine as API server
IPFS	Dedicated node(s)	Optional — omit if not using P2P distribution

Why separate FFmpeg workers?

Transcoding is CPU/GPU-bound and can saturate a machine for minutes per video. Running workers on dedicated instances prevents transcoding jobs from affecting API response latency for other users.

Architecture Diagram​

Deployment Options​

Single Server (Development / Small Scale)​

Kubernetes (Production)​

Recommended Production Setup​

Architecture Diagram

Deployment Options

Single Server (Development / Small Scale)

Kubernetes (Production)

Recommended Production Setup