Video Streaming Service | Oleg Kozlov

A sixth-month project to build a live-feed video streaming and video-on-demand (VOD) platform for a popular social app with 30+ million registered users.

Project Overview

A couple of years ago I was involved in a pretty ambitious project, which aimed to add video streaming capability for a popular consumer social app, which already enjoyed 100k+ DAUs. The feature was to be an amalgamation of TikTok Live video feed, Instagram Reels,, and YouTube Shorts.

The system would enable creators and influencers to stream live video to tens of thousands of concurrent viewers, while automatically producing a searchable, shareable clip library (similar to Instagram Reels) from those live sessions. Alongside the end-user features, we built a data and analytics layer to measure creator performance, content quality, and engagement—informing programming, growth, and monetization decisions.

I ended up being the technical lead responsible for overall architecture design and delivery of this video streaming platform with short-form “clips” experience , coordinating a remote team in Brazil (frontend and backend engineers) and stakeholders in California.

We combined AWS media primitives for low-latency ingest, playback, and VOD with a cloud-native, event-driven backend spanning across AWS on GCP to achieve scale, resilience, and fast iteration.

Business Use Case

The project’s goal was to add live video streaming and VOD (Video-On-Demand) capabilities to a popular social networking mobile app. A venture-backed consumer social startup wanted to add on creator-generated content with the goal of driving additional traffic to the app and monetizing the content via in-app purchases, donations, and paid premium features..

Business Need

The app needed a differentiated, high-engagement media surface:

Live streaming to drive real-time community engagement and creator–fan interaction.
Short-form clips to extend the shelf life of live content, fuel discovery, and power cross-platform growth.
Actionable analytics so the content team could double down on what works (creators, topics, formats) and improve programming cadence.

Goals

Engagement: Increase daily active users (DAU) and session length via live shows and snackable clips.
Reliability: Deliver stable streams for tens of thousands of concurrent viewers with sub-second to low-seconds startup latency.
Speed of iteration: Ship features daily without compromising reliability.
Efficiency: Keep infrastructure and encoding/transcode costs under control while scaling.
Safety: Chat moderation using AI-assisted tools and custom workflows.

Solution

Approach

We took a cloud-native, event-driven approach:

Video streaming service: Leverage managed ingest and low-latency playback to avoid building/operating our own media pipelines end-to-end. The core video streaming service we picked was Amazon Interactive Video Service (IVS). There were a couple of major drivers for this decision: 1) scalable, battle-tested solution; 2) ability to embed custom data into video streams.
Clips pipeline: Automatically segment and render highlights from live streams using serverless/batch workers and ffmpeg, then index clips for in-app discovery. There again, AWS Media Services provided ready-made solutions for clip cutting, format/resolution conversions, and storage.
Even driven architecture: Use durable events to fan out work (clipping, moderation, notifications, analytics) while isolating failure domains.
Observability & SLOs: Define clear service SLOs (availability, playback start time, end-to-end clip time), set up tracing/metrics/logging, and run blameless incident reviews.
Analytics platform: live video and clip viewership analytics collected initially via in-house custom event collection API, and later via BitMovin client-side plugins and analytics platform with exports to a Snowflake data warehouse. Custom Grafana dashboards for near realtime analytics data visualization.
Distributed delivery: Bridge time zones with Brazil (core development) + California (product/infra), using clear ownership, RFCs, and a fast review cadence.

How We Arrived at the Solution

We evaluated “build vs. buy” across the media stack. Building our own ingest/packaging/transcoding pipeline (WebRTC SFU + custom HLS) would maximize control, but would slow our time-to-market and expand our on-call surface. Managed live streaming kept us focused on product velocity (features and UX) while still giving us the hooks we needed for clips, moderation, and analytics. On the backend, we chose GCP-native serverless and streaming components that our team already knew well, letting us iterate quickly with clear cost and scale characteristics.

Tech Stack (Representative)

Media & Processing:
- AWS IVS for low-latency live ingest & playback (managed, auto-scaling).
- AWS MediaConvert for VOD and multi-rendition outputs.
- ffmpeg for server-side clip extraction, normalization, and thumbnail sprites.
- Web broadcasting studio (proprietary) used by creators to start/stop streams, manage scenes/overlays, and coordinate programming.
Backend & Data (cloud-native, event-driven):
- GCP Pub/Sub for event fan-out (stream start/stop, segment finalized, clip created, moderation verdict, etc.).
- Cloud Tasks for idempotent background jobs with retries/backoff (clip rendering, asset promotion).
- Cloud Spanner (with Change Streams) as the system of record for creators, streams, clips, and program schedules.
- Dataflow (Streaming) for near-real-time analytics pipelines (engagement metrics, creator performance, trend signals).
- Cloud Functions / Cloud Run / Kubernetes for stateless services and workers.
- Redis for ephemeral counters, rate limiting, and hot caches.
- Postgres/SQL for auxiliary services where relational semantics and ease of use were a fit.
Application Services:
- TypeScript/Node.js microservices (APIs, clip workers, orchestration).
- Java services for heavier throughput tasks and long-running workers.
- AI integrations for chat filtering, ranking, and generating suggested Q&A prompts for hosts.
- Mobile app clients consuming IVS playback endpoints; in-app clip feed with infinite scroll.

Architecture Overview

High-Level Flow (Live → Clips → Analytics)

Creator broadcasts via our web studio → AWS IVS handles ingest and low-latency playback.
Playback is distributed via IVS endpoints to tens of thousands of viewers in the mobile app.
Segment events (e.g., HLS segment finalized) and stream lifecycle events are published as domain events.
Clip pipeline listens on Pub/Sub, triggers ffmpeg jobs to extract highlights (rules + manual picks), stores assets, and updates Spanner.
MediaConvert creates VOD renditions for full shows; assets are promoted to the in-app library.
Analytics pipeline (Dataflow) aggregates engagement signals (concurrency, watch time, retention, CTR, shares) and writes to the analytics store for dashboards and creator reporting.
AI services score chat for safety and generate suggested prompts/questions for the host to keep engagement high.

Considered Alternatives

Agora / Mux for live + VOD: strong developer experience; we stayed with IVS for built-in Twitch-grade live reliability and predictable latency.
Custom WebRTC SFU (Janus/mediasoup/LiveKit): maximum control and sub-second latency, but significantly higher ops complexity and on-call burden.
Single-cloud architecture: simpler billing/ops, but we already had deep GCP expertise and tooling for data and events; cross-cloud used the best tool for each layer.

Challenges (and What We Did)

Latency & startup time: Tuned player settings and renditions, minimized playlist depth, and pre-warmed critical paths. Measured “time-to-first-frame” as an SLO.
Clip quality at scale: Standardized on sane defaults (keyframe alignment, normalized audio, aspect-ratio presets) and added human-in-the-loop approvals for marquee clips.
Traffic spikes: Used Pub/Sub backpressure, idempotent Cloud Tasks, and circuit breakers to degrade gracefully without losing data.
Cross-cloud coordination: Clear contracts between media (AWS) and data/events (GCP); health checks and alarms on handoff boundaries; synthetic canary streams.
AI moderation & prompt quality: Combined fast heuristic filters with model-based scoring; enforced rate limits and added A/B tests for AI-suggested prompts.
Team/time-zone execution: RFCs for design; “follow-the-sun” handoffs; code-owners for hot paths; golden paths and starter templates to keep velocity high.

Scaling

Concurrency: IVS auto-scales for ingest/playback; our APIs scaled horizontally behind a gateway with stateless services on Cloud Run/Kubernetes.
Data: Spanner provided transactional consistency for stream/clip metadata with Change Streams to fan out updates without polling.
Workers: Clip processing horizontally scaled as queue depth grew; all jobs were idempotent and checkpointed.
Analytics: Streaming pipelines aggregated per-minute and per-segment metrics to keep dashboards near real-time without overwhelming the warehouse.
Cost controls: Right-sized renditions, consolidated ffmpeg passes, TTLs on intermediate assets, reserved/preemptible capacity for non-urgent batch, and event-driven execution to avoid idle compute.

Conclusion

How the Goals Were Achieved

Engagement: Live shows + a high-quality clips feed increased session length and repeat visits; creators received actionable insights to tune content.
Reliability: Managed live ingest/playback plus strong SLOs, pre-production canaries, and observability reduced incidents and accelerated recovery.
Iteration speed: The event-driven, serverless-first backend let us ship weekly without large coordination costs.
Safety: AI-assisted moderation made chat manageable at scale while keeping latency acceptable.

Cost Impact

Efficient media pipeline: Leveraged managed services where it mattered (live) and optimized our own processing (clips) where we could control costs.
Pay-for-use compute: Serverless/event-driven services kept baseline spend low and scaled with demand.
Storage hygiene: Lifecycle policies for cold assets and aggressive cleanup of intermediates reduced ongoing costs.

Efficiency Gains

Operational load: Clear boundaries, idempotent workers, and robust retries decreased on-call toil.
Developer velocity: Templates, code owners, and automated checks cut PR cycle time; RFCs improved architectural consistency across Brazil/California teams.
Data-driven decisions: Near real-time analytics enabled rapid programming and growth experiments.

Overall Business Impact

Launched a sticky media surface that deepened community engagement and expanded top-of-funnel reach via shareable clips.
Empowered creators with insights, improving content quality and retention.
Delivered a scalable foundation for future monetization (sponsorships, tipping, shoppable video), all while maintaining a lean infra footprint and a fast product cadence.

Live Video Streaming Service for a Consumer Social App