Large-Scale Search Platform | Oleg Kozlov

Hybrid semantic (vector + keyword + geo) search on tens of millions of users

TL;DR: We rebuilt user search for a social networking app from the ground up—combining vector embeddings with traditional text, attribute, and geo search (“hybrid search”), running on a horizontally scalable ElasticSearch cluster in Kubernetes on GCP. The result: 50 ms vector lookups, 6–7 parallel queries merged in the app layer, and peak throughput up to 50k QPS.

Context

User search is mission-critical in social products: if people can’t reliably find each other, engagement stalls. We inherited an early-stage implementation that struggled with relevance, latency spikes, and limited context (e.g., partial/full name queries often surfaced irrelevant results). We needed a platform that could blend keyword, attribute, vector, and geo signals; scale to large traffic bursts; and evolve quickly as new data sources and models arrived.

Objectives

We set the following goals:

Relevance: unify signals across names, demographics, education/work, location, relationships, interaction history, and semantic interests (embeddings).
Performance: maintain low latency during spikes with tens of thousands of concurrent users.
Extensibility: support new data sources and AI models for embeddings without risky rewrites.
Scale: serve a ~30 M user base and hundreds of millions of relationships.

Why ElasticSearch

ElasticSearch gave us, in one platform:

Rich text + keyword search with precise tokenization and fuzzy matching (typos, partials).
Geo search by lat/long, essential for “near me” discovery across neighboring cities.
Vector storage + ANN search for semantic matching, with controls to balance recall/latency (e.g., k, num_candidates).
Horizontal scale via shards/replicas and straightforward operational patterns on Kubernetes.

Architecture Overview

Platform: GCP
Runtime: TypeScript/Node.js microservices packaged as containers, deployed to Kubernetes
Key services: ElasticSearch, Pub/Sub, Cloud Functions, Cloud Storage, CloudSQL (Postgres), Dataflow (Apache Beam runner)

At a high level:

App layer executes 6–7 parallel ES queries across specialized indices, joins results, and computes a unified ranking.
Data pipelines handle high-volume backfills and transactional upserts from source systems to ES.
Vector/semantic services generate embeddings and insert them into ES dense vectors for ANN search.

Indexing Strategy: Multi-Index Over Single “Mega-Index”

We compared two designs:

Single mega-index: simpler application logic and a single ES score, but unwieldy queries and poor scalability under heavy load.
Multiple specialized indices: simpler schemas/queries per concern (users, relationships, vectors), independent versioning, and better horizontal scaling—at the cost of doing joins/ranking in the app layer.

We chose the multi-index approach. It let us scale each concern independently and avoid slow aggregations. Running parallel queries per concern and merging in the Node.js layer was consistently faster and more resilient than pushing everything into one query.

Relevance & Ranking

We composed relevance from four signal families:

Keywords & attributes: name (full/partial), age, gender, etc.
Geo proximity: Elastic’s gauss decay boosted candidates closer to a target point, with a cutoff to avoid overscoring distant users.
Semantic vectors: embeddings from profiles, interests, and (where appropriate) interactions.
Social graph & interactions: profile views, common connections, and other relationship signals.

The gauss function gave nearby users higher scores while tapering influence beyond a few hundred miles—crucial for real-world discovery across adjacent cities.

Data Ingestion Workflows

Backfill at Scale

For the initial load (~30 M users and hundreds of millions of relationship edges), we exported data from our warehouse to GCS as TSV and used the Cloud Storage → ElasticSearch Dataflow template. With a small UDF to map file columns to ES fields, we achieved near turnkey ingestion and completed even multi-billion-row backfills in hours thanks to Dataflow’s elastic scaling. Peak sustained ingest reached ~100k docs/sec.

Transactional Inserts/Updates

For ongoing changes, producers published events to Pub/Sub. Dataflow and Cloud Functions consumers performed upserts into the relevant ES indices. This event-driven pattern absorbed 100k+ concurrent user bursts (signups, profile edits, searches) cost-effectively.

Semantic Search & Embeddings

We generated embeddings in the app layer and wrote them into ES dense_vector fields. After evaluating multiple open-source and proprietary models, we chose an open-source model running inside our Node.js workers to balance accuracy, latency, cost, and rate-limit independence.

At query time, we used ES ANN search (approximate KNN), tuning k (nearest neighbors) and num_candidates to trade off relevance vs. latency per use case. Production vectors were 2048-dimensional, with ~500 M vectors across ~30 M user documents in the primary index.

Scaling & Operations

We deployed ES on Kubernetes with autoscaling policies that matched diurnal traffic. ES shards/replicas provided parallelism and resilience, and the cluster rebalanced automatically as we added nodes.

At peak scale we ran:

40 data nodes, each with 32 vCPU, 64 GB RAM, 2.5 TB SSD
Largest index: ~150 B documents
Vectors: ~500 M (dim 2048) across ~30 M users
Backfill ingest: up to 100k docs/sec
Search volume: up to 50k QPS peak, ~20k QPS sustained

Outcomes

Relevance improved materially via hybrid ranking (keyword + geo + semantic + graph).
Latency remained low during marketing-driven spikes because parallel, per-concern queries scaled independently.
Extensibility: we could add/iterate embedding models and relationship features without destabilizing core search.

Lessons We’d Apply Again

Keep indices focused. Multi-index plus app-side fusion beats one mega-query for both scale and iteration speed.
Tune vectors pragmatically. k and num_candidates are reliable levers; measure per-segment SLAs and adjust.
Invest early in pipelines. Backfill + upsert paths (Dataflow + Pub/Sub) de-risk launch and simplify growth.
Use geo decay thoughtfully. gauss with a sensible offset/scale avoids accidentally crowding out non-local but relevant users.

Conclusion

We turned a fragile, slow, and context-blind search into a hybrid semantic discovery platform that scales with the product. By combining specialized indices, parallel querying, ANN vectors, and geo decay, and by leaning on GCP’s managed data services, we delivered a system that is faster under load, richer in relevance, and easier to evolve—without exposing vendor rate limits or locking ourselves into inflexible schemas.

Appendix: Technology Snapshot

ElasticSearch for keyword, geo, and vector (ANN) search
Kubernetes for orchestration and autoscaling
Node.js (TypeScript) services for query orchestration and result fusion
GCP: Pub/Sub (events), Dataflow (backfill & streaming upserts), Cloud Functions (lightweight consumers), Cloud Storage (staging), CloudSQL/Postgres (system of record)

Keywords for discoverability: ElasticSearch, hybrid search, semantic search, vector search, ANN, KNN, embeddings, dense_vector, Kubernetes, GCP, Pub/Sub, Dataflow, social network search, recommendation groundwork, geo search, gauss decay, multi-index strategy.

ElasticSearch at Scale for Social Discovery