When PostgreSQL isn't enough: Moving to specialized datastores

Writing

PostgreSQL is one of the most capable databases ever built. It handles relational data, JSON documents, full-text search, time-series data, and geospatial queries — often well enough that you never need anything else. "Use Postgres for everything" is genuinely good advice for most projects.

But there are workloads where "well enough" isn't enough. When your analytics queries are competing with your OLTP workload for I/O. When your search latency is creeping up as your data grows. When you need sub-millisecond key lookups that don't justify a disk round-trip. This is when you start looking at specialized datastores — and when you need to be careful.

The case for Redis

Redis is the easiest addition to justify. It's in-memory, so reads are fast (sub-millisecond for simple key lookups). It has a well-understood operational model and is supported by every cloud provider. The use cases are clear: session storage, rate limiting, real-time leaderboards, pub/sub messaging, and caching.

The trap is using Redis as a primary datastore. Its persistence model (RDB snapshots or AOF logs) is not equivalent to PostgreSQL's durability guarantees. If you need the data to survive a hard restart with zero loss, Redis requires careful configuration — and even then, it's optimized for performance, not durability.

Think of Redis as an acceleration layer on top of Postgres, not a replacement for it. The canonical pattern: write to Postgres first, then update Redis. Reads go to Redis when possible, fall back to Postgres on a miss. The source of truth never changes.

ClickHouse for analytics

When your analytics queries — "show me all orders by region this quarter, grouped by SKU" — start taking seconds on a table with 50 million rows, that's not a query optimization problem. That's an architecture problem. OLTP databases like PostgreSQL store data row-by-row; analytics queries that scan many rows and aggregate columns are fundamentally inefficient on that layout.

ClickHouse is a columnar database designed for exactly this workload. It stores data column-by-column, which means an aggregation over a single column reads only that column from disk — not every field of every row. A query that takes 8 seconds in PostgreSQL might take 200ms in ClickHouse on the same data.

The cost is operational complexity and a very different data model. ClickHouse doesn't do transactions or row-level updates well. It's designed for append-heavy, read-heavy analytical workloads. The pattern is to replicate data from PostgreSQL into ClickHouse (via CDC tools like Debezium) and keep them in sync.

Elasticsearch for search

PostgreSQL's full-text search is usable for simple cases — keyword matching with `tsvector` and `tsquery` works fine when your search index fits comfortably and your ranking needs are basic. When users expect Google-quality relevance ranking, faceted filtering, and sub-100ms search across millions of documents, you need a dedicated search engine.

Elasticsearch (or its open-source fork OpenSearch) is the standard answer. It uses an inverted index structure optimized for text retrieval, supports complex scoring functions, and handles aggregations for facets and filters efficiently. The operational complexity is real — Elasticsearch clusters require tuning and monitoring — but managed offerings from Elastic Cloud or AWS reduce the burden significantly.

When not to add another datastore

The hidden cost of a polyglot persistence architecture is coordination. Every new datastore is another system that can go down, another data source that can get out of sync, another thing your team needs to understand and operate. Before adding one, ask: have I genuinely exhausted what Postgres can do?

Materialized views, partial indexes, table partitioning, and connection pooling solve a surprising number of performance problems without introducing new infrastructure. An index on the right column often beats adding Redis. A partitioned table often beats migrating to ClickHouse.

Introduce a new datastore when the data model or access pattern is fundamentally different from what a relational database handles well — not because it's faster in a benchmark. The operational cost of an extra system compounds over years; make sure the benefit does too.

NextSaaSBuilding Multi-tenant SaaS: Security, Isolation, and the Data Trap