Spanner: A right database choice for application with billion of users

4 min readMay 24, 2024

How things change if you need to support billion of users?

Moment you are designing your application to serve the billion of users, many key decision needs to be taken to store data.

Replicate data to avoid loosing on failure of disk
Shard data to avoid hitting the physical limit of single machine storage.
Geo shard data to support local law to store the user’s data to the boundary of regional territory.
Transaction support to simplify the data write.
Backup of your data to deal with unforeseen reason which may delete your data.
Low latency read/write.
High throughput
Easy to manage & many more other reason

How NoSQL polluted the industry as plastic polluted the earth?

15 years back, most of the relational database struggle to scale when data started growing in terabytes. Replication and sharding was another bottleneck with relational database.

This lead to evolution of NoSQL database which doesn’t support transaction but does support replication and sharding. This lead to massive adoption of NoSQL as it was easy to use and well suited for future inspiration to serve billion of users.

But NoSQL specially lack of strong transaction lead to eventual consistency paradigm. It lead to compromise on many user experience and put a burden on developer to design algorithm to handle the eventual consistency. Please note; as a user, user never liked eventual consistent experience. User would like to see the update data should be reflected as immediately as possible.

As plastic was easy to adopt, manufacturing industries replaced metal, wood, glass etc material with plastic. Which lead to massive trash problem to earth and broke many ecosystem.

Similarly NoSQL polluted the Software industry under the umbrella of eventual consistency.

Why Spanner is best?

Support strong transaction across the shards.
Spanner usages Table design to store data which makes it natural for others to learn quickly.
Spanner maintains max 4GB of replica data size, if data grow then it splits into smaller chunks and move to different replica to keep the size small.
Allow grouping of data which can be configured to respect the geo boundary.
Takes care of sharding based on primary key and group defined.
It can support more than peta bytes of data with million of groups.
Multi version support easy backup and allow user to decide strong vs snapshot read.
Commit timestamp are created based on TrueTime and in microseconds. It allows application to use it as primary key.
Due to strong transaction across shards, now application doesn’t need to worry about eventual consistencies.
It does support Queue. Due to multi table/queue transaction support, now in same write request, you can guaranteed that data is written into both tabel and in Queue for async processing. This was one of the drawback for other database or if you need to use database as well as Queue system separately.
Hosted on GCP and provides infinite scale.

High level Spanner achitecture

Data is defined as table.

Does support foreign key concept.

Support directory table concpet to interleaved data to keep related data on same machine which lead to faster query.

Split rows to optimize the storage limit of ~4GB as default value.

Group set of splits to replicate together.

Replicates data (called tablets)

Use TrueTime to solve the globally unique time.

For more details, please refer to Paper https://www.usenix.org/conference/osdi12/technical-sessions/presentation/corbett

Conclusion

Spanner is distributed transaction database available on Google Cloud. It is a battle tested product as almost all product within Google does use Spanner. If you are looking for database for your application with billions of user then Spanner would be a rigth choice.