Scalability Pattern: Database Partitioning
- Conflicting definitions
- When you divide a big database into several smaller ones
- Partitioning: Horizontal and Vertical
- Sharding; Horizontal Partitioning
- What’s the problem?
- When the database is the bottleneck
- Add a second database server
- What to do with the data?
- Some options
- Replication: Put a complete copy of the data on the second db server
- Pay attention to read vs. write
- What to do about data consistency?
- Parition
Example
- User Database
- Happens to often be a monster
- Lots of records
- Each record with lots of information
- Accessed a lot
- Schema
- User: (id, name, email, biography, hobbies, college, last_login, encrypted_pw, profile_photo_jpg, …)
Vertical partition
- Often associated with an SOA
- Divide the User table into three different database servers:
User: (id, name, biography, email)
Autentication: (id, last_login, encrypted_pw)
Photos: (id, profile_photo_jpg)
- How it changes your application
- Pretty basic rearchitecture into separate services
Sharding (Horizontal partition)
- “buckets” of users (== shards)
- How? Create multiple database servers with
- the same schema
- different subset or clump of records
- Need a way to direct requests to the right “shard”
- inspect something about the record
- determine what shard to look in
- Three common flavors to decide what bucket gets a record
- Range Based: Range of some scalar value (record id, first letter of name, etc.)
- List Based: Take some other property (e.g. zipcode, department)
- Hash Based: Compute a hash on some value
- How it changes your application
- Whenever you either read, write, or search
- Require to include enough information to pick the right shard
Pros and Cons
- Joins become a problem
- What was once one db is now spread over more than one db
- Can lead to denormalization
- Data Integrity
- Foreign keys might now point to another database
- Databases can get out of sync
NB Both kinds of sharding are advanced techniques and you should only use them when you have quantitative reasons to believe they will improve a measured performance issue.
Teams Work out a plan for sharding your databases. What would you shard, why and how?