Chasing scaling problems

YAGNI Basic Law of Performance and Scale: Never do extra work for scaling until measurement reveals there's a problem and where the problem is.

We start with the simplest possible set up. If you have stand alone servers (which we don’t) you would run the database and web server on the same box.
In our case we are deploying to a cloud server, where you aren’t getting a full box but a virtual slice, from e.g. Heroku. In that case our base configuration is a single “web worker” to run Sinatra and a single “database” to run our database, which would be Postgress.
With that base, measure performance. If you have a real load, you can measure. In our case we have to create an artificial load send traffic to the server. We want to see how many simultanuous sessions we support.
As you do scalability testing, keep an eye out for errors that are happening. A server will appear to respond super fast to a request if all it is doing is returning an error code!

Analyze and think about whether the servers are poweful enough. Is it your old laptop running a database server or is it a fairly new computer with no other load. If you are using a cloud service, like BlueMix, Heroku or Digital Ocean, what kind of virtual capacity or limits does it have? You might need to simply up the capacity.
Examine your database access and queries. Can you determine that pages that don’t have a database call are really fast and the ones that do are slower? Can you see what pages slow down the most?

Make sure you clearly understand the separation between the web and the database server.
Consider whether you should put the database on it’s own server
Consider whether any tables need indexes
Consider whether you are going back to the database more than a few times for each page displayed.
- Consider whether you are hitting the database once for each record displayed (so called N+1 problem).
- If you are, look at making your queries ‘greedy’ meaning that they bring back more in a single call to the server
Consider whether you are issuing the exact same query over and over again.
Consider whether an intermediate query result that is expensive is being requested again. If so, caching those results is a strategy.
Look at the metrics on your database servers, that is simple

Consider what app server you are using
- For example: webrick, unicorn, puma, thin, passenger
- See App Server Arena
Consider whether you are fully/over utilizing resources
- Make sure that the resources (memory, cpu) are being used but not pinned to maximum.
- You don’t want to hit your caps on resources, otherwise the app will start thrashing.
Consider adding more “web workers” or concurrent threads.
- You have to be careful that your architecture allows that. Is your design ready to run concurrently.
- Is there any way that one request can corrupt another one running at the same time?

Consider adding another discrete server
- Once you see one box does not cut it any longer add another server, you will need a load balancer for this.
- Adding another server, increases your redundancy. Depending on how valuable this app is and how badly it needs to stay up 2 app servers is a good idea, as well as 2 database servers, 1 primary and 1 follower on stand by. AWS RDS makes that last part easy.
Consider caching services
- Are there intermediate results that can be cached to avoid hitting the database at all?
Consider partitioning the databases
- One replica for update transactions and multiple replicas for read transactions
- Partitioning vertically, by moving certain columns of tables to a separate db servers.
- Partitioning horizontally (sharding) by moving certain sets of rows to separate db servers. (For example, profile records for half the users on one server and the other half on another server.)
Consider breaking into services
- Are there separable major bits of functionality that you can carve off into fully independent services, with their own app and db servers