Are you Using Caching As a Bandaid?

0 Flares Twitter 0 Facebook 0 LinkedIn 0 Google+ 0 0 Flares ×

When your web application is responding slowly, there are several levels at which the problem could be occurring. Until you instrument the application and begin the diagnostic journey, you really don’t know whether the problem is.

It is often difficult to convince business stakeholders to allocate budget to uncover and fix an undefined problem, rather than investing in customer-facing features and enhancements. And so the front line attack on the problem is often to simply apply caching; a particularly easy solution with most modern application frameworks providing either integrated caching or easily applied solutions via 3rd party plugins or modules.

So let’s just assume that the first line of defense is going to applying a cache bandaid. Fortunately many of the caching frameworks provide multiple layers of caching and by applying caching at the lowest levels and gradually stepping up through the application, this can give a relatively easy opportunity to understand at least what level the issue is occurring at.  And this information can be useful in convincing stakeholders of the need to dig deeper.

i. DB Objects
One of the most common problems with slow applications, are slow database queries.  Either because of complex queries or poor database architecture, there is the potential to exhaust the database connection pool because each connection needs to hold open and isn’t returned back to the pool quickly enough to manage the current traffic load.  That doesn’t mean you need more hardware necessarily. But it does mean you need to look more closely at the database queries, the structure and the indexing of the database.

ii. Code Objects
The next level is to cache programmatic objects. This might be helpful if you’re holding objects open while either waiting to populate them or structure them.  Structuring complex objects can sometimes be an issue when dealing with large complex XML objects for example. More often though, it is about delays in fetching data from external services buses or web services, where the routine is waiting for this data so that it can popular the object.  By caching these objects, you avoid ever having to instantiate, structure, or populate these objects.

iii. Page Caching
Moving another level up, you can cache the entire page requested by the user.  In this case, a copy of the output stream data (HTML) is kept either on the local file server, or in RAM with more aggressive caching.  Inbound page requests are intercepted, existence of a copy of the output stream in cache is first evaluated and the user is provided this completed object if its available.  if not, it proceeds to construct the response to the query, save a copy into cache and return that to the user. By doing this, you’ve cached and avoided all of the underlying processing for that page request.

If this works to resolve latency whereas DB and object caching did not, this likely indicates  inefficient code somewhere in the logic of that processing. For example, I once worked on a project in which a junior developer had written a database query that generated over 100 results, and subsequent database queries were being run for each of those results. So rather than running a single complex database query, the code ran 101 queries, and possibly more, for a single page. Clearly this was a problem.

iv. Content Delivery Network
If you’re working on a larger application and have access to a CDN to offload serving your assets, that’s a huge advantage.  If your site is media intensive, you could be serving a lot of heaving images and/or video.  If could be that perceived slowness of your application actually has nothing to do with your application, but rather has to do with network latency in returning these assets, if your server is far away from the user. CDNs solve this problem by mirroring these heavier assets on a network of servers throughout your targeted geography, and this can significantly improve the speed of these assets, not to mention freeing up HTTP server threads that may be hanging open too long and causing a bottleneck and queuing user responses.

If you’ve walked through each of these steps sequentially, you should now have a pretty good sense of whether you’re dealing with issues related to the database, inefficient database queries, latency in web services or other external system integrations, inefficient code in your business logic, or network latency related to delivery of your media assets.  Since you likely know the application fairly well, this should hopefully be enough to trigger a few thoughts and theories of what might be the problem and what to look at next.