Podiobooks Gondor-Hosted Performance Analysis with NewRelic

Podiobooks is now part of Scribl.com! This post was written in 2012 when we had first converted podiobooks over to Django.

So Podiobooks.com is finally stabilizing after our initial push to get the critical features up and working again.

While we are still formulating the best plan to add the features that require some sort of user authentication, the ‘anonymous’ features have stabilized. One of my major concerns from our emergency launch was performance. While we’d been cooking up the Django version of the Podiobooks codebase for three years, performance tuning was hardly our biggest concern.

So when we set up our Django hosting at Gondor, I opted for a pretty big setup – two dedicated instances with 1GB of RAM each, with Django/gUnicorn app servers running on one, and the Redis cache/Postgres database instances running on the other.  While I think that Gondor’s prices for such instances are very good. Podiobooks is a site that primarily subsists on donations, so the lower we can get costs, the more money we can give to the authors and the folks that keep the site running.

To try and get a feel for how the site is performing, I installed the NewRelic application performance monitoring suite on the Podiobooks production instance. With NewRelic set up as a filter on top of the Podiobooks wsgi.py, it has amazing powers to analyze pretty much every aspect of your application’s performance, from the time it takes the browser to load the page, process the DOM and load assets, to the time it takes database queries to run. For queries it sees as running slowly, it automatically runs an Explain Plan on them, so you can quickly determine how to optimize them.

Here’s the chart that I find the most interesting. Along the Y axis is the response time of the application – how long it took to process the request and return data to the browser. This is the purest measure of your app’s performance, since it only includes your code, not the impacts of the network, their browser, loading images, etc.  We’ll look at that impact in a minute. For now, take a look at the X axis. This shows the number of requests handled per minute.

(Charts have expired, sorry!)

So, why is this important? In short – it shows clearly that the more requests per minute that Podiobooks is getting, the better the response time is. So, we’re not getting swamped with requests and getting slower the more people that hit the site.  This is super good news.

You might wonder how it’s possible that the performance is better with more simultaneous hits, and the answer is caching.  The Redis cache is set to last 5 minutes for most pages right now, so if you get a lot of hits within a 5 minute period, few of them will have to wait for the page to get cooked up by the database and app server, they just get a cached version streamed out of Redis back to their browser.  As requests slow down, the chances that any given user is going to get a ‘stale’ page that has to get refreshed and not just served out of the cache increases.

You can also look at the dot color to see that right around 8PM mountain time is when we get the highest simultaneous traffic to the site.

Once thing that we’ve noticed looking at the Google Analytics traffic to the site is that in terms of pure hits to the site, the iTunes Music Store is by far our biggest ‘user’. Since most of the titles on Podiobooks.com are also listed in the Music Store (as podcasts), the Music Store crawler is regularly checking on all the feeds to see if anything has changed. So making those RSS feed views as low-impact to folks browsing the site as possible was important.

Unfortunately, when I first looked at the ‘Slow SQL’ display in NewRelic, the queries that were underneath the RSS feeds were some of the most impactive. I had spent zero time optimizing those views and queries, and yet the vast majority of hits to the site were going through them! Luckily, a quick application of Django’s ‘select related‘ smoothed out that issue.  Long-term, we should probably be caching those views longer than 5 minutes.

Here’s the database-only equivalent of the application report above:

(Charts have expired, sorry!)

Query time is pretty flat with load still…again likely due to caching, both at the Django level, and natively within Postgres.

And here’s one for just the CPU time being consumed:

(Charts have expired, sorry!)

Good news all around. While I’m of course helpful that we can get the number of users on the site to increase to the point where we’d need to add more capacity…right now I think we have too much capacity, and can likely save some money by going to down a single dedicated instance.

Finally, if you are interested in the total time it takes to load pages, this graph covers that:
(Charts have expired, sorry!)

The tan color is how long it takes from when the network is done loading the HTML to when the browser declares the page to be loaded (DOM Ready), then then teal is the time from that point until the end of the ‘load’ time in the browser, so after all the images are loaded and such. Pages that have fancier CSS calculations, more images, and more JavaScript take longer in that teal zone.  Since that describes most of our pages, it’s the biggest contribution to load time. It’s also the least noticeable to most users, since the page is ‘doing something’ during that time.

Take note that even though about 1/3 of Podiobooks.com traffic is from mobile devices (often on 3G or slower networks), the network time is rarely a factor compared to the page rendering time. That’s on purpose – the pages have minimal HTML (thanks to @brantsteen), so they load over the network quickly, but then the complex CSS and JavaScript for the responsive layout kicks in, and it can take a second or two for everything to look perfect.

Let me know via Twitter if you have any questions about the site or performance tuning!



Comments are closed.