Oct 14

Podiobooks Gondor-Hosted Performance Analysis with NewRelic

Podiobooks is now part of Scribl.com! This post was written in 2012 when we had first converted podiobooks over to Django.

So Podiobooks.com is finally stabilizing after our initial push to get the critical features up and working again.

While we are still formulating the best plan to add the features that require some sort of user authentication, the ‘anonymous’ features have stabilized. One of my major concerns from our emergency launch was performance. While we’d been cooking up the Django version of the Podiobooks codebase for three years, performance tuning was hardly our biggest concern.

So when we set up our Django hosting at Gondor, I opted for a pretty big setup – two dedicated instances with 1GB of RAM each, with Django/gUnicorn app servers running on one, and the Redis cache/Postgres database instances running on the other.  While I think that Gondor’s prices for such instances are very good. Podiobooks is a site that primarily subsists on donations, so the lower we can get costs, the more money we can give to the authors and the folks that keep the site running.

To try and get a feel for how the site is performing, I installed the NewRelic application performance monitoring suite on the Podiobooks production instance. With NewRelic set up as a filter on top of the Podiobooks wsgi.py, it has amazing powers to analyze pretty much every aspect of your application’s performance, from the time it takes the browser to load the page, process the DOM and load assets, to the time it takes database queries to run. For queries it sees as running slowly, it automatically runs an Explain Plan on them, so you can quickly determine how to optimize them.

Here’s the chart that I find the most interesting. Along the Y axis is the response time of the application – how long it took to process the request and return data to the browser. This is the purest measure of your app’s performance, since it only includes your code, not the impacts of the network, their browser, loading images, etc.  We’ll look at that impact in a minute. For now, take a look at the X axis. This shows the number of requests handled per minute.

(Charts have expired, sorry!)

So, why is this important? In short – it shows clearly that the more requests per minute that Podiobooks is getting, the better the response time is. So, we’re not getting swamped with requests and getting slower the more people that hit the site.  This is super good news.

You might wonder how it’s possible that the performance is better with more simultaneous hits, and the answer is caching.  The Redis cache is set to last 5 minutes for most pages right now, so if you get a lot of hits within a 5 minute period, few of them will have to wait for the page to get cooked up by the database and app server, they just get a cached version streamed out of Redis back to their browser.  As requests slow down, the chances that any given user is going to get a ‘stale’ page that has to get refreshed and not just served out of the cache increases.

You can also look at the dot color to see that right around 8PM mountain time is when we get the highest simultaneous traffic to the site.

Once thing that we’ve noticed looking at the Google Analytics traffic to the site is that in terms of pure hits to the site, the iTunes Music Store is by far our biggest ‘user’. Since most of the titles on Podiobooks.com are also listed in the Music Store (as podcasts), the Music Store crawler is regularly checking on all the feeds to see if anything has changed. So making those RSS feed views as low-impact to folks browsing the site as possible was important.

Unfortunately, when I first looked at the ‘Slow SQL’ display in NewRelic, the queries that were underneath the RSS feeds were some of the most impactive. I had spent zero time optimizing those views and queries, and yet the vast majority of hits to the site were going through them! Luckily, a quick application of Django’s ‘select related‘ smoothed out that issue.  Long-term, we should probably be caching those views longer than 5 minutes.

Here’s the database-only equivalent of the application report above:

(Charts have expired, sorry!)

Query time is pretty flat with load still…again likely due to caching, both at the Django level, and natively within Postgres.

And here’s one for just the CPU time being consumed:

(Charts have expired, sorry!)

Good news all around. While I’m of course helpful that we can get the number of users on the site to increase to the point where we’d need to add more capacity…right now I think we have too much capacity, and can likely save some money by going to down a single dedicated instance.

Finally, if you are interested in the total time it takes to load pages, this graph covers that:
(Charts have expired, sorry!)

The tan color is how long it takes from when the network is done loading the HTML to when the browser declares the page to be loaded (DOM Ready), then then teal is the time from that point until the end of the ‘load’ time in the browser, so after all the images are loaded and such. Pages that have fancier CSS calculations, more images, and more JavaScript take longer in that teal zone.  Since that describes most of our pages, it’s the biggest contribution to load time. It’s also the least noticeable to most users, since the page is ‘doing something’ during that time.

Take note that even though about 1/3 of Podiobooks.com traffic is from mobile devices (often on 3G or slower networks), the network time is rarely a factor compared to the page rendering time. That’s on purpose – the pages have minimal HTML (thanks to @brantsteen), so they load over the network quickly, but then the complex CSS and JavaScript for the responsive layout kicks in, and it can take a second or two for everything to look perfect.

Let me know via Twitter if you have any questions about the site or performance tuning!



Jul 21

Django Many to Many Model Saving with Intermediary (Through) Model

I spent more time than I wanted to pulling together the solution for saving a child and its relation to its parent at the same time, while setting a value on the many-to-many model in between.

It feels like a pretty typical pattern, but the documentation and the info I found all over never quite got me there.

There seems to be a feeling that using inlines for this is the right idea…and I mostly agree. However, the inline stuff was really aimed at having lots of sub-forms, and submitting two forms separately. In this case I need to do them all together. So, I pulled the two crucial fields off of the many-to-many model, and added them ‘manually’ to a ModelForm for the child model.

A little magic in the ‘save()’ method for the form, and voila!

The gists of the related bits are below:

Apr 15

Software Generalists vs. Specialists and the Instant Reference

I’ve been working on a number of software projects, the most notable of which is podiobooks 2.0.

Podiobooks.com is a very popular site for downloading free audiobooks.

Many of the books offered first for free on podiobooks.com have gone on to become NYT bestsellers.

We’re building this project using the Python web development framework “Django“.

But like any modern web project, the “main” development is only part of the story.

There’s also Cascading Style Sheets version 3 (CSS3), Hypertext Markup Language version 5 (HTML5), JavaScript, the jQuery JavaScript framework, SQL, the Apache and NGINX webservers, unix scripts to start and stop things, and the Hudson continuous integration server to help us test and release code with some degree of quality.  Not to mention doing most of that across multiple browsers targeting multiple end-user devices (Desktops, iPads, iPhones, etc.)

Building a modern web application is a lot like building a house – it’s not just a bunch of wood nailed together.  There’s drywall, paint, windows, a foundation, a roof, plumbing, electrical, and a whole lot more.

In construction, it’s often broken out into speciality trades, who work on a given house only long enough to get the electrical in, and then move on to the next project.

Because applications need constant tweaking, and because the standards for building them are evolving at an incredible rate (the 2×4 for home construction hasn’t materially changed for 30 years – I haven’t built a web app will the same components for more than a year in a row).

So it means that if you are an experienced web developer, your brain has to be split across a whole lot of different “trades”, each with their own nuances.  And, just like a “general contractor” isn’t going to be as good an electrician as a guy who is a Master Electrician, your “general web developer” isn’t going to be as good at any one of the parts of web development.

So when someone asks me to write out a piece of code for them, it’s unlikely I’ll be able to do so without the handy reference of Google to look up the fine details.

On the other hand, I have more than enough experience to know what I need to look up.

It’s interesting to me that the more you do work in the real world, the broader your world becomes, and the harder it is to do just one thing.

It’s also interesting that the power of the instant reference makes it possible to learn new things and actually use them, because you don’t have to rely on being able to hold 100 different bits in pieces in your mind, or to thumb through a set of tomes for each new tool.

That’s not to say there’s no room for specialists – quite the opposite.  At some point, there’s a whole different personality needed to do certain tasks.  Database work that has no UI tends to appeal to different people than art-intensive layout and design.

So on big teams, I definitely rely on the guidance and work of specialists, but someone (usually me) has to be able to hold the big picture in their head, and make sense of it all…otherwise you end up with a bunch of excellent pieces that don’t work together at all.

I think we are starting to get to the point where companies hire generalists, and then contract out specialties.  Which is much the same way that the house construction industry works.

The difference in my mind though is not that this has happened through the emergence of lasting standards (although that has helped a bit), but through the power of the instant reference…

Oct 28

Why Python, Why Now?

So 9 years ago this month, I completed a rewrite of an app that a coworker had written in Python.

I rewrote it in a combination of simple Unix shell scripts and more complex .sql files.

Why?  Because it was the only Python program around, and there was NO GOOGLE.
 At least not in the way there is today, where hundreds of examples, tutorials, and guides are a keyword search away.
Also, because the parts that were written in Python were really simple, fire off a cron, send an email, but lots of code had been devoted to them as he explored the language.
But the parts that *needed* to be concentrated on were the database queries – it was an ETL program – one that did a massive 30GB data transformation every month.
 When I started on the project, the process from start to finish took three people and entire month.  When I left, it was running automatically in 10 hours.
So, it wasn’t that the Python itself was bad, rather, it was concentrating on the wrong problem.
Now I find myself contemplating taking on a huge Python-based project.  And I ask myself…why Python, Why Now?
Because for this project, the hard part IS something well-served by lots of tight snippets of app code, and now there IS Google.
But mainly, it’s because of the Python community – much like the much-touted iPhone slogan, in Python “There’s an egg for that”.  An egg is a little package of reusable code that someone else made ready for you to use…for free.
The library of eggs is now so vast, and so HIGH QUALITY (overall), that it’s a massively compelling toolbox now.
And, even if the egg is poorly documented, Python is so easy to read that it’s often a quick matter to figure out how to use it just from reading the source…which is NOT true of something like Java.
The world has changed a lot since 2001 – and IMO, in 2010, the winners in the tool wars will be the ones who get the most information on their tools available on the internet – for free…
Aug 27

Creating Custom Google Maps Overlays with GWT Widgets

So this is one of those that I went around and around on before finding out the solution, which of course turned out to be pretty darn simple.

What I was trying to do was add a box with some text under a marker on a Google Map.

I’m using Google Web Toolkit 1.5.1 and the official gwt-google-apis maps-api.jar to bind the maps stuff into GWT.

It turns out that the secret is coming to an understanding of how Panes work in Google Maps.

The pane is the whole map.  Even the stuff that you can’t see in the little container box on your page.

That pane has Pixel coordinates.

Those coordinates do not change, even when you scroll the map.

That’s right.

The pixel coordinates of the map pane do not change when you scroll the map around inside the box on your page.

So if you add a widget to the pane at a certain pixel location, that widget doesn’t have to change it’s location when you scroll the map.

It also means that the widget can slide under the edge of your viewing port just like Markers do.

So all you need to do is decide where you want your GWT Widget to be in terms of Lat/Long and call the handy myMapWidget.convertLatLngToDivPixel(LatLong) method, and you’ll get back a Point on the map that is an absolute location on the map pane to place your Widget on.

As soon as you realize that the map pane coordinates don’t change when you scroll, and that you need to add your widgets to the pane rather than some other panel, life is super-easy.

Here is the class I wrote to implement the label.  I use it to place under Markers to label them on the map.

/* Custom Map Overlay Code – Copyright 2008 Cyface Design, Released Under the Apache 2.2 License */

import com.google.gwt.maps.client.MapPane; import
com.google.gwt.maps.client.MapPaneType; import com.google.gwt.maps.client.MapWidget; import com.google.gwt.maps.client.geom.LatLng; import com.google.gwt.maps.client.geom.Point; import com.google.gwt.maps.client.overlay.Overlay; import com.google.gwt.user.client.ui.HTML; import com.google.gwt.user.client.ui.SimplePanel;
public class MapMarkerTextOverlay extends Overlay {
private final LatLng latLng;
private final SimplePanel textPanel;
private MapWidget parentMap;
private MapPane pane;
private String text;
private Point offset;
/** * Main constructor *
@param latLng
public MapMarkerTextOverlay(LatLng latLng, String text, Point offset) {
/* Save our inputs to the object */
this.latLng = latLng;
this.text = text;
this.offset = offset;
/* Create a widget for the text */
HTML textWidget = new HTML(text);
/* Create the panel to hold the text */
textPanel = new SimplePanel(); textPanel.setStyleName(“textOverlayPanel”);
/* Panel gets added to the map and placed in the initialize method */
@Override protected final void initialize(MapWidget map) {
/* Save a handle to the parent map widget */
parentMap = map;
//If we need to do redraws we’ll need this
/* Add our textPanel to the main map pane */
pane = map.getPane(MapPaneType.MARKER_PANE); pane.add(textPanel);
/* Place the textPanel on the pane in the correct spot */
Point locationPoint = parentMap.convertLatLngToDivPixel(getLatLng()); Point offsetPoint = new Point(locationPoint.getX()-getOffset().getX(), locationPoint.getY()-getOffset().getY()); pane.setWidgetPosition(textPanel, offsetPoint.getX(), offsetPoint.getY());
@Override protected final Overlay copy() {
return new MapMarkerTextOverlay(getLatLng(), getText(), getOffset());
} @Override protected final void redraw(boolean force) {
/* Shouldn’t need to do anything here since we’re on the Marker pane. */
} @Override protected final void remove() { textPanel.removeFromParent();
} public LatLng getLatLng() {
return latLng;
} public String getText() { return text; }
public void setText(String text) { this.text = text; }
public Point getOffset() { return offset; }
public void setOffset(Point offset) { this.offset = offset; }