App of the Day promo: what happened?

On Tuesday, MyTrails was featured as the App of the Day by AppTurbo, in a promotion that offered MyTrails Pro for free for a year.

Summary

The promotion was a great success, with over 120.000 new users trying out MyTrails, but it was marred by a 30-minute technical glitch that caused the promo to not automatically activate for thousands of users, some of whom reacted politely and patiently, and others not so much…

In the spirit of full disclosure, here is my analysis of what went wrong and how I attempted to rectify the situation. I hope this will make it clear that the promo was not an attempt to trick users with a bait-and-switch tactic, and that AppTurbo have had no responsibility in the issue (I don’t know what AppTurbo’s business model is, but I didn’t pay anything for the promo). The mistake was all mine.

Preparation

Promos like this are not spur of the moment things: I had had about a month to prepare for it. AppTurbo had warned me to expect high traffic, and I had tuned and tested the authorization server used by MyTrails, to make sure it could handle a sustained load of 120 requests per second with peaks of 1000 rps.

Technical details:
The server normally records a request log so I can replay transactions in the event they fail, and I disabled that. I also normally load up all the logs into an elasticsearch instance with logstash (also disabled). I use LogEntries for some lighter logging, and I didn’t disable that.
I switched the database connection pooling implementation from tomcat jdbc to BoneCP due to its reputation for high reliability and low overhead. This turned out to be a big mistake (not BoneCP’s fault)…
I also switched the request that activates the promotion from https to http to decrease the load on the server (and only because it carries no personal information, just a randomly-generated and MyTrails-specific UUID). Other requests continue to use https.

I had also made changes in MyTrails to reduce the number of requests it made to minimize the strain on the server (disabled the version update check among other things), and added an error message in the event the server was overwhelmed, with an automatic retry mechanism.

Genymotion pour usage personnel - Nexus 7 - 4.3 - API 18 - 1280x800 (1280x800, 213dpi) - 192.168.56.101 2014-03-13 10-25-29 2014-03-13 10-25-33

I also added a tongue-in-cheek message for users who didn’t activate the offer during the validity period, letting them know they could contact me for assistance.

Genymotion pour usage personnel - Nexus 7 - 4.3 - API 18 - 1280x800 (1280x800, 213dpi) - 192.168.56.101 2014-03-13 10-24-31 2014-03-13 10-24-46

On the day

Everything started great, with a short spike just after 8am, presumably when AppTurbo rolled over from their previous promo to MyTrails, then a ramp-up toward 11am.

catalina.out | Logentries 2014-03-13 09-43-58 2014-03-13 09-44-09

An approximation of the number of requests per second throughout the day

The server load remained very low (it’s a nice server) throughout the day despite processing close to 100 requests per second for sustained periods. Interestingly, the server that hosts this blog (a smaller, shared server) registered a higher CPU load than the auth server.

logentries_load

Server loads for the same period

And then I got an email from a user who was getting an error trying to create an account (not a required step, but one I advised users to take, so they could use MyTrails on multiple devices).

A quick investigation uncovered the error below: my email provider was no longer accepting the new account confirmation emails the server was attempting to send.

The first connection refused from my email provider (Google)

The first connection refused from my email provider (Google)

Technical details:
Normally, the server handled email sending errors gracefully, because in most cases the email confirmation is not necessary, but two things happened:
The javax.mail package broke its contract by throwing an exception that is not declared by the method I was calling, so I hadn’t taken that exception into account and the server was sending back the wrong type of error message, confusing MyTrails
For users connecting MyTrails to their Google account (the majority) rather than creating a FrogSparks account, a confirmation email is not necessary, and in retrospect should not have been sent, which would probably avoided triggering the problem in the first place

This error was not preventing the promo activation, but it was making it more difficult for users to follow instructions and associate MyTrails to their Google accounts (or create a separate FrogSparks account, for users leery of using their Google account everywhere), so I rushed to fix the issue.

The timeline of the failure

The timeline of the failure

It was a simple fix, and only took about 7 minutes to implement (11:04 to 11:11 approximately), during which I stopped the server, and MyTrails displayed the try again later error message to users trying to activate the promo.

Unfortunately, when I deployed the fix, the tools that I use introduced a glitch that caused the server to fail to connect to the database; this caused the server to send an unforeseen error back to MyTrails, which responded by displaying the promo expired message, and stopping the automatic retry mechanism for the promo activation (my biggest mistake was to not have made MyTrails react more specifically to the error messages it was receiving).

Technical details:
The issue was caused by a conflict between BoneCP, which depends on a recent version of the Google Guava library, and the Google OAuth package, which contains an antiquated version of the same (then called Google Collections). For some reason, the google-collections-1.0.jar was reintroduced into the build by my IDE and overrode the more recent Guava, causing BoneCP to fail.

This more critical error took longer to diagnose and fix, and by the time it was repaired at 11:33, thousands of people who had tried to activate the promo were shown an incorrect error alert and left with the option of sending email to my support mailbox or leave angry 1-star reviews on the Play Store.

Aftermath

I tried to quickly provide a work-around for activating the promo despite the problem (luckily such a work-around existed and was already documented, albeit initially targeted at people who were not eligible to the promo), answering support emails and answering Play Store reviews.

I even made a short (and fairly low-quality) screengrab movie to help users type in the correct code in the right location and posted updated instructions at 12:30, and started directing affected (and very disaffected) users to the page.

Needless to say, a subset of people who are used to getting free apps on a daily basis have little patience for anything that stands between them and instant gratification, so while most users who got the solution were able to correctly activate the promo, many didn’t try again and left angry 1-star reviews and/or uninstalled the app.

This user was able to fix the problem after an initially scathing review

This user was able to fix the problem (using a different method) after an initially scathing review

This one probably won't even try

This one probably won’t even try

That night I also deployed an updated version to help restore the auto-retry mechanism for users who hadn’t used the work-around. Probably too late for many, who had already uninstalled or who will not launch the app again after the update.

A successful promo?

Despite the issue and disappointment at having let people down (engineers don’t like to be the source of failure), initial results point at a very successful promo:

  • the Play Store reports a bit less than 120k installs (but it is updated with a delay and the cut-off times are a bit opaque, so I’ll have to wait a few days to find out exactly)
  • about 15k users did uninstall MyTrails on the same day, whether because they disliked the app, as a result of the activation problem, or because it’s their SOP (grab the promo, uninstall the app so it doesn’t crowd the device, reinstall it if needed later)
  • a bit over 90k users were able to activate the promo, which leaves about 15k unactivated users, assuming the 15k uninstalls were not activated (I have extended the validity period to give more time to these users)
  • the flood of 1-star reviews hurt MyTrails’ average rating in the countries where the promo was offered (AppTurbo do not operate in all countries, so the promo was country-specific; in particular I am not able to use App of the Day in France)
Italy and Spain, where MyTrails previously had higher reviews than in other countries (free official topo maps), dipped sharply

Italy and Spain, where MyTrails previously had higher reviews than in other countries (free official topo maps), dipped sharply

To large developers (or very successful indies), 100.000 downloads is small potatoes. To MyTrails, it’s a huge boost, and I hope the promo will trigger some spill-over in countries where I sell topo maps (my main source of revenue). It’s too early to tell, and since the free version of MyTrails carries no advertising, any financial effect will only be felt in the long term anyway.

I’m very glad to have had the opportunity to work with AppTurbo on this, and I’ve learned a valuable lesson. Now I need a similar promo for France!

Liked this post? Follow this blog to get more. 

Laisser un commentaire