Summary: We migrated SortMyBox to a new application container at 1:31 AM PDT on Sunday, July 28th. Our service became non-operational as of 8:23 AM PDT Sunday morning due to a bug on our hosting platform that led to our usage quota being exhausted. We reached out to our hosting provider and with their assistance were able restore service at 12:10 AM PDT Monday, July 29th. Your data was backed up and safe at all times during the incident.
The Migration
We host SortMyBox on Google Appengine which offers two alternate data storage modes — Master/Slave and High-Replication. While both modes have built-in redundancy and fail safe mechanisms, the High-Replication mode offers better protection against data loss and unplanned downtime. We were initially setup to use the Master/Slave datastore and wanted to take advantage of the reliability benefits of High-Replication. So, Saturday night around midnight PDT, we started the migration process. By 1:31 PDT the process was complete and our application was operating smoothly in the new environment. We had some beers and went to sleep.
The Downtime
At 8:23 AM next morning we awoke to the sound of error notifications flooding our inbox. We quickly discovered that the application was unable to perform any datastore operations because we had apparently exhausted our usage quota. This was quite odd because we budget for 20x usage to accommodate for spikes. We quickly noticed that the billing status of the new application container was stuck in “Activating Billing” which made it impossible for us to make any changes to restore operations.
We reached out to Google Appengine for help via multiple channels and on Monday July 29th, 12:09 AM PDT the billing status issue was resolved. We enabled user logins at 12:10 AM and after verifying data integrity enabled all remaining background processes at 12:35 AM. We want to highlight that your data was backed up and safe at all times during the incident.
Thank you for bearing with us while we were working to restore service, and sorry for any inconvenience this might have caused.