We deployed the fix at around 6pm ET and have been monitoring the situation. All signs look good, and we will be putting measures into place to better prevent issues like this going forward.
For all those listening in, the issue was caused by a backup of requests to backfill data for a legacy system. When the legacy database took a long time to respond, the requests piled up until the system eventually crashed. To resolve the issue, we're only backfilling the legacy database when data has actually changed in the primary. We're also buffering the backfill and have added more capacity to our servers.
Posted over 2 years ago. Jul 12, 2016 - 22:35 EDT
We've been able to replicate the issue and have identified the symptoms, which seem to be related to our caching.
We are in the process of deploying a change that should temporarily fix the issue while we address the root cause.
Posted over 2 years ago. Jul 12, 2016 - 16:54 EDT
We are looking into reports that recently published flows are not appearing when they should. We also have rare report that recently unpublished flows are continuing to show.