Intermittent 403 errors when accessing the app / status pages.

Uptime Impact: 12 minutes
Resolved
Updated

Following up after last night's incident involving an increased number of 403 errors, we now have a little more information.

It appears we first saw some elevated error levels early on July 29th. Some customers reported these, but they were not substantial enough to trigger our monitoring alerts.

Despite some initial changes and improvements, these intermittent errors began to escalate again. Still, despite impacting customers, they didn't hit the threshold for our monitoring until 01:30 UTC on July 30th, at which point our out-of-hours team was alerted. This delay in alerting left several customers frustrated without a response. I'm sorry for this.

In response to this alert, we made several configuration changes to our firewall. We also discussed with our service providers why genuine traffic was flagged as dangerous and blocked.

These configuration changes improved the situation.

This morning we have made further configuration changes to the firewall. We have also adjusted the thresholds on our monitoring to give us better early warning should this issue come back at any point.

We are also continuing to work with our Edge and WAF providers to understand better why things went wrong and what improvements we can make.

Avatar for Robert Rawlins
Robert Rawlins
Resolved

Our monitoring has been good for a while now, so I believe the configuration changes we made to our firewall have fixed the issue with some genuine requests being blocked.

We are sorry to those of you who were affected.

We'll now conduct a proper review of what happened and look to make changes to our platform to minimize the risks of this happening again.

Thanks again to everyone for your understanding.

Avatar for Robert Rawlins
Robert Rawlins
Recovering

Monitoring is looking much happier after the recent configuration changes, we've not seen any blocked requests in the last 25 minutes.

I'll continue to watch things, but they're certainly looking more reliable right now.

Avatar for Robert Rawlins
Robert Rawlins
Updated

We've made some configuration changes to our firewall, which should help with the over-sensitivity and prevent any genuine requests from being blocked.

We're watching the data to see if this has a positive effect.

Avatar for Robert Rawlins
Robert Rawlins
Updated

We are still seeing a few 403 errors; however seemly a lot fewer than before.

We're still working to understand the root of the issue, and I hope to have more positive progress and more information for you shortly.

Avatar for Robert Rawlins
Robert Rawlins
Identified

It appears that this issue may be caused by our firewall, incorrectly blocking some genuine customer requests.

Our monitoring suggests there haven't been any failures of this kind for 10+ minutes now, but we're continuing to work with our infrastructure provider to understand the problem.

Thanks to everyone for their understanding, I hope to have more information for you soon.

Avatar for Robert Rawlins
Robert Rawlins
Investigating

Some customers in specific locations currently see several requests failing with an HTTP 403 error code. We're looking closely now at what might be causing this.

I'll update you as soon as we know more.

Avatar for Robert Rawlins
Robert Rawlins
Began at:

Affected components
  • Status Pages
  • Management UI
  • REST API