A Few Web App Instances Have Response Problems in EMEA
Incident Report for Optimizely Service
Postmortem

SUMMARY

Episerver Customer-Centric Digital Experience Platform (DXP; formerly Digital Experience Cloud™ Service - DXC Service) is the cloud-based offer from Episerver based on Microsoft cloud technology. A solution that delivers high availability and performance, easy connectivity with other cloud services and existing systems, ability to manage spikes in customer demand, and a platform that is ready to seamlessly adopt the latest technology updates.

Starting on 25th February, 2020 a limited subset of customers using instances of App Services hosted in North Europe may experienced HTTP 503 response on their websites. Root cause analysis has been provided by Microsoft and the following report describes additional details around the event.

DETAILS

Between 19:14 UTC 2020-02-25 to 9:46 UTC 2020-02-26 , a subset of customers using App Service hosted in North Europe may have experienced HTTP 503 response code when accessing App Service.

TIMELINE

2020-02-25 19:14 UTC - First alerts for client websites is received and investigation is initiated by Episerver.

2020-02-25 19:46 UTC - Support ticket raised with Microsoft.

2020-02-25 21:05 UTC - Issue mitigated for the initial customers who were impacted.

2020-02-25 23:12 UTC - StatusPage updated.

2020-02-26 06:16 UTC - Mitigation efforts ongoing by Episerver & Microsoft for remaining impacted Clients.

2020-02-26 09:46 UTC - Mitigation was completed successfully. Issue was monitored.

2020-02-27 08:41 UTC - Incident closed. Microsoft continue the investigation to establish the full root cause.

2020-03-12 07:51 UTC - Microsoft officially provided root cause analysis.

ANALYSIS

The issue happened because of a platform change. Microsoft Engineers determined that the issue was related to configuration update occurring as a part of deployment on Microsoft Azure. It caused the data roles to switch which in turn was causing a null token exception. Once the deployment was completed the error disappeared.

IMPACT

A subset of customers may have seen HTTP 503 response code while accessing App service.

CORRECTIVE MEASURES

Since the root cause was discovered, necessary fixes have been implemented to mitigate the issue from re-occurring.

Microsoft is continuously taking steps to improve the Microsoft Azure Platform and their processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):

• Improving the resiliency of these types of error to allow for graceful error reporting and potential recovery paths.

FINAL WORDS

We apologize for the impact to affected customers. We have a strong commitment to delivering high availability for our services and we will do everything we can to learn from the event and to avoid a recurrence in the future.

Posted Mar 26, 2020 - 08:31 UTC

Resolved
This incident has been resolved.

We will continue to investigate the root cause. An incident report will be published as soon as the cause has been established.
Posted Feb 27, 2020 - 08:41 UTC
Monitoring
Mitigation steps are successfully performed and the services are reported fully operational.
We're monitoring the results and continuing working with Microsoft to establish the full root cause.
Posted Feb 26, 2020 - 09:46 UTC
Update
We are continuing investigate with Microsoft and taking several steps to mitigate the issue.
We will keep you updated with information.
Posted Feb 26, 2020 - 06:16 UTC
Investigating
We are currently working with Microsoft on an issue impacting a limited number of Web App instances in one EMEA Azure data center. We will keep you continuously updated with information.

Any questions can be sent to support@episerver.com or call:

Worldwide: +46 8 555 827 50
United States: +1 888 726 81 27 (Toll Free)
United Kingdom: +44 800 066 4784 (Toll Free)
Australia: +61 280 363 161
Posted Feb 25, 2020 - 23:12 UTC