Mobile Care Worker Service Issue - 21/03/2019 - 22/03/2019
Update - 22/03/2019 | 04:09 pm:
Microsoft engineers have confirmed that the issue has been resolved and our engineering team will continue to monitor the performance of the application over the weekend.
A post-mortem will be added to the bottom of this article within the next 48 hours.
Update - 22/03/2019 | 09:00am:
During yesterday evening, Microsoft migrated our service to an alternative server as part of their platform investigations, this action decreased the performance of the Mobile Care Worker service. Our technology team reacted to several system alerts and put in place some corrective actions to resolve those performance challenges. This happened between 09:40pm and 10:10pm yesterday evening.
This morning our monitoring continues to show that the service is performing as expected, but we’re waiting on confirmation from Microsoft that the issue has been fully resolved at their end. Once we receive such confirmation, we will work with Microsoft to provide a post-mortem of this incident and provide details below.
Update - 21/03/2019 | 02:00pm:
Microsoft has confirmed an issue within an Azure datacenter and their team is working on a resolution - the part of the Microsoft Azure service affected hosts the Mobile Care Worker application.
Our technology team has configured our service to work around the Azure issue and from our monitoring can see that system performance is returning to normal. We have noted that the application performance is marginally slower than normal and would expect this to return to normal once the issue within Microsoft Azure has been resolved.
We will continue to get updates from Microsoft until the issue is fully resolved.
Update - 21/03/2019 | 10:40am:
Following the action that was taken at 9:30 am today, our monitoring detected some further service issues.
Our technology team has since re-provisioned the service onto alternative servers in Azure.
From our monitoring, we can see that some requests are now being handled correctly, however one server is continuing to respond with a bad response.
We are working with Microsoft to resolve the issue and will share further updates via this article until a resolution is achieved.
Update - 21/03/2019 | 09:30am:
Our technology team has investigated and resolved an issue with one of the services that hosts our Mobile Care Worker application.
The issue started around 7:00 am and was resolved around 9:00 am, during those times no mobile messages would have processed.
We will publish a post-mortem covering off the cause and the solution for this issue below within the next 48 hours.
At 07:00 am on 21 March we detected a problem with the Mobile Care Worker application which is hosted with Microsoft Azure. The Mobile Care Worker application utilises multiple servers within Azure to deliver the service.
From 07:00 am on 21 March Microsoft have been experiencing a problem within their datacentre, if the Mobile Care Worker application was hosted on one of Microsoft servers that was experiencing the issue, the server and application became unresponsive and failed to service customer requests.
While the Microsoft engineering team were working on a resolution to the issue within their datacentres, the iCareHealth technology team worked to migrate the Mobile Care Worker application to servers that were not impacted by the Microsoft issue.
Between 09:00 pm and 10:00 pm on 21 March Microsoft applied a hotfix to their hosting platform to resolve the issue which required the iCareHealth team to restart the application which was completed by 10:10 pm on 21 March.
Microsoft have now confirmed that the issue has been resolved and our engineering team will continue to monitor the performance of the application over the weekend.