RSS Feed
Latest Updates
[Resolved] CloudMail Issues on Cluster A
Posted by Rahul :: WIPL on 24 March 2020 11:16 AM
Resolved - This incident has been resolved.
Mar 2702:06 UTC
Update - We have now unblocked all users on Cluster A to be able to access and use mail services.

There are still remaining accounts to be migrated to the new mailstore, and we will post updates on this incident to keep you informed as we work to complete the migration.

To be kept up to date, please subscribe either to the System Status page or this specific case.
Mar 2701:31 UTC
Update - The operations team has already enabled roughly 60% of users out of the subset of users (approx 2k) that was under maintenance and progressing towards enabling remaining users.
Mar 2623:18 UTC
Update - The affected mailstore is now enabled again. With the exception of the small subset of users (approx 2k) placed in maintenance for additional investigation. All other users on Cluster A should be accessible without issue
Mar 2620:13 UTC
Update - The previously disabled mailstores have been re-enabled again. Our Operations department has narrowed down the potential issue in the last affected mailstore to a subset of users. Currently, we are bringing the last affected mailstore online with those users disabled so we can monitor performance.

Updates will be provided when possible
Mar 2619:44 UTC
Update - While re-enabling the affected mailstore an issue regarding it was discovered. To mitigate that, our Operations department will be disabling some other mailstores. They will be brought back up one at a time to monitor performance. Approx 70k users will be impacted.

Updates will be provided as soon as possible
Mar 2619:01 UTC
Update - We have re-enabled all mailstores in our system except one since our last update. Some users on the affected mailstore will be offline until we completed the migration. We will provide an update as soon as the migration is done.
Mar 2618:33 UTC
Update - Most of the affected mailstores have been re-enabled. While the operations team continues to investigate another to identify and rectify the issue on it.

Updates will be provided as soon as possible
Mar 2612:05 UTC
Update - After re-enabling our mailstores to resolve this ongoing incident we have encountered a different issue and needed to bring the mailstores offline again in order to resolve this problem.

We will likely be bringing them back up shortly once we have completed testing. These affected users will be unable to login to the Webmail/IMAP or POP until the mailstores have been re-established.
Mar 26, 11:10 UTC

 - The mailstores have been enabled with 12521 users migrated. The CR and migration are stopped for today. At this time service is fully online and we will be monitoring for any impact, which is currently expected to be better than previous days.
Mar 2610:00 UTC
Update - The emergency maintenance on Cluster A will be targeted for specific users. Those targeted users will be unable to login to their account via POP/IMAP/Webmail or an email client during the maintenance window. This would affect approx. 6.6% of users on Cluster A

Start time: Mar 25, 2020 at 10:00 PM UTC

End time: Mar 26, 2020, at 10:00 AM UTC
Mar 2520:58 UTC
Update - We are seeing high load again. Users may notice slowness or intermittent issues to login.

Updates will be provided as soon as possible
Mar 2514:03 UTC
- The maintenance is nearing completion. Currently, 4 of the 5 mailstores have been enabled again and we are awaiting storage load to drop before the 5th mailstore is enabled.

At this time, only users on the still disabled mailstore are expected to be completely unable to utilize hosted email. Users from the other 4 mailstores which were enabled will still see a degraded performance at this time.

We will continue to update, as updates become available. Thank you for your patience.
Mar 25, 07:06 UTC

 - Our operations team is working on an emergency migration process to help restore access.
A handful of users will be unable to login to their account via webmail or their email clients.
Mar 2423:56 UTC
Update - The operations team is still observing a high load on the NAS. They are monitoring it to ensure it stabilizes before they resume email deliveries.
Updates will be provided as soon as possible
Mar 2420:26 UTC
Update - Our operations team has turned on all 5 mailstores and is currently observing the performance. Once the NAS stabilizes, mail deliveries will be resumed one mailstore at a time.

Updates will be provided as soon as possible
Mar 2418:55 UTC
Update - Our operations team has turned on 4 mailstores out of 5 mailstores and is currently observing the performance of the NAS. We have around ~56k users online. Once the load on the NAS settles down, the remaining mailstore will be turned on as well.
Mar 2417:36 UTC


Update -: The problem has not fully recovered after last night’s maintenance. The operations team is taking subset of users offline to stabilize the hardware. The plan is to take down around 15k users out of the 70k at a time until we notice improvement. Users on that mail store may experience problems accessing their mailbox including Tucows users. For now, email is still being delivered to users inboxes, even if they cannot login at this time.

Update at 3:00 PM March 24, 2020 - The migration appears to be progressing well. A further estimate of the completion time has not yet been made. We will update again once we have this information.

Update at 12.08 PM March 24, 2020 - The migration continues. So far 5100 users have been migrated for the maintenance and the mailstores will be enabled again in approx. 1 hour. We will continue to provide updates as they become available.

Update - We are currently performing emergency maintenance on the cluster to expedite the migration of the affected users.
During the maintenance, users may be unable to access our email services. We will post the updates as we receive.


Investigating - We are currently experiencing an issue on our CLoudMail Cluster where some of our IMAP nodes have maxed out and can result in some users experiencing connection issues on the CloudMail. 

Client Impact: Some users will not be able to login at this time through their webmail and outlook. Webmail is now down completely and many users are still unable to login at this time. We have upgraded this issue and our operations team is still working to mitigate this impact.

Identified - Our operations team has identified a disk issue as the main cause of this problem and are currently working to mitigate the impact.

Read more »