Heartbleed openSSL Bug Updates

Tuesday 8th April,

Krystal patched all affected servers.

Thursday 10th April,

Due to the recent discovery of the HeartBleed SSL bug, we have now replaced the private keys and SSL certificates across our fleet.

Further Information: Krystal’s retail (shared, reseller and premium) servers were all patched, or not initially vulnerable to the Heartbleed issue within a few hours of learning about it on 8th April. While this prevented the information leakage associated with the bug, it could provide the opportunity for those able to capture customer packets over local networks to decrypt secure data. Therefore we decided to cycle the private keys the certificates protecting cPanel, Webmail, Apache, Exim, POP3/IMAP and FTP services across the fleet. This was done by early this morning on all but the reseller servers where we experienced a little delay in obtaining the newly signed certs (perhaps the whole world is trying to do the same?). We apologise for the brief inconvenience this has caused our resellers.

Connectivity Problems

Friday 4th April

03:50 - We are investigating a total loss of connectivity across our entire network.
04:15 - We are investigating a hardware fault on one of our core routers, we have diverted traffic for the time being.
05:10 - We’ve been unable to confirm a hardware error, but the device was in a state of kernel panic. We are upgrading the version of software on the device to the latest version, and we will add it back into our network and monitor the situation.
05:20 - The router is fully upgraded and back in line, we’ve reverted traffic to flow through it and will be monitoring the device closely.
05:30 - One last reboot to bring everything into line, and to bring our full BGP tables online.

Connectivity Problems

19:58 - We have noticed a few dropped connections from one of our bandwidth providers and are investigating.
20:15 - We’ve dropped Cogent from our BGP mix until the service stabilises – all traffic should route over our alternative networks.
20:45 – Some users are experiencing connectivity issues still coming from Virgin Media – we are investigating.
20:56 – Virgin Media connectivity should be restored.

Apollo unplanned outage (Resolved)

11.45 Apollo stopped responding to server monitoring requests.

11.51 Technicians unable to access server via SSH console – Server reboot initiated.

11.59 Apollo has powered on but is now forcing a data consistency check.

12.33 The data consistency check is still on going.

13.00 The data consistency check is 50% complete.

13.20 The data consistency check is now 70% complete – nearly there!

13.55 Last few inodes being checked now 90% complete.

14.20 Apollo has been restored to service.

Tagged

Possible Datacentre Outage

21:10 - All of our hosted websites and platforms have gone offline. We’re attempting to contact The Bunker to confirm the problem is their end.

21:17 – The Bunker have confirmed a network problem their end and have their network engineers investigating.

22:07 – Unofficial Tweet from another The Bunker client:

22:11 – “No Information” is the current message coming from The Bunker. We will continue to attempt to find out anything we can.

22:22 – Unofficial Tweet from another The Bunker client:

22:27 – It looks like we’re back online

Tuesday 18th March 01:11 – All connectivity has been lost again – awaiting a status update from The Bunker

01:21 – We’re seeing intermittent connectivity at the moment, hopefully it will stabilise shortly.

01:23 – Things are looking stable – it looks like we’re currently routing traffic via a backup route.

01:25 – Unofficial Tweet from another The Bunker client:

01:47 – Unofficial Tweet from another The Bunker client:

Olympus unplanned outage

12:10 Olympus began to exhibit high CPU load which rapidly became unrecoverable. Server failed shortly afterwards.

12:16 Server reboot initiated.

12:19 All services available again. We are investigating the cause and will report back if anything significant can be reported.

Tagged

Datacentre Outage (Resolved)

10:30 All of our hosted websites and platforms have gone offline. We made contact with the datacentre (The Bunker) immediately who confirmed that the issue is their end and that they are working to fix it.

10:45 The outage is still being investigated.

11:00 Still no confirmation from the datacentre as to the nature of the issue, but we have been informed it is not environmental (eg flooding) so likely to be network.

11:30 The Bunker have confirmed that the the issue is network related and that they are working to resolve the issue.

11:35 The Bunker have tweeted “We are experiencing connectivity issues engineers on site investigating whether internal will update ASAP.”

12:00 The Bunker have tweeted “Catastrophic failure with our suppliers in Telehouse North that’s affected internet connectivity”. Will push for an ETA on a fix.

12:14 It looks like we’re back! There might be slower than normal access for a while as things return to normal and traffic is re-routed.

Thank you for your patience throughout this incident.

Link to Report -> Preliminary Incident Report Update – IM51073 and IM51075 Ash Site Network Incident

Athena Planned Migration

Transfer is in progress. Current estimated completion time is Sunday afternoon/evening.

As notified, Athena is being migrated over the weekend. This means that there will be large periods of downtime.

We will update this post periodically as progress is made. The plan is to start the migration of data at 10pm GMT Saturday night.

[14th December - 21:51] – Migration work is starting now, expect websites and e-mail to go offline shortly.
[15th December - 01:25] – Database and account transfers are nearing 50% complete
[15th December - 06:26] – File transfer is around 50% complete
[15th December - 09:23] – Initial File transfer is complete, syncing of the user data is commencing
[15th December - 12:26] – Final File Sync from remote server has started
[15th December - 14:43] – Files and accounts are synced – we are now working on bringing up services
[15th December - 15:25] – We’re currently investigating a problem with the MySQL service
[15th December - 23:28] – The migration is now fully complete

Zeus unplanned outage

12:25 Due to stability problems, zeus.krystal.co.uk is being rebooted. It was automatically triggered after an unrecoverable issue arose a few minutres ago. The server is in the process of recovering, and should be back up in a couple of minutes. We will update here again shortly once we have more information.

12:35 Zeus is back up again – we are continuing to look for the cause.

ModSecurity Updates – 403 Forbidden Errors

This issue may affect users on Amethyst, Topaz, Ruby, Sapphire, Diamond and Trinity Reseller plans. It will NOT affect Dedicated or VPS clients.

We are  introducing some revisions in the way we handle updates to our core ModSecurity rules. ModSecurity is a component of the web servers that filters the requests made to your website and the responses made from it for signs of malicious activity. Imagine it as something similar to the WebShield type functionality that is built into many firewall produce for your desktop PC – but for servers.

The updates will introduce rules that are updated more regularly and we are hoping this will reduce the number of false positives we have to deal with (i.e. when ModSecurity blocks a request that was in fact genuine). However, if you get an unexpected 403 error on your website in the next few days(and it will probably occur when you are trying to upload something to your website or submit a form), then please check out this article for help in determining the cause:

https://support.krystal.co.uk/entries/23155552-I-get-a-403-Forbidden-error-when-I-try-to-view-my-site

If you are sure it’s not due to 2. or 3. when please let us know by raising a support request and we will investigate the issue to find a solution as quickly as possible. However, we envisage very few problems will arise.

Follow

Get every new post delivered to your Inbox.

Join 55 other followers