Emails being bounced – Resellers (Resolved)

We have uncovered a client account that has been sending out large quantities of spam from our ceres reseller server. Unfortunately this has resulted in a number of blacklists adding our IP address. We have taken measures to prevent further issues, and have been contacting the list maintainers to expedite the restoration of the reputation of this server’s main IP address.

We apologise for the inconvenience this has caused, and would like to assure you that we are dealing with the matter as swiftly as we can.

Update 11:44: Normal delivery has been restored – http://www.spamcop.net/w3m?action=blcheck&ip=77.72.4.98

Tagged ,

Emergency Maintenance (Completed)

UPDATE 01:34 Wednesday 21st December 

The network maintenance has now been completed. We have re-arranged the core network, upgraded software across the board and installed the first stage of new equipment to replace the failed equipment from last months outage.

—————————————————————————

Our systems have notified us that our core switches are operating at too high a temperature.

As a result we will be moving the switches tonight, after midnight, and expect that there could be up to 10 minutes of downtime, though we will seek to minimise disruption as much as possible.

Hopefully this planned action will not cause significant inconvenience, and is certainly more desirable than being plunged in to an unplanned outage through hardware failure.

All email will be kept and held br StriKe and delivered when the network comes back up.

Thank you for your patience and understanding.

Dublin & Homer Migration (Complete)

Important Update – Users may experience mail problems if still using the .purplecloud.com mail server. Please use mail.<YOURDOMAIN> instead. Webmail is now at <YOURDOMAIN>/webmail

If you’ve have MySQL errors it’s likely because cPanel doesn’t permit the main account user being a database user too (we didn’t realise this would happen and have been fixing as many as possible)

If we’ve not got you yours yet you can fix by creating a new database user via your cPanel (under “MySQL Databases”) pair it with your database and update the config file, or contact us to have us do it for you.

10:00 20th December – Dublin & Homer migrations have begun, we will keep you updated here.

18:00 20th December  – Dublin is over 50% transfered, and Homer is around 35%. We expect the transfers to continue throughout the night with a small pause for our network maintenance at 00:00 on the 21st.

20:45 20th December  – Dublin is now 95% transfered, and a few accounts that failed the initial transfer are being manually audited and transfered. Homer is around 50% transfered, and will be continuing long into tomorrow!

16:29 21st December – All accounts have now moved over.

If you’re having any issues please contact support at http://www.krystal.info

Auckland Migration (Complete)

The migration from Auckland started at 11:00 today and is still ongoing, clients will be seamlessly migrated throughout the night.

Homer and Dublin accounts will be migrated on Tuesday 20th.

Please see http://krystal.co.uk/welcome/ to determine which server you are on and for more information

Update: Auckland has now been migrated.

MAIL
Users should update their mail servers to either mail.<YOURDOMAIN> or dionysus.krystal.co.uk
Webmail is now at <YOURDOMAIN>/webmail

CPANEL
cPanel welcome email with usernames and passwords will be sent around shortly.
Your cPanel is at <YOURDOMAIN>/cpanel

DNS
Clients with external DNS should set their nameservers to ns1.krystal.co.uk and ns2.krystal.co.uk

If you have any problems please visit www.krystal.info

Welcome to Krystal

Ongoing Network Issue (Resolved)

We are currently experiencing an ongoing network issue that is causing full network outages, we are currently looking into this, and will keep you updated.

We apologise for any inconvenience caused. We are unable to provide an ETA for this to be resolved, but be assured we are working as fast as we can to regain connectivity.

UPDATE – 17:36 – There seems to be a problem with our core routing system. We are currently investigating, and are talking with the manufacturer. We expect there to be intermittent service for some time.

UPDATE – 18:05 – We now unfortunately have a total network blackout due to a failed piece of equipment. We are attempting to migrate to a contingency setup shortly.

UPDATE – 19:41 – We are working on the configuration for a new piece of hardware, with the aim of going live as soon as possible.

UPDATE – 20:05 – We are now running on the new equipment, and connectivity should be restored. We will be bringing up more connectivity over the evening, so there may be the occasional blip as routes get updated. We will need some time to work out the full cause of the original problem, but we will update here when we know more.

Ceres experiencing problems (Resolved)

We became aware at around 18:20hrs of an issue with Ceres – specifically apache. Despite normal levels of http activity, the server was experiencing very high loads. At this time we are still trying to determine if this may be a hardware fault that is subtle enough to not have triggered alerts on the racks.

We have a specialist working hard to determine the exact nature of the issue and will update this article as soon as we have more information.

Wednesday 21:30 – This isn’t good news – ceres is very poorly. After spending considerable time trying to discover where the errors were on the disk subsystems, we determined that the local primary raid controller was developing subtle errors, which caused corruption on partitions that house the main OS and server software. Ironically if it had failed in more spectacular fashion, the situation may have been more quickly recovered. Replacing the controller is now pointless due to the state of the array data. We are now into a hardware migration and bare metal restore which will probably take a number of hours to complete.

Rest assured we will be working on this throughout the night. We will update this post again as we get closer to a more solid fix time.

Wednesday 22:30 – A completely new server has now been configured and installed in situ as a replacement for Ceres. 24 CPU Cores (up from 8 ) with faster core and bus speed, plus 32Gb of RAM (double the old Ceres) and 600Gb of Raid 1 storage (again twice as much) is going to make Ceres a very powerful platform. Tom is now overseeing the restoration of our standard OS and WHM provision, and over 200Gb of customer data from our CDP backup system.

Thursday 06:20 – 95% of all user data has now been copied to the new hardware. The most recent data is being rsynced from the old ceres /home partition (which was safe) to ensure as little data loss as possible.

Thursday 08:30 – All user data restored, and server opened to production. We are still carrying out final tweaks, but we believe the services are now stable.

Thursday 09:00 – We are aware of an issue with sub-domains serving their parent domains content, we hope to have this fixed shortly

Thursday 11:40 – Subdomains are now functioning correctly. Mail is being converted to Dovecot – this may disrupt your ability to collect mail for the next hour or so while the mailbox conversion completes.

Network Wide DDoS (Resolved)

We are currently experiencing a network wide Distributed Denial of Service (DDoS) attack : http://en.wikipedia.org/wiki/Denial-of-service_attack

We’re seeing about 90,000 connections a second coming in,  which forms a lot of sessions…

Update: The target IP involved has been null-routed and full network functionality has been restored. We’ll learn from this experience and improve on both our prevention and response. Thank you for your patience.

Ares Migration (Completed)

As per the e-mail you should have received as a customer on Ares – we will be taking Ares offline around midnight to migrate to new Hardware.

We will keep you updated on the process here as we go.

E-mail will be kept on StriKe and delivered to your account when the new server is available, so no email will be lost.

Wednesday – 00:04  Ares is now offline, and we are moving data accross.

Wednesday – 01:39  Data is flowing from the old server to the new server, we are talking close to half a terabyte so it’s going to take some time to get it all across.

Wednesday – 03:35 Data still moving between the servers. All of the MySQL databases, config files etc have been moved and we are just awaiting the rest of the /home directory that houses everyones web files and mail directores to finish.

Wednesday 06:00 Data is almost completely moved now, we will be attempting to restore the new machine to a working state once the last few gigabytes have come across.

Wednesday 06:46 Data has finished transferring. We are now attempting to get all services working and configured.

Wednesday 08:30 We’re rebuilding Apache and hope to be back very soon!

Wednesday 09:15 Ares is back!

Wednesday 10:15 Apache is being rebuilt as some sites we’re generating errors. We expect to have websites working shortly. Email is working.

Wednesday 10:40 Apache is back up. If you are experiencing any issues please submit a ticket at http://support.krystal.co.uk/ and we’ll look in to it straight away. Thank you for your patience.

Ares Down (Resolved)

Early this morning Ares suffered data corruption in key system folders. We have taken the server offline while we work to ascertain the cause and work to resolve the issue. Please bear with us, we’ll have her back up and running as soon as possible.

Update 09:31 We’ve managed to restore the integrity of the operating system and are now running further checks on the machine to ensure that everything is o.k – will update shortly.

Update 11:19 The server booted a number of times successfully but then encountered errors. We have a backup of the server and so are going to attempt a restore of the affected operating system directory. If that fails then we will clone the data backup to a spare server we have.

Update 12:24 A new machine has been provisioned, with 32Gigs of RAM and faster CPU’s than Ares had (so an upgrade!) We’re now installing the operating system and following that will re-install the system software. Finally we’ll re-introduce the data from the most recent backup.

Update 13:45 The OS has been installed on the new Ares and we’re in the process of copying user data over now from the backup.

Update 14:30 Progress has been made on isolating and fixing the error on Ares I so if possible we’ll restore that server to use (which will be faster than transferring the data) and move the migration to the new hardware for a quiet out-of-hours window.

Update 16:00 Ares I is now back up! Thank you all for your patience, we’ll email all affected parties shortly with a scheduled migration window. No email will have been lost as it’ll be redelieverd by StriKe where it’s been queued.

Thank you for your patience, we’ll resolve this just as soon as is possible.

Artemis Down (Resolved)

Artemis has suffered a technical malfunction. An engineer is on their way to the site to carry out diagnostics and to replace the faulty component.

We expect a resolution by 20:00.

Update 15:40 Artemis has booted successfully after the removal of a failed CPU. Another engineer will bring a replacement CPU and add it during the night.

Update 12/01 – 01:00 The engineer is currently on site.

Update 12/01 – 02:22 The 2nd CPU has been installed, diagnostics run (all clear) and Artemis has been successfully restored to full working order.

Follow

Get every new post delivered to your Inbox.