Down for the Next Hour
Our ISP is down for scheduled maintenance for the next hour or so which means we're down for the next hour or so.
Update: The maintenance didn't take as long as we thought. As you were.
Our ISP is down for scheduled maintenance for the next hour or so which means we're down for the next hour or so.
Update: The maintenance didn't take as long as we thought. As you were.
We're having some unplanned down time this morning. Investigating with the ISP now.
Nevermind! We're back.
We have another network outage at our host. Sorry about this.
Update: We're back *clicks stopwatch* about 40 minutes later.
MetaFilter sites have been down for about 30 minutes thanks to an outage at our host's datacenter. Here's the message we got from them: "This is a known issue. We are working with our NOC in that datacenter to bring back full connectivity as soon as possible. We do not currently have an ETA as we are still investigating the cause. We appreciate your patience."
Update: (a few minutes later) We're back.
Looks like the same problem as before is happening now -- some unknown network issue at the host is cutting off our outside access, so the sites are down for an unknown period of time, out of our control. Last time it was 15 minutes of downtime, hopefully it's less than that this time around.
update: back about ten minutes later.
Just as I was about to post the new podcast, both servers went down, and we can't seem to bring them back. I'm talking to the hosting company right now trying to get to the bottom of it, but it seems like a network/infrastructure issue at the moment that is hopefully resolved soon. In the meantime, MeFi is totally offline.
update: and we're back, after about 15min of downtime. Checking the logs to see what's up but it appears to be a temporary network issue.
The colocation facility where the MeFi servers are sitting is undergoing maintenance right now, so the site is unreachable until they are done (should be 10-15min).
update: ooof, we're at 45 minutes of downtime and the colo just sent me the following message:
We apologize for the inconvenience, but the server is currently being monitored as down due to network maintenance being performed on equipment connected upstream from your server. We will inform you once the maintenance has been completed. We appreciate your patience, and hope to resolve the issue as quickly as possible.
We spent most of the day working on uptime issues, trying out new settings and trying out different tweaks to the server, but in the end we're about where we were yesterday, with server restarts every hour or so (kicked off automatically when there are memory leaks/crashes). There were a few hours of strangeness on the front page of MeFi not showing all posts, but that has been fixed.
We'll be working on server issues likely next week as well, but hopefully things improve as traffic goes down over the weekend.
For the past 24hrs or so, we've been testing out some monitoring software to get a better handle on what is causing some intermittent crashes, but it seems like the added load of the monitoring got the better of the server, causing several crashes per hour for the last 12 hours. We've rolled things back to how they were before the night began, so things should get a bit better today.
We're pretty much done with the integration of the basics on the new servers, and everything is running a lot better. We have a couple runaway processes we're keeping an eye on, but for the first time in a couple weeks, the server is stable for 12+ hours in a row.
The last things left to do are the following, and we hope to bring them all back next week: