Administrative Announcement


Were there some issues or DNS problems today? I was completely unable to get to the site from work - kept saying "site is offline". But since there were posts here all day, apparently most people did not get that message.

Yes. I'm looking into it! Judging from the numbers, the large majority of people were problem-free today (except for their profile pictures which I'm also looking into).

But a couple people couldn't get onto the site, and I haven't yet established a pattern. Please keep reporting your specific problems here!

DNS changes can take a couple days to spread through the Internet, so this is normal behavior.


Everything is great for me besides the profile pic problem!

(FWIW, I just tried getting in from my iPhone, and it worked fine for me, but I'd already had a page up on Safari from a couple days ago.)

Having the page up from before actually would make it more likely that the DNS shift wouldn't propagate -- it would default to the old site. That happened earlier, that it was directing to yesterday (i.e. showing last post as 9-something last night). However, I think chaofun's note is very helpful; probably those of us experiencing issues are just having DNS lag.

Do you have an IP address for the new server? That would take care of the DNS issue.

It doesn't; the webserver appears to be configured to examine the request URL as well (ETA: yeah, we're sharing the server daemon this way with some other sites). If your DNS cache hasn't expired by now, you could try to force it manually:

For current versions of MacOS, "dscacheutil -flushcache" (in a Terminal window).
For recent Microsoft Windows, "ipconfig /flushdns" (in a CMD window).

For those who really want to get in and can't still, I just checked Google's public DNS servers and they have the correct ip addresses in thier servers. Just plug:

In your DNS servers (I would change this back to your local DNS servers tho eventually).

Why, when I copy the link to a specific post, does clicking that link now take me only to the top of the page, not the specific post? It's annoying when trying to link from other places, like Twitter or my blog.

Heather -- that's not happening for me. And, for example, when I just clicked a link in your Twitter that was to a specific post, it took me there. What browser are you using? I've noticed this happening in older browser versions but I thought it was a delay, not a permanent link redirect problem.

Chrome, Windows. Might be a Chrome weirdness?

I bet it is. It just worked for me in Chrome on a Mac but it wasn't a click-to-open, it was a cut-and-paste (since Safari is my default browser). But I've had Chrome be weird about links opening tabs before.

(Also, have you ever noticed that sometimes Chrome bugs out on Google products the most? I mean, seriously. But that rant belongs somewhere else. :) )

Help. We've always had crawlers, but haven't had any more now than we we ported to DreamHost. These two issues (see top general announcement by DreamHost (note, times are PST, not EST), and bottom "white flag" raised by me as a Support Ticket).


Speartooth Mysql Server – Raid Issues * RESOLVED *

Posted (March 9th, 2011 at 7:41 am PST) by Justin

The mysql server named speartooth went unresponsive a short time ago. Our admins believe its RAID card has gone bad, as it will not boot with the raid array. We are investigating now and will likely need to replace the raid card. When we have more info or an ETA, we will update this post. Our apologies for the downtime related to this hardware issue.

– Update 8:04AM PST —

The issue has been resolved, the raid card behaved unusually and after removing a failed disk from the raid array, stopped recognizing the raid volume. The bad disk has been removed, and speartooth is running now. It has to run a large number of “repair table”‘s to fix the open tables when it went down, so it will be slow for possibly up to 15 minutes while this completes. When it is done repairing and back to normal, I’ll update this post one last time.

– Update 8:30AM PST –

The repair of such a huge number of databases at one time was too much for speartooth to take without possibly going down. We are slowly starting each of the 50 services on it, letting the tables have time to repair. Full service should be restored and back to normal within 15 minutes we are expecting.

– Update 9:07AM PST –

Despite signs of improvement, speartooth again has went down due to raid issues. We are evaluating the situation and the fastest solution, if it should get new hardware or if it should be moved to other mysql servers in the same rack. We will update this post when we have more information and an ETA.

– Update 9:30AM PST –

An admin is on their way to replace the raid card in the server. If this fixes the severe issue it is having, service will be restored. If it appears the raid volume is not recoverable, we will be replacing the hardware. More info to come. Sorry again for the issues with this server, raid issues are very rare on our mysql servers and this is not a typical outage.

– Update 10:58AM PST –

The raid card replacement got the server working again, but the raid array has failed. We will be reinstalling the debian OS on it, and will have to restore the databases from backups performed between 1 and 3AM this morning, which was a couple hours before the hardware issues started happening. I will update this post again when the server is live again, and when restores of backups begins and an ETA on its completion.

– Update 11:23AM PST –

Operating system is installed, services being setup on it now, and restores will begin in likely 15 minutes.

– Update 12:24PM PST –

Databases are half restored, we have multiple channels of restores going to get these all back in service as soon as possible. As soon as one of the channels finish, we’ll turn that service on and it will be live. When the restore is completely done I’ll do a final post.

– Update 1:43PM PST –

We have around 90% of the dbs restored, the last few services are being restored still. We should have full service restored for all customers within the next hour.

– Update 3:00PM PST –

The services zehner, zombo and zukowski are the last to restore and are taking a bit longer than expected due to larger databases, the other services are restored and live. When these last 3 services are finished, we will do a final check and resolve this status post.

– Update 3:43PM PST –

The last restores have finished, regrants are run, and all should be live again. If you are still having any issues, please contact support.


From: DreamHost Customer Support Team <support@dreamhost.com>

Date: March 9, 2011 6:28:35 PM EST

To: donrockwell@dcdining.com

Subject: Re: [leimal 43508159] donrockwell.com has been down all day long


- After reading this response, please consider visiting

- the URL below to comment on its quality. Thanks!


- http://www.dreamhost.com/survey.cgi?n=43508159&m=4777043



I've checked into why you've been unable to access the domain name, and

it your donrockwell user's processes are being killed by our procwatch

daemon, and this is the source of the errors you are receiving.

Procwatch is a daemon that runs constantly on shared servers to monitor

the usage of RAM/CPU and execution time so that no single user can use an

inappropriately high percentage of the shared resources and impact the

overall health of the server or the server's ability to serve all users'


If you haven't done anything recently that should have increased the load

on the server, then there's a chance that your site is getting hit by web

crawlers. Please review the following articles and see if there is

anything you can do to bring your usage down:



If you can manage to get donrockwell's memory usage down, your sites

should come back up on their own.

Another possibility is that you could upgrade to a private server. On a

private server, it's up to you to determine how much memory you need, and

there is no procwatch that will kill your scripts.

Please let me know if you have any other questions!

Art A

On Wed, 09 Mar 2011, you wrote:

This is about as critical as it gets. What gives?


That's a load of crap. There was no problem before their server crashed, but now, suddenly, the site is too busy? Yeah, it's possible, but it's a hell of a coincidence. More likely, something they did while restoring the server went wrong - some configuration setting somewhere.

I think that's correct, but I also think his response was not malevolent in any way, i.e., he just didn't put two and two together.

That server "speartooth," btw, is dr.com's MySQL server, so it was surely involved somehow. Also, I filed the problem ticket after they said 90% of the databases had been restored, and the rest would be up within an hour (I guess we were one of the last 10%).

This is a very large website, considering it's mostly text-based, and it's also very busy, so it would make sense that dr.com might be one of the last to emerge.

Regarding web crawlers: they've pretty much doubled in the last year - much of the crawler traffic is from Bing (whereas it used to be almost exclusively Google). I'm not sure whether to ban certain crawlers and run the risk of not showing up in searches, or just let them crawl away.

"Never attribute to malice that which is adequately explained by stupidity." (Or "Never ascribe... by incompetence," depending on which version you're quoting.)

I won't say stupidity either; I'm thinking a stressful day capped by a late-afternoon problem ticket. I believe him about Procwatch; I merely think it wasn't "reset" or some such thing.

Understand: the reason we ported to Dreamhost from Invision is because our website was too highly trafficked for their shared servers to handle anymore. (The cost of an individual server is prohibitive.)

Hope everyone can see I'm trying to be as transparent as possible here.



From the research I've done, too, we're a huge huge HUGE database;

No, not really, maybe large for your service provider, but really quite small otherwise (until 2 weeks ago I spent most of my days running queries against a system that had over 120 million unique records, and that is only an average database when compared to large banks and cellular providers).

No, not really, maybe large for your service provider, but really quite small otherwise (until 2 weeks ago I spent most of my days running queries against a system that had over 120 million unique records, and that is only an average database when compared to large banks and cellular providers).

Good point; for an Invision board, though, we're pretty massive. I've also had some db folks look at us and they say we're rather large and have a lot of dross we could eliminate (with some ideas how to do so), but I know pretty much zilch about it so I just pass the word along.

The DB (database) which houses the forum is quite large. The company that makes this forum is called Invision. Invison and VBullentin are the two major online forum boards used by websites, like Don Rockwell.com, that have similar number of users and threads. They both are used on other websites, and online communities that are much larger.

Apparently, the internet hosting company that Don uses to host the forum has a server crash from too many users (real people and web crawlers) trying to access the website at the same time or time interval. This may be a result of a cap that the internet hosting company places on the kind of account Don has for the forum since it's a shared server. It also could be a result of the doubling of web crawlers coming to DR.com, which are kind of automated 'users' that Google, Bing, and other search websites use to search the web.

Hope that helps explain some of the basics in a simpler way.

On what I assume is a related issue, a post I made yesterday morning about this time (8-9 AM)--and before the server crash--was gone once the board came back. I don't know how many other posts this may have happened to, but FYI.

Not sure where to put this, but:

1) I am being told there is no new content when I haven't logged in for a while.

2) I inadvertently tried to sign in with a wrong password. I tried again, I believe w correct password. When that didn't work, I tried clearing all cookies and cache and then tried again. This time, I was told that I had to wait 14 minutes for my account to become unblocked.

Any thoughts on why this is happening when browsing the site on an iPad? When I get the page of new posts after clicking "View new Content", clicking a link only highlights it. I have to click the link a second time before I'm taken to it. This doesn't seem to happen with links anywhere else on the site, as far as I can tell.

Any thoughts on why this is happening when browsing the site on an iPad? When I get the page of new posts after clicking "View new Content", clicking a link only highlights it. I have to click the link a second time before I'm taken to it.

I don't have any thoughts on why, other than maybe iPad doesn't think it's a link or maybe the webpage lags in its loading time before the new content link fully appears on the page load? But the solution I thought was perhaps you can bookmark the view new contents link as a favorite or on your bookmark toolbar and that might alleviate pressing that link altogether.

Sorry gang! There was an internal problem with the database, and I had to roll back the website to about 2:30 PM this afternoon.

Before I did that, I manually cut and paste the posts, and will be sending them out to people so they didn't lose their work.

Gotta love this message from Dreamhost when I rolled the website back:

Your database "donrockwell" will have its backups restored within the next 5-10 minutes!

Phew! Crisis AVERTED, eh?

If you posted between 2 PM and now, I'll be sending you a text copy of your post, and I hope you'll repost (I can't go in and repost for you, unfortunately).

As of 3:01 AM, all is well. B)

There were two separate rollbacks: the database (which contained all the posts, and was successful), and the domain (which took longer, and didn't finish successfully). There was one additional step which needed to be performed to complete the domain rollback, and that has been finished.

The only data that was lost were about eight posts written in the late afternoon, and I manually saved and PM'd that information to members so they could repost it. So, everything is as it should be (I think!) with my apologies for the evening of downtime.


A Tired Rocks.

Can you please put an ad banner somewhere on this site so that you actually get paid something for dealing with these headaches?

The biggest headache (or should I say "hangover") right now is that the avatars / personal photos (including yours) are showing up as little blue question marks on my MacBook. Re-uploading them solves the problem, but I hate to ask everyone to do that. This is clearly an Invision bug because the same thing happened about a year ago, but the important thing is that the content is preserved (and it's something of a relief to know that DreamHost's rollback system works nicely).

It's good to know you miss seeing me. I reuploaded my ridiculous piucture and am posting to see if it solved the problem.

ETA: It didn't.

I removed the one I had up there and then uploaded another one, and that worked. (I'd been planning to change the avatar for a while, so took the opportunity.)

Thanks for getting our avatars back!

While It looks as if some members have been able to download new images, all my attempts have failed. I continue to get the same message about my files being too big. Usually automatic reformatting shrinks jpeg files down to size, but this property may not be kicking in.

While It looks as if some members have been able to download new images, all my attempts have failed. I continue to get the same message about my files being too big. Usually automatic reformatting shrinks jpeg files down to size, but this property may not be kicking in.

Can you give an example so I can look into it?

