DCUM Forums Outage

by Jeff Steele — last modified Feb 25, 2023 01:00 AM

A database crash has caused an outage.

Fixed again and I'm going to bed.

Broken Again. I tried to fix a problem with the Nanny Forums and something went wrong. Working on it now.

This evening I was doing some maintenance on the database cluster when suddenly an error message appeared and the forums stopped working. The cluster consists of four nodes and two of them had crashed completely and two were in a not-working state. I tried several times to bring the cluster back up, but what appears to be a corrupt file prevented that. I am now restoring yesterday's backup. Unfortunately, since backups are done at night, any post or thread from about 4 am Friday morning will be lost.

UPDATE: The DCUM Forums are back up but are working very slowly. Posting is very slow. The page times out but the posts actually go through. Please don't resubmit because that will cause duplicate posts. Recent Topics also doesn't seem to be working. 

The Nanny Forums are not working.

I will work more on this later after I get some sleep.

UPDATE SATURDAY MORNING: There was a problem with the data restoration so I am going to repeat it which means that the forums will be down for a while again this morning.

UPDATE SATURDAY AFTERNOON: I can't resolve the index issue. Reading the site is okay as long as you stay away from Recent Topics. Posting times out, but the post goes through. If you try to post, don't repost after a time out.

I'll let it go like this for a while because I need a break.

UPDATE SATURDAY AFTERNOON AFTER A BREAK: Thanks to Bob in the comments below, I found a way to resolve the problem I was having. It looks like things are back to normal minus a day's worth of missing posts.

Dcud says:
Feb 25, 2023 09:02 AM
Thanks!
Anonymous says:
Feb 25, 2023 09:35 AM
I’m in DCUM withdrawal - please hurry!! :)
Jeff Steele says:
Feb 25, 2023 09:36 AM
I'm doing my best. I've run into a series of problems that I am having trouble solving.
Anonymous says:
Feb 25, 2023 11:07 AM
Anything we can try to help with?
Jeff Steele says:
Feb 25, 2023 11:10 AM
If anyone knows anything about Mysql Cluster. I have restored the data and rebuilt the indexes but am getting an error message saying, "Got error 4721 'Mysqld: index stats thread not open for requests' from NDB" and none of the indexes work. Without indexes, the forum will work but loads very slowly and eventually just overloads the server. I am doing everything I can think of to solve this issue. I've never seen this problem before and Google doesn't turn up anything useful.
Anonymous says:
Feb 25, 2023 11:51 AM
If google isn’t turning up anything useful, I would try dropping and rebuilding the index. I am not a DBA though.
Jeff Steele says:
Feb 25, 2023 11:56 AM
That's basically what I am doing. But, there are a lot of indexes so I can't do them manually. I am running an automated process right now to do that.
Bob says:
Feb 25, 2023 12:55 PM
It looks like something internal to NDB. See here:

    https://dev.mysql.com/[…]/ndb-error-codes-single.html

So frustrating how useless google is these days for searches like this.

If there's a way to bring up the cluster but keep web traffic blocked, I would do that and let it stay up for a while. It may by that the NDB stuff is thrashing away until it can get all the indexes up to date and synced between instances. If you can monitor CPU/DISK activity on the cluster without any web traffic, that may give some indication if it is indeed busy trying to recover internal state even after the indexes (on disc) have been rebuilt. That's just a guess though.
Jeff Steele says:
Feb 25, 2023 12:57 PM
I had the same thought but I don't see any activity between the nodes. I've been watching the logs and cpu performance. I'm just going to let things go for a while because I need some rest.
Bob says:
Feb 25, 2023 01:02 PM
Sleep solves a surprisingly large number of technology problems; the NDB stuff may just come out of some deadlock state(s) on its own while you are resting. :-)
Jeff Steele says:
Feb 25, 2023 01:03 PM
Hopefully.
Bob says:
Feb 25, 2023 01:27 PM
Somewhat grasping at straws, but a couple more things to try if it is not already resolved by the time you are back at it:

Try the ndb_index_stat command with --verbose. That may at least tease out more info about whether or not there is a problem with the indexes. More info here:

    https://dev.mysql.com/[…]/mysql-cluster-programs-ndb-index-stat.html

Maybe try a "SHOW STATUS LIKE 'ndb_api%';" from a mysql client. More info on that here:

        https://docs.oracle.com/[…]/mysql-cluster-ndb-api-statistics.html

Hopefully someone with actual NDB knowledge will get back to you via your MySQL Forums post.
Jeff Steele says:
Feb 25, 2023 02:59 PM
That did it! Thank you so much. I went through every parameter of that command and found one to check indexes. It found one that was missing and then another parameter created it. Now things are back to working. That command is really miss-named because it does a lot more than provide stats. Thank you, thank you, thank you.
Bob says:
Feb 26, 2023 02:19 AM
All’s well that ends well. Glad that helped, it was pretty much a shot in the dark :-)
Anonymous says:
Feb 26, 2023 07:51 AM
Thanks Jeff and Bob!!
Larla says:
Feb 25, 2023 09:57 AM
Reports are now coming in about the most productive Saturday morning ever! People all over the area have found themselves with available time and energy. Says one citizen in Northwest DC, “It’s not even 10am, and I’ve repaired my roof, had an engaging conversation with my spouse about how we’re going to tackle our household chores together and reorganized my kids’ winter clothes. I’m about to go for brisk walk before this snow. I’ve never been this focused on a Saturday morning! I wonder what’s different?”
Rinseandrepeat says:
Feb 25, 2023 10:18 AM
LOL! Love that this person chose the name Larla too.
Anonymous says:
Feb 25, 2023 10:38 AM
Thanks for Jeff.Happy weekend.
Anonymous says:
Feb 25, 2023 11:22 AM
So glad we can have a peaceful night without the constant classist, racist remarks on the MCPS forum. Thanks for making these anonymous grifters get a life!
Anonymous says:
Feb 25, 2023 11:31 AM
Thank God . You are enabling internet thugs to have a life. You are making them powerful by bullying people anonymously. You are the reason the admin of this trash page to have low life uneducated POS a place in our society .
Anonymous says:
Feb 25, 2023 11:39 AM
Somebody needs a hug!
J’accuse! says:
Feb 25, 2023 02:02 PM
I suspect we may have just found the perp behind this attack!
Anonymous says:
Feb 25, 2023 11:32 AM
Jeff, We use a service called Percona that offers a flat $5k for year long access to their help desk service for SQL servers. Promise I am not selling, my company is a happy customer!
Anonymous says:
Feb 25, 2023 01:49 PM
Hang in there, Jeff!!
Larlo says:
Feb 25, 2023 02:05 PM
Have you tried asking chatGPT for a solution? I am not trying to be funny, just helpful!!
Larla jr says:
Feb 25, 2023 02:19 PM
Thanks for all your work Jeff! Just an FYI Still get the message forums unavailable though instead of limited accesss.
Jeff Steele says:
Feb 25, 2023 02:20 PM
Yeah, things slowed down so much that it triggered that notice.
Larla says:
Feb 25, 2023 04:50 PM
Jeff, genuine thanks for all that you do!
Add comment

You can add a comment by filling out the form below. Plain text formatting. Web and email addresses are transformed into clickable links. Comments are moderated.