FutureQuest, Inc. FutureQuest, Inc. FutureQuest, Inc.

FutureQuest, Inc.
Go Back   FutureQuest Community > FutureQuest Site Owners (All may read - Only Site Owners May Respond) > Notices & Alerts
User Name
Password  Lost PW

 
Thread Tools Search this Thread Display Modes
Old 04-25-2022, 07:19 AM   Postid: 188700
 Terra
CTO FutureQuest, Inc.
 
Terra's Avatar
 
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
Re: Network Outage

Current status with email systems.

The storage cluster suffered from metadata corruption which prevented this SAN from properly starting. We have been working on fixing this non-stop while trying to recover as close to 100% of emails as possible.

Out of a current store of 38,435,157 we were unable to recover 2,452. Those 2,452 emails were held in memory cache before being finalized to storage cluster.

That part is now done. What remains is fixing some control files that are causing SMTP to hang and not accept inbound email. There are 8 of them that need repair, and we are clearing them and rebuilding.

I am hoping that email should be back in full production within the next couple hours. As best I can see, SMTP processing is the last thing that needs to be fixed.
__________________
The FutureQuest Team
Terra is offline  
Old 04-25-2022, 07:23 AM   Postid: 188701
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 377
Re: Network Outage

Quote:
Originally Posted by Terra View Post

I am hoping that email should be back in full production within the next couple hours. As best I can see, SMTP processing is the last thing that needs to be fixed.
My site is still down, is this something that you are aware of? I'm on Managed QuestServer
__________________
Neil
www.esl-lounge.com
esllou is offline  
Old 04-25-2022, 07:27 AM   Postid: 188702
bcoffey
Registered User

Forum Notability:
0 pts: Even-handed
[Post Feedback]
 
Join Date: May 2007
Location: Maryland
Posts: 6
Re: Network Outage

Good news! I just had some email trickle in from yesterday. (Fingers crossed that the rest is fixed soon.)
bcoffey is offline  
Old 04-25-2022, 07:29 AM   Postid: 188703
 Kevin
Systems Administrator
 
Kevin's Avatar
 
Join Date: Aug 2001
Location: Orlando, FL
Posts: 2,986
Re: Network Outage

On the topic of MySQL...

4 MySQL servers were destroyed 3 of them probably beyond recovery. They were MYSQL07, MYSQL08, MYSQL10, and MYSQL16. Fortunately we were able to move MYSQL16's disks from a spare machine and get it going fairly quickly though we were prioritizing email hardware at the time.

One lucky thing did happen here... The other 3 were the oldest MySQL servers we were still using. The next time we needed a MySQL server I was planning to start it with the content from all 3 anyway.

So, if your MySQL data was on any of those 3 it was restored from the ~3:30am backup and is now on MySQL16. The new MySQL16 is faster than the old one and hopefully the new load won't bother it much. If it does we can move do some distribution of users later.

If you are still seeing MySQL errors other than network related the scan/repair scan is running and will finish soon. I will post when it is done. If you are getting MySQL networking errors it probably means you manage your own DNS and are serving out an A record to the raw IP instead of the mysql.domain.tld.fqdns.net name in your web app config. If you change your 3rd party DNS to the new IP let us know as we can flush the entry out of our cache so it takes effect as soon as we do.
__________________
Kevin
Kevin is offline  
Old 04-25-2022, 07:30 AM   Postid: 188704
 Ryan
Service Assistant
 
Join Date: Feb 2010
Posts: 235
Re: Network Outage

Quote:
Originally Posted by esllou View Post
My site is still down, is this something that you are aware of? I'm on Managed QuestServer
I just wanted to note that we do also have your email/ticket and I have forwarded it to the techs to check your MQS.
Ryan is offline  
Old 04-25-2022, 07:31 AM   Postid: 188705
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 377
Re: Network Outage

thanks a lot Ryan
__________________
Neil
www.esl-lounge.com
esllou is offline  
Old 04-25-2022, 07:45 AM   Postid: 188706
wilkoff
Site Owner

Forum Notability:
0 pts: Even-handed
[Post Feedback]
 
Join Date: Aug 2020
Posts: 21
Re: Network Outage

When can we expect email to return? It has now been almost 24 hours
wilkoff is offline  
Old 04-25-2022, 08:30 AM   Postid: 188707
Erica C.
Site Owner

Forum Notability:
156 pts: Ambassador of Goodwill
[Post Feedback]
 
Join Date: Mar 2005
Posts: 418
Re: Network Outage

I have email again and all my sites are back! Hope this is true for everyone else.
Thanks FQ for all your hard work.
Erica C. is offline  
Old 04-25-2022, 08:41 AM   Postid: 188708
 Terra
CTO FutureQuest, Inc.
 
Terra's Avatar
 
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
Re: Network Outage

SMTP has been fixed and all mail services are now operational.

We have run various tests and have been watching the log files closely.

If you run into any problems, please let us know so we can take a closer look. There may be some gremlins due to the missing metadata entries from the storage cluster. However the bulk of them should clear out over time. The ones that don't I'll have to manually remove. I mention this because I've had to handle them before (rare), and the behavior is when a process tries to access the file - the metadata is there, but the file itself isn't causing the process to go into a D state (uninterruptible sleep) waiting for SAN I/O that won't ever happen.

Once again, our sincerest apologies for the headaches this has caused. The email systems are incredibly sophisticated with hundreds of moving parts that have to work in unison. Most of the time, it doesn't take long to sort things out, but in the case of a cold bootstrap, things can get complicated really fast as our first and foremost priority is to not have any lost email. Validating 34M+ emails takes an enormous amount of I/O and time to do right as it must be offline to accomplish.
__________________
The FutureQuest Team
Terra is offline  
Old 04-25-2022, 08:47 AM   Postid: 188709
MarkW
Site Owner

Forum Notability:
10 pts: User-friendly
[Post Feedback]
 
Join Date: Apr 2001
Location: UK
Posts: 164
Re: Network Outage

"...we were unable to recover 2,452."

Are these emails lost?
MarkW is offline  


Currently Active Users Viewing This Thread: 1 (0 members and 1 visitors)
 

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -4. The time now is 12:23 AM.


Running on vBulletin®
Copyright © 2000 - 2019, Jelsoft Enterprises Ltd.
Hosted & Administrated by FutureQuest, Inc.
Images & content copyright © 1998-2019 FutureQuest, Inc.
FutureQuest, Inc.