|
|
|
12-05-2020, 07:31 AM
|
Postid: 188170
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
|
[FQuest Notice] SAN work progress updates
I am currently making a major push to unload one of the emergency storage nodes and shifting the datasets over to a new migration storage minion. This will allow me to break down this storage minion and pave over its storage areas with a fresh filesystem, reinsert back into the cluster, and let its self-healing take over to repopulate.
The initial data migration task is pretty much transparent, however I will post up when I plan to twitch the SAN in order to do a hot=>cold=>hot cut over. It is technologically impossible do a hot=>hot cut over without the potential of data loss as I've tried to implement this on a destructible testing SAN. In testing this technique data loss ranged from 2 to 70 chunks which is completely unacceptable. The only way to guarantee zero data loss is to twitch the SAN.
The implication of this cut over is any unavailable chunk accesses will cause the mounting daemons to be temporarily blocked until the migration chunks are scanned and recorded in the master metadata server's memory cache. The custom technique I'm using (hot cache transversal) ensures they will be scanned at the highest rate possible minimizing the overall twitch time.
__________________
The FutureQuest Team
|
|
|
12-06-2020, 12:56 PM
|
Postid: 188174
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
|
Re: [FQuest Notice] SAN work progress updates
I will be twitching the SAN in the next few minutes.
This will clear out the first emergency storage node so I can pave over its dual data silos with good intentions.
I was going to do this earlier today, but I saw too many duplicate chunks (which is normally fine and is side effect of the stacking). However, there was enough that it would cause extra maintenance loop processing that could have run on for a few days to achieve convergence. I'd rather that processing time be spent on retrieving your emails, then doing busy work.
Overall, I spent all of yesterday writing/testing/debugging the new offline de-duplication system that will make storage processing more efficient. The migration set (one of many) it dedup'd (reduced) 12.8M down to 9.5M.
__________________
The FutureQuest Team
|
|
|
12-06-2020, 02:10 PM
|
Postid: 188175
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
|
Re: [FQuest Notice] SAN work progress updates
This round of reintegration work has completed. Four storage arenas have been scanned into two minion data silos and registered with the master metadata servers. All data chunks are accounted for.
Our apologies for ripping off the emergency bandaids during midday as we are trying to do as much as we can before the Monday rush is upon us. We still have a ways to go, but this was a major leap forward giving us the ability to repave the two degraded silos.
__________________
The FutureQuest Team
|
|
|
12-06-2020, 02:14 PM
|
Postid: 188176
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
|
Re: [FQuest Notice] SAN work progress updates
I am now starting work on another silo that just completed its 57.5 hour data transfer journey. I should have this next set ready for reintegration in ~6 to ~8 hours.
__________________
The FutureQuest Team
|
|
|
12-07-2020, 01:47 AM
|
Postid: 188178
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
|
Re: [FQuest Notice] SAN work progress updates
Quick Notice: While working on the startup scripts for the new silo type, I accidentally stopped (via script that saw "pt-2" as "pt" - errant splitting lacking $3) the companion minion daemon that was handling four storage arenas. As I wasn't able to perform a hot cache transversal on those areas, it is taking up to 20 minutes to scan them all back in and register with the master metadata system. Currently one of those areas have finished and I'm waiting for the other three ( at ~65% ).
Service should return to normal soon, however I will be (tentatively) twitching the SAN at least two more times tonight to finish current migrations.
Our apologies to anyone that was affected by this technical snafu.
__________________
The FutureQuest Team
|
|
|
12-08-2020, 05:15 AM
|
Postid: 188179
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
|
Re: [FQuest Notice] SAN work progress updates
I am turning up the replication routines to clear out a backlog caused by the prior days work. If I do not do this now, it will begin to get exponentially worse that could turn a few days into a week or more.
Performance will be degraded until 8am ET.
__________________
The FutureQuest Team
|
|
|
12-08-2020, 08:08 AM
|
Postid: 188180
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
|
Re: [FQuest Notice] SAN work progress updates
I have dialed back the replications from 80% to 40% priority for the next hour. At 9am, I will further reduce it to 10%. The backlog has been reduced from 1,213,874 down to 238,227 pending chunks.
__________________
The FutureQuest Team
|
|
|
12-09-2020, 08:34 AM
|
Postid: 188184
|
|
Site Owner
Join Date: Oct 2013
Posts: 43
|
Re: [FQuest Notice] SAN work progress updates
8:30am Eastern on Wednesday 12/9, anything going on we should know about? I seem to be having issues sending mail through Outlook or logging in to QuestMail.
|
|
|
12-09-2020, 08:41 AM
|
Postid: 188185
|
|
Site Owner
Join Date: Nov 2000
Location: Grove City, PA, USA
Posts: 28
|
Re: [FQuest Notice] SAN work progress updates
E-mail appears to be down yet again.
|
|
|
12-09-2020, 08:44 AM
|
Postid: 188186
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 8,108
|
Re: [FQuest Notice] SAN work progress updates
Mohawk, you beat me to it. I was a little delayed with hair on fire when one of the storage nodes stepped out for breakfast (kernel driver OOPS) and forced a re-convergence. Unfortunately, this is a 10 to 15 minute window of high I/O load that slows everything down until the master can update its metadata.
Performance I/O should return shortly.
__________________
The FutureQuest Team
|
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 visitors)
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -4. The time now is 12:23 AM.
|
| |
|
|
|