 |  | Server Degradation - [solved] |
|  |
Posted: Fri Dec 11, 2020 8:09 am |
|
|
Red Squirrel |
AoV Owner |
 |
 |
Joined: 13 Dec 2006 |
Posts: 8277 |
Location: Ontario, Canada |
|
|
 |
 |
 |
|
Everything is good and we are now back to full disk redundancy
----
Just a heads up I have 2 drives that have failed in one of the raid arrays on the NAS and it just so happens both the shard's VM and the database server are on that array.
I have 4 drives on order from 2 retailers to maximize chance of getting it sooner, and to have 2 spares after, since a couple of the other drives are throwing errors too. The drives in this array have around 60 thousand hours on them so I think it's just a matter of time until they all need to be swapped out.
In order to minimize load on this degraded array and reduce chance of another failure, I have decided to turn off the database server. The shard will continue to run, but if anything happens such as a crash, it will result in a revert to around Dec 11 2:45am ET. (no longer the case)
You can continue to play as normal, and if all goes well there will be no revert.
I'm on night shifts right now so I don't want to do anything too drastic at this point as I can't dedicate my focus 100% to it, but once I'm off again, I want to look at migrating the database server to another array. In theory I should be able to do that while the shard continues to run, and when I bring it back up, it will start to save again.
The shard itself does not really produce much disk IO so that will remain.
For your viewing pleasure, this is a shot of the carnage:
If I'm understanding this right, any of the B drives can fail and we will be safe, but if any of the A ones fail, then the entire array is lost. I do have backups but hope I don't need to use them as it's still a pain to rebuild everything. _________________
my blog
Honk if you love Jesus, text if you want to meet Him! |
|
Last edited by Red Squirrel on Tue Dec 29, 2020 3:34 am; edited 4 times in total |
|
|
|
 | |  |
Posted: Sat Dec 12, 2020 7:51 am |
|
|
Red Squirrel |
AoV Owner |
 |
 |
Joined: 13 Dec 2006 |
Posts: 8277 |
Location: Ontario, Canada |
|
|
 |
 |
 |
|
It's quiet tonight at work and I got everything done that I needed to.
Currently migrating database VM to another LUN. This is kinda critical as that very act is putting lot of strain on the array, but mostly read and not write, so should be fine...
Once it's on the new LUN I will fire the VM back up and turn the server back on and start to sync the shard back up. _________________
my blog
Honk if you love Jesus, text if you want to meet Him! |
|
|
|
|
 | |  |
Posted: Sat Dec 12, 2020 8:25 am |
|
|
Red Squirrel |
AoV Owner |
 |
 |
Joined: 13 Dec 2006 |
Posts: 8277 |
Location: Ontario, Canada |
|
|
 |
 |
 |
|
Database server now on new array and running. Server is now synced with database and there is no longer a risk of revert.
However the shard VM itself remains on the degraded array so there is still a risk of downtime, but no data loss.
I still have no ETR for arrival of new hard drives and with the weekend they won't really move until monday but according to the tracking number from one retailer the drives are in Richmond Hill which is here in Ontario so once it does ship it should only be a few days. _________________
my blog
Honk if you love Jesus, text if you want to meet Him! |
|
|
|
|
Posted: Sat Dec 12, 2020 9:00 pm |
|
|
ggkthx |
|
 |
 |
Joined: 13 Jan 2009 |
Posts: 953 |
Location: MN |
|
|
 |
 |
 |
|
This is quite the adventure.  _________________
I didn't choose the Fel life, the Fel life chose me. |
|
|
|
|
Posted: Sun Dec 13, 2020 12:23 am |
|
|
Red Squirrel |
AoV Owner |
 |
 |
Joined: 13 Dec 2006 |
Posts: 8277 |
Location: Ontario, Canada |
|
|
 |
 |
 |
|
Lol yeah quite the adventure. I can't wait for those drives to come in... it's one of my higher performance raid arrays so have a lot on there.
I'm actually due for an overall upgrade to increase capacity since most of my arrays are running low on space and are on fairly old drives, but costs of living keep going up so don't really have money to buy server stuff anymore these days. _________________
my blog
Honk if you love Jesus, text if you want to meet Him! |
|
|
|
|
 | |  |
Posted: Sat Dec 19, 2020 1:49 am |
|
|
Red Squirrel |
AoV Owner |
 |
 |
Joined: 13 Dec 2006 |
Posts: 8277 |
Location: Ontario, Canada |
|
|
 |
 |
 |
|
So the two replacement drives came in. I pulled out the 2 dead drives and put the replacements in. Running some tests on them to make sure they're good, then will insert them into the array and let it rebuild.
At this point the shard's data is NOT at risk as per my last post about migrating it to another LUN, but the possibility of downtime is still a risk should the array get more drive failures.
I am not too worried though and I think everything will go smooth. This should be over within 1-2 days. _________________
my blog
Honk if you love Jesus, text if you want to meet Him! |
|
|
|
|
Posted: Sat Dec 19, 2020 4:51 am |
|
|
ggkthx |
|
 |
 |
Joined: 13 Jan 2009 |
Posts: 953 |
Location: MN |
|
|
 |
 |
 |
|
Cool cool cool. Hope all goes smoothly! _________________
I didn't choose the Fel life, the Fel life chose me. |
|
|
|
|
Posted: Sat Dec 19, 2020 5:21 am |
|
|
Red Squirrel |
AoV Owner |
 |
 |
Joined: 13 Dec 2006 |
Posts: 8277 |
Location: Ontario, Canada |
|
|
 |
 |
 |
|
First round of testing (long SMART test) completed without error on both drives.
Doing full write test now then will do full read back test. This makes sure there's no bad sectors.
It's so odd looking at the stats and seeing a drive with only several power on hours compared to like 60 thousand lol. The drives did pretty good time. _________________
my blog
Honk if you love Jesus, text if you want to meet Him! |
|
|
|
|
Posted: Sat Dec 19, 2020 8:53 am |
|
|
Red Squirrel |
AoV Owner |
 |
 |
Joined: 13 Dec 2006 |
Posts: 8277 |
Location: Ontario, Canada |
|
|
 |
 |
 |
|
All tests were good. Rebuild in progress!
 _________________
my blog
Honk if you love Jesus, text if you want to meet Him! |
|
|
|
|
Posted: Sun Dec 20, 2020 12:59 am |
|
|
Red Squirrel |
AoV Owner |
 |
 |
Joined: 13 Dec 2006 |
Posts: 8277 |
Location: Ontario, Canada |
|
|
 |
 |
 |
|
Everything good now. Raid array is nominal.
I have 2 other drives on the way which I'll keep as spares as I do have more drives showing errors. _________________
my blog
Honk if you love Jesus, text if you want to meet Him! |
|
|
|
|
Age of Valor Forum Index » News |
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
All times are GMT
Page 1 of 1
|
|
|
Powered by phpBB © 2001-2004 phpBB Group Designed for Trushkin.net | Themes Database
This website and forum best viewed in a standards compliant browser such as Firefox or Opera. Internet explorer is not recommended.
|