Age of Valor
login.php?sid=4fcc061474c6f4fd0a0c73f1e27bcd4b profile.php?mode=register&sid=4fcc061474c6f4fd0a0c73f1e27bcd4b faq.php?sid=4fcc061474c6f4fd0a0c73f1e27bcd4b memberlist.php?sid=4fcc061474c6f4fd0a0c73f1e27bcd4b search.php?sid=4fcc061474c6f4fd0a0c73f1e27bcd4b index.php?sid=4fcc061474c6f4fd0a0c73f1e27bcd4b
Site Links: Home :: Donate :: Features :: Connection Guide




Age of Valor Forum Index » News » Server Degradation - [solved]
Post new topic  Reply to topic View previous topic :: View next topic 
Server Degradation - [solved]
PostPosted: Fri Dec 11, 2020 8:09 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8283
Location: Ontario, Canada




Everything is good and we are now back to full disk redundancy

----
Just a heads up I have 2 drives that have failed in one of the raid arrays on the NAS and it just so happens both the shard's VM and the database server are on that array.

I have 4 drives on order from 2 retailers to maximize chance of getting it sooner, and to have 2 spares after, since a couple of the other drives are throwing errors too. The drives in this array have around 60 thousand hours on them so I think it's just a matter of time until they all need to be swapped out.

In order to minimize load on this degraded array and reduce chance of another failure, I have decided to turn off the database server. The shard will continue to run, but if anything happens such as a crash, it will result in a revert to around Dec 11 2:45am ET. (no longer the case)

You can continue to play as normal, and if all goes well there will be no revert.

I'm on night shifts right now so I don't want to do anything too drastic at this point as I can't dedicate my focus 100% to it, but once I'm off again, I want to look at migrating the database server to another array. In theory I should be able to do that while the shard continues to run, and when I bring it back up, it will start to save again.

The shard itself does not really produce much disk IO so that will remain.

For your viewing pleasure, this is a shot of the carnage:



If I'm understanding this right, any of the B drives can fail and we will be safe, but if any of the A ones fail, then the entire array is lost. I do have backups but hope I don't need to use them as it's still a pain to rebuild everything.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!


Last edited by Red Squirrel on Tue Dec 29, 2020 3:34 am; edited 4 times in total
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sat Dec 12, 2020 7:51 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8283
Location: Ontario, Canada




It's quiet tonight at work and I got everything done that I needed to.

Currently migrating database VM to another LUN. This is kinda critical as that very act is putting lot of strain on the array, but mostly read and not write, so should be fine...

Once it's on the new LUN I will fire the VM back up and turn the server back on and start to sync the shard back up.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sat Dec 12, 2020 8:25 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8283
Location: Ontario, Canada




Database server now on new array and running. Server is now synced with database and there is no longer a risk of revert.

However the shard VM itself remains on the degraded array so there is still a risk of downtime, but no data loss.

I still have no ETR for arrival of new hard drives and with the weekend they won't really move until monday but according to the tracking number from one retailer the drives are in Richmond Hill which is here in Ontario so once it does ship it should only be a few days.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sat Dec 12, 2020 9:00 pm Reply with quote
ggkthx
Joined: 13 Jan 2009
Posts: 953
Location: MN




This is quite the adventure. Shocked
_________________

I didn't choose the Fel life, the Fel life chose me.
View user's profile Send private message Visit poster's website AIM Address
PostPosted: Sun Dec 13, 2020 12:23 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8283
Location: Ontario, Canada




Lol yeah quite the adventure. I can't wait for those drives to come in... it's one of my higher performance raid arrays so have a lot on there.

I'm actually due for an overall upgrade to increase capacity since most of my arrays are running low on space and are on fairly old drives, but costs of living keep going up so don't really have money to buy server stuff anymore these days.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sat Dec 19, 2020 1:49 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8283
Location: Ontario, Canada




So the two replacement drives came in. I pulled out the 2 dead drives and put the replacements in. Running some tests on them to make sure they're good, then will insert them into the array and let it rebuild.

At this point the shard's data is NOT at risk as per my last post about migrating it to another LUN, but the possibility of downtime is still a risk should the array get more drive failures.

I am not too worried though and I think everything will go smooth. This should be over within 1-2 days.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sat Dec 19, 2020 4:51 am Reply with quote
ggkthx
Joined: 13 Jan 2009
Posts: 953
Location: MN




Cool cool cool. Hope all goes smoothly!
_________________

I didn't choose the Fel life, the Fel life chose me.
View user's profile Send private message Visit poster's website AIM Address
PostPosted: Sat Dec 19, 2020 5:21 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8283
Location: Ontario, Canada




First round of testing (long SMART test) completed without error on both drives.

Doing full write test now then will do full read back test. This makes sure there's no bad sectors.

It's so odd looking at the stats and seeing a drive with only several power on hours compared to like 60 thousand lol. The drives did pretty good time.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sat Dec 19, 2020 8:53 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8283
Location: Ontario, Canada




All tests were good. Rebuild in progress!


_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sun Dec 20, 2020 12:59 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8283
Location: Ontario, Canada




Everything good now. Raid array is nominal.

I have 2 other drives on the way which I'll keep as spares as I do have more drives showing errors.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
Server Degradation - [solved]
Age of Valor Forum Index » News
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
All times are GMT  
Page 1 of 1  

  
  
 Post new topic  Reply to topic  
Shout Box


Powered by phpBB © 2001-2004 phpBB Group
Designed for Trushkin.net | Themes Database

This website and forum best viewed in a standards compliant browser such as Firefox or Opera.
Internet explorer is not recommended.