Age of Valor
login.php?sid=732987ee07846147ef61de8686c67d21 profile.php?mode=register&sid=732987ee07846147ef61de8686c67d21 faq.php?sid=732987ee07846147ef61de8686c67d21 memberlist.php?sid=732987ee07846147ef61de8686c67d21 search.php?sid=732987ee07846147ef61de8686c67d21 index.php?sid=732987ee07846147ef61de8686c67d21
Site Links: Home :: Donate :: Features :: Connection Guide




Age of Valor Forum Index » News » Ongoing Database Issue [most likely solved]
Post new topic  Reply to topic View previous topic :: View next topic 
Ongoing Database Issue [most likely solved]
PostPosted: Sat Feb 17, 2018 9:29 pm Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8194
Location: Ontario, Canada




Looks like the issue we've been facing for past few days may be related to a failing hard drive in one of the raid arrays. I've ordered a new drive. That particular raid array is not actually the one that the server VM is on but I think when it gets stuck it just causes the whole file server to block and then affects other stuff too. Though this is just a guess as the issue actually looks more like a network issue, so it's kinda weird. But there is a drive failing, so may as well tackle that and see if it helps. Has to be done anyway.

As a side note I do have a code issue with the SQL system as it should not be corrupting the way it's doing even with DB write issues. The system was designed in a way for situations like this not to actually cause database corruption and for the pending data to simply wait until the DB is available again. So I will have to look at that.

I am reluctant to increase the frequency of the backups for now as it will simply put more strain on the server, so I will just keep the backups at he same rate while I wait for the replacement drive to arrive. Shard DB backups run several times per day already.

If by chance you play and do something such as get an artifact and are worried about the crash happening, just shoot me a PM and I can run a backup manually.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!


Last edited by Red Squirrel on Thu Sep 27, 2018 4:37 pm; edited 1 time in total
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sat Feb 24, 2018 4:20 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8194
Location: Ontario, Canada




The new drive came in, so I will be replacing it in next few days.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Tue Feb 27, 2018 5:31 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8194
Location: Ontario, Canada




Drive replaced and raid rebuilt.

Will give it a few days to see whether or not that fixes it. I have a feeling it's not the cause, but we'll see. Had to be done anyway.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Mon Mar 05, 2018 2:24 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8194
Location: Ontario, Canada




This may possibly be solved, but not too sure yet. It's related to an overall issue with my storage system where if there is too much load, things start to crash. I never was able to figure out that issue and it kind of went away on it's own but then it resurfaced and now it's hitting the DB server instead of other VMs. I figured maybe the failing drive was not helping though.

Will continue to monitor.

It's still safe to play, it's just that the worse thing that can happen is losing a couple hours of progress if it does happen. I do need to look at redesigning part of the DB system, as it should not corrupt like this regardless of if the DB becomes unavailable in the middle of saving (which is what seems to happen during high loads) so I will look at fixing that at some point.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sat Mar 31, 2018 4:18 pm Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8194
Location: Ontario, Canada




Unfortunately the drive I replaced is not the cause of this. I really don't know what it is at this point. This issue just started randomly with no explanation.

Basically it looks like the network randomly drops for no reason and then it causes the shard to crash hard instead of just trying again later to write whatever it is it's trying to write. The crash logs don't give line numbers because I think it's the core that's crashing and not the main part, so this makes it extremely hard to troubleshoot.

Given the shard is pretty much dead I'm just going to keep restoring backups for the time being every time this happens. If you log in and everything is missing assume that the issue happened and that it will get restored. I do get alerts on my phone when it crashes so chances are I already know about it when it happens, I might just be at work or sleeping or something.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sun Jul 29, 2018 7:59 pm Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8194
Location: Ontario, Canada




Happened again.

Restored backup from Fri Jul 27 01:00:27 EDT 2018.

I have summer projects I've been working on, but I do seriously want to get back into coding a bit for the shard at some point, mostly back end fixes though, and this is one of them.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Sun Sep 02, 2018 4:30 pm Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8194
Location: Ontario, Canada




Had another crash.

Restored DB from: Sun Sep 2 06:00:08 EDT 2018

I still have to figure out why these crashes keep happening, but I also have an idea to redesign the DB system to be more efficient, so I might just do that and it might by chance fix the crash issue too. Not that the shard is all that active now days to start putting this kind of work into it, but the whole idea is I want it to be set and forget... and right now it's not.

As always let me know if you see any major issues but everything should be normal as of the backup date.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
PostPosted: Wed Sep 05, 2018 8:55 am Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8194
Location: Ontario, Canada




I did some changes to the file system/program files of the shard and restructured a few things. Long story short I made it so the shard's data files (executables) are local to the VM, instead of on a SMB share. One hunch I have is that when disk IO on my network grinds to a halt during backup jobs (I still don't know why it does that) it would actually cause some SMB faults, which in turn would crash the whole server.

The database issue is not fixed, but if I can at least fix the crashes then the database issue will stop happening.

So I will leave it at that for the time being and hopefully the crashes stop. Either way, my next step is to redesign the snapshot portion of the database system. I actually implimented it quite poorly and it could be done better so I'll want to do that. It will also generate less strain on the sql server so that will be a bonus.

Shard should be running as normal now in it's new environment setup, let me know if there's any weird issues.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
Ongoing Database Issue [most likely solved]
Age of Valor Forum Index » News
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
All times are GMT  
Page 1 of 1  

  
  
 Post new topic  Reply to topic  
Shout Box


Powered by phpBB © 2001-2004 phpBB Group
Designed for Trushkin.net | Themes Database

This website and forum best viewed in a standards compliant browser such as Firefox or Opera.
Internet explorer is not recommended.