HDD-delay-safe fwrite in C++ (need help for RNPC)
#1

Getting technical now

Theres a pretty tough and stealthy bug in RNPC that randomly crashes NPCs. It seemed like they are connected to the HDD speed, and thanks to a nice user here this is pretty much confirmed now. NPCs crash much more often on slow HDDs, and dont seem to crash at all when using a SSD.

RNPC creates the rec file in the plugin and writes it to the disk, then tells the npcmode script to play it back. Theres a short delay between write and playback due to the NPCs ping, but it seems like this isnt always enough. My theory is that even though fwrite should write instantly, it takes a moment until the whole file is readable (especially on VPS, as they dont have direct hardware access). So the NPC sometimes tries to play not yet existing or incomplete files and crashes.

Ive got several ideas how to fix this problem, but none of them is perfect because it slows RNPC down too much.

1. Add some ms delay to the playback, to give the HDD more time for writing. (this slows down NPC reaction times, and still isnt 100% safe)
2. Add a quick checksum test in the npcmode to ensure the file is complete. (difficult as npcmodes cant access the rec files directly)
3. Create and mount a ramdisk for the rec files and link to it (this would be a great solution, as it would greatly boost the performance, but no idea yet if this is even possible in the plugin)
4. Add a check to the rec file write code to ensure the file is complete before continuing (this could work, but as it runs in the main thread, it could block the server for too long on slow HDDs. Maybe memory-mapped files would work, but dont know if samp can handle them)

Anyone has better ideas? Or is there someone with enough knowledge of server virtualisation to tell if my theory is plausible?
Reply
#2

I get NPCs disconnecting that don't even play back any recordings.
Reply
#3

I'm assuming this is only happening on larger .rec files. Maybe perform a ifexists before running the file? Can't really think of anything outside of imperfect delay or direct reassembly of the server.exe
Reply
#4

Is running a new thread in the plugin to check if the file is complete an option? You'd eliminate hassle with blocking the main thread and you can still tell the NPC to open the file once it's done.
Reply
#5

You can force-write the file cache (stuff you write to the file with "fwrite" is first written into a cache) into the file on the physical disk with "fsync" on Linux/OSX. A call to that function is blocking.
It's not that easy on Windows though. It seems that fsync does not exist on Windows platforms. You'd have to use the Windows API to create a file with a special flag (FILE_FLAG_WRITE_THROUGH).
Here's a quick example on how to do that:
Код:
HANDLE file = CreateFile("something.rec", GENERIC_WRITE, 0, NULL, CREATE_NEW, 
	FILE_ATTRIBUTE_NORMAL | FILE_FLAG_WRITE_THROUGH, NULL);
if (file == INVALID_HANDLE_VALUE)
{
	//error
	return;
}

char data[] = "data to write to the file";
DWORD
	bytes_to_write = (DWORD)strlen(data),
	bytes_written = 0;
bool res = WriteFile(file, data, bytes_to_write, &bytes_written, NULL);

if (res == false)
{
	//error
}
else if (bytes_to_write != bytes_written)
{
	//error
}
else
{
	//success!
}

CloseHandle(file);
That solution (WriteFile in particular) will also block until everything is actually written to the file on the disk. This will of course also halt the SA-MP server. You'd have to thread all file operations to avoid that, but I guess that'd be a little bit overkill.
Reply
#6

Quote:
Originally Posted by Pottus
Посмотреть сообщение
I get NPCs disconnecting that don't even play back any recordings.
IIRC youre using a custom version, right? Is it based on 0.4 or 0.4.1? 0.4.1 fixed a bug that crashed NPCs when receiving zero length client messages, which might cause the idle crashes, in case you didnt already fix that yourself.

Quote:
Originally Posted by Joe Staff
Посмотреть сообщение
I'm assuming this is only happening on larger .rec files. Maybe perform a ifexists before running the file? Can't really think of anything outside of imperfect delay or direct reassembly of the server.exe
Thats available as optional setting in the npc script. But it doesnt seem to prevent the crashes (or not all of them, hard to differentiate). I guess files might already exist, but do not return their full content. I also hoped fexist would fix it, really cant explain why things show such a strange behaviour. Must be about the hdd buffer of virtual machines, on direct hardware the problem shouldnt occur at all, but most people use virtual servers so this needs to be fixed finally.

Quote:
Originally Posted by Hiddos
Посмотреть сообщение
Is running a new thread in the plugin to check if the file is complete an option? You'd eliminate hassle with blocking the main thread and you can still tell the NPC to open the file once it's done.
Would be a nice solution, but requires a lot of plugin<->gamemode<->npcmode communication to set ready- and playback-flags. I also dont know much about plugin threading, but thats no real problem. Just have to put some time in learning it. Threading would also greatly increase performance, so sooner or later id have to go that way anyways.

Quote:
Originally Posted by maddinat0r
Посмотреть сообщение
You can force-write the file cache (stuff you write to the file with "fwrite" is first written into a cache) into the file on the physical disk with "fsync" on Linux/OSX. A call to that function is blocking.
It's not that easy on Windows though. It seems that fsync does not exist on Windows platforms. You'd have to use the Windows API to create a file with a special flag (FILE_FLAG_WRITE_THROUGH).
Here's a quick example on how to do that:


That solution (WriteFile in particular) will also block until everything is actually written to the file on the disk. This will of course also halt the SA-MP server. You'd have to thread all file operations to avoid that, but I guess that'd be a little bit overkill.
Thats exactly what I was originally looking for. But youre right, the blocking time really isnt nice and cant be avoided, would probably mess up the server performance for slow hdds.
Guess I wont get around threading stuff. fsync then could be a good alternative for iterative file checks on linux systems.

Thanks for the input! Gave me some good ideas for ways to fix the problems.
Reply
#7

http://stackoverflow.com/questions/8...te-to-the-file

Check the answer by Ben Voigt (Accepted one)
Reply
#8

Quote:
Originally Posted by DRIFT_HUNTER
Посмотреть сообщение
http://stackoverflow.com/questions/8...te-to-the-file

Check the answer by Ben Voigt (Accepted one)
Locking the files is a nice idea, but it needs to be checked how samp-npc treats locks. I guess it wont wait for unlock, but just denies the access (and so probably crashes like it does when the file doesnt exist). But if it works correctly this would safe a lot of inter-script communication and would be the perfect addition to the "check files in a thread" solution. Ill test that.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)