Posts by Ananas

Basics

Project information:  
Create account
Your account
Teams
Download BOINC
Add-ons

Community

Participant profiles
Message boards
Questions and answers
Donations/Sponsors
Live Games

Statistics

Top countries
Top participants
Top computers
Top teams
Server Status
Other statistics

1) Message boards : : Number crunching : 277 Invalid results
Posted 1480 days ago by Ananas
Tested and working code :


#include <io.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <process.h>

main (int ac,char *av[])
{
int std_in, std_out, std_err, pid;

std_in = open ("input", O_RDWR | O_TEXT); dup2 (std_in, 0);
std_out = open ("output", O_CREAT | O_RDWR | O_TEXT); dup2 (std_out, 1);
std_err = open ("errout", O_CREAT | O_RDWR | O_TEXT); dup2 (std_err, 2);

pid = spawnl (P_NOWAIT, "engine_r2.exe", "engine_r2.exe", (char *)0);
return (pid);
}


redirect.exe compiled with Watcom C 10.6

Note that I have not done any error handling, this is just to show how it can be done.
2) Message boards : : Number crunching : 277 Invalid results
Posted 1480 days ago by Ananas
What about popen() ?

If popen isn't the right thing, you can use something like :


fhTmp = open ("stdout.txt", O_CREAT|O_TEXT|O_RDWR);
close (1); // close current stdout handle
dup (fhTmp); // duplicate fhTmp from above to the handle 1, which is stdout

There should not be any file operations between close and dup, or those will grab handle 1 !

If you need to connect stderr to the same file, you can append those two lines, which does something like "2>&1" in a shell :

close (2);
dup (1);
___________________

The same thing can be done with stdin of course :

fhTmp1 = open ("input.txt", O_TEXT|O_RDONLY);
close (0);
dup (fhTmp1);
___________________

It is important to know, that dup() always dups the given handle to the lowest unused handle, that is why that works.

If you need to restore your original stdxxx handles back later, you can save them, before you do the above stuff :

stdinSav = dup(0);
stdoutSav = dup(1);
stderrSav = dup(2);

and later restore them :

close(0); dup(stdinSav);
close(1); dup(stdoutSav);
close(2); dup(stderrSav);
3) Message boards : : Number crunching : 277 Invalid results
Posted 1482 days ago by Ananas
Dang, you're right, your error messages look different from those I had :-/ For now I'm out of ideas :-(
4) Message boards : : Number crunching : 277 Invalid results
Posted 1482 days ago by Ananas
Kerwin, when you are out of Chess results, check if there is a slots directory containing a leftover glaurung file.

If you find one of those, remove all its contents.

If it doesn't let you remove the files, there must still be a "CMD.exe" with an "engine_r?.exe" visible in the task manager. Kill those, then empty the slots directory that has the "glaurung" file - and hopefully you will be able to crunch Chess results again :-)
5) Message boards : : Number crunching : 277 Invalid results
Posted 1482 days ago by Ananas
Yep, that assumption was right, the bug starts when a really short WU comes in :

This is the "bad" short WU, the one with PID 848, it ran only 27.19 seconds!

But it never really ended, all damaged results found the engine_r1.exe with PID=848 too
6) Message boards : : Number crunching : 277 Invalid results
Posted 1482 days ago by Ananas
Clearing the "slots" directory seems to help, while the host has no results to work with.

It does get the PID of a running engine_r# but it isn't the right one, it is some random stuck one from a previous run.

What about not generating a batch file but spawning the process with normal C or windows commands instead? That would provide more control over the child application.

I have this error 1000 too now btw., this is one of many WUs that had this error, with quite a lot of stdout : resultid=1460924

The initial problem might be a timing problem btw., the wrapper seems to sleep for a while before it tries to get the process handle - with a very short WU this can fail, if the wrapper sleeps too long
7) Message boards : : Number crunching : 277 Invalid results
Posted 1482 days ago by Ananas
Can you check if there's a hanging CMD.exe with one engine_r1(2/3...) that doesn't belong to any WU anymore?

Best shutdown BOINC and then kill any leftover CMD.exe and engine_r# that survived that (the number behind the "engine_r" is the slot number where it has been supposed to work)

Make sure to exclude BOINC with all its subdirectories from virus scan too.

Another possible reason (from the error code) would be that the user you are using to run BOINC doesn't have sufficient permissions on those directories, code 1000 might be something with group permissions.
8) Message boards : : Chess960@home Discussions : Announcements
Posted 1483 days ago by Ananas
With 1.20 I just had 2 "hanging" CMD.exe with one engine_r1.exe process each, those 4 processes survived a BOINC exit, but seemed to have affected other results (those have been invalid).

The hanging ones probably came from this exception 0xc000001e

I killed the hanging tasks and now it seems to work smooth again.

Even though this report is about 1.20, it might have an effect on 1.22, a hanging 1.20 task might compromise the new 1.22 tasks.


p.s.: the reason why that has produced some more invalid results after this hanging failure is quite clear :
"Found process: engine_r1.exe (ID: 1000)" in more than one consecutive result is nearly impossible, it usually takes some time before the PID has the same value a second time - so the wrapper found the wrong engine task ID


Return to Chess960@Home main page

Copyright © 2010 Chess960@home