| Author | Message | |
STE\/EJoined: Mar 21, 2006 Posts: 7 ID: 159 Credit: 103,753 RAC: 3,106
|
It's best to not even run this Project on Linux or at least not on my Linux Box anyway. I very seldom could get 4 Wu's at the same time to run like they should. 1 or 2 Wu's would always hang so I just quit running them on my Linux Box.
____________
|
|
|
Richard Kimber Joined: Oct 22, 2006 Posts: 22 ID: 4026 Credit: 47,120 RAC: 2
|
9hrs is to long, please abort it or try a host restart. Maybe it helps.
I will forward this to the admin.
This is still a big problem. I may need to detach from the project, it's requiring too much monitoring
____________
|
|
|
rroonnaallddJoined: Oct 21, 2006 Posts: 155 ID: 3957 Credit: 28,702 RAC: 6
|
9hrs is to long, please abort it or try a host restart. Maybe it helps.
I will forward this to the admin.
____________
 |
|
|
Richard Kimber Joined: Oct 22, 2006 Posts: 22 ID: 4026 Credit: 47,120 RAC: 2
|
The other issue is that once a work unit has been going for a very long time, boinc runs it as a high priority job because all the chess jobs have very short deadlines. This means that it is never swapped out according to the preferences for allocating resources between projects.
I currently have chess960_52758_117_3 running as high priority. It has been running for 9:17:30 and boincmanager shows 18:39:17 to completion, but it also shows 0.000% progress. The deadline is Wed 09 June 2010 04:15:46 BST.
How long should I leave it? Normally I would abort it.
____________
|
|
|
Richard Kimber Joined: Oct 22, 2006 Posts: 22 ID: 4026 Credit: 47,120 RAC: 2
|
The position seems to be right for me but you found many moves and the calculating time of an unit depends ever on the number of moves.
The problem is that on many jobs boincmanager shows an elapsed time much larger than the 3 minutes or so given as the original completion time (maybe 12 hours, if I haven't kept an eye on it) and at the same time it shows *zero* progress and an increasing 'To completion' time. If it showed *some* progress, I wouldn't abort it, but if after many hours no progress is shown, that suggests there's something wrong, surely. All the other projects I work on show progress being made and at some point a decreasing 'Time to completion'.
____________
|
|
|
rroonnaallddJoined: Oct 21, 2006 Posts: 155 ID: 3957 Credit: 28,702 RAC: 6
|
Your resultid=28653636 had the following command-line -nodes 150000000 -engineid 0 -ag 130 -mmg 200 -ps 100 -co 95 -debug 2 -startup "fen rkqrnnbb/pppppppp/8/8/8/8/PPPPPPPP/RKQRNNBB w ADad - 0 1 - 0 1 moves d2d3 d7d6 f2f4 f7f5 e2e4 g7g6 g2g3 e8f6 h1g2 h8g7 f1d2 f8d7 e1f3 f5e4 d2e4 d7b6 a2a4 f6e4 d3e4 c7c5 e4e5 a7a5 a1a3 c8c6 f3h4 c6e8 e5d6 e7d6 f4f5 g6f5 h4f5 g7e5 g1e3 g8e6 g3g4 b6c4 a3b3 a8a7 e3f2 b8a8 c1h6 e8g6 h6g6 h7g6 f2c5 g6f5 c5a7 a8a7 b3b7 a7a6 b7e7 c4e3 g2b7 a6b6 d1d3 e3g4 e7e6 b6b7 h2h3 g4f2 e6e5 f2d3 e5b5 b7a6 c2d3 d8f8 b5d5 f5f4 d5d6 a6b7 d6g6 f4f3 g6g1 f8h8 b1c2 b7c6 c2c3 h8h3 g1g5 f3f2 g5f5 h3h2 d3d4 c6d6 b2b4 d6e6 f5f4 a5b4 c3b4 h2h4 f4f2 h4d4 b4b5 d4d5 b5c6 d5d6 c6c5 d6d5 c5b6 d5d6 b6b7 d6d7 b7c8 d7d4 f2a2 e6d5 c8b7 d5c4 a2a1 c4b4".
The position seems to be right for me but you found many moves and the calculating time of an unit depends ever on the number of moves.
____________
 |
|
|
Richard Kimber Joined: Oct 22, 2006 Posts: 22 ID: 4026 Credit: 47,120 RAC: 2
|
The problem should be solved now. The error was an incorrect position in some units.
It was solved for quite a while, but the problem has returned again - for me at least since mid-May.
____________
|
|
|
rroonnaallddJoined: Oct 21, 2006 Posts: 155 ID: 3957 Credit: 28,702 RAC: 6
|
The problem should be solved now. The error was an incorrect position in some units.
____________
 |
|
|
rroonnaallddJoined: Oct 21, 2006 Posts: 155 ID: 3957 Credit: 28,702 RAC: 6
|
Thanks, i forwarded this problem to the admin too.
____________
 |
|
|
|
|
The problem is not new, but a solution was not found. I believe sometimes the app doesn't recognize that there is no further move. Do you have a link to units with longer runtimes?
Interesting to know what happened with the crunching time if you restart the boinc client...
Look like this one is one of the units
5128135
____________
|
|
|
rroonnaallddJoined: Oct 21, 2006 Posts: 155 ID: 3957 Credit: 28,702 RAC: 6
|
The problem is not new, but a solution was not found. I believe sometimes the app doesn't recognize that there is no further move. Do you have a link to units with longer runtimes?
Interesting to know what happened with the crunching time if you restart the boinc client...
____________
 |
|
|
JKuehl2Joined: Feb 2, 2007 Posts: 2 ID: 6814 Credit: 59,502 RAC: 212
|
since yesterday i experienced about 10-20 wus on my q6600 that are running in this kind of manner. some took 2-3 hours until i aborted them. I´m gonna manually abort all wus that take longer than the usual 2-5 minutes on this machine.
____________
|
|
|
Richard Kimber Joined: Oct 22, 2006 Posts: 22 ID: 4026 Credit: 47,120 RAC: 2
|
I just use boincmgr to abort the one that's not doing anything and everything goes back to normal - until the next one that sticks. It does mean one has to remember to keep an eye on the progress that's being made.
____________
|
|
|
m.mitch   Joined: Jul 3, 2006 Posts: 8 ID: 235 Credit: 26,465 RAC: 3
|
I tried suspending the work but the they kept running, leaving me with four processes running on the dual core box. Unfortunately for me, I didn't notice that until some hours latter.
Naturally, the stalled Chess work units were taking more than their fair share, leaving only between 10% and 12% for the other projects.
I manually killed them off with a kill -11 (Linux) command. When I restarted them, they reset the counter to zero, increased the estimated time to completion to 8 or hours and immediately fell into the original problematic operation.
I've terminated all those work units and will wait a bit and see if the next batch are okay.
Thanks for the advice.
____________

Click here to join the #1 Aussie Alliance in Chess960 |
|
|
Kalessin   Joined: May 30, 2007 Posts: 16 ID: 9731 Credit: 1,175,458 RAC: 13
|
No I think not.
And these are not the only faulty ones.
I've got some, that start counting elapsed time up to 3 to 5 seconds and then start over again up for hours.
I've got some counting elapsed time very slowly.
I've got some that run normally but take about 40 to 60 minutes
____________
Dragons can fly because they do not fit into pirate ships! |
|
|
m.mitch   Joined: Jul 3, 2006 Posts: 8 ID: 235 Credit: 26,465 RAC: 3
|
Has anyone heard anything about work units running for 5 hours and not incrementing the "Percent Done" and the "Time To Completion" keeps increasing?
Do these work units ever end?
____________

Click here to join the #1 Aussie Alliance in Chess960 |
|
|