SiDock@home September Sailing

Message boards : News : SiDock@home September Sailing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
TankbusterGames
Avatar

Send message
Joined: 22 May 21
Posts: 11
Credit: 3,283,899
RAC: 0
Message 1189 - Posted: 17 Sep 2021, 6:43:00 UTC

Seems like my hosts don't get any new workunits as soon as Eprot workunits were replaced by 3CLpro.
Probably the low runtime creates too much stress on the server again :(

3CLpro takes like 20-30 minutes to complete
Eprot takes like 5-6 hours
so it creates 12 times the downloads and uploads. 12 times the queries and since no tasks are in queue on the client side probably even more.
ID: 1189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,278
RAC: 6,534
Message 1191 - Posted: 17 Sep 2021, 7:39:09 UTC - in response to Message 1189.  
Last modified: 17 Sep 2021, 7:42:10 UTC

I switch to two feeders and monitor it. Load is not great, and it's an interesting. I have an idea but now in not good time for it verification. :) Current bunch of 3CLpro is near to end, help are close. :)

(We generate a 2 sets of "compound" from 3CLpro*(v4+v5+v6) + Eprot_v1_run-2, by 160 00 tasks).
ID: 1191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JagDoc

Send message
Joined: 24 Oct 20
Posts: 19
Credit: 10,133,833
RAC: 11,173
Message 1194 - Posted: 17 Sep 2021, 11:40:12 UTC - in response to Message 1191.  

Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent)
ID: 1194 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,278
RAC: 6,534
Message 1195 - Posted: 17 Sep 2021, 13:07:07 UTC - in response to Message 1194.  
Last modified: 17 Sep 2021, 13:12:40 UTC

Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent)

Yes! And, it's, may be a one of keys of problem! And explains why after switching from 2 to 1 feeder, problem solved for some time! Switched to single feeder. Thank you!
ID: 1195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 20 Mar 21
Posts: 4
Credit: 203,256
RAC: 0
Message 1196 - Posted: 17 Sep 2021, 14:57:47 UTC - in response to Message 1195.  

Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent)

Yes! And, it's, may be a one of keys of problem! And explains why after switching from 2 to 1 feeder, problem solved for some time! Switched to single feeder. Thank you!


How big is the shared memory? As the feeder chucks tasks there and the scheduler takes tasks from there.

It might be better to increase the size of the shared memory instead of running more feeders
ID: 1196 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 3 Jan 21
Posts: 24
Credit: 30,966,595
RAC: 88
Message 1197 - Posted: 17 Sep 2021, 17:18:47 UTC

Some users know how to edit cc_config, but don't know yet how to edit it responsibly.
example host
Hopefully those who taught step one find the time to teach step two too.
ID: 1197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,278
RAC: 6,534
Message 1198 - Posted: 17 Sep 2021, 20:20:08 UTC
Last modified: 17 Sep 2021, 20:21:10 UTC

May be we solved a problem and found a good option for optimization. The following happens:
Within project server exist a cache at 100 slots for tasks, that feeder reads from database.
Also project have 4 applications, but really only one is active - "CurieMarieDock on BOINC + zipped input, checkpoints and progress bar".
Feeder reads tasks into cache by applications. Initially project feeder runs with option "--allapps" that instructs it to read tasks from database by application, sequentially execution queries like
select ...
  from result r1 force index(ind_res_st), workunit, app
 where ... r1.appid=1 limit 50;

for all appid's: 1, 2, 3, 4. (Of course, usage of predicate like r.appid IN(...) or omitting this predicate for "--allapps" option is better choice, may be it is a good point for server code optimization).
Each execution of query needs a some time. After full cycle by applications feeder pauses by 5 seconds also.
But searching tasks for applications 1, 2, 3 does not need for us and we change feeder start settings to preferred application: "--appids 4" instead of "--allapps".
After this:
1. Feeder spends time on only one request;
2. A query is change also, to:
select ...
  from result r1 force index(ind_res_st), workunit, app
 where ... workunit.appid in (4) limit 200

With "--allapps" feeder query result limited 50 rows, but with "--appids" - it gets 200 tasks per one request! And that changed situation. Now we have one feeder, full cache of tasks and disk utilization ~ 20% in average.

But what happens when we use two feeders?
We use a parameter that instructs feeder # 1 to read only even results (id % 2 = 0) and feeder # 2 - only odd results (id % 2 = 1). And as previously described, each feeder performed a cycle by application 1, 2, 3 ... but only "odd feeder" read tasks for applicatoin id 4! (Currently I don't know why, may be it's a my mistake with settings, may be it's a bug), and after some time all odd results was sent to computers and sending is stops! And when we switch back to one feeder, it start to read "even tasks" that present in database and put it into cache. May be this problem with two (and more) feeders can be solved by usage "--appids" parameter also. But need a some time for test this configuration.

And, in the total we have a new recommendation: "Use a separate feeder for each active application! And, if need, change delay pause between tasks request!"
Does anyone have any problems getting tasks right now?
And, I think, we try another interesting option on server side also... :)
ID: 1198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn

Send message
Joined: 24 Oct 20
Posts: 7
Credit: 533,463
RAC: 238
Message 1199 - Posted: 18 Sep 2021, 10:56:44 UTC - in response to Message 1198.  
Last modified: 18 Sep 2021, 10:57:28 UTC

You can also set the weight of each application in the ops admin interface on the server. So you could set the weight of the app which has workunits to e.g. 100 and the weight of apps which currently don't have workunits to 1. This is also used by the feeder to fill the shared memory.
ID: 1199 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,278
RAC: 6,534
Message 1206 - Posted: 18 Sep 2021, 23:26:40 UTC

A set of workunits that are being created now, used new settings of estimate and maximum number of FLOPS. This should solve the problem for Raspberry Pi. Please report if you faced with problems. Thank you!
ID: 1206 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,278
RAC: 6,534
Message 1208 - Posted: 19 Sep 2021, 7:02:21 UTC - in response to Message 1199.  

You can also set the weight of each application in the ops admin interface on the server. So you could set the weight of the app which has workunits to e.g. 100 and the weight of apps which currently don't have workunits to 1. This is also used by the feeder to fill the shared memory.

Yes, another good option. :)
ID: 1208 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg_BE

Send message
Joined: 8 Sep 21
Posts: 13
Credit: 3,005,074
RAC: 3,037
Message 1211 - Posted: 19 Sep 2021, 13:20:58 UTC - in response to Message 1189.  

Seems like my hosts don't get any new workunits as soon as Eprot workunits were replaced by 3CLpro.
Probably the low runtime creates too much stress on the server again :(

3CLpro takes like 20-30 minutes to complete
Eprot takes like 5-6 hours
so it creates 12 times the downloads and uploads. 12 times the queries and since no tasks are in queue on the client side probably even more.



I'm seeing my client estimate 2-3 days for finish times with 5 hours run time so far for Eprot.
You times on 3CLpro were exactly what I got.
ID: 1211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 3 Jan 21
Posts: 24
Credit: 30,966,595
RAC: 88
Message 1215 - Posted: 20 Sep 2021, 5:36:37 UTC - in response to Message 1211.  

@Greg_BE,
previously, the "estimated computation size" of both 3CLpro and Eprot was configured as 50,000 GFLOPS.¹ (This caused the client to assume the same 'estimated time remaining' for new tasks of either kind.)

Now the estimated computation size of 3CLpro is 40,000 GFLOPS.¹ I don't know about Eprot.

If you had very good time estimates in your client before, then only because it had completed a good number of tasks of only one of the two types before, and therefore adjusted its time estimate for this type of workunits.


________
¹) Both figures were observed from a very small sample, hence may not be generally applicable.
ID: 1215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 3 Jan 21
Posts: 24
Credit: 30,966,595
RAC: 88
Message 1217 - Posted: 20 Sep 2021, 15:33:45 UTC
Last modified: 20 Sep 2021, 15:39:13 UTC

Some fellow DC'ers have an awkward approach to this contest. The owner of computer 21557 for example.

  • Has got a 15W 2c/5t core i3. (Nothing wrong with that.)
  • On September 14, sets ncpus=1024, downloads 312 tasks, goes offline.
  • On September 20, when the 6-days reporting deadline comes up, goes online again:
    - Has completed 168 tasks by now, about half of the buffer.
    - Aborted 130 tasks. Didn't bother to report them earlier.
    - Has still 14 tasks in progress. (Edit: These are cancelled by the server now, due to the missed deadline.)
  • Still has ncpus left at 1024 while being online again.
  • Downloads another 256 tasks on September 20 between 14:50 UTC and 14:55 UTC. Is perhaps offline again now.

I have no solid idea of what he plans to do with 270 tasks during the next six days, if he managed to complete just 168 tasks in the past 6 days.

ID: 1217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pschoefer
Avatar

Send message
Joined: 1 Jan 21
Posts: 9
Credit: 2,789,020
RAC: 5,101
Message 1218 - Posted: 20 Sep 2021, 18:51:24 UTC - in response to Message 1217.  

According to the application details for that host, it has already completed 1125 tasks in the last 8 days (the host was created on 12 Sep), but a lot of those tasks were already purged from the database. To me, it looks like it just downloaded way too much work before the challenge, aborted the tasks that it could not finish before the deadline, and should be able to complete most of the remaining tasks before the end of the challenge.
ID: 1218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 3 Jan 21
Posts: 24
Credit: 30,966,595
RAC: 88
Message 1219 - Posted: 21 Sep 2021, 4:58:09 UTC - in response to Message 1218.  
Last modified: 21 Sep 2021, 5:33:27 UTC

Thanks, indeed. The host must have started downloading the buffer which it reported yesterday much earlier than it occurred to me, such that the tasks were old enough that result deletion removed a lot even within the short time between when results were reported and when I looked. (Nevertheless, the user over-bunkered but aborted+reported excess tasks late and incompletely.) The host retains only 3CLpro work currently, so it could work out if it runs mostly uninterrupted.

Edit: The good news is that between my post yesterday and now, the workunits of which the host cancelled the tasks or had them cancelled by the server were almost all completed already. (Replica tasks were promptly sent out, and completed by other hosts, thanks to very shallow buffers of these hosts.) Just 3 of these are left in progress now; their replicas were soaked up into other deep bunkers.
ID: 1219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 4 Nov 20
Posts: 23
Credit: 3,233,019
RAC: 3,360
Message 1220 - Posted: 21 Sep 2021, 8:33:14 UTC

I saw this was going to run, so set the projects quota up, and updated my machines. I just looked to see how we were doing, and found we were not there at all. Aparently, it was necessary to register the team. I, of course, did not know that. Big thank you.
ID: 1220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Buro87 [Lombardia]

Send message
Joined: 23 Nov 20
Posts: 28
Credit: 771,948
RAC: 0
Message 1224 - Posted: 22 Sep 2021, 7:49:13 UTC - in response to Message 1220.  

Congratulations to all Teams and Cruncher :)
Congratulation to Planet3DNow for the victory

Challenge produced over 74 Milions credit and around +15% progress on targets
The project also break the 50.000 GFLOPS mark
ID: 1224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crtomir
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 11 Nov 20
Posts: 47
Credit: 83,493
RAC: 0
Message 1225 - Posted: 22 Sep 2021, 8:54:35 UTC

Congratulation to all teams and especially Planet3DNow for the victory.

;-) Crtomir
ID: 1225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 3 Jan 21
Posts: 24
Credit: 30,966,595
RAC: 88
Message 1226 - Posted: 22 Sep 2021, 9:54:32 UTC - in response to Message 1219.  

xii5ku wrote:
Nevertheless, the user over-bunkered but aborted+reported excess tasks late and incompletely.
Dear friends, if you bunker at a project with variable task run times, and especially at a project with a quorum of 2, please monitor the progress of your computer and abort + report tasks which the computer won't finish, as early as you feasibly can.

If you know how to bunker many tasks, you certainly also know how to report aborted tasks early while leaving completed tasks for later reporting. Or you are knowing somebody who can tell you how to do it; it's trivial.

Thank you. Don't be like the owner of host 21573 who aborted 732 tasks 4 days after download but just 4 hours before conclusion of the contest.
ID: 1226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 3 Jan 21
Posts: 24
Credit: 30,966,595
RAC: 88
Message 1228 - Posted: 23 Sep 2021, 8:12:24 UTC

Thanks to the team for organizing this event. :-)
And special thanks to hoarfrost for all the work put into this.
ID: 1228 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : SiDock@home September Sailing

©2024 SiDock@home Team