Message boards :
News :
SiDock@home September Sailing
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 22 May 21 Posts: 11 Credit: 3,283,899 RAC: 0 |
Seems like my hosts don't get any new workunits as soon as Eprot workunits were replaced by 3CLpro. Probably the low runtime creates too much stress on the server again :( 3CLpro takes like 20-30 minutes to complete Eprot takes like 5-6 hours so it creates 12 times the downloads and uploads. 12 times the queries and since no tasks are in queue on the client side probably even more. |
Send message Joined: 11 Oct 20 Posts: 333 Credit: 25,518,278 RAC: 6,534 |
I switch to two feeders and monitor it. Load is not great, and it's an interesting. I have an idea but now in not good time for it verification. :) Current bunch of 3CLpro is near to end, help are close. :) (We generate a 2 sets of "compound" from 3CLpro*(v4+v5+v6) + Eprot_v1_run-2, by 160 00 tasks). |
Send message Joined: 24 Oct 20 Posts: 19 Credit: 10,133,833 RAC: 11,173 |
Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent) |
Send message Joined: 11 Oct 20 Posts: 333 Credit: 25,518,278 RAC: 6,534 |
Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent) Yes! And, it's, may be a one of keys of problem! And explains why after switching from 2 to 1 feeder, problem solved for some time! Switched to single feeder. Thank you! |
Send message Joined: 20 Mar 21 Posts: 4 Credit: 203,256 RAC: 0 |
Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent) How big is the shared memory? As the feeder chucks tasks there and the scheduler takes tasks from there. It might be better to increase the size of the shared memory instead of running more feeders |
Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,966,595 RAC: 88 |
Some users know how to edit cc_config, but don't know yet how to edit it responsibly. example host Hopefully those who taught step one find the time to teach step two too. |
Send message Joined: 11 Oct 20 Posts: 333 Credit: 25,518,278 RAC: 6,534 |
May be we solved a problem and found a good option for optimization. The following happens: Within project server exist a cache at 100 slots for tasks, that feeder reads from database. Also project have 4 applications, but really only one is active - "CurieMarieDock on BOINC + zipped input, checkpoints and progress bar". Feeder reads tasks into cache by applications. Initially project feeder runs with option "--allapps" that instructs it to read tasks from database by application, sequentially execution queries like select ... from result r1 force index(ind_res_st), workunit, app where ... r1.appid=1 limit 50; for all appid's: 1, 2, 3, 4. (Of course, usage of predicate like r.appid IN(...) or omitting this predicate for "--allapps" option is better choice, may be it is a good point for server code optimization). Each execution of query needs a some time. After full cycle by applications feeder pauses by 5 seconds also. But searching tasks for applications 1, 2, 3 does not need for us and we change feeder start settings to preferred application: "--appids 4" instead of "--allapps". After this: 1. Feeder spends time on only one request; 2. A query is change also, to: select ... from result r1 force index(ind_res_st), workunit, app where ... workunit.appid in (4) limit 200 With "--allapps" feeder query result limited 50 rows, but with "--appids" - it gets 200 tasks per one request! And that changed situation. Now we have one feeder, full cache of tasks and disk utilization ~ 20% in average. But what happens when we use two feeders? We use a parameter that instructs feeder # 1 to read only even results (id % 2 = 0) and feeder # 2 - only odd results (id % 2 = 1). And as previously described, each feeder performed a cycle by application 1, 2, 3 ... but only "odd feeder" read tasks for applicatoin id 4! (Currently I don't know why, may be it's a my mistake with settings, may be it's a bug), and after some time all odd results was sent to computers and sending is stops! And when we switch back to one feeder, it start to read "even tasks" that present in database and put it into cache. May be this problem with two (and more) feeders can be solved by usage "--appids" parameter also. But need a some time for test this configuration. And, in the total we have a new recommendation: "Use a separate feeder for each active application! And, if need, change delay pause between tasks request!" Does anyone have any problems getting tasks right now? And, I think, we try another interesting option on server side also... :) |
Send message Joined: 24 Oct 20 Posts: 7 Credit: 533,463 RAC: 238 |
You can also set the weight of each application in the ops admin interface on the server. So you could set the weight of the app which has workunits to e.g. 100 and the weight of apps which currently don't have workunits to 1. This is also used by the feeder to fill the shared memory. |
Send message Joined: 11 Oct 20 Posts: 333 Credit: 25,518,278 RAC: 6,534 |
A set of workunits that are being created now, used new settings of estimate and maximum number of FLOPS. This should solve the problem for Raspberry Pi. Please report if you faced with problems. Thank you! |
Send message Joined: 11 Oct 20 Posts: 333 Credit: 25,518,278 RAC: 6,534 |
You can also set the weight of each application in the ops admin interface on the server. So you could set the weight of the app which has workunits to e.g. 100 and the weight of apps which currently don't have workunits to 1. This is also used by the feeder to fill the shared memory. Yes, another good option. :) |
Send message Joined: 8 Sep 21 Posts: 13 Credit: 3,005,074 RAC: 3,037 |
Seems like my hosts don't get any new workunits as soon as Eprot workunits were replaced by 3CLpro. I'm seeing my client estimate 2-3 days for finish times with 5 hours run time so far for Eprot. You times on 3CLpro were exactly what I got. |
Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,966,595 RAC: 88 |
@Greg_BE, previously, the "estimated computation size" of both 3CLpro and Eprot was configured as 50,000 GFLOPS.¹ (This caused the client to assume the same 'estimated time remaining' for new tasks of either kind.) Now the estimated computation size of 3CLpro is 40,000 GFLOPS.¹ I don't know about Eprot. If you had very good time estimates in your client before, then only because it had completed a good number of tasks of only one of the two types before, and therefore adjusted its time estimate for this type of workunits. ________ ¹) Both figures were observed from a very small sample, hence may not be generally applicable. |
Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,966,595 RAC: 88 |
Some fellow DC'ers have an awkward approach to this contest. The owner of computer 21557 for example.
I have no solid idea of what he plans to do with 270 tasks during the next six days, if he managed to complete just 168 tasks in the past 6 days. |
Send message Joined: 1 Jan 21 Posts: 9 Credit: 2,789,020 RAC: 5,101 |
According to the application details for that host, it has already completed 1125 tasks in the last 8 days (the host was created on 12 Sep), but a lot of those tasks were already purged from the database. To me, it looks like it just downloaded way too much work before the challenge, aborted the tasks that it could not finish before the deadline, and should be able to complete most of the remaining tasks before the end of the challenge. |
Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,966,595 RAC: 88 |
Thanks, indeed. The host must have started downloading the buffer which it reported yesterday much earlier than it occurred to me, such that the tasks were old enough that result deletion removed a lot even within the short time between when results were reported and when I looked. (Nevertheless, the user over-bunkered but aborted+reported excess tasks late and incompletely.) The host retains only 3CLpro work currently, so it could work out if it runs mostly uninterrupted. Edit: The good news is that between my post yesterday and now, the workunits of which the host cancelled the tasks or had them cancelled by the server were almost all completed already. (Replica tasks were promptly sent out, and completed by other hosts, thanks to very shallow buffers of these hosts.) Just 3 of these are left in progress now; their replicas were soaked up into other deep bunkers. |
Send message Joined: 4 Nov 20 Posts: 23 Credit: 3,233,019 RAC: 3,360 |
I saw this was going to run, so set the projects quota up, and updated my machines. I just looked to see how we were doing, and found we were not there at all. Aparently, it was necessary to register the team. I, of course, did not know that. Big thank you. |
Send message Joined: 23 Nov 20 Posts: 28 Credit: 771,948 RAC: 0 |
Congratulations to all Teams and Cruncher :) Congratulation to Planet3DNow for the victory Challenge produced over 74 Milions credit and around +15% progress on targets The project also break the 50.000 GFLOPS mark |
Send message Joined: 11 Nov 20 Posts: 47 Credit: 83,493 RAC: 0 |
Congratulation to all teams and especially Planet3DNow for the victory. ;-) Crtomir |
Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,966,595 RAC: 88 |
xii5ku wrote: Nevertheless, the user over-bunkered but aborted+reported excess tasks late and incompletely.Dear friends, if you bunker at a project with variable task run times, and especially at a project with a quorum of 2, please monitor the progress of your computer and abort + report tasks which the computer won't finish, as early as you feasibly can. If you know how to bunker many tasks, you certainly also know how to report aborted tasks early while leaving completed tasks for later reporting. Or you are knowing somebody who can tell you how to do it; it's trivial. Thank you. Don't be like the owner of host 21573 who aborted 732 tasks 4 days after download but just 4 hours before conclusion of the contest. |
Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,966,595 RAC: 88 |
Thanks to the team for organizing this event. :-) And special thanks to hoarfrost for all the work put into this. |
©2024 SiDock@home Team