SiDock@home September Sailing

Author	Message
TankbusterGames Send message Joined: 22 May 21 Posts: 11 Credit: 3,283,899 RAC: 0	Message 1189 - Posted: 17 Sep 2021, 6:43:00 UTC Seems like my hosts don't get any new workunits as soon as Eprot workunits were replaced by 3CLpro. Probably the low runtime creates too much stress on the server again :( 3CLpro takes like 20-30 minutes to complete Eprot takes like 5-6 hours so it creates 12 times the downloads and uploads. 12 times the queries and since no tasks are in queue on the client side probably even more. ID: 1189 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 345 Credit: 25,992,396 RAC: 977	Message 1191 - Posted: 17 Sep 2021, 7:39:09 UTC - in response to Message 1189. Last modified: 17 Sep 2021, 7:42:10 UTC I switch to two feeders and monitor it. Load is not great, and it's an interesting. I have an idea but now in not good time for it verification. :) Current bunch of 3CLpro is near to end, help are close. :) (We generate a 2 sets of "compound" from 3CLpro*(v4+v5+v6) + Eprot_v1_run-2, by 160 00 tasks). ID: 1191 · Rating: 0 · rate: / Reply Quote

JagDoc Send message Joined: 24 Oct 20 Posts: 19 Credit: 13,792,485 RAC: 11,338	Message 1194 - Posted: 17 Sep 2021, 11:40:12 UTC - in response to Message 1191. Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent) ID: 1194 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 345 Credit: 25,992,396 RAC: 977	Message 1195 - Posted: 17 Sep 2021, 13:07:07 UTC - in response to Message 1194. Last modified: 17 Sep 2021, 13:12:40 UTC Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent) Yes! And, it's, may be a one of keys of problem! And explains why after switching from 2 to 1 feeder, problem solved for some time! Switched to single feeder. Thank you! ID: 1195 · Rating: 0 · rate: / Reply Quote

Kiska Send message Joined: 20 Mar 21 Posts: 4 Credit: 203,256 RAC: 0	Message 1196 - Posted: 17 Sep 2021, 14:57:47 UTC - in response to Message 1195. Something is wrong, all wu after 8:00 are send out only 1 x, no wingman (status unsent) Yes! And, it's, may be a one of keys of problem! And explains why after switching from 2 to 1 feeder, problem solved for some time! Switched to single feeder. Thank you! How big is the shared memory? As the feeder chucks tasks there and the scheduler takes tasks from there. It might be better to increase the size of the shared memory instead of running more feeders ID: 1196 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,971,353 RAC: 0	Message 1197 - Posted: 17 Sep 2021, 17:18:47 UTC Some users know how to edit cc_config, but don't know yet how to edit it responsibly. example host Hopefully those who taught step one find the time to teach step two too. ID: 1197 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 345 Credit: 25,992,396 RAC: 977	Message 1198 - Posted: 17 Sep 2021, 20:20:08 UTC Last modified: 17 Sep 2021, 20:21:10 UTC May be we solved a problem and found a good option for optimization. The following happens: Within project server exist a cache at 100 slots for tasks, that feeder reads from database. Also project have 4 applications, but really only one is active - "CurieMarieDock on BOINC + zipped input, checkpoints and progress bar". Feeder reads tasks into cache by applications. Initially project feeder runs with option "--allapps" that instructs it to read tasks from database by application, sequentially execution queries like select ... from result r1 force index(ind_res_st), workunit, app where ... r1.appid=1 limit 50; for all appid's: 1, 2, 3, 4. (Of course, usage of predicate like r.appid IN(...) or omitting this predicate for "--allapps" option is better choice, may be it is a good point for server code optimization). Each execution of query needs a some time. After full cycle by applications feeder pauses by 5 seconds also. But searching tasks for applications 1, 2, 3 does not need for us and we change feeder start settings to preferred application: "--appids 4" instead of "--allapps". After this: 1. Feeder spends time on only one request; 2. A query is change also, to: select ... from result r1 force index(ind_res_st), workunit, app where ... workunit.appid in (4) limit 200 With "--allapps" feeder query result limited 50 rows, but with "--appids" - it gets 200 tasks per one request! And that changed situation. Now we have one feeder, full cache of tasks and disk utilization ~ 20% in average. But what happens when we use two feeders? We use a parameter that instructs feeder # 1 to read only even results (id % 2 = 0) and feeder # 2 - only odd results (id % 2 = 1). And as previously described, each feeder performed a cycle by application 1, 2, 3 ... but only "odd feeder" read tasks for applicatoin id 4! (Currently I don't know why, may be it's a my mistake with settings, may be it's a bug), and after some time all odd results was sent to computers and sending is stops! And when we switch back to one feeder, it start to read "even tasks" that present in database and put it into cache. May be this problem with two (and more) feeders can be solved by usage "--appids" parameter also. But need a some time for test this configuration. And, in the total we have a new recommendation: "Use a separate feeder for each active application! And, if need, change delay pause between tasks request!" Does anyone have any problems getting tasks right now? And, I think, we try another interesting option on server side also... :) ID: 1198 · Rating: 0 · rate: / Reply Quote

yoyo_rkn Send message Joined: 24 Oct 20 Posts: 7 Credit: 634,819 RAC: 0	Message 1199 - Posted: 18 Sep 2021, 10:56:44 UTC - in response to Message 1198. Last modified: 18 Sep 2021, 10:57:28 UTC You can also set the weight of each application in the ops admin interface on the server. So you could set the weight of the app which has workunits to e.g. 100 and the weight of apps which currently don't have workunits to 1. This is also used by the feeder to fill the shared memory. ID: 1199 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 345 Credit: 25,992,396 RAC: 977	Message 1206 - Posted: 18 Sep 2021, 23:26:40 UTC A set of workunits that are being created now, used new settings of estimate and maximum number of FLOPS. This should solve the problem for Raspberry Pi. Please report if you faced with problems. Thank you! ID: 1206 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 345 Credit: 25,992,396 RAC: 977	Message 1208 - Posted: 19 Sep 2021, 7:02:21 UTC - in response to Message 1199. You can also set the weight of each application in the ops admin interface on the server. So you could set the weight of the app which has workunits to e.g. 100 and the weight of apps which currently don't have workunits to 1. This is also used by the feeder to fill the shared memory. Yes, another good option. :) ID: 1208 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 8 Sep 21 Posts: 23 Credit: 3,468,802 RAC: 4,384	Message 1211 - Posted: 19 Sep 2021, 13:20:58 UTC - in response to Message 1189. Seems like my hosts don't get any new workunits as soon as Eprot workunits were replaced by 3CLpro. Probably the low runtime creates too much stress on the server again :( 3CLpro takes like 20-30 minutes to complete Eprot takes like 5-6 hours so it creates 12 times the downloads and uploads. 12 times the queries and since no tasks are in queue on the client side probably even more. I'm seeing my client estimate 2-3 days for finish times with 5 hours run time so far for Eprot. You times on 3CLpro were exactly what I got. ID: 1211 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,971,353 RAC: 0	Message 1215 - Posted: 20 Sep 2021, 5:36:37 UTC - in response to Message 1211. @Greg_BE, previously, the "estimated computation size" of both 3CLpro and Eprot was configured as 50,000 GFLOPS.¹ (This caused the client to assume the same 'estimated time remaining' for new tasks of either kind.) Now the estimated computation size of 3CLpro is 40,000 GFLOPS.¹ I don't know about Eprot. If you had very good time estimates in your client before, then only because it had completed a good number of tasks of only one of the two types before, and therefore adjusted its time estimate for this type of workunits. ________ ¹) Both figures were observed from a very small sample, hence may not be generally applicable. ID: 1215 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,971,353 RAC: 0	Message 1217 - Posted: 20 Sep 2021, 15:33:45 UTC Last modified: 20 Sep 2021, 15:39:13 UTC Some fellow DC'ers have an awkward approach to this contest. The owner of computer 21557 for example. Has got a 15W 2c/5t core i3. (Nothing wrong with that.) On September 14, sets ncpus=1024, downloads 312 tasks, goes offline. On September 20, when the 6-days reporting deadline comes up, goes online again: - Has completed 168 tasks by now, about half of the buffer. - Aborted 130 tasks. Didn't bother to report them earlier. - Has still 14 tasks in progress. (Edit: These are cancelled by the server now, due to the missed deadline.) Still has ncpus left at 1024 while being online again. Downloads another 256 tasks on September 20 between 14:50 UTC and 14:55 UTC. Is perhaps offline again now. I have no solid idea of what he plans to do with 270 tasks during the next six days, if he managed to complete just 168 tasks in the past 6 days. ID: 1217 · Rating: 0 · rate: / Reply Quote

pschoefer Send message Joined: 1 Jan 21 Posts: 9 Credit: 2,894,574 RAC: 0	Message 1218 - Posted: 20 Sep 2021, 18:51:24 UTC - in response to Message 1217. According to the application details for that host, it has already completed 1125 tasks in the last 8 days (the host was created on 12 Sep), but a lot of those tasks were already purged from the database. To me, it looks like it just downloaded way too much work before the challenge, aborted the tasks that it could not finish before the deadline, and should be able to complete most of the remaining tasks before the end of the challenge. ID: 1218 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,971,353 RAC: 0	Message 1219 - Posted: 21 Sep 2021, 4:58:09 UTC - in response to Message 1218. Last modified: 21 Sep 2021, 5:33:27 UTC Thanks, indeed. The host must have started downloading the buffer which it reported yesterday much earlier than it occurred to me, such that the tasks were old enough that result deletion removed a lot even within the short time between when results were reported and when I looked. (Nevertheless, the user over-bunkered but aborted+reported excess tasks late and incompletely.) The host retains only 3CLpro work currently, so it could work out if it runs mostly uninterrupted. Edit: The good news is that between my post yesterday and now, the workunits of which the host cancelled the tasks or had them cancelled by the server were almost all completed already. (Replica tasks were promptly sent out, and completed by other hosts, thanks to very shallow buffers of these hosts.) Just 3 of these are left in progress now; their replicas were soaked up into other deep bunkers. ID: 1219 · Rating: 0 · rate: / Reply Quote

adrianxw Send message Joined: 4 Nov 20 Posts: 28 Credit: 3,943,347 RAC: 3,887	Message 1220 - Posted: 21 Sep 2021, 8:33:14 UTC I saw this was going to run, so set the projects quota up, and updated my machines. I just looked to see how we were doing, and found we were not there at all. Aparently, it was necessary to register the team. I, of course, did not know that. Big thank you. ID: 1220 · Rating: 0 · rate: / Reply Quote

Buro87 [Lombardia] Send message Joined: 23 Nov 20 Posts: 28 Credit: 771,948 RAC: 0	Message 1224 - Posted: 22 Sep 2021, 7:49:13 UTC - in response to Message 1220. Congratulations to all Teams and Cruncher :) Congratulation to Planet3DNow for the victory Challenge produced over 74 Milions credit and around +15% progress on targets The project also break the 50.000 GFLOPS mark ID: 1224 · Rating: 0 · rate: / Reply Quote

Crtomir Volunteer moderator Project developer Project scientist Send message Joined: 11 Nov 20 Posts: 47 Credit: 83,493 RAC: 0	Message 1225 - Posted: 22 Sep 2021, 8:54:35 UTC Congratulation to all teams and especially Planet3DNow for the victory. ;-) Crtomir ID: 1225 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,971,353 RAC: 0	Message 1226 - Posted: 22 Sep 2021, 9:54:32 UTC - in response to Message 1219. xii5ku wrote: Nevertheless, the user over-bunkered but aborted+reported excess tasks late and incompletely. Dear friends, if you bunker at a project with variable task run times, and especially at a project with a quorum of 2, please monitor the progress of your computer and abort + report tasks which the computer won't finish, as early as you feasibly can. If you know how to bunker many tasks, you certainly also know how to report aborted tasks early while leaving completed tasks for later reporting. Or you are knowing somebody who can tell you how to do it; it's trivial. Thank you. Don't be like the owner of host 21573 who aborted 732 tasks 4 days after download but just 4 hours before conclusion of the contest. ID: 1226 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,971,353 RAC: 0	Message 1228 - Posted: 23 Sep 2021, 8:12:24 UTC Thanks to the team for organizing this event. :-) And special thanks to hoarfrost for all the work put into this. ID: 1228 · Rating: 0 · rate: / Reply Quote