SiDock@home September Sailing

Author	Message
hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 346 Credit: 26,018,950 RAC: 1,148	Message 1164 - Posted: 15 Sep 2021, 14:27:55 UTC About 15 or 30 minutes ago one of participant (or may be several participants) flush ~ 25 000 results or even more. But it was done after 14 hours from challenge start. And now project server successfully process this results and after short break (about 15 minutes?) new tasks already available in queue. Have you considered raising the limit a bit as long as the workunits are this small? We can try to do this, if it is really necessary, but not in first day. When we change settings from 2 to 3 (for example) number of received tasks will increase in several times for first few hours. ID: 1164 · Rating: 0 · rate: / Reply Quote

yoyo_rkn Send message Joined: 24 Oct 20 Posts: 7 Credit: 634,819 RAC: 0	Message 1165 - Posted: 15 Sep 2021, 15:25:34 UTC - in response to Message 1159. Michael H.W. Weber wrote: Please take a look at these guidelines which my team colleague Yoyo has written down This guide is about keeping the server responsive, not so much about keeping the hosts utilized. (One central point of the guide is to reduce the number of tasks in progress. But high utilization of contributor hosts ultimately requires a high number of tasks in progress.) This is right. But without server hosts get nothing! ID: 1165 · Rating: 0 · rate: / Reply Quote

yoyo_rkn Send message Joined: 24 Oct 20 Posts: 7 Credit: 634,819 RAC: 0	Message 1166 - Posted: 15 Sep 2021, 15:27:33 UTC - in response to Message 1161. xii5ku wrote: Michael H.W. Weber wrote: Please take a look at these guidelines which my team colleague Yoyo has written down This guide is about keeping the server responsive, not so much about keeping the hosts utilized. Database optimisation can help under the right circumstances, but usually, when many hosts request work at the same time, the bottleneck is the scheduler queue. I fully agree that the other points in that guide are just about mitigating the impact on the server (with sometimes debatable success), not about solving the underlying problem. The guide addresses this as well, how to reduce client request frequency. ID: 1166 · Rating: 0 · rate: / Reply Quote

pschoefer Send message Joined: 1 Jan 21 Posts: 9 Credit: 2,894,574 RAC: 0	Message 1167 - Posted: 15 Sep 2021, 16:10:58 UTC - in response to Message 1166. xii5ku wrote: Michael H.W. Weber wrote: Please take a look at these guidelines which my team colleague Yoyo has written down This guide is about keeping the server responsive, not so much about keeping the hosts utilized. Database optimisation can help under the right circumstances, but usually, when many hosts request work at the same time, the bottleneck is the scheduler queue. I fully agree that the other points in that guide are just about mitigating the impact on the server (with sometimes debatable success), not about solving the underlying problem. The guide addresses this as well, how to reduce client request frequency. The underlying problem is that the tasks are not sent out efficiently enough. If a client would not have to ask for tasks several times before it receives any, there would be far less requests. Of course, as soon as the work supply is shaky, the more enthusiastic participants will take measures to avoid running dry (i.e. forcing work requests as frequently as possible, setting higher buffers, etc.), thereby creating (most of) the server problems your guide is trying to mitigate. ID: 1167 · Rating: 0 · rate: / Reply Quote

Coleslaw Send message Joined: 23 Oct 20 Posts: 5 Credit: 10,814,286 RAC: 1,753	Message 1168 - Posted: 15 Sep 2021, 18:08:04 UTC - in response to Message 1167. This included running multiple clients to sidestep most of the server work arounds to limit work going out per host. ID: 1168 · Rating: 0 · rate: / Reply Quote

yoyo_rkn Send message Joined: 24 Oct 20 Posts: 7 Credit: 634,819 RAC: 0	Message 1169 - Posted: 15 Sep 2021, 18:09:09 UTC - in response to Message 1167. I'm sure, that this is the reason and the base problem is the diskio on the server and this base problem is addressed. ID: 1169 · Rating: 0 · rate: / Reply Quote

TankbusterGames Send message Joined: 22 May 21 Posts: 11 Credit: 3,283,899 RAC: 0	Message 1170 - Posted: 15 Sep 2021, 18:51:21 UTC Last modified: 15 Sep 2021, 18:52:38 UTC There are some more issues. This Forum and the whole webpage are very unresponsive... it takes like 13,21 seconds to load the server status site. Another issue is Boincstats not updating the challenge as it should do. We all have 0 Credits. Is that Error related to the poor server performance? Or is there another Problem with the Project or BoincStats?? EDIT: wow over 30 seconds to submit this post ^__^ ID: 1170 · Rating: 0 · rate: / Reply Quote

Pascal Send message Joined: 5 Sep 21 Posts: 4 Credit: 113,431 RAC: 0	Message 1173 - Posted: 15 Sep 2021, 20:37:19 UTC Last modified: 15 Sep 2021, 20:43:55 UTC I get no tasks anymore :( ID: 1173 · Rating: 0 · rate: / Reply Quote

yoyo_rkn Send message Joined: 24 Oct 20 Posts: 7 Credit: 634,819 RAC: 0	Message 1174 - Posted: 15 Sep 2021, 20:49:23 UTC - in response to Message 1170. Yes, this is all related to the poor server performance. The forum runs most probably on the same server as the boinc services and the DB and the stats which is fetched by boincstats are also fetched from this server. Therfore again, most important is to stablize the server. Second point is to make it pleasant for the user regarding wus-in-progress, connect interval and so on. So the question is, why is the server so slow. This has to be evaluated and to be improved. My analysis and mitigations are here https://www.rechenkraft.net/wiki/Benutzer_Diskussion:Yoyo/Boincserver_Tuning I run yoyo@home, which is in the meantime mostly stable and fast also in big races. The server has only 2 cores and 8 GB ram and hard disks. yoyo ID: 1174 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 346 Credit: 26,018,950 RAC: 1,148	Message 1175 - Posted: 15 Sep 2021, 21:10:14 UTC Last modified: 15 Sep 2021, 21:13:57 UTC More interesting that "simple I/O". :) For example, now : Average: DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util Average: dev8-0 192.08 80.64 4349.12 23.06 0.35 1.84 1.34 25.71 Initially we use 1 feeder. After some time (usually - after big flush of tasks) internal tasks cache become empty for significant time (because computers, that report many tasks, request many tasks also). This is a good time to switch to 2 feeders, for example. Two feeders runs excellent and provide tasks for computers. But after several hours (about 3 or 5) sending tasks are stopped. Feeders runs, no any errors, but tasks from queue not sent to computers. After simple restart situation does not change. But if in this moment perform switch to single feeder (and restart project server processes) problem is resolved. Tasks successfully puts into internal tasks cache and sent to computers. Not so fast as with two feeders, but "Tasks in progress" metric is grow. May be we have an interesting interaction of latency and logic of BOINC server processes. In any case, next bunches of will be mixed with bunches of Eprot_v1_run_2 tasks. ID: 1175 · Rating: 0 · rate: / Reply Quote

crashtech Send message Joined: 5 Jan 21 Posts: 8 Credit: 33,614,541 RAC: 10,230	Message 1177 - Posted: 16 Sep 2021, 1:21:49 UTC I can't speak to the intricacies of server configuration, but I can report that the main problem I am having is stuck downloads. These require user intervention with short WUs because further downloads are halted by the timed-out downloads. I propose shortening the timeout interval somewhat. ID: 1177 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 3 Jan 21 Posts: 24 Credit: 30,971,353 RAC: 0	Message 1178 - Posted: 16 Sep 2021, 5:27:19 UTC Last modified: 16 Sep 2021, 5:33:13 UTC hoarfrost wrote: In any case, next bunches of will be mixed with bunches of Eprot_v1_run_2 tasks. Thanks! as far as I can tell, everything works smoothly now. The client's estimation of task durations is thrown off now of course, therefore it's good that you have the 2-tasks-in-progress limit, preventing the clients from putting more on their plate than they can chew. :-) A bit off topic: yoyo_rkn wrote: I run yoyo@home, which is in the meantime mostly stable and fast also in big races. The server has only 2 cores and 8 GB ram and hard disks. That's nice that you can get by with a severely (and in these days, unnecessarily) under-powered server. But the price is drastically reduced functionality. No message board. (There is the external RKN message board, but it doesn't share user credentials with the boinc project.) During contests, you are deleting results from the database extremely early. This makes it very difficult (basically impossible) for users to monitor (and hopefully improve) their hosts' performances. At some point you changed yoyo@home to collapse multiple client instances into the same host record. It's obvious why you did it and I can understand it. But it's just another flawed compromise akin to other compromises (such as very small work buffer quotas per host): An owner of eleven i7-2600 is allowed to run eleven client instances, whereas an owner of one dual E5-2699 v4 is allowed to run only one client instance. This makes only limited sense. Furthermore, the boinc server version at yoyo@home seems curiously outdated, but I have no idea if this too is in place because of performance reasons. Given that e.g. results tables in the web interface cannot be filtered, which makes them practically useless, I guess there are performance considerations in play too. My opinion is that there shouldn't be DC contests held at yoyo@home for similar reasons as there shouldn't be contests at projects like ODLK or TN-Grid. (Contests which are shorter than about a month, that is.) ID: 1178 · Rating: 0 · rate: / Reply Quote

yoyo_rkn Send message Joined: 24 Oct 20 Posts: 7 Credit: 634,819 RAC: 0	Message 1179 - Posted: 16 Sep 2021, 5:46:30 UTC - in response to Message 1178. Last modified: 16 Sep 2021, 5:51:52 UTC Most of your conclusions and assumptions are wrong. But I will not run a discussion battle here, so I leave this discussion. ID: 1179 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 346 Credit: 26,018,950 RAC: 1,148	Message 1180 - Posted: 16 Sep 2021, 6:35:17 UTC corona_Eprot_v1_run_2* working blocks (that are about 15 times longer in processor time and even less in data for writing) are now in the queue. We can safely add more and more power! ID: 1180 · Rating: 0 · rate: / Reply Quote

Michael H.W. Weber Send message Joined: 7 Nov 20 Posts: 8 Credit: 11,329,757 RAC: 3	Message 1181 - Posted: 16 Sep 2021, 7:57:18 UTC Last modified: 16 Sep 2021, 8:01:11 UTC Client request frequency allowance is key. My machines still do not get tasks, so the issue is not fixed - although at least the forum works again (so something has been tweaked or people are already leaving the competition - it is sometimes hard to keep team colleagues to contribute when virtually no work is delievered and even the project website does not respond properly anymore). Practical suggestions for corrections have been posted. If you listen to (external?) people who like to deviate the discussion to rant about outdated servers of other projects instead of just trying to implement what has been kindly suggested by people who practically work with BOINC on multiple projects since about 15 years, you will most likely remain with your problems where you are at present. It's simply your choice. But remember that this project operates way beyond it's capabilities with the current hardware and settings. Michael. President of Rechenkraft.net - This world's first and largest distributed computing organization. We make those things possible that supercomputers don't. ID: 1181 · Rating: 0 · rate: / Reply Quote

TankbusterGames Send message Joined: 22 May 21 Posts: 11 Credit: 3,283,899 RAC: 0	Message 1182 - Posted: 16 Sep 2021, 8:06:57 UTC wow finally it doesn't take 20 seconds to load the forum. Seems like you fixxed something ^_^ ID: 1182 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 346 Credit: 26,018,950 RAC: 1,148	Message 1183 - Posted: 16 Sep 2021, 8:32:46 UTC - in response to Message 1181. Last modified: 16 Sep 2021, 8:38:28 UTC My machines still do not get tasks, so the issue is not fixed - although at least the forum works again (so something has been tweaked or people are already leaving the competition - it is sometimes hard to keep team colleagues to contribute when virtually no work is delievered and even the project website does not respond properly anymore). Looks very strange. Tasks present in queue, cache, and freely sends to many computers. Can you post a messages from event log of BOINC client? May be hosts try to report too many results in one request? With recommendations like below we live from RakeSearch time (this is a good recommendations but only part of whole tuning). We started a challenge with small workunits, but 10 hours ago switched to Eprot_v1 15 times larger. From my computer this thread opens in about 1 second... ID: 1183 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 346 Credit: 26,018,950 RAC: 1,148	Message 1184 - Posted: 16 Sep 2021, 10:23:50 UTC Last modified: 16 Sep 2021, 10:24:48 UTC TeAm AnandTech and [H]ard\|OCP at the same time passed SETI.Germany and SETI.USA. Hardwarers beat the SETIens? :) The leading six at now: 1. Planet 3DNow! 2227253 2. TeAm AnandTech 1172867 3. SETI.Germany 1147797 4. Rechenkraft.net 873543 5. [H]ard\|OCP 702525 6. SETI.USA 700473 ... ID: 1184 · Rating: 0 · rate: / Reply Quote

hoarfrost Volunteer moderator Project administrator Project developer Send message Joined: 11 Oct 20 Posts: 346 Credit: 26,018,950 RAC: 1,148	Message 1185 - Posted: 16 Sep 2021, 15:35:55 UTC 1 day and 15 hours of challenge. Planet 3DNow! crunch workunits on a separate planet. Will anyone to able to reach these green crunchers in the next days of crunch-week? :) TeAm AnandTech inhabit on second place, but who knows - may be techies building an own Starship for greens conquer? SETIens from Germany and USA do not have now a time for searching a Great Green Crunchers due to hard pressing from [H]ard\|OCP and crafty crunching by rakes-masters from .net. In the basement of top 10, Metals and Crystals find out who will perform brighter. TOP 6 at now: 1. Planet 3DNow! 3168535 2. TeAm AnandTech 1811902 3. SETI.Germany 1453868 4. Rechenkraft.net 1345601 5. [H]ard\|OCP 1076506 6. SETI.USA 1067469 Some technical news: After solid block of ~60 000 workunits of Eprot_v1 we try to make a mixed compound: 20000 of 3CLpro_v4 + 6000 of Eprot_v1 + 20000 of 3CLpro_v5 + 6000 of Eprot_v1 + 20000 of 3CLpro_v6 + 8000 of Eprot_v1. ID: 1185 · Rating: 0 · rate: / Reply Quote

TankbusterGames Send message Joined: 22 May 21 Posts: 11 Credit: 3,283,899 RAC: 0	Message 1186 - Posted: 16 Sep 2021, 21:44:20 UTC since the challenge started the project performance has increased by roughly ~30% ^__^ That's really awesome :) And thank you for dealing with all the issues so fast :) ID: 1186 · Rating: 0 · rate: / Reply Quote