SiDock@home September Sailing

Message boards : News : SiDock@home September Sailing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,826
RAC: 6,534
Message 1164 - Posted: 15 Sep 2021, 14:27:55 UTC

About 15 or 30 minutes ago one of participant (or may be several participants) flush ~ 25 000 results or even more. But it was done after 14 hours from challenge start. And now project server successfully process this results and after short break (about 15 minutes?) new tasks already available in queue.

Have you considered raising the limit a bit as long as the workunits are this small?

We can try to do this, if it is really necessary, but not in first day. When we change settings from 2 to 3 (for example) number of received tasks will increase in several times for first few hours.
ID: 1164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn

Send message
Joined: 24 Oct 20
Posts: 7
Credit: 533,463
RAC: 238
Message 1165 - Posted: 15 Sep 2021, 15:25:34 UTC - in response to Message 1159.  

Michael H.W. Weber wrote:
Please take a look at these guidelines which my team colleague Yoyo has written down
This guide is about keeping the server responsive, not so much about keeping the hosts utilized.

(One central point of the guide is to reduce the number of tasks in progress. But high utilization of contributor hosts ultimately requires a high number of tasks in progress.)


This is right. But without server hosts get nothing!
ID: 1165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn

Send message
Joined: 24 Oct 20
Posts: 7
Credit: 533,463
RAC: 238
Message 1166 - Posted: 15 Sep 2021, 15:27:33 UTC - in response to Message 1161.  

xii5ku wrote:
Michael H.W. Weber wrote:
Please take a look at these guidelines which my team colleague Yoyo has written down
This guide is about keeping the server responsive, not so much about keeping the hosts utilized.

Database optimisation can help under the right circumstances, but usually, when many hosts request work at the same time, the bottleneck is the scheduler queue. I fully agree that the other points in that guide are just about mitigating the impact on the server (with sometimes debatable success), not about solving the underlying problem.

The guide addresses this as well, how to reduce client request frequency.
ID: 1166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pschoefer
Avatar

Send message
Joined: 1 Jan 21
Posts: 9
Credit: 2,789,148
RAC: 5,071
Message 1167 - Posted: 15 Sep 2021, 16:10:58 UTC - in response to Message 1166.  

xii5ku wrote:
Michael H.W. Weber wrote:
Please take a look at these guidelines which my team colleague Yoyo has written down
This guide is about keeping the server responsive, not so much about keeping the hosts utilized.

Database optimisation can help under the right circumstances, but usually, when many hosts request work at the same time, the bottleneck is the scheduler queue. I fully agree that the other points in that guide are just about mitigating the impact on the server (with sometimes debatable success), not about solving the underlying problem.

The guide addresses this as well, how to reduce client request frequency.

The underlying problem is that the tasks are not sent out efficiently enough. If a client would not have to ask for tasks several times before it receives any, there would be far less requests. Of course, as soon as the work supply is shaky, the more enthusiastic participants will take measures to avoid running dry (i.e. forcing work requests as frequently as possible, setting higher buffers, etc.), thereby creating (most of) the server problems your guide is trying to mitigate.
ID: 1167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Coleslaw
Avatar

Send message
Joined: 23 Oct 20
Posts: 5
Credit: 10,738,228
RAC: 55
Message 1168 - Posted: 15 Sep 2021, 18:08:04 UTC - in response to Message 1167.  

This included running multiple clients to sidestep most of the server work arounds to limit work going out per host.
ID: 1168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn

Send message
Joined: 24 Oct 20
Posts: 7
Credit: 533,463
RAC: 238
Message 1169 - Posted: 15 Sep 2021, 18:09:09 UTC - in response to Message 1167.  

I'm sure, that this is the reason and the base problem is the diskio on the server and this base problem is addressed.
ID: 1169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TankbusterGames
Avatar

Send message
Joined: 22 May 21
Posts: 11
Credit: 3,283,899
RAC: 0
Message 1170 - Posted: 15 Sep 2021, 18:51:21 UTC
Last modified: 15 Sep 2021, 18:52:38 UTC

There are some more issues. This Forum and the whole webpage are very unresponsive... it takes like 13,21 seconds to load the server status site.
Another issue is Boincstats not updating the challenge as it should do. We all have 0 Credits.
Is that Error related to the poor server performance? Or is there another Problem with the Project or BoincStats??


EDIT: wow over 30 seconds to submit this post ^__^
ID: 1170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Pascal

Send message
Joined: 5 Sep 21
Posts: 4
Credit: 113,431
RAC: 0
Message 1173 - Posted: 15 Sep 2021, 20:37:19 UTC
Last modified: 15 Sep 2021, 20:43:55 UTC

I get no tasks anymore :(
ID: 1173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn

Send message
Joined: 24 Oct 20
Posts: 7
Credit: 533,463
RAC: 238
Message 1174 - Posted: 15 Sep 2021, 20:49:23 UTC - in response to Message 1170.  

Yes, this is all related to the poor server performance.
The forum runs most probably on the same server as the boinc services and the DB and the stats which is fetched by boincstats are also fetched from this server.

Therfore again, most important is to stablize the server. Second point is to make it pleasant for the user regarding wus-in-progress, connect interval and so on.

So the question is, why is the server so slow. This has to be evaluated and to be improved.

My analysis and mitigations are here https://www.rechenkraft.net/wiki/Benutzer_Diskussion:Yoyo/Boincserver_Tuning

I run yoyo@home, which is in the meantime mostly stable and fast also in big races. The server has only 2 cores and 8 GB ram and hard disks.

yoyo
ID: 1174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,826
RAC: 6,534
Message 1175 - Posted: 15 Sep 2021, 21:10:14 UTC
Last modified: 15 Sep 2021, 21:13:57 UTC

More interesting that "simple I/O". :)

For example, now :
Average: DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
Average: dev8-0 192.08 80.64 4349.12 23.06 0.35 1.84 1.34 25.71

Initially we use 1 feeder. After some time (usually - after big flush of tasks) internal tasks cache become empty for significant time (because computers, that report many tasks, request many tasks also). This is a good time to switch to 2 feeders, for example. Two feeders runs excellent and provide tasks for computers. But after several hours (about 3 or 5) sending tasks are stopped. Feeders runs, no any errors, but tasks from queue not sent to computers. After simple restart situation does not change. But if in this moment perform switch to single feeder (and restart project server processes) problem is resolved. Tasks successfully puts into internal tasks cache and sent to computers. Not so fast as with two feeders, but "Tasks in progress" metric is grow. May be we have an interesting interaction of latency and logic of BOINC server processes.
In any case, next bunches of will be mixed with bunches of Eprot_v1_run_2 tasks.
ID: 1175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
crashtech

Send message
Joined: 5 Jan 21
Posts: 7
Credit: 21,280,743
RAC: 55,110
Message 1177 - Posted: 16 Sep 2021, 1:21:49 UTC

I can't speak to the intricacies of server configuration, but I can report that the main problem I am having is stuck downloads. These require user intervention with short WUs because further downloads are halted by the timed-out downloads. I propose shortening the timeout interval somewhat.
ID: 1177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 3 Jan 21
Posts: 24
Credit: 30,966,595
RAC: 88
Message 1178 - Posted: 16 Sep 2021, 5:27:19 UTC
Last modified: 16 Sep 2021, 5:33:13 UTC

hoarfrost wrote:
In any case, next bunches of will be mixed with bunches of Eprot_v1_run_2 tasks.
Thanks! as far as I can tell, everything works smoothly now.

The client's estimation of task durations is thrown off now of course, therefore it's good that you have the 2-tasks-in-progress limit, preventing the clients from putting more on their plate than they can chew. :-)


A bit off topic:
yoyo_rkn wrote:
I run yoyo@home, which is in the meantime mostly stable and fast also in big races. The server has only 2 cores and 8 GB ram and hard disks.
That's nice that you can get by with a severely (and in these days, unnecessarily) under-powered server. But the price is drastically reduced functionality.

  • No message board. (There is the external RKN message board, but it doesn't share user credentials with the boinc project.)
  • During contests, you are deleting results from the database extremely early. This makes it very difficult (basically impossible) for users to monitor (and hopefully improve) their hosts' performances.
  • At some point you changed yoyo@home to collapse multiple client instances into the same host record. It's obvious why you did it and I can understand it. But it's just another flawed compromise akin to other compromises (such as very small work buffer quotas per host): An owner of eleven i7-2600 is allowed to run eleven client instances, whereas an owner of one dual E5-2699 v4 is allowed to run only one client instance. This makes only limited sense.

Furthermore, the boinc server version at yoyo@home seems curiously outdated, but I have no idea if this too is in place because of performance reasons. Given that e.g. results tables in the web interface cannot be filtered, which makes them practically useless, I guess there are performance considerations in play too.

My opinion is that there shouldn't be DC contests held at yoyo@home for similar reasons as there shouldn't be contests at projects like ODLK or TN-Grid. (Contests which are shorter than about a month, that is.)

ID: 1178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn

Send message
Joined: 24 Oct 20
Posts: 7
Credit: 533,463
RAC: 238
Message 1179 - Posted: 16 Sep 2021, 5:46:30 UTC - in response to Message 1178.  
Last modified: 16 Sep 2021, 5:51:52 UTC

Most of your conclusions and assumptions are wrong. But I will not run a discussion battle here, so I leave this discussion.
ID: 1179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,826
RAC: 6,534
Message 1180 - Posted: 16 Sep 2021, 6:35:17 UTC

corona_Eprot_v1_run_2* working blocks (that are about 15 times longer in processor time and even less in data for writing) are now in the queue. We can safely add more and more power!
ID: 1180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 7 Nov 20
Posts: 8
Credit: 11,148,833
RAC: 0
Message 1181 - Posted: 16 Sep 2021, 7:57:18 UTC
Last modified: 16 Sep 2021, 8:01:11 UTC

Client request frequency allowance is key.
My machines still do not get tasks, so the issue is not fixed - although at least the forum works again (so something has been tweaked or people are already leaving the competition - it is sometimes hard to keep team colleagues to contribute when virtually no work is delievered and even the project website does not respond properly anymore).
Practical suggestions for corrections have been posted.
If you listen to (external?) people who like to deviate the discussion to rant about outdated servers of other projects instead of just trying to implement what has been kindly suggested by people who practically work with BOINC on multiple projects since about 15 years, you will most likely remain with your problems where you are at present.
It's simply your choice.
But remember that this project operates way beyond it's capabilities with the current hardware and settings.

Michael.
President of Rechenkraft.net - This world's first and largest distributed computing organization. We make those things possible that supercomputers don't.
ID: 1181 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TankbusterGames
Avatar

Send message
Joined: 22 May 21
Posts: 11
Credit: 3,283,899
RAC: 0
Message 1182 - Posted: 16 Sep 2021, 8:06:57 UTC

wow finally it doesn't take 20 seconds to load the forum. Seems like you fixxed something ^_^
ID: 1182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,826
RAC: 6,534
Message 1183 - Posted: 16 Sep 2021, 8:32:46 UTC - in response to Message 1181.  
Last modified: 16 Sep 2021, 8:38:28 UTC

My machines still do not get tasks, so the issue is not fixed - although at least the forum works again (so something has been tweaked or people are already leaving the competition - it is sometimes hard to keep team colleagues to contribute when virtually no work is delievered and even the project website does not respond properly anymore).

Looks very strange. Tasks present in queue, cache, and freely sends to many computers. Can you post a messages from event log of BOINC client? May be hosts try to report too many results in one request?
With recommendations like below we live from RakeSearch time (this is a good recommendations but only part of whole tuning). We started a challenge with small workunits, but 10 hours ago switched to Eprot_v1 15 times larger.
From my computer this thread opens in about 1 second...
ID: 1183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,826
RAC: 6,534
Message 1184 - Posted: 16 Sep 2021, 10:23:50 UTC
Last modified: 16 Sep 2021, 10:24:48 UTC

TeAm AnandTech and [H]ard|OCP at the same time passed SETI.Germany and SETI.USA. Hardwarers beat the SETIens? :)
The leading six at now:

1. Planet 3DNow! 2227253
2. TeAm AnandTech 1172867
3. SETI.Germany 1147797
4. Rechenkraft.net 873543
5. [H]ard|OCP 702525
6. SETI.USA 700473
...
ID: 1184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 11 Oct 20
Posts: 333
Credit: 25,518,826
RAC: 6,534
Message 1185 - Posted: 16 Sep 2021, 15:35:55 UTC

1 day and 15 hours of challenge.
Planet 3DNow! crunch workunits on a separate planet. Will anyone to able to reach these green crunchers in the next days of crunch-week? :)
TeAm AnandTech inhabit on second place, but who knows - may be techies building an own Starship for greens conquer?
SETIens from Germany and USA do not have now a time for searching a Great Green Crunchers due to hard pressing from [H]ard|OCP and crafty crunching by rakes-masters from .net.
In the basement of top 10, Metals and Crystals find out who will perform brighter.

TOP 6 at now:
1. Planet 3DNow! 3168535
2. TeAm AnandTech 1811902
3. SETI.Germany 1453868
4. Rechenkraft.net 1345601
5. [H]ard|OCP 1076506
6. SETI.USA 1067469

Some technical news:
After solid block of ~60 000 workunits of Eprot_v1 we try to make a mixed compound: 20000 of 3CLpro_v4 + 6000 of Eprot_v1 + 20000 of 3CLpro_v5 + 6000 of Eprot_v1 + 20000 of 3CLpro_v6 + 8000 of Eprot_v1.
ID: 1185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TankbusterGames
Avatar

Send message
Joined: 22 May 21
Posts: 11
Credit: 3,283,899
RAC: 0
Message 1186 - Posted: 16 Sep 2021, 21:44:20 UTC

since the challenge started the project performance has increased by roughly ~30% ^__^

That's really awesome :)

And thank you for dealing with all the issues so fast :)
ID: 1186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : SiDock@home September Sailing

©2024 SiDock@home Team