Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6090 times and has 7 replies Next Thread
DennyInDurham
Cruncher
USA
Joined: Aug 4, 2020
Post Count: 23
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
The client doesn't account for the GPU thread...

If I allocate 100% of CPU resources to the client, on this particular Hyperthreaded 6-core Xeon, it would run 12 work units at 100% CPU utilization (which is fine, it maximizes throughput). With the GPU work unit, there are now 13 running. This was OK when the GPU thread used minimal CPU, but the current ones are CPU-bound for a considerable time at the beginning, about 7.5 minutes with the CPU maxed out. Reducing the work units to 12 (i.e., CPUs to 95%) reduces the CPU-bound time to about 6.5 minutes, and maximizes the GPU utilization.

The client should probably do this automagically if GPU work units are going to be significantly CPU-bound.
[Apr 28, 2021 4:54:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The client doesn't account for the GPU thread...

1. dont ever run CPU to 100% setting in BOINC. that's just asking to overcommit the CPU and bog things down. remember, the OS still needs some CPU cycles to run things in the background. always try to leave at least 1 thread free.

2. are you running an nvidia GPU? using a full core per GPU task is normal for most nvidia-openCL applications. this doesn't necessarily mean the task is "CPU bound". probably just something setup wrong or not well optimized causing the increased GPU idle time. the CPU stays engaged no matter what the GPU app is doing.

3. BOINC doesn't properly count CPU use when considering values less than 1. I think the default CPU use estimate from WCG ships them as 0.9CPU - 1GPU. well to BOINC, that 0.9 really means 0.0. it adds up the CPU portion and chops it down to an integer, truncating the decimal remainder. so:
0.9 CPU = BOINC reserves 0 CPUs for the GPU task
0.9+0.9 = BOINC reserves 1 CPU for the GPU tasks
0.9+0.9+0.9 = BOINC reserves 2 CPU for the GPU tasks
and so on.

you can rectify this by running an app_config.xml file to force 1.0 CPU - 1.0 GPU. then BOINC will properly account for CPU usage of GPU tasks.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
[Apr 28, 2021 5:21:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
DennyInDurham
Cruncher
USA
Joined: Aug 4, 2020
Post Count: 23
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The client doesn't account for the GPU thread...

1. MacOS on this old Mac Pro has been perfectly happy running at 100% for several months now... the box doesn't run anything but WCG, and occasional Remote Desktop to monitor WCG.

2. It's an AMD. The task is CPU-bound for several minutes (apparently setting things up). This started with the 1xxxx WUs.

3. I haven't felt the desire or need to get into BOINC plumbing. I'm only observing that things that worked optimally before GPU WUs now work suboptimally without a little tweaking.
[Apr 28, 2021 5:56:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The client doesn't account for the GPU thread...

Tweaking is always necessary to get things working more optimally. You’re not going to get what you want without tweaking just because of the nature of what you’re doing and how BOINC operates.

Just because it “worked fine” at 100% doesn’t mean it’s optimal. As you’ve found out. I’d still recommend getting out of the habit of running 100% and instead always leave some breathing room for the system. At least 1 thread.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
[Apr 28, 2021 6:58:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 234
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The client doesn't account for the GPU thread...

My own experience with the Ryzen 7 1700 (8 cores/2 threads per core) is that setting things up for 15 CPU tasks leaves breathing room for 1 GPU task plus system overhead.

I'm currently running 14 CPU tasks/2 GPU tasks, configured for 0.5 CPU per GPU task. This seems to be humming along nicely.
[Apr 28, 2021 7:14:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 11816
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The client doesn't account for the GPU thread...

I stopped running opng because all were erroring. However, when I was, the status was showing 0.17 CPU + 1 GPU on my Intel machine. That means that the units were using a whole GPU and only used the CPU for 17% of the time.

In other words, most of the time it was not using the CPU. I presume that meant that it was only using the CPU for uploading and downloading which ties in with the observation about higher CPU at the start.

I have for some time now been using all my 8 threads (4 cores hyperthreaded) for WCG and also my sole GPU for Einstein GRP4. That states in the Status 0.5 CPU + 1 GPU but I find that it hardly slows down the WCG units. The other Einstein projects say 1 CPU + 1 GPU and only allow 7 WCG units to run.

Mike
[Apr 28, 2021 7:27:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The client doesn't account for the GPU thread...

However, when I was, the status was showing 0.17 CPU + 1 GPU on my Intel machine. That means that the units were using a whole GPU and only used the CPU for 17% of the time.


That's not really what that means. the CPU use % is only an estimate provided by the project and has no bearing on what ACTUALLY gets used by the application. In other words, you cannot impact what the science application is doesn't by changing these values. changing it to .5 wont make it use a 50% of a core. I could set my nvidia cards to 0.5 and it'll still use a full CPU core.

what this value DOES do, is act as bookkeeping for BOINC when it calculates how many resources are available to run other tasks. with the caveat that BOINC truncates the values down to integers. so when BOINC wants to know how many free CPU cores it can use. that 0.17 really end up being 0. if you run 2 tasks, it will be 0.17+0.17 = 0.34 which is still 0 to BOINC. you'd need to run 6 tasks concurrently to get up over 1 so that BOINC actually reserves a free core for support of the task. if you run an 8-thread CPU with 0.17CPU-1GPU and tell it to use 100% of the CPU, it will spin up all 8 tasks on the CPU and then throw the GPU task on top of that. forcing everyone to fight for CPU resources that have been overprovisioned.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
[Apr 28, 2021 9:05:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
DennyInDurham
Cruncher
USA
Joined: Aug 4, 2020
Post Count: 23
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The client doesn't account for the GPU thread...

(This is on MacOS) I've watched Activity Monitor carefully when the OPNG tasks were running... multiple minutes of CPU-bound followed by a surprising amount of CPU activity while the GPU is in use. For the five-digit WUs, a whole CPU allocated is absolutely appropriate.

The OS handles 100% CPU perfectly well, but reducing the number of allocated CPUs by 2 does cause the OPNG WU to run faster, and gives the OS a bit of headroom to operate.
[Apr 28, 2021 10:57:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread