Announcement: Our new CyberLink Feedback Forum has arrived! Please transfer to our new forum to provide your feedback or to start a new discussion. The content on this CyberLink Community forum is now read only, but will continue to be available as a user resource. Thanks!
CyberLink Community Forum
where the experts meet
| Advanced Search >
Where's my bottleneck during Producing?
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
So I have a fresh (re)install of PD running on a clean Win8.1 partition.

Since Win8.1 has a more informative Task Manager than Win7, I noticed something interesting.

Here's the background:

I recorded an OTA TV movie. I made a cut at the beginning and at the end. That's all that I did.

I produced it using SVRT.

I can't complain. It takes only 1.5 minutes to produce a 1.5 hour movie.

But I was wondering what the bottleneck might be. Especially since I noticed that my CPU wasn't maxed out during the process.

So I had Task Manager and GPU-Z running while doing the render.

See the screenshot here.

My CPU is running at 68%. So that's not the bottleneck.

My Disk rate is at 41%. So that's not the bottleneck.

My GPU (a GTX-960) is hardly being used. So that's not the bottleneck.

I'm only using 26% of RAM. So capacity isn't the bottleneck.

I suppose that leaves memory speed? Either at L2, L3 or RAM? (Probably RAM, since, at 26% usage, I'm clearly writing past the capacities of L2, L3.)

I suppose I could try lowering the DRAM clock in my BIOS to see if that causes a commensurate decrease in performance.

But it's easier to ask the experts here.

So what'da'y think?

-------------

BTW, in the interest of full disclosure, I'm running PD15 in this test. I'm posting here anyway since 1) this post will get more eyes than it would being buried in "Older Versions." And 2), I have PD17 Essentials on another partition. And although I can't run SVRT in PD17 Essentials, I get the same speeds using HA in PD17 as with HA in PD15. So I think it's reasonable to post the test here.

This message was edited 3 times. Last update was at Dec 26. 2018 07:06

PepsiMan
Senior Contributor Location: Clarksville, TN Joined: Dec 29, 2010 01:20 Messages: 1054 Offline
[Post New]
belated merry Christmas.
Quote ... My CPU is running at 68%. So that's not the bottleneck. ...

it proves that CL did their home work and it's faster than previous versions of PD, >PD14.

Quote ... My Disk rate is at 41%. So that's not the bottleneck. ...

going from SATA 7200RPM to SATAII 7200RPM to SATAIII 7200RPM to 1000RPM HDD and cache from 64MB/sec to 128MB/sec will improve about 5% then jumps to about 10% using SSD.

Quote ... My GPU (a GTX-960) is hardly being used. So that's not the bottleneck. ...

which driver version r u using?

Quote ... I'm only using 26% of RAM. So capacity isn't the bottleneck.

I suppose that leaves memory speed? Either at L2, L3 or RAM? (Probably RAM, since, at 26% usage, I'm clearly writing past the capacities of L2, L3.)

I suppose I could try lowering the DRAM clock in my BIOS to see if that causes a commensurate decrease in performance. ...

yes, again depends on RAM speed you'll gain or lose about 5-10% of total rendering time. got money get DDR4 3000 or 3200MHZ... i9 supports 2666MHZ...

Quote ... So what'da'y think? ...

what about ur motherboard? it's a potential huge bottleneck. so is the OS, namely wX.

what about you?
me? i am the bottleneck! ^^

after you get the fastest computer in the world then the PD itself will become a bottleneck... how's that for an answer?

happy happy joy joy

PepsiMan
'garbage in garbage out'

This message was edited 1 time. Last update was at Dec 26. 2018 01:16

'no bridge too far'

Yashica Electro 8 LD-6 Super 8mm
Asrock TaiChi X470, AMD R7 2700X, W7P 64, MSI GTX1060 6GB, Corsair 16GB/RAM
Dell XPS L702X i7-2860QM, W7P / W10P 64, Intel HD3000/nVidia GT 550M 1GB, Micron 16GB/RAM
Samsung Galaxy Note3/NX1
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
Funny answer that "I am the bottleneck."

It's true that we have these million horsepower computers that sit idle while we fiddle. We only need wide open throttle for a short time.

My RAM is old DDR3, running as fast as it can on an old AMD X4 mobo. So I can't speed up there, unless Ryzen.

My drive is SATA III RAID 1+0. For sequential writes - which I presume is what's happening when Producing - it benchmarks at SSD speeds.

I don't see a new Ryzen, mobo and DDR4 in my future. But it would be interesting to run my test on such a system. I suppose if speed of code execution where the bottleneck, then the elapsed time would be proportional to clock speed change.

(Assumes newer instruction sets don't make a difference. Which is probably a bad assumption.)

Oh - I'm running a June 2018 nVvidia driver, a .39x something or other. The GPU maxes out when I do a render in Blender. And the GPU is used by PD when I do HA renders. (As opposed to SVRT.) So I'm pretty sure that it's working okay.

I plan to stay away from the .400 series nVidia drivers.

Like I said in my O.P., I can't complain about 1.5 minutes. Mine is more of an engineer's curiosity.
PepsiMan
Senior Contributor Location: Clarksville, TN Joined: Dec 29, 2010 01:20 Messages: 1054 Offline
[Post New]
Quote ... I don't see a new Ryzen, mobo and DDR4 in my future. But it would be interesting to run my test on such a system. I suppose if speed of code execution where the bottleneck, then the elapsed time would be proportional to clock speed change.
(Assumes newer instruction sets don't make a difference. Which is probably a bad assumption.)
Oh - I'm running a June 2018 nVvidia driver, a .39x something or other. The GPU maxes out when I do a render in Blender. And the GPU is used by PD when I do HA renders. (As opposed to SVRT.) ...

since you have GTX 960, i'd like to point to an old one if you've missed it GTX960 Performance Comparisons .
follow the instructions and test yours for poops and giggles.
for me just going from nVidia driver 347.25 to 358.5 gave me a little boost and 2700x is in my crosshairs.
i'll be dual OS booting w7 n wX.

Quote ... (Assumes newer instruction sets don't make a difference. Which is probably a bad assumption.) ...

it does. same video card, GTX960, from H.264 to H.265 shrinks 1/2 rendering time! read Eugen157 remarks.
Is it worth buying a new GPU to reduce render times in PowerDirector?

happy happy joy joy

PepsiMan
'garbage in garbage out'

This message was edited 1 time. Last update was at Dec 26. 2018 09:23

'no bridge too far'

Yashica Electro 8 LD-6 Super 8mm
Asrock TaiChi X470, AMD R7 2700X, W7P 64, MSI GTX1060 6GB, Corsair 16GB/RAM
Dell XPS L702X i7-2860QM, W7P / W10P 64, Intel HD3000/nVidia GT 550M 1GB, Micron 16GB/RAM
Samsung Galaxy Note3/NX1
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
Thanks.

I meant the (newer) instruction set in the newer CPU's.

My old AMD only goes up to SSE4A. I don't know what the newer ones are. But there's lots of room in CPU-Z for more and I presume that there are many new instruction sets that help optimize video editing. Hopefully CL is keeping up and is using them to speed things up.

Will be interesting to hear what you have to say about speed in PD after you upgrade to a 2700x.
optodata
Senior Contributor Location: California, USA Joined: Sep 16, 2011 16:04 Messages: 8630 Offline
[Post New]
Similar to PepsiMan's answers, IMHO the short answer to your question is: There is no bottleneck. No software is perfectly efficient, and in my experience, not all video projects push PD to use 100% resources to produce.

So while you may want PD to "push as hard as possible," the specifics of this particular project and how PD uses your system's resources simply won't do it. Different projects tax PD in different ways. This one moderately stressed the CPU before maxing it out after I took the screenshot:



This one pushed the GPU very hard in some places:



And this one maxed out my SSD during the entire producing run!:



My point is that every project is different, and there are many complex interacting elements in producing a video. Honestly, trying to speed things up on the user side is likely to be futile - unless you can clearly see one or more components maxing out (as in the above examples), and with your system that's not the case.

Another way to look at this is: How much time and effort might it take to track down a suspected bottleneck, or project tweak, or maybe find that there's a producing algorithm inefficiency?

Assuming that you found the answer and could actually do something about it, maybe you could get a 20% improvement. That's a pretty big deal, and you're then looking at finishing your 1.5 hour SVRT production in a minute and 15 seconds, instead of a minute and a half.

Kind of seems like tilting at windmills here

This message was edited 1 time. Last update was at Jul 12. 2019 12:38



YouTube/optodata


DS365 | Win11 Pro | Ryzen 9 3950X | RTX 4070 Ti | 32GB RAM | 10TB SSDs | 5K+4K HDR monitors

Canon Vixia GX10 (4K 60p) | HF G30 (HD 60p) | Yi Action+ 4K | 360Fly 4K 360°
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
I agree that this is more of a curiosity more than anything else. Especially for what I do with PD. This is why I think people stopped updating their computers a few years ago when we hit 4 cours and 3 GHz. Word Processing and web surfing were fast enough and didn't need faster computers.

It's only for really big projects where someone might need a (much) faster computer.

Thanks for your screenshots.

Very interesting that a project of yours was able to flood your SSD. I might expect that for importing video. (Indeed, the task manager shows that this is read operation.)

I don't see how PD could output video that fast.

This message was edited 1 time. Last update was at Dec 26. 2018 11:30

JL_JL [Avatar]
Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline
[Post New]
Quote Very interesting that a project of yours was able to flood your SSD. I might expect that for importing video. (Indeed, the task manager shows that this is read operation.)

Probably a little fictitious scenario for most PD users. More than likely he has demonstrated this with multiple intermediate codec MagicYUV clips in the timeline. Since ~10x the size because of no compression, I/O becomes a big demand for given CPU load. I'm sure I'll be corrected if wrong, but PD with PD features can't do that, it takes a special scenario which one can easily create to demonstrate about anything. I'd guess probably no more than a few percent of PD typical users are doing the intermediate codec route, although it can aid editing significantly.

Jeff
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
And, just for completeness, when I encode a video that's been resized or slightly rotated (that is, that needs to be encoded with CPU), my CPU generally does peg at 90% or above. So clearly in those cases, my CPU is a bottleneck.
optodata
Senior Contributor Location: California, USA Joined: Sep 16, 2011 16:04 Messages: 8630 Offline
[Post New]
Quote More than likely he has demonstrated this with multiple intermediate codec MagicYUV clips in the timeline. Since ~10x the size because of no compression, I/O becomes a big demand for given CPU load

That's correct, and just to be clear, I wasn't trying to convey typical PD behavior.

I wanted to show 3 clear instances of actual bottlenecks, and also how the CPU and GPU might get maxxed out in some sections of a producing run while not being taxed very hard in others.

This message was edited 1 time. Last update was at Jul 12. 2019 12:36

PepsiMan
Senior Contributor Location: Clarksville, TN Joined: Dec 29, 2010 01:20 Messages: 1054 Offline
[Post New]
i know nothing.
looking at optodata's keeeler machine's gpu usage, it reminds me of speed limit was in pennsylvania long ago.
crossing over from w. va to pa. a sign said speed limit in pa is still 55 mph! ^^

after three years later PD still uses only <800MB vRam. optodata will be very happy if PD used upto 6GB vRam instead of 0.6GB whenever rendering...

oh happy happy joy joy

PepsiMan
'garbage in garbage out'

This message was edited 1 time. Last update was at Dec 26. 2018 20:19

'no bridge too far'

Yashica Electro 8 LD-6 Super 8mm
Asrock TaiChi X470, AMD R7 2700X, W7P 64, MSI GTX1060 6GB, Corsair 16GB/RAM
Dell XPS L702X i7-2860QM, W7P / W10P 64, Intel HD3000/nVidia GT 550M 1GB, Micron 16GB/RAM
Samsung Galaxy Note3/NX1
JL_JL [Avatar]
Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline
[Post New]
Quote after three years later PD still uses only <800MB vRam. optodata will be very happy if PD used upto 6GB vRam whenever rendering...

Please explain why you think vRAM utilized should be large or nearing 6GB?

I see it quite the opposite, the small vRAM footprint is actually a good thing, there is absolutely no need for any significant amount of vRAM usage, encoding is currently a serial process, read > decode > encode > write, nothing in that requires high sustained vRAM needs. One's typically only dealing with less than 10 MB/sec video stream so even encoding at 10x real-time requires no significant continuous vRAM usage, nothing significant needs to be retained in vRAM for latter.

The only thing that will require more vRAM in this process are effects done to the individual frames, like AI Style or Fx, or the like, when these are done by the GPU. The attached pic shows 2.8GB vRAM usage vs optodata’s 0.8GB, 3.5x as much. As I indicated, one can create a test to show about anything. Optodata example was probably a pure transcode, mine a PD timeline for vRAM load. One should actually be happy it's not bloatware with massive vRAM leak usage for a typical user timeline content.

Basically, one needs as much vRAM, or for that matter RAM, as required for the editor's timeline content, having an abundance more serves no purpose.

Jeff
[Thumb - PD17_GPU_Memory.PNG]
 Filename
PD17_GPU_Memory.PNG
[Disk]
 Description
 Filesize
10 Kbytes
 Downloaded:
7 time(s)
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
I just now trieded increasing the Priority of PowerDirector(15 in my case) to High.

I had tried that before in Win7. But it didn't make any difference. But when I tried it in Win8.1, my CPU usage jumped from about 70% to 90%.

I usually don' thave anything running in the backgroudn when I'm rendering. So I don't undertand why changing Priorty would make a difference.

But it did.

(The only other thing I changed since my last post was that I updated Comodo Firewall.)
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
Last night I upgraded my CPU from a 4 core AMD Phenom II X4 to an AMD FX-3850 8 core. (No hyperthreading, so 8T.) Am still using my old AM3+ mobo.

(As an aside, I was thinking about doing a total upgrade to a Ryzen 3500, now that the third generation is out. But that would have cost a lot more money (new mobo, new DDR4, and Win10 for the scheduler) and a lot more time (total reinstall of a new OS and all my old programs). And, from what I've been able to find on the net, it would have been only 2x faster than the gain I just got by going to the FX-8250. For $100, I think I got a good bang for the buck. The rest of the money is probably better spent getting a better GPU next.)

So naturally one of the first things I did was to produce a video in PD.

I took a short clip and purposely rotated it, so that it would be CPU intensive for rendering.

At first I was going to report here that PD (15) wasn't using all the cores to their fullest. For example, I was seeing all 8 cores used, but only about 50% each.

But then I remembered that I had Hardware Video Encoder enabled. So I turned that off. And then PD used all 8 cores, all running 100%.

Am glad to see that.

So now as I come back to the bottleneck question, I wonder if I am constrained by BW between my GPU and CPU? Maybe I need to get a 470 mobo before investing in a faster GPU?
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
Credit where credit due: I see that @SoNic67 has already discussed this, and has already suggested that RAM BW and/or PCI BW are new bottlenecks. See his post dated Dec 2, 2018. https://forum.cyberlink.com/forum/posts/list/77870.page
PepsiMan
Senior Contributor Location: Clarksville, TN Joined: Dec 29, 2010 01:20 Messages: 1054 Offline
[Post New]
Quote ...
Will be interesting to hear what you have to say about speed in PD after you upgrade to a 2700x.

howzit?
yes, i did total upgrade to ryzen 7 2700x and it's fast.

old setup: Asrock 970 Extreme4, FX-8370E successfully overclocked to 4.3GHZ(all 8 cores) @ 1.2875v, DDR3 1866 16GB, DeePCooL GAMMAXX 400 CPU Cooler, MSI GTX1060 6GB.

new setup: Asrock Taichi X470, Ryzen 7 2700X successfully overclocked to 4.0GHZ(all 8cores 16T) @ 1.2875v, DDR4 3200 16GB, DeePCooL GAMMAXX 400 CPU Cooler, MSI GTX1060 6GB.

yup, same voltage! HA is good with nVidia driver 411.70

oh, happy happy joy joy. really. ^^

PepsiMan
'garbage in garbage out'
[Thumb - 20190812_094443.jpg]
 Filename
20190812_094443.jpg
[Disk]
 Description
default rendering 1080 FHD 18m 20s .MTS to .MP4 in 16m 35s
 Filesize
1148 Kbytes
 Downloaded:
6 time(s)

This message was edited 1 time. Last update was at Aug 12. 2019 17:34

'no bridge too far'

Yashica Electro 8 LD-6 Super 8mm
Asrock TaiChi X470, AMD R7 2700X, W7P 64, MSI GTX1060 6GB, Corsair 16GB/RAM
Dell XPS L702X i7-2860QM, W7P / W10P 64, Intel HD3000/nVidia GT 550M 1GB, Micron 16GB/RAM
Samsung Galaxy Note3/NX1
JL_JL [Avatar]
Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline
[Post New]
Quote default rendering 1080 FHD 18m 20s .MTS to .MP4 in 16m 35s

As long as you're a happy camper that's what counts.

But:
a) 1.10X realtime encoding is very poor for basic 1080 FHD at your 16Mbps bitrate and very poor performance for a GTX1060 in general. 2-3X realtime probably more the norm.
b) 20% GPU VE load is pretty poor
c) 7 parked CMT's with only 20% GPU VE load, something appears amiss. CPU power should not be needed, maybe that explains the park. Does a CPU encode unpark and fully utilize CMT's?

However, a lot of observation are very dependent on what really your source is other than 1080 FHD and if simple transcoding to MP4 for a test or a more complex timeline.

But, as I said, if you're happy, all is good, everything in the eye of the holder.

Jeff

This message was edited 1 time. Last update was at Aug 12. 2019 19:56

tomasc [Avatar]
Senior Contributor Joined: Aug 25, 2011 12:33 Messages: 6464 Offline
[Post New]
Thank you for sharing the information on your new setup. Assuming that it takes 16 min. to render an 18 min. 1080p60 file. You are using cpu rather than using gpu encoding. In the 16 min. I believe that your cpu temps are reaching 80 degrees Celsius. That can be confirmed with the HWMonitor from cupid.

Setup is ready for 4k 60p editing... Nice setup with windows 7.

Would like to know what is your produce time using the GTX 1060 for this same project.
PepsiMan
Senior Contributor Location: Clarksville, TN Joined: Dec 29, 2010 01:20 Messages: 1054 Offline
[Post New]
i've updated GTX960 Performance Comparisons we did while back. Click here to access the data sheet.

and attached PD14 in action.

happy happy joy joy

PepsiMan
'garbage in garbage out'

p.s.
<60° C
 Filename
20190812_211422.mp4
[Disk]
 Description
kite.wmv to 4K MKV
 Filesize
5064 Kbytes
 Downloaded:
430 time(s)
[Thumb - 20190812_094443.jpg]
 Filename
20190812_094443.jpg
[Disk]
 Description
never >80° C, <60° C
 Filesize
1105 Kbytes
 Downloaded:
6 time(s)

This message was edited 2 times. Last update was at Aug 12. 2019 22:44

'no bridge too far'

Yashica Electro 8 LD-6 Super 8mm
Asrock TaiChi X470, AMD R7 2700X, W7P 64, MSI GTX1060 6GB, Corsair 16GB/RAM
Dell XPS L702X i7-2860QM, W7P / W10P 64, Intel HD3000/nVidia GT 550M 1GB, Micron 16GB/RAM
Samsung Galaxy Note3/NX1
pmikep [Avatar]
Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline
[Post New]
I'm confused by Pepsiman's results. Is this the 10x Kite benchmark that we're talking about?

If so, I ran it on my newly upgraded 8 core AMD-8350, which is similar to what he was running before his Ryzen.

I get a speed of about 1 and a half minutes, on par with others with the GTX-960.

But I'm using a 398 driver (pre-415) on PD 15.

I see that he is using a pre-415 driver too. But I'm wondering if there is an issue between his 1060 and PD-14? Because, in addition to the long time to render, I notice his GPU load is very small. As if his GPU isn't being used to render.

Am I missing something?
Powered by JForum 2.1.8 © JForum Team