Announcement: Our new CyberLink Feedback Forum has arrived! Please transfer to our new forum to provide your feedback or to start a new discussion. The content on this CyberLink Community forum is now read only, but will continue to be available as a user resource. Thanks!
CyberLink Community Forum
where the experts meet
| Advanced Search >
PD12 efficiency with MultiCore-CPU & Nvidia CUDA...
HPmanic [Avatar]
Newbie Location: U.S. Joined: Nov 02, 2013 19:30 Messages: 7 Offline
[Post New]

Howdy!

I've been with PD10 for sometime, and have now upgraded to PD12 (so far, seems pretty good). My first post, so (please) be gentle... 8

Currently running a HP Z800 workstation, with twin Xeons X5660 (2 NUMA nodes), 24GB of ECC/Buffered RAM and Nvidia FX3800, with freshest drivers I could find during last 48hrs (to put it simply).

Some preliminar tests:

1. CPU-based encoding of [DV-AVI (25Mbps)] => [H.264 (AVCHD) pcm audio] yields 4.5-6.0 real-time mins. per 60mins-footage.
2. GPU-based encoding of [DV-AVI (25Mbps)] => [H.264 (AVCHD) pcm audio] yields 6.0-7.5 real-time mins. per 60mins-footage.
3. CPU-based encoding of [1080i QAM capture] => [H.264 (AVCHD) Dolby5.1] yields ~30mins real-time mins. per 60mins-capture.
4. GPU-based encoding of [1080i QAM capture] => [H.264 (AVCHD) Dolby5.1] yields ~35mins real-time mins. per 60mins-capture.
(still working on 1080i testing, so take #3 and #4 with grain-of-salt).

Now, some questions:

1. PD12's CPU-based enconding above DOES NOT use 24 logical cores. Only 12-cores seem active. Why? Is this a purposeful multi-core optimization? Is the FX3800 still being used for something, even HW-enconding is unchecked in Production mode / tab?

2. Will upgrading to Quadro Kepler-4000 yield in better H264 performance (*CUDA*-based)? In other words, would increasing from 192 to 768 CUDA cores end-up in improved H264 ENCODING performance? Or there is no gain expected for this task?

3. When will Kepler-4000's NVENC HW-based tuned-encoder be supported in PD12?

MANY thanks, in advance, for your responses!

RobAC [Avatar]
Contributor Joined: Mar 09, 2013 18:20 Messages: 406 Offline
[Post New]
Hello welcome-

[Edit to add:]

The Xeon X5660 is listed as a 6 Core / 12 thread processor. So your dual CPU setup = 12 Core / 24 Thread. (Unless I am missing something.) So do you mean Power Director 12 is using the 6+6= 12 Cores and not the 12+12 = 24 Threads ?
-----------------------

I was thinking about building a workstation the other day to do heavy rendering work for my other software (in addition to using it for Power Director.) So very interested in yours and others observations regarding dedicated Server / Workstation performance. I would definitely like to see whatever additional test times you have.

Thanks for posting.

Rob

This message was edited 1 time. Last update was at Nov 06. 2013 03:40

PD 14 Ultimate Suite / Win10 Pro x64
1. Gigabyte Brix PRO / i7-4770R Intel Iris Pro 5200 / 16 GB / 1 TB SSD
2. Lenovo X230T / 8GB / Intel HD4000 + ViDock 4 Plus & ASUS Nvidia 660 Ti / Link: https://www.youtube.com/watch?v=ZIZw3GPwKMo&feature=youtu.be
HPmanic [Avatar]
Newbie Location: U.S. Joined: Nov 02, 2013 19:30 Messages: 7 Offline
[Post New]

A quick update here:

Simply cannot find the info. I need, so I ordered a Quadro K4000, which is expected to arrive pretty soon.

Will be installing, tuning and testing, and will come back with some results.

I CANNOT, however, determine yet why/how PD12 is only using 12-threads, instead of the 24-available (it may be a purposeful multi-core optimization driven by law of diminishing returns and other testing, for instance, but I do not see all cylinders firing up, I have to say).

Stay tuned!
JL_JL [Avatar]
Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline
[Post New]
As Rob indicated, the Xeon X5660 is a 6 core proc, if you are seeing 24 cpu graphs in like the Windows Task Manger with your dual cpu Z800 you most likely have Hyper-Threading technology enabled in the BIOS. PD does take advantage of Hyper-Threading, you can also see it referenced in the specs under CPU, http://www.cyberlink.com/products/powerdirector-ultimate-suite/spec_en_US.html .

My experience is that Hyper-Threading can be beneficial for PD and I have seen no significant downside for PD. With Hyper-Threading enabled, memory requirements do increase some.

Below are a few extracts from a significant amount of testing to highlight a few characteristics.

Input, 1920x1080 24Mbps H.264 footage.

Hyper-Threading enabled: Output, 720x480 8Mbps H.264 footage, CPU encoded, CPU 45% avg on 8 CPUs, 80seconds
Hyper-Threading disabled: Output, 720x480 8Mbps H.264 footage, CPU encoded, CPU 75% avg on 4 CPUs, 79seconds

Hyper-Threading enabled: Output, 1920x1080 28Mbps H.264 footage, CPU encoded, CPU 100% avg on 8 CPUs, 230seconds
Hyper-Threading disabled: Output, 1920x1080 28Mbps H.264 footage, CPU encoded, CPU 100% avg on 4 CPUs, 285seconds

So getting full CPU usage is format dependent, a 0-20% reduction in encode time for Hyper-Threading was also observed.

i7-3770
Win7 64, 16GB RAM

Jeff
HPmanic [Avatar]
Newbie Location: U.S. Joined: Nov 02, 2013 19:30 Messages: 7 Offline
[Post New]
Thanks!

In my case, only TWELVE (12) THREADS fire-up, instead of the TWENTY-FOUR (24) available. This equates to 28%-30% cpu usage, at best, because the twelve threads that fire, turn out to be not saturated / exhausted from available processing bandwidth.

In contrast, CineBench fires up all 24 available "cylinders" and it just quite a scene to watch those twin Xeons chewing the benchmark, with both NUMA nodes reporting FULL steam (95% to 100% use)

Therefore, in my setup (with twin X5660s), PD12 is not using all available fire-power. Now, this does not necessarily means that it has to do it (there are some specific types of task that require multi-core optimization, and sometimes using less core or threads means better performance, as I have seen in other similar apps).

This is a question for PD12's development team to answer (maybe my processors are not actually supported?)
RobAC [Avatar]
Contributor Joined: Mar 09, 2013 18:20 Messages: 406 Offline
[Post New]
Yep, I would definitely ask Cyberlink Tech support about this one.
Let us know what they say, thanks.

Rob
PD 14 Ultimate Suite / Win10 Pro x64
1. Gigabyte Brix PRO / i7-4770R Intel Iris Pro 5200 / 16 GB / 1 TB SSD
2. Lenovo X230T / 8GB / Intel HD4000 + ViDock 4 Plus & ASUS Nvidia 660 Ti / Link: https://www.youtube.com/watch?v=ZIZw3GPwKMo&feature=youtu.be
HPmanic [Avatar]
Newbie Location: U.S. Joined: Nov 02, 2013 19:30 Messages: 7 Offline
[Post New]
UPDATE (and a bit of demystifying, as well 8:

Finally got my Nvidia (Kepler) 4000 locked-and-loaded. Installed, booted and configured in a breeze. Fired-it up, tuned it a bit on NVidia's Control Panel, and this is what I got:


1. CPU-based encoding of [DV-AVI (25Mbps)] => [H.264 (AVCHD) pcm audio] 2mins-clip coded in ~10secs. (same)
2. GPU-based encoding of [DV-AVI (25Mbps)] => [H.264 (AVCHD) pcm audio] 2mins-clip coded in ~9secs. (down from 13!)
3. CPU-based encoding of [1080i QAM capture] => [H.264 (AVCHD) Dolby5.1] 2mins-clip coded in ~40secs. (down from 60secs!)
4. GPU-based encoding of [1080i QAM capture] => [H.264 (AVCHD) Dolby5.1] 2mins-clip coded in ~39secs. (down from 75secs!)

NOTES:

1. GPU-based encoding fires-up TWELVE (12) threads, with mid-low total CPU usage. (~35%)
2. CPU-based encoding fires-up TWENTY-FOUR (24) threads, with mid-high total CPU usage (~65%)
3. Cinebench GPU (OpenGL) scores virtually DOUBLED with the K4000.

BOTTOM-LINE:

1. The Nvidia Kepler-4000 is quite a boost in performance.
2. The K4000 substantially *reduced* CUDA encoding times, in both SD or HD clips.
3. The K4000 also helped improve CPU-based encoding times (?). This was not expected, which means that there may be some work still being done by Nvidia, even if "HW-accel" is checked-off (e.g. the CPU keeps waiting along the line).
4. Based on thread-usage and bandwidth of CPU-based load, it seems that there is STILL room for improvement, if the card was faster (CPU is not yet fully used).
5. Powerdirector DOES NOT seem (yet) to be using K4000's NVENC. I do wonder how much extra improvement could be attained.

Just sharing some interesting findings, and still room for improvement!

This message was edited 1 time. Last update was at Nov 13. 2013 22:11

NicolasNY
Senior Contributor Location: Caracas Joined: Sep 28, 2008 17:49 Messages: 805 Offline
[Post New]
Hello,

I’m a PD ..many years ago. I'm “trying” to setup my new workstation, but its not easy to select the correct CPU. My options, in price range are:

i7-4770k ( 4Cores x 10 Threads)
i7-4930k ( 6Cores x 12 Threads)
or a dual 6 core XEON.

I have found this topic very useful. HPmanic give some information that "moderators" should take a look and comment about it. I also appreciate if JEFF can give more numbers and rendering times for his i7-3770 setup that can be compared with the Xeon performance (like HPmanic post). The i7-4930K will be in the middle on what it’s going to be my final CPU selection.

Nicolas
Powered by JForum 2.1.8 © JForum Team