Announcement: Our new CyberLink Feedback Forum has arrived! Please transfer to our new forum to provide your feedback or to start a new discussion. The content on this CyberLink Community forum is now read only, but will continue to be available as a user resource. Thanks!
CyberLink Community Forum
where the experts meet
| Advanced Search >
PowerDirector performance problems with AMD CPU + nVidia video card
Julien Pierre [Avatar]
Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline
[Post New]
Some people here might be interested in this, especially if they have an AMD CPU and nVidia video card.

http://blog.madbrain.com/2012/03/powerdirector-9-performance-problem-on.html MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)
JL_JL [Avatar]
Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline
[Post New]
Somewhat interesting, you really don't give enough detail to really understand what might be going on. Here is my take on your results.

To me, PD offers several decoding/encoding options and what works best is very platform dependant from both speed and quality perspective. CL provides very little details in the docs on what the different settings do so one needs to use GPU monitors, CPU monitors, and the like to reverse engineer what the settings are doing. The commentary below is for your single SVRT compatible clip in the timeline with NO editing and your results table in the link.

column 1, SVRT is used so performance really driven by platform I/O capability, columns 4, 7, 10 same settings
column 2, GPU is used to decode video stream and GPU is used to encode video stream, so GPU/GPU. column 5 same settings
column 3, GPU is used to decode video stream and CPU is used to encode video stream, so GPU/CPU. column 6 same settings
column 8, CPU is used to decode video stream and GPU is used to encode video stream, so CPU/GPU. column 11 same settings
column 9, CPU is used to decode video stream and CPU is used to encode video stream, so CPU/CPU. column 12 same settings

You should get same results for same setting columns provided no other windows process played a roll. What you don't provide is the setting of the preview window and if you indeed had it the same between non SVRT tests.

On my system, the worst performance is the CPU/CPU configuration when preview window is on; one is asking the CPU to essentially do it all. For a GPU/XXX configuration, the effect of the preview window is mundane as the GPU is already doing the decoding anyway. Quality is also an issue, currently for many CPU encoding provides better quality over GPU encoding. SVRT is fast, but depending on what transitions/editing one has done, quality (jumps/jerks) can be an issue. Obviously, if/when they get the SVRT route working correctly; it could offer excellent results for quality and speed.

Jeff
Julien Pierre [Avatar]
Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline
[Post New]
Jeff,

Quote:
To me, PD offers several decoding/encoding options and what works best is very platform dependant from both speed and quality perspective. CL provides very little details in the docs on what the different settings do so one needs to use GPU monitors, CPU monitors, and the like to reverse engineer what the settings are doing.


Right. I really wish that they would provide more info.

Ideally, I think PD should run some sort of benchmark on each machine when it's installed, sort of what Windows does for its "scoring" system. It could prompt users to rerun the benchmark when there is a relevant hardware change.
Then users might not have to worry so much about the settings.


The commentary below is for your single SVRT compatible clip in the timeline with NO editing and your results table in the link.


To clarify, there is no editing done whatsoever in any of the tests. It is always the same project. I just change the target rendering options and program preferences for the individual tests.


column 1, SVRT is used so performance really driven by platform I/O capability, columns 4, 7, 10 same settings


Right, that's my takeaway too. CUDA/hardware decoding/hardware encoding become irrelevant for those cases, which is great.


column 2, GPU is used to decode video stream and GPU is used to encode video stream, so GPU/GPU. column 5 same settings


Well, that's in theory... But as I observed, I don't think the hardware encoder is getting used at all on the AMD box.
Compare results from column 2 and 3 on the AMD - they are identical, both 145.

Same for column 5 and 6, 145 again. It would have to be an enormous coincidence for the hardware encoder to take the exact same amount of time as the software encoder.

So, my guess is that the hardware encoder is actually off for columns 2, 3, 5, 6 on the AMD - but it is on for the Intel for columns 2 and 5.


column 3, GPU is used to decode video stream and CPU is used to encode video stream, so GPU/CPU. column 6 same settings


Yes, the only difference between columns 3 and 6 is the CUDA, but that has no effect in any of my tests. Maybe because I don't use effects, or live preview.


column 8, CPU is used to decode video stream and GPU is used to encode video stream, so CPU/GPU. column 11 same settings


Right, once again the only difference between columns 8 and 11 is CUDA / no CUDA, and it made no difference.


column 9, CPU is used to decode video stream and CPU is used to encode video stream, so CPU/CPU. column 12 same settings


Yes, again, 9 and 12 are identical parameters except for CUDA.

The times are slightly different, but since those tests are long, I didn't rerun the tests multiple times to figure out if this was statistically significant - my guess is that it was not.


You should get same results for same setting columns provided no other windows process played a roll. What you don't provide is the setting of the preview window and if you indeed had it the same between non SVRT tests.


I had openoffice running on the Intel box just to record the data. No other apps running on either box.
Power settings were set to "high performance" on both systems.
I was not using the preview in Powerdirector.


SVRT is fast, but depending on what transitions/editing one has done, quality (jumps/jerks) can be an issue. Obviously, if/when they get the SVRT route working correctly; it could offer excellent results for quality and speed.


I haven't seen problems with the SVRT but I have only done very basic things with it.
I have seen cases where just changing the audio triggered a new encode when it should not have.
But I just did a simple test with one audio track, and muting the audio from the video footage, and that problem did not occur, SVRT worked full speed without a reencode of the video as I expected.
MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)
JL_JL [Avatar]
Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline
[Post New]
Quote: Well, that's in theory... But as I observed, I don't think the hardware encoder is getting used at all on the AMD box.
Compare results from column 2 and 3 on the AMD - they are identical, both 145.

Same for column 5 and 6, 145 again. It would have to be an enormous coincidence for the hardware encoder to take the exact same amount of time as the software encoder.

So, my guess is that the hardware encoder is actually off for columns 2, 3, 5, 6 on the AMD - but it is on for the Intel for columns 2 and 5.

I don't think I'll convince you, so test and use what works for you and your platform. For your columns 2 and 3, here are my results with a AMD 1090T and a GTX580 GPU, column 2, 5.22min, column 3, 13.58min, footage was 12min of 24Mbps 1920x1080. The GTX580 is a very capable GPU, nearly the best yet today, the 1090T, seen its day and now a mid level CPU.

Jeff
Julien Pierre [Avatar]
Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline
[Post New]
Jeff,

Quote:
I don't think I'll convince you, so test and use what works for you and your platform. For your columns 2 and 3, here are my results with a AMD 1090T and a GTX580 GPU, column 2, 5.22min, column 3, 13.58min, footage was 12min of 24Mbps 1920x1080. The GTX580 is a very capable GPU, nearly the best yet today, the 1090T, seen its day and now a mid level CPU.


Are you using Win 7 64 bit also ? And are your nVvidia drivers 296.10 also ?

It certainly sounds like the accelerator is being used on your system for column 2. Just not on mine, unfortunately.
Would you mind testing columns 8 and 9 too ? These are the same as 2 & 3, but with hardware decoding turning off.

I just want to see if you are seeing a decrease in performance, as one would expect, or a massive increase, like I actually experienced between 2 ->8 , and to a lesser extent 3 -> 9 also .


I'm certainly considering some hardware upgrades, but I have to figure out which are worthwhile.

A new GPU is a consideration. The GTX 580 is one option. The GTX 680 is also supposed to be available by wednesday at my local Micro center in Santa Clara's for $499 (EVGA card, 2GB ). Not sure if it's worth taking the chance with the brand new untested chip, it probably won't work right out of the box. I don't particularly like to be the guinea pig. On the other hand, maybe the prices for the GTX 580 will drop and I will be able to buy one for a decent price.

A new CPU is another consideration, I can go to the Phenom II x6 1100T. Fry's last one is an open box for $189. My 1055T won't overclock at all in my system. Unfortunately, my GA-890GPA-UD3H rev 2.1 motherboard is only AM3, not AM3+ like the rev 3.0/3.1 , and thus it cannot take the newer FX-8120 or FX-8150 chips If the chip was $100 or less, I would jump on it but at $189 for 18% more clock speed, I hesitate.

There is a deal at Microcenter that runs only until tomorrow where I can get an FX-8120 + GA-990FXA-UD3 motherboard for $250. The deal only applies to that combo. Except I don't care for the time it takes to swap motherboards in the computer myself, it's not my idea of how to spend my sunday evening. That and all the probable OS/software reinstall. As far as I know the FX socket is dead, there won't be any other chip beyond the FX-8150. There are more PCI-E slots in the UD3 motherboard, but they are still PCI-E 2.0 only. Even at this price I'm not sure it's worth the pain.

MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)
JL_JL [Avatar]
Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline
[Post New]
Column 8, 5.82min
Column 9, 14.20min
All fractional minutes in my tests, not (min:sec)

At the time of my PD9 testing I was using 285.62. I am currently using 296.10 but have not completed the results for PD9 and the newer driver. I had pretty much moved on to PD10 and rarely ever use PD9 although from decoding/encoding perspective and timed test I typically do I had not seen significant difference on my platform between PD9 and PD10.

I won't get into a raw how to configure a PC question here, this is a PD forum and I tend to limit the discussion on PD characteristics with hardware, a fine line but various posts get locked over the years for straying. For me, I only use CPU encoding with PD for any H.264 final piece of work, for drafts I always use SVRT or GPU because on my current platform it's fast. As mentioned earlier, I use CPU because of a few subtle quality problems that tend to appear now and then when the clip has edit type features. The quality issues are very subjective, they bother some users and other users are just happy so it’s really what works for you and your distribution media (DVD, BD, YT...)and playback media (PC, media player, large sceen TV...).

Jeff
Julien Pierre [Avatar]
Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline
[Post New]
Jeff,

Quote: Column 8, 5.82min
Column 9, 14.20min
All fractional minutes in my tests, not (min:sec)


Thanks ! You are definitely getting expected results, lucky you.


I won't get into a raw how to configure a PC question here, this is a PD forum and I tend to limit the discussion on PD characteristics with hardware, a fine line but various posts get locked over the years for straying. For me, I only use CPU encoding with PD for any H.264 final piece of work, for drafts I always use SVRT or GPU because on my current platform it's fast. As mentioned earlier, I use CPU because of a few subtle quality problems that tend to appear now and then when the clip has edit type features. The quality issues are very subjective, they bother some users and other users are just happy so it’s really what works for you and your distribution media (DVD, BD, YT...)and playback media (PC, media player, large sceen TV...).


Good to know. I am not a pro and I have not seen the quality issues with the GPU encoding yet.
I have seen crasher bugs with PD10 trial which prevent me from upgrading at this time.

From the hardware side, it looks like my options are very limited. My current case won't accept a card as long as the GTX 580 . The 9800GT I have is about 8.5" long and that's the max that will fit. A GTX 460 is probably the only GPU upgrade option if I only switch the GPU. My GA-890GPA-UD3H motherboard won't do SLI, so it would have to be just one nVidia GPU. Crossfire would work, but I would have to investigate the ATI cards, but I know nothing about them.
If I'm going to need a new case to fit a long GPU, I may as well upgrade the motherboard/CPU as well. This gets expensive. sigh.
MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)
JL_JL [Avatar]
Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline
[Post New]
I have no knowledge that PD supports Crossfire or SLI GPU's so I'd proceed with caution before you drop $$$'s in a system like that. By support I mean a signficant GPU render advantage, I'm sure PD will function. This maybe of interest to you http://forum.cyberlink.com/forum/posts/list/14356.page

Jeff
Julien Pierre [Avatar]
Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline
[Post New]
Jeff,

Quote: I have no knowledge that PD supports Crossfire or SLI GPU's so I'd proceed with caution before you drop $$$'s in a system like that. By support I mean a signficant GPU render advantage, I'm sure PD will function. This maybe of interest to you http://forum.cyberlink.com/forum/posts/list/14356.page

Jeff


Yeah, I am definitely not doing it without some hard data. I am going to try to move both of my GT9800 into the one system and see if I get any encoding speedup or not. The motherboard doesn't support SLI but that may not be required to use both GPUs at once. The two video cards would drop down to PCI-E 2.0 x8 . I am not sure that bandwidth is the limiting factor with those GPUs though, I think the processors/cores are.
I may also try using the crappy ATI chip built in to the GA-890GX-UD3H mobo just to see what kind of encoding performance it gets. That card won't drive my 2560x1600 monitor, tho, but I can still use the other monitor at 1920x1200.
MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)
Julien Pierre [Avatar]
Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline
[Post New]
Just finished those tests.

With the motherboard built-in ATI HD 4290, hardware encoder was not available in PD9. Even with the latest Catalyst 12.3 and setting the "acceleration" in the Catalyst manager. Was pleasantly surprised that it could drive the 30" at 2560x1600 after all, with HDCP even.

Also tried with the two XFX nVidia 9800GT GPUs together. The times did not change at all for columns 8 and 11 - the best time remains 40s. That's onclusive evidence that PD9 does not use both nVidia cards.

However, for columns 2 and 5, time went down from 145s to 52s, inexplicably.

When I went back to the one GPU, in the same PCI-E slot, time for column 2 was still 52s.

Something changed for the better as a result of uninstalling the nVidia drivers, enabling the ATI in the BIOS, installing the ATI drivers and then reinstalling the nVidia drivers.

There is no way to uninstall the ATI drivers now to revert to the previous configuration where the 145s time was showing.

The new 52s for column 2 is now much faster than column 3 at 145s, so that's evidence that's the GPU encoder is finally being used in column 2.

Still, 52s for column 2 is longer than 40 for column 8, so disabling the hardware decoding still provides a benefit on my system. Perhaps the speedup from 2->8 is is explained by the GPU being spread too thin doing both encoding and decoding tasks. Doing software decode and hardware encode is still the fastest on my system.

Another thing, while my two XFX 9800GT 512MB are both the same model card, they are not the same revision. One takes a PCI-E power input and the other doesn't. They had exactly the same performance, though, when I tried them separately in the PCI-E x16 slot.

I also tried only one GPU in the PCI-E x8 slot, and nothing in the x16 slot.

There was only a minimal slowdown for the best time in column 8, from 40s to 41s even after many iterations.

For column 2, there was a slightly more noticeable difference, 54s in the x8 slot vs 52s in the x16 slot.

This would match well with the theory that the GPU is now really doing both decodes and encodes, and thus needing more bandwidth.

I may update my table in the blog to make more sense of all this...

It still doesn't tell me what GPU to get, but at least I know not to get two nVidia.

I did not test PD10 at all.

This message was edited 1 time. Last update was at Apr 03. 2012 03:57

MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)
Julien Pierre [Avatar]
Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline
[Post New]
I updated the blog page at
http://blog.madbrain.com/2012/03/powerdirector-9-performance-problem-on.html

Looks like the main root of the performance problem was the order of installing the drivers - nVidia and ATI.

ATI Catalyst drivers have to be installed for the motherboard chipset.

When installing the ATI catalyst drivers last, the problem with hardware encoding not being used when hardware decoding is also used goes away. Very strange.

I bought a 560 Ti tonight at Microcenter's. I'm somewhat disappointed.

The best rendering time for this test was 40s with the 9800GT, and now it is 31s. Only 22.5% shorter.

PD10 trial is still much slower than PD9, though. I didn't post any numbers. I wonder if it's due to the extra cost of encoding the watermark onto the video.
MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)
Powered by JForum 2.1.8 © JForum Team