Multi GPU support question

Julien Pierre

Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline

Solved by

Nov 19, 2012 20:05

I just downloaded the trial for PD11 yesterday.
I would like to know more about the multi GPU support for encoding.

1) is it limited to 2 GPUs ? Or will it help to add a 3rd or 4th one if the motherboard is capable ?
2) which encoding formats are accelerated by multiple GPUs? Does it help any other tasks ?
3) do the 2 GPUs have to be the same ?
The combinations I have are one box with 2 x 9800GT and another box with 1 x 560 Ti and 1 x 8600 GT

MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)

JL_JL

Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline

Nov 19, 2012 20:46

Some discussion here that may help you. http://forum.cyberlink.com/forum/posts/list/24947.page

For your two combinations suggested, my experience would indicate you won't get any Multi-GPGPU benefit unless your MB supports Intel Sandy/Ivy Bridge then you can get the benefit of that and one of your other vga devices like the 560Ti.

From the little encoding work I have done, SLI with Nvidia is seen as 1 device for PD and really no faster than a single GPU. 1 GPU (GTX580) and my Ivy CPU (i7-3770) can speed things up marginally for H.264 (all I've played with). I did not notice any significant help with other editing tasks, I only use a single monitor though.

I would like to evaluate further, but for me I need to think of a better quality measurement (video, audio, effect transitions, periodic black frames, audio ticks in transitions.....) as it's not constant with encoding approach and that's what I care about more than speed. In the interim, plain old CPU encoding often yields the best quality for me so sales hype dies out some.

Jeff

Julien Pierre

Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline

Nov 19, 2012 21:14

Hi,

Quote: Some discussion here that may help you. http://forum.cyberlink.com/forum/posts/list/24947.page

For your two combinations suggested, my experience would indicate you won't get any Multi-GPGPU benefit unless your MB supports Intel Sandy/Ivy Bridge then you can get the benefit of that and one of your other vga devices like the 560Ti.

Both PCs are AMD-based, not Intel. So of course they are not going to have anything Sandy bridge whatsoever, or any kind of Intel graphics. I didn't see anything in the Cyberlink literature about the multi GPU being for Intel only.

One PC is running a Gigabyte 990FX-UD3 motherboard with a an AMD FX-8120 Bulldozer 8 core CPU. The motherboard doesn't have any discrete graphics. That's the one with the two 9800GT . The motherboard supports SLI and I have the bridge connected.
However, the SLI is actually counter productive in my triple monitor setup, so I have it disabled. I don't remember the exact issue but I think it had to do with the maximum number of displays in SLI mode.

The other PC is running a Gigabyte 890GPA-UD3H motherboard with a Phenom II x6 1055T 6 core CPU.
That one has the 560 Ti and the 8600 GT.
The motherboard has discrete AMD graphics but I have them disabled because the AMD and nVidia video drivers conflict in Windows, even if I don't attach anything to the motherboard video port.

I am running triple monitors - two 30" 2500x1600 dual-link DVI and one 24" 1920x1200 single-link DVI. . Both machines share the 3 monitors . The first two monitors have built-in switches, and the 3rd monitor is on a separate switch. So I need at least 3 ports per machine, this is the main reason for having multiple video cards. Sadly on the second machine I was not able to use one nVidia GPU + the motherboard GPU due to the driver conflict ...

From the little encoding work I have done, SLI with Nvidia is seen as 1 device for PD and really no faster than a single GPU. 1 GPU (GTX580) and my Ivy CPU (i7-3770) can speed things up marginally for H.264 (all I've played with). I did not notice any significant help with other editing tasks, I only use a single monitor though.

I didn't see anything in the Cyberlink literature about SLI in relation to the Truevelocity 3 multi GPU acceleration, so I don't think one requires the other. Just like I can run my nVidia cards just fine as separate devices without SLI. I would think it s up to the CUDA and/or OpenCL layers to expose the hardware capabilities and I don't think SLI or Crossfire is required.
Certainly SLI or Crossfire couldn't ever work with an AMD or nVidia discrete GPU + Intel motherboard graphics. So let's not confuse different issues, and let's forget about SLI/Crossfire.
I would like an official statement from Cyberlink about what multi GPU configurations are or aren't supposed to help.

I did some quick benchmarking yesterday and PD11 did seem quite a bit faster than PD10 using the hardware encoding for some cases. But I did not try to pull out one of the cards in either system to see if that was due to having the extra GPU.

I would like to evaluate further, but for me I need to think of a better quality measurement (video, audio, effect transitions, periodic black frames, audio ticks in transitions.....) as it's not constant with encoding approach and that's what I care about more than speed. In the interim, plain old CPU encoding often yields the best quality for me so sales hype dies out some.

In some of my projects, both PD10 and PD11 require the GPU hardware encoded to be used, and CPU encoding is not even an option. If I uncheck the box for "fast video processing", I can't start the encoding. PD complains and forces it back on. I don't know why. It may be due to different files from 3 different cameras. With other projects, CPU works OK.

Regardless, I would like to know which multiple GPU configurations are supposed to help which tasks.
MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)

JL_JL

Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline

Nov 20, 2012 10:32

If you want a official statement from CL, I'd suggest you contact them, http://www.cyberlink.com/prog/support/cs/contact-support.jsp In lieu of that, the supplied quoted text from the "Cyberlink product Manager" in the last comment of the link I had attached does provide some insight.

If I find the time, maybe I'll write up the detailed results of the small test case I used in order to try and figure out PD11 Multi-GPGPU support as it is not well documented as you indicated. That post will surely create another uncivilized flaming war. Anyhow, from my study, I think one needs at least the following;

1) A CPU that has a Integrated GPU, like the Ivy/Sandybridge
2) A MB/chipset configuration that supports the Intel Flexible Display Interface (FDI). Simplistically, FDI is an interconnect that allows communication of the HD Graphics integrated GPU of the CPU with your display connectors.
3) A standalone GPU that supports OpenCL
4) A properly configured BIOS to adjust interaction of the integrated CPU GPU and the installed display GPU to properly get the Multi-GPGPU effects CL discusses for PD11. Not configured correctly, you simply get the integrated CPU GPU as your device or the installed GPU as your graphic device.

With the above I appear to get the Multi-GPGPU features CL discusses, my installed GTX580 can see significant load even though in some test configurations I have NO display attached to it, my single display was attached to the integrated GPU. When the integrated CPU GPU can not handle the encoding load of video effects which have AMD/nVidia/Intel icon, the GTX580 kicks in and assists. This is achieved through the FDI communication layer. I believe this is how the CL chart in the product page ( http://www.cyberlink.com/products/powerdirector-ultimate-suite/features_en_US.html ) with the Intel HD4000 + GTX680 assists during a MPEG-2 encoding which natively the GTX680 does not support MPEG-2 encoding through hardware acceleration.

Just my thoughts after playing with it on a dedicated box and doing some timed testing and monitoring in an attempt to figure out what Multi-GPGPU in PD11 is doing and if in fact it might be right for my projects. Depending on your hardware, timelines and video formats and outputs, your adaptation could be very different.

Jeff

This message was edited 1 time. Last update was at Nov 20. 2012 18:15

Robert2 S

Senior Contributor Location: Australia Joined: Apr 22, 2009 05:57 Messages: 1461 Offline

Nov 20, 2012 17:34

On my system with dual Nvidia GTX 460's in SLI mode, PowerDirector 11 does not use both cards.

Using MSI Afterburner it shows only one of my video cards are being used when I produce my videos.

One last tip I have found when I upgrade my Nvidia video card drivers I have to go into the Nvidia control panel and re-tick enable SLI.

Bottom line even using one video card in CUDA doesn't save that much time and with some videos produces artefacts in the final video, so I don't use it at all. My youtube channel====> http://www.youtube.com/user/relate2?feature=mhsn

Michael8511

Contributor Location: U.S.A. Indiana Joined: Jan 14, 2012 16:12 Messages: 374 Offline

Nov 20, 2012 18:48

The why I take it is it will use the Intel Quick Sync and the NVIDIA or ATI card GPU at the same time. Intel i7 5960X overclock to 4 Ghz 16 GB of ram.
GoPro 4
Canon VIXIA HF G10
Canon EOS Rebel T3
Canon EOS 70D
My Vimeo Channel http://vimeo.com/user3339631/videos

JL_JL

Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline

Nov 20, 2012 20:38

Quote: The why I take it is it will use the Intel Quick Sync and the NVIDIA or ATI card GPU at the same time.

Correct, new feature supported in PD11 so just curious what it might offer.

Many threads have been posted about Quick Sync

http://forum.cyberlink.com/forum/posts/list/22732.page
http://forum.cyberlink.com/forum/posts/list/17981.page
http://forum.cyberlink.com/forum/posts/list/23135.page

as well as many others. However, until PD11, PD never really took advantage of both the Quick Sync and the GPU for Multi-GPGPU encoding, it was one or the other for desktops.

Jeff

Julien Pierre

Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline

Nov 21, 2012 07:43

Well, I just spent about 5 hours benchmarking PD 8,9, 10, and 11 with 4 different clips on 2 different machines with 4 nVidia video cards.

The whole data set is at

https://docs.google.com/spreadsheet/ccc?key=0AleSSO_7gwqedExrY01oNXVGYWJLdVBwY2ZnZmZkdnc

Make sure to look at the 4 different sheets at the bottom (benchmark1, benchmark2, benchmark3, benchmark4).

The conclusions are very sad.

Having multiple nVidia GPU actually significantly *reduces* the speed of rendering with all versions of PowerDirector.

If I physically pull one of the cards, or just disable the second card in Device manager, the speed is much improved.

Some example data, with benchmark3, the clip that takes the longest to encode.

From the Phenom x6 3.8 GHz box :

PD10 / single GPU, 560 Ti : 98 s
PD10 / dual GPU, 560 Ti + 8600 GT : 134s

PD11 / single GPU, 560 Ti : 145 s
PD11 / dual GPU, 560 Ti + 8600 GT : 182s

From the FX-8120 4.0 Ghz box :

PD10 / single GPU, 9800 GT : 254s
PD10 / dual GPU, 9800 GT : 543s

PD11 / single GPU, 9800 GT : 260s
PD11 / dual GPU, 9800 GT : 597s

There is something really, really wrong there.
While I didn't really expect mixing GPUs of different types was going to help much, I was extremely surprised to find that having two identical 9800 GT more than doubles the encoding time !

Definitely a case of less is more.

In most of my tests, PD11 came in significantly slower than PD10, but there are a few cases where it wins with benchmark1 and benchmark2 on the FX-8120 with the hardware encodes.

PD11 didn't let me do any software encodes, but I think that's because I'm using the trial version. For all the other versions of PD (8, 9, 10) I have registered licenses.

PD8 and PD9 can't coexist, but it's possible to have either PD8/PD10/PD11 or PD9/PD10/PD11.
Of the four versions overall, PD9 seems to be the fastest the majority of the time ...
MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)

JL_JL

Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline

Nov 21, 2012 13:53

Quote: There is something really, really wrong there.
While I didn't really expect mixing GPUs of different types was going to help much, I was extremely surprised to find that having two identical 9800 GT more than doubles the encoding time !

Thanks for sharing your detailed data, I'm not so sure I find your results surprising nor necessarily anything PD is doing. What occurs can be slightly more complicated and your testing maybe brought this out while testing PD. To my knowledge, the PD versions being tested do not support nor claim to support, multiple GPU's in configurations you are trying to test. So one would really need to know your MB and chipset to rationalize PD's performance in the tested results a little better. On MB's you will most typically find 2 physical x16 sized slots for graphics cards these days. So, one has 16 lanes supported for data. Since you are AMD, the 990FX, 890FX and 790FX chipsets have support for a total of 42 PCI Express lanes and 880G, 870, 790GX, 785G chipsets have support for a total of 22 PCI Express lanes. These lanes also support other controllers. What the MB manufacturers do with these lanes is up to them and simply documented in the manuals.

Often, if only one graphic card is used in the primary slot, all 16 lanes from the processor socket will connect to this slot. However, if a graphic card is inserted into the second x16 slot, the MB reroutes 8 of the 16 lanes from the primary slot to the secondary slot. The end result is that 8 physical lanes connect to each of the 2 slots. Essentially each slot has half the capability. The performance won't be half as the card never really utilized the full 16 lanes in the first place but it will see degraded performance. This appears to mimic your data and the comment.

Quote: If I physically pull one of the cards, or just disable the second card in Device manager, the speed is much improved.

MB manufacturers also play games, it's important to take note of how this was written down in the manual, if slot 2 is populated, both slots operate with 8 lanes. This means that if there is no card in the first slot but only a card in the second slot, that card will operate with 8 lanes! This is the reason MB manuals point out you should always use the primary slot for best performance. With only 1 card inserted in the primary slot, it will operate on a 16 lane connection. Unfortunately this wording also means if you have just one card and insert it into the second slot it will operate on 8 lanes.

Often much of the detail is over the top for a typical consumer so they just buy what appears to be a good deal and accept the performance. Also a common reason why everyone has a "unique" result with a given product performance like PD.

My last post as the thread is drifting away from PD issues and Multi-GPGPU features of PD11 suitable for this forum and more on computer requirements which other forums are much more suited for. I hate it when posts get locked.

Jeff

Julien Pierre

Contributor Joined: Apr 14, 2011 01:34 Messages: 476 Offline

Nov 26, 2012 22:15

Turns out most of the issue were caused by the buggy nVidia drivers version 306.97 .
Once I reverted to 301.42, most of the problems disappeared.

Most notably, by reverting to those drivers, there is now never a slowdown by adding a second card, even a very slow card like the nVidia 8600 GT. I can now mix a 560 Ti and 8600 GT and get the same performance as a standalone 560 Ti . No added performance, though.

I set my 2 9800GT in SLI and non-SLI, and verified that there is no performance improvement from having two cards either way. The numbers are all the same as having one single 9800GT card. And they are much better than what was posted, also. I will refresh the data set with a new link.

Another finding is that PD 11 is significantly slower than 10 on both my systems for many test cases, even with the proper nVidia drivers. I suspect this is another case where AMD CPUs were neglected, and it was optimized for Intel. Hopefully an update will fix it some day. I think the same thing happened with previous versions of PD. I guess the morale is never upgrade to a brand new version of an app. But then again, what else is new ?

Last useful thing to know : during Thanksgiving I bought a GTX 680 . I did some benchmarking with it. The H.264 hardware encoding was exactly the same speed as the GTX 560 Ti. That was quite disappointing to see no improvement in the top of the line chip. I thought others might want to know so they don't waste their money. I returned the GTX 680 to Fry's the next day.

Next, I may give a try to a 7970 card. Perhaps AMD has a better implementation for video encoding. Hopefully their drivers don't have as many regressions as nVidia.

This message was edited 1 time. Last update was at Nov 26. 2012 22:15

MSI X99A Raider
Intel i7-5820k @ 4.4 GHz
32GB DDR4 RAM
Gigabyte nVidia GTX 960 4GB
480 GB Patriot Ignite SSD (boot)
2 x 480 GB Sandisk Ultra II SSD (striped)
6 x 1 TB Samsung 860 SSD (striped)

2 x LG 32UD59-B 32" 4K
Asus PB238 23" HD (portrait)