What difference does hardware encoding make?

JeffK.Boston

Newbie Joined: Sep 14, 2010 05:40 Messages: 26 Offline

Solved by

Mar 13, 2020 12:14

I was able to run tests using what may be the perfect environment to answer this question...

Dell Precision T3420 Intel Xeon E3-1245 V5 3.5ghz (4 Core 8 Logical Processors) - 64GB - dual PCIe M2 Drives - Integrated Intel Graphics

Dell Precision T7910 Xeon E5-2667 V3 3.2GHz (8 Core 16 Logical Processors) - 64GB - dual PCIe M2 Drives - GeFore RTX 2060 Graphics

In each above PD365 wasn't going to be limited by CPU speed or Cores (nowhere near 100% CPU usage). Nor was it going to be limited by disk speed (PCIe M2 Drives are the fastest you can get for a workstation)

That means there wouldn't be another limit impacting performance when I tested with and without hardware encoding.

Using a complex project I had (4 1920x1080 40bps input streams with key frames on output) the following results happen...

Production time on a T3420 without hardware encoding 46:23 and with it 33:43

Production time on a T7910 without hardware encoding 45:10 and with it 31:32

FYI - The output file was 36:52 in length which in best case it was producing about 1 minute of video each for every 51 seconds of production time

In both cases the source files were NOT on the system disk because that will slow you down about 1/2 of a percent

So when hardware encoding is the only performance factor was in these test it takes about 1.5 times longer to produce a video without using hardware encoding

Now don't assume that is a universal rule i.e. you pop in a graphics card that can do hardware encoding and your production time will be 2/3's for what it was. If your CPU speed, number of cores, or disk speed are issues they like will wash out some of that performance improvement...

This message was edited 1 time. Last update was at Mar 14. 2020 07:58

PepsiMan

Senior Contributor Location: Clarksville, TN Joined: Dec 29, 2010 01:20 Messages: 1054 Offline

Mar 13, 2020 15:14

thanks for sharing your little test.
i'm assuming FHD AVC .MP4 container. can you do an another test? change AVC to HEVC same resolution(FHD) and same container. also info on the driver versions(os, mobo, cpu, gpu, mem speed, etc).

we're in a 8K time; however, some people are still stuck in HD wormhole. i'm at a 4K blackhole gravity. ^^

happy happy joy joy

PepsiMan
'garbage in garbage out'

This message was edited 1 time. Last update was at Mar 13. 2020 15:19

'no bridge too far'

Yashica Electro 8 LD-6 Super 8mm
Asrock TaiChi X470, AMD R7 2700X, W7P 64, MSI GTX1060 6GB, Corsair 16GB/RAM
Dell XPS L702X i7-2860QM, W7P / W10P 64, Intel HD3000/nVidia GT 550M 1GB, Micron 16GB/RAM
Samsung Galaxy Note3/NX1

pmikep

Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline

Mar 13, 2020 18:34

Thanks for the testing.

I've done similar tests.

In my testing, I have found that slightly different flavors of video sizes can make quite a difference.

For example, you worked on 1920 x 1040. (Is that a European standard? You didn't state the frame rate.) I have found that little changes like 1920 to 1080 can make a difference. (I suspect that the algorithims in the hardware encoders are designed for/optimized for "standard" formats. Even for small changes, like from 29.97 to 30.00 fps.) Sometimes I find that pointed PD to different encoders (using the Windows Graphic Setting) can make a difference, DEPENDING on the file size. (Sometimes 4K does better on Intel' UHD, sometimes something else is better on Nvidia's Turin.)

As for the close to 1:1 encode ratio, I am also finding that, when encoding videos which have been touched (such that SVRT can't just pass them to the output), 1:1 seems to be a reasonable expectation. At least for me, who's still in the stone age doing HD video resolution. (I'm not convinced that, once past a certain point, bigger is better.)

tomasc

Senior Contributor Joined: Aug 25, 2011 12:33 Messages: 6464 Offline

Mar 13, 2020 18:41

The Intel Xeon E3-1245 V5 Integrated graphics has both hardware decode and hardware encode for both 4k hevc and avc. It does seem to match the Nvidia card for 2k encoding performance in your tests. The hevc 4k produce may show a 2 to 1 better performance with the nvidia card if you test it. I used 1 minute tests of avc 1080p60p and hevc 2160p60 on mine. PepsiMan has the right idea for quick tests and quick results
.

This message was edited 1 time. Last update was at Mar 13. 2020 18:46

SoNic67

Senior Contributor Joined: Sep 27, 2014 14:14 Messages: 1308 Offline

Mar 14, 2020 08:12

Quote
we're in a 8K time; however, some people are still stuck in HD wormhole. i'm at a 4K blackhole gravity. ^^

I agree that 4k TV are becoming more prevalent, I have a 4K monitor and 4K HDR/DolbyVision TV. My guess is that the next step will be 3D HDR 4K, like it was for 3D HD (yes, they will sell it as reason to upgrade again our TV's).
About 8K?I don't see the reason to have one.
I have a big enough TV now (65") and I can't see, from my couch, all the details that I see closer.I am at the limits of my eyesight resolving power.
A bigger TV? Won't fit easily on my wall and would be like sitting in the first row at theater or gaming too close of a big screen - I basically would see only a crop of the image at a time, would need to move the eyes around too much to see all the action.

PS: It's ridiculous that, in year 2020, when all TV's are basically big compters/tablets, some people still think that they need to capture their videos at 50FPS. Because that how it was done in an analog color standard established in 1967 (PAL), based on a format that was first started in 1948 in Russia (B&W, 625 lines, 50Hz). Based on the power lines frequecy (that was used for syncro) standard of 50Hz, established around 1900.

Analog formates that are dead now and everyone has flat panels that can do 60FPS as a minimum (some do 120FPS or more).

This message was edited 1 time. Last update was at Mar 14. 2020 08:13

SoNic67

Senior Contributor Joined: Sep 27, 2014 14:14 Messages: 1308 Offline

Mar 14, 2020 08:33

Quote
In each above PD365 wasn't going to be limited by CPU speed or Cores (nowhere near 100% CPU usage). Nor was it going to be limited by disk speed (PCIe M2 Drives are the fastest you can get for a workstation)

At some point there are other bottlenecks that come into play. In my tests it looks more like latency in software and probably on the buses.

The use of SSD for video editing is a complete waste of money, the encoding speed is way below what a modern HHD can do, even at 4K.Check with Windows "Resource Monitor" and you'll see the HDD usage.
Maybe one HDD for sources and one for results is a good start, to minimize the seek times and the fragmentation... Personally I have 3 HDD on a hardware RAID 5, for redundancy.
The GPU encoding versus CPU encoding, tested in the conditions of doing just that task (encoding from one format to another) is not a very relevant task. The moment you add any effects or color corrections, the CPU will be used more, and the software encoding performance will start to drop. That's when you want the GPU to perform those encoding tasks.

Now, it's true that some of Intel GPUs have encoders too. But their higher performance CPU's usually don't have the GPU integrated, to stay in the thermal limits of the die (TDP). Eliminating a source of heat from the die, leaves more headroom for the CPU heat. Usually that means higher frequency (or cores).

My Dell Precision T7610 has two E5-2667 V2 @3.30 GHz (8 cores/16 HT). Each is rated for a TDP of 130W, to the top of capabilities for that package. Other models with lower core count/frequency are rated 110, 95, 80W... in those Intel could theoretically add the GPU. But... is that really needed?

This message was edited 2 times. Last update was at Mar 14. 2020 08:37

JeffK.Boston

Newbie Joined: Sep 14, 2010 05:40 Messages: 26 Offline

Mar 16, 2020 03:22

>

Quote

> The use of SSD for video editing is a complete waste of money, the encoding speed is way below what a modern HHD can do, even at 4K.Check with Windows "Resource Monitor" and you'll see the HDD usage.

On that we will disagree - I have seen up to a 5% different between sourcing from a HDD vs a SDD. However for SATA SSD vs PCIe SSD there was no difference. While you are using internal RAID I am not. Given the cost of Samsung 970 Pro PCIe NVM disks are south of $100 now the cost isn't high anymore. The only thing I use the SDD's is the current project I am working. I archive 2 copies of any finished project on HDD's using a RocketStor 3122B

I did a lot of testing of various hardware combinations

This message was edited 1 time. Last update was at Mar 16. 2020 03:35

JeffK.Boston

Newbie Joined: Sep 14, 2010 05:40 Messages: 26 Offline

Mar 16, 2020 03:24

Quote Thanks for the testing.

> 1920 x 1040. (Is that a European standard

Nope - it is a typo ;

This message was edited 2 times. Last update was at Mar 16. 2020 03:34

SoNic67

Senior Contributor Joined: Sep 27, 2014 14:14 Messages: 1308 Offline

Mar 16, 2020 05:26

Quote
On that we will disagree - I have seen up to a 5% different between sourcing from a HDD vs a SDD.s

Only if you are trying to put back the file on the same HDD. That would make the HDD heads seek between the two locations.
If you read from one HDD and write on another, this seek time is minimized a lot.
The RAID usually doubles that HDD speed (if is done in actual hardware, not software by the system CPU).

Write cycles on SSD's are limited. That's how they are actually rated - the number of TB that can be written on them before failure. That's why they are not so great for video editing.

In my experience, the software has latencies that still are to be tackeld. In my edits, the CPU is not saturated (usage at 100%), the GPU is not saturated, memory has plenty of free space, HDD speed is less than 30% of being maxed out... So something else is slowing down the editing process.
Probably the transfer speed and latency of the CPU to RAM controller and the CPU to GPU (PCIe) interface.
A similar issue happens in gaming and DirectX 12 was supposed to help reducing that (I am not sure what PD uses as video interface betwen CPU and GPU).
https://appuals.com/how-to-enable-ultra-low-latency-mode-for-nvidia-graphics/

Caching in fast local memory, that is usually the way of dealing with latencies, is not effective when dealing with large files (like video files). So all that "smart" caching in CPU's and GPU's fast internal memory is not helping.

This message was edited 2 times. Last update was at Mar 16. 2020 05:34

SoNic67

Senior Contributor Joined: Sep 27, 2014 14:14 Messages: 1308 Offline

Mar 16, 2020 05:39

I found a cool software that measures the system/kernel latencies:

https://resplendence.com/latencymon

pmikep

Senior Member Joined: Nov 26, 2016 22:51 Messages: 285 Offline

Mar 16, 2020 16:18

The guy who develped the FLIP Fluids simulation for Blender explains in a FAQ that the reason that people don't see 100% CPU usage during simulations is because simulations are memory intensive. So apparently moving stuff in and out of RAM is one bottleneck, as SoNic67 suggests. (I wonder if Optane would make a difference in this instance?)

This message was edited 1 time. Last update was at Mar 17. 2020 00:10

SoNic67

Senior Contributor Joined: Sep 27, 2014 14:14 Messages: 1308 Offline

Mar 16, 2020 18:37

Well, those are my results with the latency tester that I linked above.


Filename	latency.PNG	Download
Description
Filesize	401 Kbytes
Downloaded:	3 time(s)

This message was edited 1 time. Last update was at Mar 16. 2020 18:40

JeffK.Boston

Newbie Joined: Sep 14, 2010 05:40 Messages: 26 Offline

Mar 22, 2020 11:40

Quote I

Using a complex project I had (4 1920x1080 40bps input streams with key frames on output) the following results happen...

Production time on a T3420 without hardware encoding 46:23 and with it 33:43

Production time on a T7910 without hardware encoding 45:10 and with it 31:32

I added a second CPU to the T7910 workstation. WIth hardware encoding the encoding time was 27:24. I ran the test several times to make sure. Within a second the same each time. Not sure why

Differences in hardware...

2 vs 1 Xeon E5-2667 V3 3.2 GHz

16 Cores (32 Logical Processors) vs 8 Cores (16 Logical Processors)

.5MB vs 1 MB L1 Cache

4 MB vs 2 MB L2 Cache

40 MB vs 20 MB L3 Cache

6 vs 2 Memory Channels (added four 16GB RDIMMs to the existing two 32GB RDIMMs)

This message was edited 3 times. Last update was at Mar 22. 2020 12:17

SoNic67

Senior Contributor Joined: Sep 27, 2014 14:14 Messages: 1308 Offline

Mar 22, 2020 13:23

Quote

I added a second CPU to the T7910 workstation. WIth hardware encoding the encoding time was 27:24. I ran the test several times to make sure. Within a second the same each time. Not sure why

Open task manager, and then hit "Performance" tab. You can see the CPU usage and GPU usage during editing.
If you are doing only encoding, the CPU will not be used that much. If you apply any effects (LUT for example), it might change things.

JeffK.Boston

Newbie Joined: Sep 14, 2010 05:40 Messages: 26 Offline

Mar 22, 2020 14:30

Quote

Open task manager, and then hit "Performance" tab. You can see the CPU usage and GPU usage during editing.
If you are doing only encoding, the CPU will not be used that much. If you apply any effects (LUT for example), it might change things.

33% usage CPU on the two XEONs - 66% CPU usage on the single XEON

37% GPU usage on the two XEON's - 41% GPU usage on the single XEON (peek in bioth cases)

somehow I knew you would ask for the above

and...

6% memory usage on the two XEONs (128GB) - 15% memory usage on the single XEON

JL_JL

Senior Contributor Location: Arizona, USA Joined: Oct 01, 2006 20:01 Messages: 6091 Offline

Mar 22, 2020 15:17

Quote

33% usage CPU on the two XEONs - 66% CPU usage on the single XEON

37% GPU usage on the two XEON's - 41% GPU usage on the single XEON (peek in bioth cases)

somehow I knew you would ask for the above

and...

6% memory usage on the two XEONs (128GB) - 15% memory usage on the single XEON

Check pref, you are probably doing CPU decoding while hardware GPU encoding. Stats would have indicated too if the case.

Jeff

JeffK.Boston

Newbie Joined: Sep 14, 2010 05:40 Messages: 26 Offline

Mar 22, 2020 15:30

Quote

Check pref, you are probably doing CPU decoding while hardware GPU encoding. Stats would have indicated too if the case.

Jeff

2 screenshots attached


Filename	GPU.jpg	Download
Description
Filesize	295 Kbytes
Downloaded:	4 time(s)


Filename	CPU.jpg	Download
Description
Filesize	177 Kbytes
Downloaded:	2 time(s)

SoNic67

Senior Contributor Joined: Sep 27, 2014 14:14 Messages: 1308 Offline

Mar 22, 2020 17:29

Quote
2 screenshots attached

^This is a nicer way to link the pics.

Like I said above, the PD hits other bottlenecks, to me looks like latency.
I have also two (2) 8 core CPU's, plus a GTX1080 and 64GB RAM. I can see that none of them are maximized during editing.

This message was edited 4 times. Last update was at Mar 22. 2020 17:34

JeffK.Boston

Newbie Joined: Sep 14, 2010 05:40 Messages: 26 Offline

Mar 23, 2020 07:51

Quote
Like I said above, the PD hits other bottlenecks, to me looks like latency.

I agree which is why I was little surprised there was a 15% increase by adding the second CPU. Given with one CPU it wasn't using a 1/3 of the CPU. The only thing I could think of is the second CPU has it's own access to a seprate pool of memory which means memory bandwidth went up...

SoNic67

Senior Contributor Joined: Sep 27, 2014 14:14 Messages: 1308 Offline

Mar 23, 2020 08:48

Yeah, memory bandwidth helps, but more importanly the system's memory acccess latency goes down with the number of channels used to access it.

I have 64GB in my system now, not because I needed that much memory, but because I wanted 4+4 channels (each CPU has a Quad controller) to access my DDR3 memory (T7610 here, with two E5-2667 V2). So I have installed 8 sticks of 8 MB in my machine (of 16 slots total), 4 sticks for each CPU (to use their Quad channels)

Sure it's not a linear dependency, due to other bottlenecks (like the L3 level CPU, or the two QPI links, or PCI-Express 3.0 latency moving data between CPU and GPU).

PS: You really need to check the recommended Dell fill for your memory slots, especially in your case with un-equal sticks. Best way to go is to fill all slots equaly.
One of your CPU's has memory in Dual channel, the other in Quad. There will be times when the Quad ch CPU will wait for a result that is in the Dual ch CPU memory.

This message was edited 7 times. Last update was at Mar 23. 2020 09:12