VP9 is a bit of a paradox: it offers compression well above today’s industry standard for internet video streaming (H.264 – usually created using the opensource encoder x264), and playback is widely supported by today’s generation of mobile devices (Android) and browsers (Chrome, Edge, Opera, Firefox). Yet many companies and people are wary of using VP9. I’ve blogged about the benefits of VP9 (using Google’s encoder, libvpx) before, and I keep hearing some common responses: libvpx is slow, libvpx is blurry, libvpx is optimized for PSNR, libvpx doesn’t look visually better compared to x264 encodes (or more extreme: x264 looks much better!), libvpx doesn’t adhere to target rates. Really, most of what I hear is not so much VP9, but more about libvpx. But this is a significant issue, because libvpx is the only VP9 software encoder available.
To fix this, we wrote an entirely new VP9 encoder, called Eve (“Efficient Video Encoder”). For those too lazy to read the whole post: this VP9 encoder offers 5-10% better compression rates (for broadcast-quality source files) compared to libvpx, while being 10-20% faster at the same time. Compared to x264, it offers 15-20% better compression rates, but is ~5x slower. Its target rate adherence is far superior to libvpx and comparable to x264. Most importantly, these improvements aren’t just in metrics: the resulting files look visually much better than those generated (at the same bitrate) by libvpx and x264. Don’t believe it? Read on!
As software, I used a recent version of Eve, libvpx 1.5.0 and x264 git hash 7599210. For downsampling to 720p/360p and measuring PSNR/SSIM, I used ffmpeg git hash 69e80d6.
As source material for these tests, I used the “4k” test clips from Xiph. These are broadcast-quality source files at 4k resolution (YUV 4:2:0, 4096×2160, 10 bit/component, 60fps). For these tests, since I have limited resources, I downsampled them to 360p (640×360, 8 bit/component, 30 fps) or 720p (1280×720, 8 bit/component, 30 fps) before encoding them.
I did two types of tests: 1-pass CRF (where you set a quality target) and 2-pass VBR (where you set an average bitrate target). For both tests, I measured objective quality (PSNR), effective bitrate and encoding time. For 2-pass VBR, I also measured target bitrate adherence (i.e. difference between actual and target file size). Lastly, I looked at visual quality.
I encoded the 360p test set using recommended 1-pass CRF settings for each encoder. First, let’s look at the PSNR metrics. The table shows bitrate improvement between Eve and libvpx/x264, i.e. “how many percent less (or more) bits does Eve need to accomplish the same PSNR value”. For example, a bitrate improvement of 10% for one clip means that it needs, on average (BD-RATE) over the bitrate spectrum in the graph for that clip, 10% less bits (e.g. 9 bits for Eve instead of 10 bits for the other encoder) to accomplish the same quality (PSNR). The average across all clips in this test set is -12.6% versus libvpx, which means that Eve needs, on average, 12.6% less bits than libvpx to accomplish the same quality (PSNR). Compared to x264, Eve needs 14.1% less bits to accomplish the same quality.
Some people object to using PSNR as a quality metric, so I measured the same files using SSIM as a metric. The results are not fundamentally different: Eve is 8.9% better than libvpx, and 22.5% better than x264. x264 looks a little worse in these tests than in the PSNR tests, and that’s primarily because x264 does significant metric-specific optimizations, which don’t (yet) exist in libvpx or Eve. However, more importantly, this shows that Eve’s quality improvement is independent of the specific metric used.
Lastly, I looked at encoding time. Average encoding time for each encoder depends somewhat on the target quality point. For most bitrate targets, Eve is quite a bit faster than libvpx. Overall, for an average (across all CRF values and test sequences) encoding time of about 1.28 sec/frame, Eve is 0.30 sec/frame faster than libvpx (1.58 sec/frame). At 0.25 sec/frame, x264 is ~5x faster, which is not surprising, since H.264 is a far simpler codec, and x264 a much more mature encoder.
|CRF; 360p||PSNR, Eve vs.||SSIM, Eve vs.||Encoding time (sec/frame)|
I encoded the same 360p sequences again, but instead of specifying a target CRF value, I specified a target bitrate using otherwise recommended settings for each encoder (Eve, vpxenc, x264), and used target bitrate adherence as an additional metric. Again, let’s first look at the objective quality metrics: the table shows results that are not fundamentally different from the CRF results: Eve requires 7.7% less bitrate than libvpx to accomplish the same quality in PSNR. Results for SSIM are not much different: Eve requires 6.6% less bitrate than libvpx to accomplish the same quality. Compared to x264, Eve requires 15.9% (PSNR) or 24.5% (SSIM) less bits to accomplish the same quality.
For an average encoding time of around 1.26 sec/frame, Eve is approximately 0.31 sec/frame faster than libvpx (1.57 sec/frame), which is similar to the CRF results. At 0.20 sec/frame, x264 is again several times faster than either Eve or libvpx, for the same reasons as explained in the CRF section.
|VBR; 360p||PSNR, Eve vs.||SSIM, Eve vs.||Encoding time (sec/frame)|
In terms of target bitrate adherence, Eve and x264 adhere to the target rate much more closely than libvpx does. Expressed as average absolute rate drift, where rate drift is target / actual – 1.0, Eve misses the target rate on average by 2.66%. x264 is almost as good, missing the target rate by 3.83% at default settings. Libvpx is several times farther off, with an average absolute rate drift of 9.48%, which confirms libvpx’ rate adherence concerns I’ve heard from others. Each encoder has options to curtail the rate drift, but enabling this option costs quality. If I curtail libvpx’ rate drift to the same range as x264/Eve (commandline options:
--undershoot-pct=2 --overshoot-pct=2; table below: RRD), it loses another 3.6% in quality, at which point Eve requires 11.3% less bitrate to accomplish the same quality, with a rate drift of 3.33% for libvpx.
|VBR; 360p||PSNR, Eve vs.||Absolute rate drift|
|libvpx||libvpx (RRD)||x264||Eve||libvpx||libvpx (RRD)||x264|
Most people in the US watch video at resolutions much higher than 360p nowadays, so I repeated the VBR tests at 720p to ensure consistency of the results at higher resolutions. Compared to libvpx, Eve needs 5.5% less bits to accomplish the same quality. Compared to x264, Eve needs 20.4% less bits. At 5.09 sec/frame versus 5.52 sec/frame, Eve is 0.43 sec/frame faster than libvpx, with the strongest gains at the low-to-middle bitrate spectrum. At 0.76 sec/frame, x264 is several times faster than either. In terms of bitrate adherence, Eve misses the target rate by 1.82% on average, and x264 by 1.65%. libvpx, at 8.88%, is several times worse. To curtail libvpx’ rate drift to the same range as Eve/x264 (using
--undershoot-pct=2 --overshoot-pct=2), libvpx loses another 2.9%, becoming 8.4% worse than Eve at an average absolute rate drift of 1.50%. Overall, these results are mostly consistent with the 360p results.
|VBR; 720p||PSNR, Eve vs.||Encoding time (sec/frame)||Absolute rate drift (%)|
|libvpx||libvpx (RRD)||x264||Eve||libvpx||x264||Eve||libvpx||libvpx (RRD)||x264|
The most-frequent concern I’ve heard about libvpx concerns visual quality. It usually goes like this: “the metrics for libvpx are better, but x264 _looks_ better!” (Or, at the very least, “libvpx does not look better!”) So, let’s try to look at some of these (equal bitrate/filesize) videos and decide whether we can see actual visual differences. When doing visual comparisons, it should be obvious why effective rate targeting is important, because visually comparing two files of significantly different size is quite meaningless.
For this comparison, I picked three files: one where Eve is far ahead of libvpx (BarScene), one where the two perform relatively equally (BoxingPractice), and one which represents roughly the median across the files in this test set (SquareAndTimelapse). In each case, the difference between Eve and x264 is close to the median. For target rate, I picked values around 200-1000kbps, with visual optimizations (i.e. no
--tune=psnr). Overall, this gives reasonable visual quality and is typical for internet video streaming at this resolution, but at the same time allows easy distinction of visual artifacts between encoders. For higher resolution, you’d use higher bitrates, but the types visual artifacts would not change substantially.
First, BarScene: I encoded the file at 200kbps and picked frame 217 of each encoded file. The coded frame size is 889 bytes (Eve), 1020 bytes (libvpx) and 862 bytes (x264),with total file size of 505kB (Eve), 500kB (libvpx) and 507kB (x264). Full-sized images are clickable. In the close-ups, we see various artifacts:
- Bartender’s face: x264 makes the man’s nose and forehead look like a zombie, because of high-frequency noise at sharp edges. Libvpx has the opposite artifact: it is blurry, which is the most-often heard complaint about this encoder.
- Bartender’s shirt and girl’s sweater: libvpx blurs out most texture in the clothing. x264, on the other hand, has high-frequency noise around the buttons on the bartender’s shirt. Both x264 as well as libvpx manage to make the lemon in the glass disappear.
- Patron’s faces: libvpx is again blurry. x264 is also more blurry than it typically is.
- Bar area: x264 hides the finger of the left-hand (top/right, holding the menu), and adds a dark scar (instead of a faint shadow) to the thumb on the right hand. libvpx changes the color of the drink from orange to yellow, makes straws disappear, and is – surprise! – blurry.
Second, let’s look at SquareAndTimelapse. I encoded the file at 1 mbps and selected frame 101 of each encoded file. The coded frame sizes are 2651 bytes (Eve), 2401 bytes (libvpx) and 3721 bytes (x264), with total file size of 1.27 MB (Eve), 1.30 MB (libvpx) and 1.24 MB (x264). Full-sized images are clickable. In the close-ups, we can again compare visual artifacts:
- Man in black coat and woman in pink sweater: x264 turned the woman’s face green’ish. On the other hand, it maintains most texture in the black coat. Eve maintains almost as much detail in the coat, but libvpx blurs it quite significantly. Libvpx also bleeds the red color from the man’s t-shirt into the hair of the woman in front of him (mid/bottom).
- Man in blue t-shirt and woman in white shirt: libvpx blurs the bottom of the man’s t-shirt, particularly the red portion, which is barely visible anymore. x264, on the other hand, blurs away the woman’s face quite significantly (e.g. her mouth disappears). x264 also again suffers from coloring artifacts in the top/left girl’s neck (which turns gray) and the woman in the bottom/right (whose face turns blue). Also with x264, we again see significant high-frequency artifacts in what used to be a shoulderbag in the person to the top/right.
- Red backpack: libvpx combines two recurring artifacts here – blur and color bleed – at the bottom/right edge of the backpack, where the red backpack bleeds into neighbouring objects. x264 does the opposite, and replaces the red color in the bottom/right corner of the backpack with a green patch that seems to come out of nowhere.
Lastly, let’s look at BoxingPractice. I encoded the file at 1 mbps and selected frame 86 of each encoded file. The coded frame sizes are 3060 (Eve), 2785 (libvpx) and 2171 bytes (x264), with total file size of 509 kB (Eve), 481 kB (libvpx) and 513 kB (x264). Full-sized images are clickable. In the close-ups, we can again compare visual artifacts:
- Man with red gloves: in x264, we see the boxing glove color bleeding through into the man’s face. The high frequency noise is also abundantly present, particularly around his left hand’s boxing glove. And although all three encoders suffer significantly from blurring artifacts, libvpx is still by far the worst.
- Man with blue gloves: the x264 file shows more high-frequency noise artifacts on the right shoulder area, and a bright red patch coming out of nothing on the left. And libvpx is this time much more blurry than either of the other two encodes, and also loses the red spot on the base of the glove. The man’s facial color is not well maintained by any of the encoders, unfortunately.
- Foreground boxer: x264 has more high-frequency noise artifact just under the man’s nose. Libvpx, on the other hand, is once again blurry, and loses significantly more color in the man’s face.
Overall, we start seeing a pattern in these artifacts: at comparable file sizes and frame sizes, compared to Eve, libvpx is blurry, and x264 suffers from high-frequency noise artifacts at sharp edges and has issues with skin textures. Both x264/libvpx also have significantly more color-artifacts compared to Eve: x264 tends to lose color and libvpx often bleeds colors. Eve – although obviously not perfect – looks visually much more pleasing, at the same frame size and file size.
Eve is a world-class VP9 encoder that fixes some of the key issues people have complained about with libvpx. Here, I tested the encoder at 360p and 720p using broadcast-style settings, where one encoded file is streamed many, many times, and therefore slow encoding times (1 sec/frame) are acceptable. At these tested CRF/VBR settings, Eve:
- provides better quality metrics than libvpx (5-10% bitrate reduction) and x264 (~15-20% bitrate reduction)
- provides better visual results than libvpx/x264
- is faster than libvpx (10-20%), but slower than x264 (~5x)
- has better target rate adherence than libvpx, and has comparable target rate adherence with x264. To get libvpx at the same target rate adherence, it loses another ~2-3% in quality metrics compared to Eve.
At Two Orioles, we are working to further improve Eve’s quality and speed every day, and lots of work can still be done (e.g.: faster encoding modes, multi-threading). At the same time, we would love to help you use VP9 for internet video streaming. Do you stream lots of video, and are you interested in trying out VP9 or improving your VP9 pipeline using Eve? Contact us, or see our website for more information.
Shouldn’t Eve be spelled ‘EVE’? You have links to libvpx, x264 and ffmpeg sources but not to Eve/EVE. Would you mind adding a link where to download it. That’s assuming that Eve/EVE is Open Source. Thanks.
Does it support multithreading encoding better than libvpx? libvpx 1.5 here can barelly use more than one thread
Can we test it? Will this be FOSS?
@Braulio: multi-threading is planned, but not yet finished.
@krs: it is currently not available as opensource. We are hoping to test it with customers who are willing to pay for it, so we can build a business around it that can sustain its development and pay the rent etc.
AM1 will be the superior codec & open source.
It is AV1, not AM1 – and it is just format derived from VP10 (which is polished VP9). Eve could in the future be evolved to encode it too – hopefully, because the reference encoder for AV1 is a development of libvpx (ouch). In any case, AV1 is at least a year from bitstream specification, and experience shows that after that you need 2-3 more years to produce a usable encoder. That’s a lot of time so my good advice is, don’t worry too much about AV1 at this point, it is a waste of time to “wait and look forward”. Fun anecdote: I have been looking forward to Daala since 2010 and where is it? 🙂
Of course, it is a pity the encoder won’t be public and open soure, but it is probably a financial necessity. As you can see even Google that has incredibly deep pockets has neglected libvpx – it has worthless rate control and almost worthless threading (it only uses tiles and the number of threads is limited to one per 512 pixels of width – meaning that for 1920×1080, you can at most use three threads…. one and half a Xeon core).
Apparently being open source (like libvpx is) can’t currently sustain the needed development and some commercial style funding through old-school sales is needed at least to start it. I would also like to play with this Eve, but such is life. VP9 as a format should benefit from having a better encoder, because the availability of something better than the sucky reference was sorely needed.
Is it suitable for real time video on mobile?
Volkan: mobile devices typically use hardware encoders/decoders. Can this encoder create video that can be decoded in hardware by mobile devices? Yes. Will this encoder encode video in realtime on mobile devices? No, not right now.
No offence, but you cannot really persuade readers without giving us some samples to inspect. There have been tons of similar articles claiming some encoders work better than x264, both benchmark-wise and quality-wise.
TONS OF THEM.
Can you say something about what speed preset you used for x264? I saw some indications from others that it could be the “veryslow” preset, and 5x the encoding time of veryslow (with no support for multithreading, to boot) would be very, very slow indeed. For most users, comparing against “medium” would be a more realistic target for bitrate/CPU tradeoff.
Who cares? Seriously, it’s a nice academic exercise, but of zero professional and/or commercial value in the world moving to H.265.
H.264 will never be dethroned in its space. That war is over and so is the H.265 war.
How does this compare to the output quality of HEVC encoders?
[WORDPRESS HASHCASH] The poster sent us ‘0 which is not a hashcash value.
So…where can I get EVE at, yo?
Daala > AV1
It is unclear what tune settings were used in the PSNR/SSIM tests (in general, no settings are specified at all – “recommended settings” does not tell much).
I assume all the tests are 8-bit only. It could be interesting to test higher bit depth profiles too (I guess Eve doesn’t support them yet).
Also, just saying the obvious here, but some sort of “community” release (under a restrictive license that forbids commercial use, even in binary-only form) would be appreciated. There’s little point on telling us how awesome it is if we can’t use it!
You assume I didn’t know, but anyway, that is kind of the point… where is AV1? Few years away.
Maybe they will have more luck with meeting their roadmap milestones than Xiph did, but even if they do freeze the bitstream in twelve months from now, it will need 2-3 more years to reach maturity on the side of encoders. By that time, I might not even be alive, much less still be in search of compression format and encoder. In practice the AV1 codec is currently irrelevant.
One way to provide a trial version for general quality testing would be to release a build without assembly code compiled in. That way everybody could test it except for speed/performance. You could probably give some rough benchmark guidance ratios so that the user could estimate what speed could he expect with the full software.
My ultimate Target at work is actually to find a solution that can encode our UHD and HD video source camera files from like 100Gb to like 100MB (i.e quality ratio 1000:1), without any quality loss.
Please do you see this visible with either v9 or any other lossless technology? Should it not be a visible target, what quality ration do you consider visible for me
Anticipate your kind reply.
Where is the download link to an ffmpeg build with EVE linked into?
Is the development mainly in these two:
Which of them will become AV1?
Great Development! Wish you be successful!
What about contacting Google and kindly asking them to fund the project so that it can be open sourced.
We would to test your product with our church video archive on a variety of inputs.
Pastor, programmer in free time, video technologies enthusiast