Uncompressed (e.g. decoded) video frames are almost universally structured in planar YUV arrays containing lines of pixels. These YUV pixels will then be drawn on-screen, e.g. using a GL shader to do the YUV-to-RGB conversion. It sounds so simple, right? All you’d need is a 3×3 matrix containing the YUV-to-RGB matrix coefficients. The colorspace coefficients for YUV-to-RGB conversion depend on the colorspace of the video, e.g. something like Bt-601/709/2020 or SMPTE-170M/240M. Modern video codecs (such as H.264, HEVC and VP9) signal the colorspace in their bitstream header; VP9-in-ISO also allows signaling it in the container (vpcc atom). This should be simple.
Unfortunately, it’s not that simple. A fundamental problem is that RGB, like YUV, is device-dependent, i.e. it has a color matrix associated with it. The RGB color matrix (and transfer coefficients) define how particular pixel values in RGB (or YUV) are converted into photons beaming from your display device. The problem isn’t so much the signaling – it works just like colorspace signaling; the problem is what to do with that information.
First, what are color matrix coefficients and transfer characteristics? Let’s move back one step. Is there a universal (device-independent) way of specifying how many photons a pixel value should correspond to? It turns out there is! This is called the XYZ colorspace; the Y component is analogous to luminance (similar’ish to the Y component in YUV), and the X/Z components contain the chroma information – where Z is “quasi-equal” to the S-cone (blue) response in our eye. Conversion between XYZ and RGB is similar to conversion between YUV and RGB, in that it takes a 3×3 matrix with colormatrix coefficients. However, this isn’t regular RGB, but linearized RGB, which means that the increase in pixel component values correlates linearly with an increase in photon count. RGB pixel values from images or videos (whether coded directly as RGB or converted from YUV) are gamma-corrected. Gamma correction essentially increases the resolution of pixel values near zero, which is useful because the human eye is more sensitive to dark than to light values. In relevant standards, gamma correction is typically defined using transfer characteristics. The gamma-corrected RGB can then be converted to YUV after applying the gamma and then using the typical colorspace coefficients.
Why is this relevant? Imagine you have a digital version of a video file using Bt-2020 (“UHDTV”) colorspace. Let’s say I load these decoded YUV pixels in memory in my home computer and decide to display them, but my computer device only supports Bt-709 (“HDTV”). Will the colors display correctly? (Note that computer screens typically use the sRGB colorspace, which uses the same color matrix coefficients as Bt-709.) Let’s look at the color diagram:
Imagine a pixel at the corner of this color spectrum, e.g. a pixel with RGB values of R=1.0,G=0.0 and B=0.0. Will that pixel display in identical ways on HDTV and UHDTV devices? Of course not! Therefore, if the content was intended to be displayed on a HDTV device, but is instead displayed on an UHDTV device without additional conversion, the colors will be off. All of this without even starting to look at YUV/RGB conversion coefficients, which are also different for each colorspace. In the worst case, you get this:
The left color is the Bt-2020 source image displayed as if it were Bt-709 data (or: “on a Bt-709/HDTV/sRGB device”). The right image is the inverse. The middle image is correct. It shows the importance of indicating the correct colorspace in video files, and correctly converting colorspaces when the target display device doesn’t support the source data’s colorspace.
Your UHDTV at home may actually do the right thing, but that’s largely because the ecosystem is almost entirely closed. This is completely different from … the web! And video colorspace support on the web is, unfortunately, a mess, if not just outright ignored. Not to mention that lots of video files don’t identify the colorspace of the YUV data inside it. But if they did…
It’s perhaps not realistic to expect all browsers to support all colorspaces. It’s easier to just stream them the data that they support, and convert it while you’re processing the file anyway, e.g. while you’re encoding it as a streaming service. For this purpose, we wrote the colorspace filter in FFmpeg. The idea is simple: it will convert YUV data in any colorspace to YUV data in any other colorspace. The simplest way to use this filter is to convert data from whatever the input is to Bt-709 before encoding it to the streamable format eventually sent to browsers (or mobile devices), since Bt-709 appears to the be the only format universally (and correctly) supported by mainstream browsers. But it could also be used for other purposes. For example, Clément Bœsch suggested that we use the colorspace filter as a generator for the lut3d filter, which would greatly improve performance of the colorspace conversion. I’m hoping he’ll write a tutorial on how to do that!
You may remember 20 years ago, we’d have to download Quicktime for one website or RealMedia player for another website, to be served small stamp-sized videos in players that displayed ads bigger than the video itself. We’ve come a long way, overcoming Flash, with VP9 or H.264 as universal web formats integrated in our browsers and mobile devices. Now that the problem of video playback on the web is being solved, let’s step up from displaying anything at all to displaying colors as they were intended to be.