1748540 - Sometime image captured from canvas is not equal to the displayed video frame

Reporter

Description

•

3 years ago

See (2) on comment1 to see the issue.

The image captured from canvas is not equal to the displayed video frame, the example can be seen on the video in comment2.

Alastor Wu [:alwu]

Reporter

Comment 1

•

3 years ago

Attached video reftest-different-image.mp4 (deleted) — Details

There are a couple things worth to mention. My testing are done in the latest Nightly (610170:3762abb6ee06) and on Ubuntu 20.04. Also, worth to note, I'm using software web render. Here is my about:support.

(1) The reference PNG file is not a good reference

If you take a closer look in those videos, you can see the black lines between each color region are actually thicker than those lines in PNG file. That is why we always need to fuzzy those results, because there are already certain amount of differences between test files and the reference PNG. In addition, in term of current method (comparing two different videos), as each video codec uses different compression algorithm, we also can't expect all their first frame are exactly equal.

Encoding will result some loss so that's expected. The better way should be to create seperate image for each video, not use the original PNG file for all videos.

(2) Color difference on canvas

Because of (1), I was trying to create PNG files for each videos. I used this file to capture the first frame of the video, which should ensure the image we captured is exactly equal to the video.

However, I found that if I capture the image via canvas by using following code (I use this part of codes to replace printing base64), the image I captured from canvas will be different from the video itself. The captured image doesn't show in full range of color.

  var img = canvas.toDataURL("image/png");
  document.write('<img src="'+img+'"/>');

Is the canvas not supporting full range of color? I also suspect that this is the same issue of what we saw in this, which is the reference file wasn't showing in full range of color. I don't know how reftest captures the image from a video, but if it uses the same mechanism of canvas, then it's possible that the captured image is actually different from the displayed video frame.

Markus, do you have any idea about (2)? Thank you.

Flags: needinfo?(mstange.moz)

Alastor Wu [:alwu]

Reporter

Comment 2

•

3 years ago

Hmm not sure why we can't play that video, so I uploaded that video again.

Alastor Wu [:alwu]

Reporter

Comment 3

•

3 years ago

In addition, when I ran these tests on my local build based on the latest central, some of those tests already failed.

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 23 (23 pass, 0 load only)
REFTEST INFO | Unexpected: 5 (5 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 1 (0 known fail, 0 known asserts, 0 random, 1 skipped, 0 slow)
REFTEST SUITE-END | Shutdown

Markus Stange [:mstange]

Comment 4

•

3 years ago

(In reply to Alastor Wu [:alwu] from comment #1)

However, I found that if I capture the image via canvas by using following code (I use this part of codes to replace printing base64), the image I captured from canvas will be different from the video itself. The captured image doesn't show in full range of color.

Hmm, I'm not completely sure how full range videos are handled. Canvas is definitely constrained to the sRGB color space for now. I think on macOS, canvas and video currently have the same color range available, because we present everything as sRGB. But on other platforms, color management is handled differently - we transform everything to the device color space, and as a result, videos can address a larger range of colors.
I'm not sure how to export PNG reference files from videos in such a way that the PNG file has the same color range as the video.
And even if you could, the reftest harness still compares everything by drawing it to a canvas, so the comparison is tied to the canvas color space.

Jeff, have you thought about this before?

Flags: needinfo?(mstange.moz) → needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Comment 5

•

3 years ago

From what I recall all of our test infrastructure runs in sRGB. I think for now we can continue to just assume everything is sRGB. Does that solve your concern?

Flags: needinfo?(jmuizelaar)

Alastor Wu [:alwu]

Reporter

Comment 6

•

3 years ago

Does sRGB also support showing color in full range? My question is that, now sometime the image showing in the canvas would be different from what a video frame actually looks like, where video shows in full range of color, the image captured from canvas only showw in limited range of color.

If this is the fact of how current reftest framework works, does that means we can't use reftest to check full range of color? because in some cases it couldn't reflect the actual video frame, if the video is showing in the device color space, which is not sRGB.

If that is the fact that we couldn't change, how would we compare full range of color well on those cases? Now I encounter a case in the try-server (only on Windows R-sw) which I think shouldn't be a blocker for landing my change for ffmpeg, if this problem exists (which I suspect might be the same issue) prior my patches.

Thank you.

Flags: needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Comment 7

•

3 years ago

From what I understand, full range just refers to whether the video uses 16-235 vs 0-255 for pixel values, the actual color space of the video is a different thing. We're pretty sloppy in how we decide what color space the video is in, but for the most part it should be contained in sRGB.

Flags: needinfo?(jmuizelaar)

Alastor Wu [:alwu]

Reporter

Comment 8

•

3 years ago

In this case, search 720p.png.bt709.bt709.pc.yuv420p.vp9.webm, then you can see video could show difference between 235&255 and 0&16, but 720p.png.bt709.bt709.pc.yuv420p.av1.webm couldn't. And I know these two are both videos, but if the way of comparing two videos in a reftest is capturing them in a canvas, then is it possible that the result showing for av1 is not actually what AV1 video looks like?

Although I don't know why the image for vp9 could show the correct color, AV1 couldn't. I suspect that there might be some bug in the reftest? The reftest result above were tested for my ffmpeg patches, which won't affect AV1 (dav1decoder), so I don't think my patch would affect AV1's result.

most part it should be contained in sRGB.

So you mean sRGB should be able to display both (235,235,235) and (255,255,255) in different color? If so, then back to my intiail question, why the image I captured in canvas couldn't show such color? It couldn't display the difference between 235 and 255, which pixels all look the same.

Thank you!

Flags: needinfo?(jmuizelaar)

Kelsey Gilbert [:jgilbert]

Comment 9

•

3 years ago

(In reply to Alastor Wu [:alwu] from comment #0)

Per this, currently we compare AV1 with PNG first, then compare other codecs with AV1.

It would be more stable if we can only compare AV1 with PNG once, and then compare other codecs to PNG file, not comparing two different videos directly. Because that ensures the reference file is always correct, won't be interfered by any unexpected decoding error.

This is what I did originally, but it's way, way more work to mark all the tests, rather than mark fuzziness for video vs reference, and then compare that video (with its decoder-specific artifacts) with other video from the same decoder.

Since each (decoder, webrender-backend) tuple has its own artifacts, it's not really viable to make a png gold-standard for what the frames should look like, or at least doing that work doesn't scale well to the number of videos we need.
Because each tuple tends to have its own artifacts, it's not surprising to me that you ran into failures when running the tests locally!

We expect to handle full-range video correctly ever since I fixed/added support for them originally in bug 1459526, though bug 1716093 tracks a remaining narrow case with (iirc) swgl+dcomp not being handled correctly.

Like others mentioned, narrow-range just means that y'=0.0 at y=16, but y still definitely goes from 0.0-1.0, same as full-range. (also worth noting that most content is encoded as narrow-range, since that's what media/broadcast uses)
What that means is that narrow-vs-full is sort of orthogonal from colorspace colors, and is just an encoding detail. Colorspace standards like bt709 (the normal one for video, same gamut as srgb) and bt2020/bt2100 (much, much wider gamut than srgb) both have narrow and full range encodings.

For your question about canvas, it's very possible that canvas doesn't handle both narrow and full range video inputs properly, and that that would be another bug for us to fix. (and add more reftests for)

Kelsey Gilbert [:jgilbert]

Comment 10

•

3 years ago

Put another way, bt709_yuv_narrow(235,240,240) is exactly the same color as bt709_yuv_narrow(255,255,255).

Alastor Wu [:alwu]

Reporter

Comment 11

•

3 years ago

Thank you for your detailed explanation!

I think the failure I encountered in the try server was same as the narrow case you mentioned, which was using sw web render (swgl?) and on Windows (D3D11). It might be related with canvas, which couldn't capture the correct image to compare, so the captured av1 image couldn't display the detail under 16 and above 235.

Since each (decoder, webrender-backend) tuple has its own artifacts, it's not really viable to make a png gold-standard for what the frames should look like, or at least doing that work doesn't scale well to the number of videos we need. Because each tuple tends to have its own artifacts, it's not surprising to me that you ran into failures when running the tests locally!

If the canvas issue gets fixed, then we can use the way (2) I mentioned in comment1 to generate different PNG files for different videos, not use one PNG for all videos. That should suppose to reduce the artifact between different decoders, and make tests more robust, right?

Also, could you also help me review this patch, or provide any suggestion on that? Because from the discussion so far, it seems to me that the incorrect AV1 reference image was caused by canvas. My patches seems unexpectedly fixing the result for h264 and vp9, which causes the comparison difference between their results and AV1's.

Thank you so much.

Flags: needinfo?(jmuizelaar) → needinfo?(jgilbert)

Kelsey Gilbert [:jgilbert]

Comment 12

•

3 years ago

If definitely sounds like canvas didn't receive the right colors for the video, if you saw color clipping at all. We should file a bug about this, but it sounds mostly unrelated to video reftest playback. (except in relation to your proposal to make per-video reference images)

I choose av1 as our reference (in part) because it's software decoded, and thus most reliable. (or at least in practice that's what I found)
I'll help review the reftest marking changes you're looking to make.

I sort of don't want to have to track pngs for all combinations of (decoder, wr-backend, codec, params), and that's why I deliberately chose to use a single PNG source, and (per configuration, per backend, per params) first check one codec then compare other decoding paths on that configuration to that example decoded video.

The point of failure here would be if AV1 didn't work right, but it's probably our most reliable one.
In a case where AV1 didn't work, we would probably instead compare e.g. VP9 to the gold-standard PNG instead of to AV1.

What I want to avoid is having to regenerate reference images. Ideally, we would have reference images only for our success-result, and any error-results that were different enough that they are hard to match to the success-result, and also too difficult to fix immediately.
Part of why I want to avoid multiple reference images, is that I don't think we can expect to create a cross-platform, cross-configuration reference image. There are too many platform-/config-specific differences in artifacts, which you can sort of see from the wild variety of fuzzy parameters on those reftests.

My guiding goal here is that eventually, all our test results should be within a few thousand pixels of the reference image, and also that at the same time, I'm expecting to roughly quadruple the number of tests we have, as we add bt2020/bt2100 narrow/wide/hlg. (and ideally we add bt601 tests too, since we shouldn't backslide on that either) We're going to end up needing to do something more complicate for (hdr) pq playback checks, too.
With a single reference, this is just a matter of either marking an upper bound for how fuzzy a (correct) result can be, or alternatively, marking a lower bound for known-incorrect results, so that we get UNEXPECTED-PASS when those change, and we can ratchet our test expectations towards correctness. (e.g. "we expect at least this many pixels to fail")

One thing I don't understand and want to understand better, is what issue are you running into with av1 decoded video? That's usually our most reliable one, which is why I chose it. Can you link me to those Try results?

Flags: needinfo?(jgilbert)

Kelsey Gilbert [:jgilbert]

Updated

•

3 years ago

Flags: needinfo?(alwu)

Alastor Wu [:alwu]

Reporter

Comment 13

•

3 years ago

I choose av1 as our reference (in part) because it's software decoded, and thus most reliable. (or at least in practice that's what I found)

FYI, we're working on using ffvpx to decode AV1 (bug 1745285), and ffvpx can support hardware decoding (bug 1652958) on some Linux machines (currently only on Wayland). That means in the future if the machine on the try server supports VAAPI, AV1 decoding would be possible to be performed by a hardware decoder.

Part of why I want to avoid multiple reference images, is that I don't think we can expect to create a cross-platform, cross-configuration reference image.

That makes sense.

The point of failure here would be if AV1 didn't work right, but it's probably our most reliable one.
In a case where AV1 didn't work, we would probably instead compare e.g. VP9 to the gold-standard PNG instead of to AV1.

I am not sure the failure I saw is "AV1 decoder didn't output full range video frame", or "AV1 decoder outputs correct full range video frame, but during the comparison when reftest is drawing images to canvas, the captured image is not same as the decoded video frame". The second one is what I experienced on my Linux machine. See (2) in comment1 and video I recorded in comment2.

One thing I don't understand and want to understand better, is what issue are you running into with av1 decoded video? That's usually our most reliable one, which is why I chose it. Can you link me to those Try results?

Sure, in this try-run, you can see the failures on Windows R-swr. Click any of them and search 720p.png.bt709.bt709.pc.yuv420p.vp9.webm, then you would see vp9 shows the full range of color, but its reference file, which is AV1, only shows limited range of color. That can also be found on other 720p.png.bt709.bt709.pc.yuv420p.av1.webm or 720p.png.bt709.bt709.pc.yuv420p10.av1.webm.

The analysis I made is here. In the beginning, I suspected that the result for AV1 video was wrong for a while. But then I filed this bug, and discovered this canvas issue, that make me think that the AV1 video actually outputs correct video frame, the error actually happened during the comparison phase, when reftest draws video into a canvas.

Flags: needinfo?(alwu)

Alastor Wu [:alwu]

Reporter

Comment 14

•

3 years ago

Per above discussion, we would keep the current way to compare AV1 with other different codecs. And I am going to rename this bug in order to only discuss why capturing video from canvas would fail. (considering we have sone valuable comments here)

Assignee: alwu → nobody

Component: Audio/Video: Playback → Canvas: WebGL

Summary: Make reftests under `dom/media/test/reftest/color_quads/` more stable → Sometime image captured from canvas is not equal to the displayed video frame

Alastor Wu [:alwu]

Reporter

Updated

•

3 years ago

Type: task → defect

Priority: P1 → --

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

3 years ago

The severity field is not set for this bug.
:jgilbert, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jgilbert)

Alastor Wu [:alwu]

Reporter

Comment 16

•

3 years ago

Worth to note that, in this comment I assume that the comparison between AV1/WebM/MP4 is already incorrect and that can be seen in this try-run when I disable shmem-decoding and compare WebM and MP4 with PNG directly. So it seems to me that all those 10bits video would be captured failed on SW web render.

Kelsey Gilbert [:jgilbert]

Updated

•

3 years ago

Severity: N/A → S3

Flags: needinfo?(jgilbert)

Priority: -- → P3

Alastor Wu [:alwu]

Reporter

Updated

•

3 years ago

Blocks: 1751913