Open Bug 1725070 Opened 3 years ago Updated 3 years ago

Assertion failure: result, at src/layout/printing/PrintTranslator.h:61

Categories

(Core :: Printing: Output, defect)

defect

Tracking

()

Tracking Status
firefox-esr91 --- wontfix
firefox92 --- wontfix
firefox93 --- wontfix
firefox95 --- wontfix
firefox96 --- wontfix
firefox97 --- wontfix

People

(Reporter: tsmith, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(4 keywords, Whiteboard: [bugmon:bisected,confirmed])

Attachments

(1 file, 1 obsolete file)

Attached file testcase.html (deleted) —

Found while fuzzing m-c 20210617-1cf323597558 (--enable-debug --enable-fuzzing)

Assertion failure: result, at src/layout/printing/PrintTranslator.h:61

#0 0x7fc87e4778d6 in mozilla::layout::PrintTranslator::LookupSourceSurface(mozilla::gfx::ReferencePtr) src/layout/printing/PrintTranslator.h:61:5
#1 0x7fc87a9a6aac in mozilla::gfx::RecordedFilterNodeSetInput::PlayEvent(mozilla::gfx::Translator*) const src/gfx/2d/RecordedEventImpl.h:3855:41
#2 0x7fc87aa2d4e4 in bool mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::EventStream>(mozilla::gfx::EventStream&, mozilla::gfx::RecordedEvent::EventType, std::function<bool (mozilla::gfx::RecordedEvent*)> const&) src/gfx/2d/RecordedEventImpl.h
#3 0x7fc87e467281 in mozilla::layout::PrintTranslator::TranslateRecording(mozilla::layout::PRFileDescStream&) src/layout/printing/PrintTranslator.cpp:50:20
#4 0x7fc87e46972b in mozilla::layout::RemotePrintJobParent::PrintPage(mozilla::layout::PRFileDescStream&, nsRefCountedHashtable<nsUint64HashKey, RefPtr<mozilla::gfx::RecordedDependentSurface> >*) src/layout/printing/ipc/RemotePrintJobParent.cpp:167:26
#5 0x7fc87e4695f5 in mozilla::layout::RemotePrintJobParent::FinishProcessingPage(nsRefCountedHashtable<nsUint64HashKey, RefPtr<mozilla::gfx::RecordedDependentSurface> >*) src/layout/printing/ipc/RemotePrintJobParent.cpp:146:17
#6 0x7fc87e46942f in mozilla::layout::RemotePrintJobParent::RecvProcessPage(nsTArray<unsigned long>&&) src/layout/printing/ipc/RemotePrintJobParent.cpp:121:5
#7 0x7fc87a3eabe7 in mozilla::layout::PRemotePrintJobParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PRemotePrintJobParent.cpp:301:28
#8 0x7fc87a10b4bb in mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PContentParent.cpp:6678:32
#9 0x7fc879f3ce81 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) src/ipc/glue/MessageChannel.cpp:2148:25
#10 0x7fc879f39231 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) src/ipc/glue/MessageChannel.cpp:2075:9
#11 0x7fc879f3a76d in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) src/ipc/glue/MessageChannel.cpp:1923:3
#12 0x7fc879f3b4eb in mozilla::ipc::MessageChannel::MessageTask::Run() src/ipc/glue/MessageChannel.cpp:1954:13
#13 0x7fc87964363e in mozilla::RunnableTask::Run() src/xpcom/threads/TaskController.cpp:502:16
#14 0x7fc879621359 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) src/xpcom/threads/TaskController.cpp:805:26
#15 0x7fc8796201c8 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) src/xpcom/threads/TaskController.cpp:641:15
#16 0x7fc879620443 in mozilla::TaskController::ProcessPendingMTTask(bool) src/xpcom/threads/TaskController.cpp:425:36
#17 0x7fc879646e36 in operator() src/xpcom/threads/TaskController.cpp:135:37
#18 0x7fc879646e36 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:534:5
#19 0x7fc879632daf in nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1152:16
#20 0x7fc8796399ea in NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:548:10
#21 0x7fc879f42796 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:85:21
#22 0x7fc879eaa4e7 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:331:10
#23 0x7fc879eaa402 in RunHandler src/ipc/chromium/src/base/message_loop.cc:324:3
#24 0x7fc879eaa402 in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:306:3
#25 0x7fc87dcf1c48 in nsBaseAppShell::Run() src/widget/nsBaseAppShell.cpp:137:27
#26 0x7fc87f582b86 in nsAppStartup::Run() src/toolkit/components/startup/nsAppStartup.cpp:273:30
#27 0x7fc87f690ddb in XREMain::XRE_mainRun() src/toolkit/xre/nsAppRunner.cpp:5239:22
#28 0x7fc87f692565 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) src/toolkit/xre/nsAppRunner.cpp:5437:8
#29 0x7fc87f692df9 in XRE_main(int, char**, mozilla::BootstrapConfig const&) src/toolkit/xre/nsAppRunner.cpp:5496:21
#30 0x55ee04fa1b80 in do_main src/browser/app/nsBrowserApp.cpp:225:22
#31 0x55ee04fa1b80 in main src/browser/app/nsBrowserApp.cpp:384:16
#32 0x7fc88e61a0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
#33 0x55ee04f7ea5c in _start (/home/worker/builds/m-c-20210618092056-fuzzing-debug/firefox-bin+0x15a5c)
Flags: in-testsuite?

A Pernosco session is available here: https://pernos.co/debug/KJqVnSZYmJBZpO5fjYVAaQ/index.html

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20210811213739-df9503f5956b.
The bug appears to have been introduced in the following build range:

Start: 42a33fb8ccdb2ea3f84a585c98220c080a4025b4 (20201208144115)
End: 659b97e784bf71f7bbaab8cb262663f0f6c07552 (20201208180244)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=42a33fb8ccdb2ea3f84a585c98220c080a4025b4&tochange=659b97e784bf71f7bbaab8cb262663f0f6c07552

Whiteboard: [bugmon:bisected,confirmed]

Bob, maybe you could take a look when you've got time? It looks like you've got hg-blame on the assertion in question, and on various code related to the mSourceSurfaces structure where the lookup is failing here.

Flags: needinfo?(bobowencode)

In the regression range, bug 1681052 would be my prime suspect (it just enabled layout.display-list.improve-fragmentation by default). But I can't test that theory because I can't reproduce the assertion-failure locally.

Tyson, two requests:
(1) could you test whether this goes away when you set about:config pref layout.display-list.improve-fragmentation to false?
(2) assuming it does: can you (or bugmon) narrow down a further-back regression range, using a Firefox profile/configuration with layout.display-list.improve-fragmentation manually set to true?

Flags: needinfo?(twsmith)

Even though I don't see the fatal assertion: when I scroll the testcase's print-preview rendering, I do see the following:

###!!! ASSERTION: Too many pages to handle OOFs: 'pageNum <= 255', file layout/generic/nsPageContentFrame.cpp:327

(which makes sense, since it's a 300+ page document; so yes, the page numbers do get higher than 255).

(In reply to Daniel Holbert [:dholbert] from comment #4)

(1) could you test whether this goes away when you set about:config pref layout.display-list.improve-fragmentation to false?

I was still able to reproduce the issue with layout.display-list.improve-fragmentation=false

Any other ideas?

Flags: needinfo?(twsmith)

(In reply to Tyson Smith [:tsmith] from comment #6)

I was still able to reproduce the issue with layout.display-list.improve-fragmentation=false

Any other ideas?

That surprises me... That pref-flip is really the only item in comment 2's regression range that would make sense as having introduced this, unless I'm overlooking something. (The only other print-related thing is https://hg.mozilla.org/integration/autoland/rev/f947a5c5794d5939a8b0503c18a795b3874ca777 which reduced a timeout if Cu.isInAutomation==true, but I would imagine that isInAutomation is false for your fuzzers so that wouldn't have changed behavior for you.)

Moreover: on my end, layout.display-list.improve-fragmentation=false makes this testcase render as a 1-page document in print preview, rather than a 300+ page document (because we don't bother trying to print out all of the paint-overflow). And I get no assertion-failures at all (fatal or otherwise). I suspect that this issue would depend on the huge number of pages, to some extent...

Would you mind double-checking that you really can reproduce with that pref turned off? And maybe for good measure, see if bugmon still finds the same range with the pref turned on (I would expect it to find a further-back range, if the pref is indeed involved).

Flags: needinfo?(twsmith)

Aha, actually, I answered my own question -- turns out I can indeed reproduce! (And yeah, I do get an assertion-failure even with the pref turned off). I just needed to proceed past the print-preview dialog and actually attempt to print. :) (using Save to PDF)

Here's the debug output that leads up to the assertion failure, FWIW:

[GFX3-]: Surface size too large (exceeds extent limit)!
[GFX2-]: Allowing surface with invalid size (Cairo) Size(48960,63360)
[GFX3-]: Surface size too large (exceeds extent limit)!
[GFX2-]: Allowing surface with invalid size (Cairo) Size(48960,63360)
[GFX3-]: Ending FillGlyphs with a bad surface 1
[GFX2-]: DrawTargetCairo context in error state: error occurred in libfreetype(40)
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: DrawTargetCairo::Snapshot with bad surface 0x7f965cf1bb60, context 0x7f9656eaf000, status 0 (t=20.8718) [GFX1-]: DrawTargetCairo::Snapshot with bad surface 0x7f965cf1bb60, context 0x7f9656eaf000, status 0
Assertion failure: result, at layout/printing/PrintTranslator.h:61
Flags: needinfo?(twsmith)

I also get a crash when doing Save-to-PDF in our official Nightly builds as well: bp-d74dd03c-4569-4df0-bc61-e98360210813
This Nighlty crash happens regardless of the layout.display-list.improve-fragmentation value, and it has a more-recent regression range -- it regressed on April 29th 2021:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=d735e197d708d85d9b7e7cb1ea3609bdc4f27bd6&tochange=7387be4b195f3f266410d8faafbaf80f8176e293

I actually think there might actually be two independent assertion-failures here! Tyson, would you mind testing this theory on your end and seeing if it matches what you got from Bugmon/etc?

(1) The proximal issue is the PrintTranslator.h fatal assertion failure (the assertion-failure tracked in this bug), which is approximately the same as the above-linked crash report. It's independent of the layout.display-list.improve-fragmentation setting, and I think this issue only goes back to April 29th, and it's a regression from the Cairo upgrade (shown in my regression pushlog there).

(2) Before that regression range, there's a second unrelated fatal assertion that we hit if and only if you have layout.display-list.improve-fragmentation turned on, and this one only crashes the content process (so it's a little less bad):

Assertion failure: nsLayoutUtils::IsAncestorFrameCrossDoc(mAdditionalOffsetFrame, aFrame), at layout/painting/nsDisplayList.cpp:1516

At least, that's the assertion that I get if I run mozregression --launch 2021-04-01 --build-type debug -a https://bugzilla.mozilla.org/attachment.cgi?id=9235720 -- and if I add --pref "layout.display-list.improve-fragmentation:false" to that command, then I don't get any issues at all.

Flags: needinfo?(twsmith)

So based on my analysis, I think the assertion described here in comment 0 is a regression from bug 739096, the recent cairo upgrade.

The PrintTranslator assertion-failure is a downstream effect of some cairo failure. Leaving ni=bobowen open to decide if that's an issue in that code or not (e.g. maybe the new cairo is just being a bit stricter and refusing to create certain bogus/huge surfaces, so we end up with a nullprt or a surface creation failure of some sort, and we need to handle that case?)

Regressions: 739096
Component: Layout → Printing: Output
Keywords: crash

Checking for calls to _cairo_error in the pernosco trace, it appears the failure arises from a FreeType error (it returns an Invalid_PPem status), which results from an extremely stretched matrix in the gstate when _cairo_gstate_ensure_scaled_font is called; checking the CTM, I see

p gstate->ctm 
$8 = {xx = 0.5, yx = 0, xy = 0, yy = 16384, x0 = 0, y0 = -237518}

This arises from the three nested transform: scaley(32) properties that are in effect (on the <html>, <body>, and <dt> elements), resulting in an attempt to use a font with a y-scale of 32768 times (the ratio between xx and yy in the matrix).

FreeType balks at this, and the resulting error code bubbles up to cairo surface creation, leaving it in an error state.

Maybe we should try to catch and handle the failure somehow, but it's a pretty extreme edge-case that seems unlikely to affect many real users.

(In reply to Daniel Holbert [:dholbert] from comment #8)

FWIW, these warnings:

[GFX3-]: Surface size too large (exceeds extent limit)!
[GFX2-]: Allowing surface with invalid size (Cairo) Size(48960,63360)

seem to be quite common when printing, and things continue to work OK.

However, these:

[GFX3-]: Ending FillGlyphs with a bad surface 1
[GFX2-]: DrawTargetCairo context in error state: error occurred in libfreetype(40)

come from the point where things go wrong as described above.

(In reply to Daniel Holbert [:dholbert] from comment #9)

I actually think there might actually be two independent assertion-failures here! Tyson, would you mind testing this theory on your end and seeing if it matches what you got from Bugmon/etc?

Yes I see the same thing. This is a known issue with bugmon.

Flags: needinfo?(twsmith)

(In reply to Daniel Holbert [:dholbert] from comment #3)

Bob, maybe you could take a look when you've got time? It looks like you've got hg-blame on the assertion in question, and on various code related to the mSourceSurfaces structure where the lookup is failing here.

We have found cases where these lookups can fail and we've had to remove the check, for example GradientStops.

I think in general the original playback code assumed that the lookups never failed, but that has been changed to add null checks to make it more robust.
Looks like the investigation of the root cause of the surface being missing has moved on, if that's something that can "genuinely" happen and is not a bug elsewhere then we should probably remove this check.

Flags: needinfo?(bobowencode)

(In reply to Jonathan Kew (:jfkthame) from comment #11)

Maybe we should try to catch and handle the failure somehow, but it's a pretty extreme edge-case that seems unlikely to affect many real users.

Calling this S4 for the time being then.

(In reply to Bob Owen (:bobowen) from comment #14)

(In reply to Daniel Holbert [:dholbert] from comment #3)

Bob, maybe you could take a look when you've got time? It looks like you've got hg-blame on the assertion in question [...]
[...] if that's something that can "genuinely" happen and is not a bug elsewhere then we should probably remove this check.

From jfkthame's investigation, it sounds like it is technically a bug elsewhere, but not a high-priority one. So it might be worth softening the assertion to be non-fatal at least, to avoid tripping up fuzzers and other debug-build investigations that happen to run afoul of this check.

I'll add a patch to do that.

Severity: -- → S4
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
Attachment #9239956 - Attachment description: Bug 1725070: Soften a fatal assertion in PrintTranslator::LookupFilterNode since we can make it fail and we don't actually rely on it. r?bobowen → Bug 1725070: Soften a fatal assertion in PrintTranslator::LookupFilterNode, since fuzzer testcases can make it fail, and we don't rely on its condition holding. r?bobowen
Attachment #9239956 - Attachment is obsolete: true

(In reply to Daniel Holbert [:dholbert] from comment #15)

From jfkthame's investigation, it sounds like it is technically a bug elsewhere, but not a high-priority one. So it might be worth softening the assertion to be non-fatal at least, to avoid tripping up fuzzers and other debug-build investigations that happen to run afoul of this check.

I'll add a patch to do that.

Er, scratch that; turns out the attached testcase crashes opt builds as well, not just debug builds. So we do indeed have some downstream code that relies on the assertion holding. In particular, if I soften/remove the mSourceSurfaces assertion, then I instead abort here with:

Assertion failure: filter (missing input), at gfx/2d/FilterNodeSoftware.cpp:872

And in an opt build (i.e. disregarding assertions), I crash on the line that immediately follows that^ FilterNodeSoftware assertion (which dereferences the filter pointer that we we're expecting to be non-null).

So: there's no easy soften-the-assertion-to-make-fuzzers-happy fix here, it seems; we need to handle the issue that jfkthame referenced in comment 11 in order to avoid this crash, to the extent that we think this affects users or impedes fuzzers.

Assignee: dholbert → nobody
Status: ASSIGNED → NEW

Bugmon Analysis
Testcase crashes using the initial build (mozilla-central 20210617153900-1cf323597558) but not with tip (mozilla-central 20211203213802-92df9c655be5.)
The bug appears to have been fixed in the following build range:

Start: 75c615b53e7b96334ee7e75f0224be36daf04595 (20211115215316)
End: a6342e8bac712a9923c04e3fd1b34cfcff90fd25 (20211117034108)
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=75c615b53e7b96334ee7e75f0224be36daf04595&tochange=a6342e8bac712a9923c04e3fd1b34cfcff90fd25
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon

The link was backwards :)

Regressed by: 739096
No longer regressions: 739096
Has Regression Range: --- → yes
Keywords: regression

Set release status flags based on info from the regressing bug 739096

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: