Closed Bug 671064 Opened 13 years ago Closed 13 years ago

[Mac] Update to cairo 1.10 causes Print Preview to always crash on some pages on OS X

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla8
Tracking Status
firefox7 + fixed

People

(Reporter: marcia, Assigned: smichaud)

References

Details

(4 keywords, Whiteboard: [inbound][qa!])

Crash Data

Attachments

(3 files, 1 obsolete file)

Mac only low volume crash seen in Aurora and trunk but right now ranks as #1 top crash on Aurora. Crashes started showing up using the 2011062100 build. Comments mention Print Preview and Google books. https://crash-stats.mozilla.com/report/list?signature=_cairo_surface_release_source_image https://crash-stats.mozilla.com/report/index/e6b2257b-34ed-4b25-934d-682a02110708 Frame Module Signature [Expand] Source 0 XUL _cairo_surface_release_source_image gfx/cairo/cairo/src/cairo-surface.c:1476 1 XUL DataProviderReleaseCallback gfx/cairo/cairo/src/cairo-quartz-surface.c:1108 2 CoreGraphics CoreGraphics@0x7409a 3 CoreGraphics CoreGraphics@0x74042 4 CoreFoundation _CFRelease 5 CoreGraphics CoreGraphics@0x73f80 6 CoreFoundation _CFRelease 7 CoreGraphics CoreGraphics@0x156476 8 libPDFRIP.A.dylib libPDFRIP.A.dylib@0x1e9e 9 CoreFoundation __CFSetCallback 10 CoreFoundation __CFBasicHashDrain 11 CoreFoundation _CFRelease 12 libPDFRIP.A.dylib libPDFRIP.A.dylib@0x20ed 13 libPDFRIP.A.dylib libPDFRIP.A.dylib@0x5c36 14 CoreGraphics CoreGraphics@0x741ea 15 CoreFoundation _CFRelease 16 CoreGraphics CoreGraphics@0x741d4 17 PrintCore pdfSpoolingReleaseDrawingContext 18 PrintCore PJCReleaseSpoolingSession 19 PrintCore PJCEndDocument 20 PrintCore PMSessionEndDocumentNoDialog 21 XUL nsDeviceContextSpecX::EndDocument widget/src/cocoa/nsDeviceContextSpecX.mm:129 22 XUL nsDeviceContext::EndDocument gfx/src/nsDeviceContext.cpp:609 23 XUL nsPrintData::~nsPrintData layout/printing/nsPrintData.cpp:117 24 XUL nsPrintEngine::Destroy layout/printing/nsPrintEngine.cpp:284 25 XUL DocumentViewerImpl::OnDonePrinting layout/base/nsDocumentViewer.cpp:4281 26 XUL nsPrintCompletionEvent::Run layout/printing/nsPrintEngine.cpp:3377 27 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:617 28 XUL NS_ProcessPendingEvents_P obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp:195 29 XUL nsBaseAppShell::NativeEventCallback widget/src/xpwidgets/nsBaseAppShell.cpp:130 30 XUL nsAppShell::ProcessGeckoEvents widget/src/cocoa/nsAppShell.mm:422 31 CoreFoundation __CFRunLoopDoSources0 32 CoreFoundation __CFRunLoopRun 33 CoreFoundation CFRunLoopRunSpecific 34 HIToolbox HIToolbox@0x2e7ed 35 HIToolbox HIToolbox@0x2e550 36 HIToolbox HIToolbox@0x2e4ab 37 AppKit _DPSNextEvent 38 AppKit -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 39 AppKit -[NSApplication run] 40 XUL nsAppShell::Run widget/src/cocoa/nsAppShell.mm:769 41 XUL nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:222 42 XUL XRE_main toolkit/xre/nsAppRunner.cpp:3570 43 firefox-bin main browser/app/nsBrowserApp.cpp:198 44 firefox-bin firefox-bin@0xb73
Just able to reproduce this on 10.7. Here are my STR using the latest trunk build: 1. Load my main mail page on GMail - https://mail.google.com/mail/. I have the "Tree" theme installed in GMail but I am not sure if that is a factor or not. 2. File -> Print -> Open PDF in Preview 3. Crash 100% https://crash-stats.mozilla.com/report/index/bp-15a25f2f-9b91-4b58-8fb9-677412110712
Keywords: reproducible
100% reproducible for me on 10.6 too; File -> Print -> PDF -> Save as PDF. Doesn't seem to matter what theme on Gmail. Stephen, can you take a look at this?
Assignee: nobody → smichaud
err - Steven. Sorry for the misspelling!
I've just reproduced this on OS X 10.5.8: bp-b30499bd-bdf1-48ee-8cd0-b2c4f2110713 I'll look into it, and see what I can find out.
This is bad. Every attempt to use Print Preview crashes! Here's the regression range: firefox-2011-05-27-03-mozilla-central firefox-2011-05-28-03-mozilla-central I'm pretty sure this incriminates the patch for bug 562746, which updated cairo to version 1.10.
Blocks: 562746
Summary: [Mac] Firefox Crash [@ _cairo_surface_release_source_image ] → [Mac] Update to cairo 1.10 causes Print Preview to always crash on OS X
Breakpad reports several different crash signatures. I'll get some gdb crash stacks.
There appear to be some pages on which Print Preview *doesn't* crash (or at least doesn't always crash). I'll try to find some examples.
Attached file Gdb crash stacks (deleted) —
Here are two gdb crash stacks -- the first taken in 64-bit mode, the other in 32-bit mode. There was one tab (in one window) open to the default about:home page when I chose Preview from the Print dialog ... and crashed. I tested with a build with debug symbols, with optimization turned off (not the same as a debug build). The build was made from current trunk code. I tested on OS X 10.6.8.
After some quick tests with today's mozilla-central nightly on OS X 10.6.8: Pages that (almost) always crash doing Print Preview: about:home http://www.mozilla.org/projects/firefox/prerelease.html http://www.apple.com/ Pages that (usually) don't crash doing Print Preview: https://bugzilla.mozilla.org/show_bug.cgi?id=0 https://bugzilla.mozilla.org/show_bug.cgi?id=671064 http://www.mozilla.org/ http://www.mozilla.com/en-US/firefox/new/
> <missing bug number> This was a link to the bug whose id is 0.
Summary: [Mac] Update to cairo 1.10 causes Print Preview to always crash on OS X → [Mac] Update to cairo 1.10 causes Print Preview to always crash on some pages on OS X
It's definitely the patch for bug 562746 (the update to cairo 1.10) that triggered these crashes, if it didn't actually cause them. A build with http://hg.mozilla.org/mozilla-central/rev/503990b2c5c6 as tip doesn't crash. A build with http://hg.mozilla.org/mozilla-central/rev/04e8d0b481bc as tip does crash.
Attached patch Fix (obsolete) (deleted) — Splinter Review
This bug involves accessing a cairo_surface_t object after it's been deleted. Here's a very simple patch that seems to fix it. I'm doing a tryserver build, which should be available tomorrow morning.
Attachment #546049 - Flags: review?(jmuizelaar)
> This might be related: > http://lists.freedesktop.org/archives/cairo/2009-February/016677.html I don't know the cairo code well enough to say. But my patch does (in effect) make a quartz_source_image_t object hold a reference to the cairo_surface_t object that it points to, to ensure that the lifetime of the cairo_surface_t object is at least as long as the lifetime of the quartz_source_image_t object that points to it.
(Following up comment #16) In debugging this, though, I noticed that quartz_source_image_t.surface and quartz_source_image_t.img_out are always the same ... which puzzles me.
(In reply to comment #17) > (Following up comment #16) > > In debugging this, though, I noticed that > quartz_source_image_t.surface and quartz_source_image_t.img_out are > always the same ... which puzzles me. Any idea what changed to cause this crash in the first place?
> Any idea what changed to cause this crash in the first place? No. Though (of course) it's clear that a cairo_surface_t object is being accessed after deletion -- in _cairo_surface_release_source_image() and (probably) also elsewhere. Possibly it was just a timing change.
I can confirm that I do not crash on 10.7 using the tryserver build and trying to print using the same STR in Comment 1. Trying 10.6 next. (In reply to comment #15) > Here's a tryserver build made with my patch from comment #13 > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com- > ea8910103ac1/try-macosx64/firefox-8.0a1.en-US.mac.dmg > > Please try it out!
It occurs to me (as I'm learning more about the cairo code) that it might be better to fix this bug in cairo_quartz_image_surface.c. New patch coming up shortly.
> It occurs to me (as I'm learning more about the cairo code) that it > might be better to fix this bug in cairo_quartz_image_surface.c. But this doesn't work, for reasons I can't yet fathom. So I'll need to dig further.
Andrea Canciani writes: regarding the cairo-quartz patch, ref'ing/unref'ing should not be needed as the life of cgimage objects (and of the surface acquisition) does not extend beyond the backend functions. if you have some code which breaks without it, it is probably indicating another bug in cairo[-quartz]. (that ref'ing/unref'ing would be needed if the cgimage was associated to the surface snapshot-like)
>> It occurs to me (as I'm learning more about the cairo code) that it >> might be better to fix this bug in cairo_quartz_image_surface.c. > > But this doesn't work, for reasons I can't yet fathom. It does work if I try to fix the bug in cairo-image-surface.c instead. But I now think my original patch is probably the best we can do. Fixing the problem in a "backend"'s calls to acquire_source_image() and release_source_image() is overkill -- we only (as far as I can tell) need the additional calls to cairo_surface_reference() and cairo_surface_destroy() in this one case -- that of using a CGDataProvider to create a CGImage. The problem is that the releaseCallback (DataProviderReleaseCallback()) is called by the OS at unpredictable times, which will sometimes be after quartz_source_image_t.surface has been destroyed. I've also figured out why these crashes started happening with the upgrade to Cairo 1.10. Before the upgrade, a separate surface snapshot was created in _cairo_surface_to_cgimage() and passed to _cairo_quartz_create_cgimage() as its last parameter (releaseInfo), so we didn't have to worry that something else would destroy it. After the upgrade, only a pointer to the "original" surface gets passed (as ((quartz_source_image_t*)releaseInfo)->surface), which *can* get destroyed by something else, unless we create another reference to it.
As best I can tell, what Andrea Canciani says is some combination of wrong and irrelevant.
Jeff, are you going to r- Steven's patch? If not, what would a better solution to this bug be? It's gotten to the point where, if we don't fix this, we should probably look at backing out the Cairo update.
Attachment #546049 - Flags: review?(jmuizelaar) → review+
(In reply to comment #25) > As best I can tell, what Andrea Canciani says is some combination of wrong > and irrelevant. Yes, unfortunately I didn't understand the issue correctly since I was thinking about immediate drawing. Quartz printing surfaces seem to be retained, which means that the life of the sources used to draw on them is much longer than cairo-quartz currently expects. This is a known problem, as pointed out in this comment http://cgit.freedesktop.org/cairo/tree/src/cairo-quartz-surface.c#n933 The patch attached to this bugreport handles the simple case of a source surface which does not change for the whole lifetime of the destination printing surface. Unfortunately this assumption is not valid in general and to guarantee the correctness of the output, a snapshot or some other form of deep-cloning should be used instead. Nevertheless, the patch alleviates the issue, because it guarantees that the source surface will be alive until it is needed (the content might be different from the desired one, but at least it should avoid crashes). Snapshotting might have a performance impact, but might be acceptable for printing surfaces. A patch which both avoids the crash and provides the correct result would be much preferred upstream.
Attached patch Fix (for others to land) (deleted) — Splinter Review
I've carried forward Jeff's r+.
Attachment #546049 - Attachment is obsolete: true
Attachment #549835 - Flags: review+
(In reply to comment #27) > The patch attached to this bugreport handles the simple case of a > source surface which does not change for the whole lifetime of the > destination printing surface. Unfortunately this assumption is not > valid in general and to guarantee the correctness of the output, a > snapshot or some other form of deep-cloning should be used instead. Would it be acceptable to call _cairo_surface_snapshot() on 'source' and set source_img->surface to its result? (This is more-or-less what our cairo code did before the upgrade to version 1.10.) Would this snapshot get updated if the original source surface changed?
(In reply to comment #29) > (In reply to comment #27) > > > The patch attached to this bugreport handles the simple case of a > > source surface which does not change for the whole lifetime of the > > destination printing surface. Unfortunately this assumption is not > > valid in general and to guarantee the correctness of the output, a > > snapshot or some other form of deep-cloning should be used instead. > > Would it be acceptable to call _cairo_surface_snapshot() on 'source' > and set source_img->surface to its result? (This is more-or-less what Yes, this should avoid the crash and provide the desired behavior. > our cairo code did before the upgrade to version 1.10.) Would this > snapshot get updated if the original source surface changed? Cairo snapshot are copy-on-write. When the surface they target is about to bemodified, snapshots are detached from it and attached to a non-modificable clone (the clonation takes place before the surface is changed, so that the content which the snapshot provides never changes for its whole lifetime). This is the expected behavior in your case (if I paint A over B, then I change A, I still expect to have the old content of A in B... in fact this is how client code usually copies surface contents). This solution should be simple and correct, so if the performance does not degrade too much, I believe you should definitely apply it.
Keywords: checkin-needed
Whiteboard: [inbound]
Attachment #549835 - Flags: approval-mozilla-aurora?
Comment on attachment 549835 [details] [diff] [review] Fix (for others to land) This patch is (basically) zero risk, and fixes a topcrasher.
Comment on attachment 549835 [details] [diff] [review] Fix (for others to land) Approved for releases/mozilla-aurora
Attachment #549835 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
qa+ for verification on Firefox 7
Whiteboard: [inbound] → [inbound][qa+]
Setting resolution to Verified Fixed on MacOS X 10.6 and 10.7: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0) Gecko/20100101 Firefox/7.0 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0a2) Gecko/20110921 Firefox/8.0a2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0a1) Gecko/20110922 Firefox/9.0a1 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0) Gecko/20100101 Firefox/7.0 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:8.0a2) Gecko/20110921 Firefox/8.0a2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0a1) Gecko/20110919 Firefox/9.0a1 In order to verify this, I've used the steps from comment1 and comment2 and I got no crash.
Status: RESOLVED → VERIFIED
Whiteboard: [inbound][qa+] → [inbound][qa!]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: