Closed Bug 1291190 Opened 8 years ago Closed 8 years ago

Crash in RefCounted<T>::Release from DrawTargetD2D1::PopClip when scrolling PDF

Categories

(Core :: Graphics: Layers, defect)

50 Branch
x86_64
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
e10s - ---
firefox48 --- wontfix
firefox49 --- fixed
firefox-esr45 --- wontfix
firefox50 + fixed
firefox51 + fixed

People

(Reporter: mathieu.marquer, Assigned: nical)

References

Details

(Keywords: sec-other, Whiteboard: [gfx-noted][fixed by disabling patch for Windows in bug 1285271][adv-main49-])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-b9bd34ab-5834-4aa0-bc99-9c1c92160802.
=============================================================

Crashing when opening a PDF file and scrolling into it. Example: http://www.stif.org/IMG/pdf/STIF_-_CA_02072014_CP_Renfort_d_offre_bus.pdf
I'm not able to reproduce on Win7 x64 with FF47.

Does it crash for you:
1) in safe mode:
https://support.mozilla.org/fr/kb/resoudre-problemes-firefox-mode-sans-echec

2) with a fresh profile:
https://support.mozilla.org/fr/kb/utiliser-gestionnaire-profils-creer-supprimer-profils
Flags: needinfo?(mathieu.marquer)
Surprising results to me: 
- Doesn't crash in safe mode
- Crashes in "normal mode" after manually disabling all my add-ons and restarting firefox

Another crash signature: https://crash-stats.mozilla.com/report/index/93d5c12c-34ae-47e2-bb7a-c3fea2160802
Flags: needinfo?(mathieu.marquer)
It would be useful to test with a fresh profile, sometimes changes in prefs (in about:config e.g.) can make Firefox crash.

In addition, try in "normal" mode with HWA disabled (you need to restart FF to apply).
https://support.mozilla.org/fr/kb/desactiver-acceleration-materielle
Does *not* crash with HWA disabled.
(In reply to Mathieu Marquer from comment #5)
> Does *not* crash with HWA disabled.

That's why it doesn't crash in safe mode, HWA is off in safe mode.

Could you type about:support and copy here the "graphics" section.
Oh, you report the crash for FF51, not 48! I'm able to reproduce the crash too. :)
No problem :)

The graphics section is the following:
Graphics
Features
Compositing	Direct3D 11
Asynchronous Pan/Zoom	wheel input enabled; touch input enabled
WebGL Renderer	Google Inc. -- ANGLE (Intel(R) HD Graphics Family Direct3D11 vs_5_0 ps_5_0)
WebGL2 Renderer	WebGL creation failed: * Refused to create native OpenGL context because of blacklist entry: WEBGL_NATIVE_GL_OLD_INTEL * Exhausted GL driver options.
Hardware H264 Decoding	Yes; Using D3D9 API
Audio Backend	wasapi
Direct2D	true
DirectWrite	true (6.2.9200.17568)
GPU #1
Active	Yes
Description	Intel(R) HD Graphics Family
Vendor ID	0x8086
Device ID	0x0a16
Driver Version	10.18.14.4029
Driver Date	11-18-2014
Drivers	igdumdim64 igd10iumd64 igd10iumd64 igdumdim32 igd10iumd32 igd10iumd32
Subsys ID	503c17aa
RAM	Unknown
Diagnostics
AzureCanvasAccelerated	0
AzureCanvasBackend	direct2d 1
[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

I tested and I'm able to reproduce with Nightly only with e10s+HWA both enabled.

Regression range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d0cb076a4e736e89f8393ba5658148ae5be4171d&tochange=e5db12322fd393fe7970e726cd1f4b64845f6d23

Clearly related to bug 1289816 probably.
Blocks: 1289816
Status: UNCONFIRMED → NEW
Has Regression Range: --- → yes
Has STR: --- → yes
Component: DOM → Graphics: Layers
Ever confirmed: true
Flags: needinfo?(nical.bugzilla)
Hardware: Unspecified → x86_64
Version: unspecified → 50 Branch
Blocks: e10s
tracking-e10s: --- → ?
Summary: Crash in nsPIDOMWindow<T>::GetDoc → [e10s] Crash in nsPIDOMWindow<T>::GetDoc
This bug has various crash signatures (from CR of the reporter and from my own CRs):
[@ nsPIDOMWindow<T>::GetDoc ]
[@ nsPIDOMWindowInner::HasActiveDocument ]
[@ xpc::NativeGlobal ]
[@ IPCError-browser | ShutDownKill ]
[@ mozilla::detail::RefCounted<T>::Release ]

The last one is probably more significant about crash in Graphics Layers:
https://crash-stats.mozilla.com/report/index/f43b3b28-7526-4aaa-b8f1-80bfc2160803
Crash Signature: [@ nsPIDOMWindow<T>::GetDoc] → [@ nsPIDOMWindow<T>::GetDoc] [@ mozilla::detail::RefCounted<T>::Release ]
Summary: [e10s] Crash in nsPIDOMWindow<T>::GetDoc → [e10s] Crash in nsPIDOMWindow<T>::GetDoc when scrolling PDF
Whiteboard: [gfx-noted]
The nsPIDOMWindow signatures are very different from the DrawTarget::PopClip crashes like bug 1291531 and other canvas signatures that spiked recently. It should be investigated separately.
Flags: needinfo?(nical.bugzilla)
Group: gfx-core-security
This sounds like it could be memory corruption, so I'm hiding it.

Somebody on Linux should try the STR from comment 0 with a Nightly Linux ASan build. Kamil, could you do that? Thanks.
Flags: needinfo?(kjozwiak)
I'm going to conservatively mark this sec-critical, as there's a big spike in Nightly crashes that look like they hitting garbage memory.
:nical, I'm assuming this is a partial fallout from the canvas stuff (e.g., https://crash-stats.mozilla.com/report/index/f4e8c512-6171-4d7c-9a50-80b5a2160805 is noisy about persistent buffer provider related issues.)
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
Lots of these are high uptime runs, in the actor destroy - are these computers going to sleep and we crash?
(In reply to Milan Sreckovic [:milan] from comment #15)
> :nical, I'm assuming this is a partial fallout from the canvas stuff

I can't see the original crash stack from comment 0 https://crash-stats.mozilla.com/report/index/b9bd34ab-5834-4aa0-bc99-9c1c92160802 being related to the canvas stuff. There are canvas crashes, one of which has the same STR as comment 0, but we have to be clear about which stack we are investigating here, I don't think they are the same thing.
I am currently focusing on whatever as a trace of gfx in the stack, so only the PopClip stuff. Someone from the DOM team should have a look at the nsPIDOMWindow crashes.

Changing the title since I am assigned on this bug.
Crash Signature: [@ nsPIDOMWindow<T>::GetDoc] [@ mozilla::detail::RefCounted<T>::Release ] → [@ mozilla::detail::RefCounted<T>::Release ]
Flags: needinfo?(nical.bugzilla)
Summary: [e10s] Crash in nsPIDOMWindow<T>::GetDoc when scrolling PDF → Crash in RefCounted<T>::Release from DrawTargetD2D1::PopClip when scrolling PDF
(In reply to Andrew McCreight [:mccr8] from comment #12)
> This sounds like it could be memory corruption, so I'm hiding it.
> 
> Somebody on Linux should try the STR from comment 0 with a Nightly Linux
> ASan build. Kamil, could you do that? Thanks.

I tried but cannot reproduce :/
Andrew, comment 17.
Flags: needinfo?(overholt)
Flags: needinfo?(kjozwiak)
Just to close the loop from my end, I've been trying to reproduce the crash using an asan build under an Ubuntu VM but I couldn't reproduce the crash. I'm assuming it's due to fx blocking the VM gfx adapters?

Only machine that I managed to reproduce the crash was my Win 10 desktop machine at home.
We were looking at the related bug 1291536. Michael, wanna confirm the nsPIDOMWIndow crashes noted here in comment 17 are the same as bug 1291536?
Flags: needinfo?(overholt) → needinfo?(michael)
Those failures appear very similar. They both appear to occur when trying to get a document off of a nsPIDOMWindow, and they both have the same strange address (-1). I'm inclined to believe that they are related.
Flags: needinfo?(michael)
Can't reproduce on my Linux machine either.
PersistentBufferProviderShared was disabled again on windows (landed in central last saturday 2016-08-06). We'll uplift that to aurora if the crashes stop. If this crash persists, the good news is there aren't a lot of changes left that could have caused it. In any case, please leave this bug open so that I have a place to investigate until I can re-enable PersistentBufferProviderShared.
Confirm: not crashing anymore.
Yup, it works fine on my end too now.
Chances are this is not a sec-critical anymore, and probably not a security bug at all...
(In reply to Milan Sreckovic [:milan] from comment #28)
> Chances are this is not a sec-critical anymore, and probably not a security
> bug at all...

I believe the bad code is still in aurora, as the backout hasn't been uplifted, but I'm not confident.
It looks like the disabling was in bug 1285271. That has Aurora approval, but has not been landed yet, so I'm going to leave 50 as "affected". Once that has landed and somebody has confirmed the crash has gone away we can unhide this bug and remove the sec-critical flag.
I guess you are tracking these in the other direction.
Blocks: 1285271
No longer depends on: 1285271
Whiteboard: [gfx-noted] → [gfx-noted][fixed by disabling patch for Windows in bug 1285271]
Tracking 50/51+ for this sec-critical bug.
Backout just landed on Aurora.
I think that for at least a subset of the crashes, the DrawTarget is not dangling but it's actually the clip stack that isn't properly maintained, leading us to pop more clips that we push which explodes when we call std::vector::back on the empty clip stack.
There's been some improvement on the clip management code since the backout so we may be in a better spot, but i'd like to replace the scary crash with a gfxCriticalError so that we can more easily verify the problem and identify it again in the future.
At a glance, it looks like skia does not crash release builds when saves and restores are unbalanced, which would explain this happening only on windwows+D2D.
Attachment #8782460 - Flags: review?(bas)
I confirmed that this crash seems to have been fixed by the backout on Nightly and Aurora, so I'm changing this to sec-other.
Keywords: sec-criticalsec-other
Comment on attachment 8782460 [details] [diff] [review]
Don't crash in DrawTargetD2D if PopClip is called once too many.

Review of attachment 8782460 [details] [diff] [review]:
-----------------------------------------------------------------

Can we make this a gfxDevCrash? I'd like to confirm this for our aurora and beta builds..
Attachment #8782460 - Flags: review?(bas) → review+
https://hg.mozilla.org/mozilla-central/rev/eb6b3562334e
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Crash volume for signature 'mozilla::detail::RefCounted<T>::Release':
 - nightly (version 51): 755 crashes from 2016-08-01.
 - aurora  (version 50): 7564 crashes from 2016-08-01.
 - beta    (version 49): 48 crashes from 2016-08-02.
 - release (version 48): 120 crashes from 2016-07-25.
 - esr     (version 45): 9 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly      35     108     604
 - aurora     2729    3649     846
 - beta          5      16      11
 - release      33      37      29
 - esr           1       0       2

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly           #36
 - aurora  #1539     #2
 - beta    #856      #2872
 - release #645
 - esr     #2459
This is not the same crash? https://crash-stats.mozilla.com/report/index/01cb18ff-54cb-40f8-92ae-e0a472160822  We fail in BasicLayerManager::PushGroupForLayer, then things get worse.
Flags: needinfo?(nical.bugzilla)
(In reply to Milan Sreckovic [:milan] from comment #41)
> This is not the same crash?
> https://crash-stats.mozilla.com/report/index/01cb18ff-54cb-40f8-92ae-
> e0a472160822  We fail in BasicLayerManager::PushGroupForLayer, then things
> get worse.

I think that this is the same type of bug: we improperly track the number of clips pushed and end up popping more than we pushed. But it's a different part of the code.
Status: RESOLVED → REOPENED
Flags: needinfo?(nical.bugzilla)
Resolution: FIXED → ---
I have landed a whole lot of things on various bugs which should improve the situation, could you download this build https://archive.mozilla.org/pub/firefox/try-builds/nsilva@mozilla.com-2f7ed76b0208490b2438260b1f5f396e0d72fcf8/try-win32/firefox-51.0a1.en-US.win32.zip and tell me if you can still reproduce the issue?
It's the latest nightly build with the copy-on-write canvas pref enabled. You can also just update your installation of firefox nightly (I don't know if all of the patches are already in today's build and if some will make it in tomorrow's nightly though) and enable "layers.shared-buffer-provider.enabled" in about:config.
Flags: needinfo?(thesweetlilycake)
Flags: needinfo?(mathieu.marquer)
Not crashing for me with the build and the STR I provided in comment 0.
Flags: needinfo?(mathieu.marquer)
(In reply to Nicolas Silva [:nical] from comment #43)
> I have landed a whole lot of things on various bugs which should improve the
> situation, could you download this build
> https://archive.mozilla.org/pub/firefox/try-builds/nsilva@mozilla.com-
> 2f7ed76b0208490b2438260b1f5f396e0d72fcf8/try-win32/firefox-51.0a1.en-US.
> win32.zip and tell me if you can still reproduce the issue?
> It's the latest nightly build with the copy-on-write canvas pref enabled.
> You can also just update your installation of firefox nightly (I don't know
> if all of the patches are already in today's build and if some will make it
> in tomorrow's nightly though) and enable
> "layers.shared-buffer-provider.enabled" in about:config.

No crash for me with this build when scrolling the PDF, but I got some invalidation areas (black background for 1 or 2 pages of the PDF) when switching from another tab to the PDF tab. Is there a bug open about this issue?

NB: e10s+HWA are turned on.
Excellent, thanks! I'll switch the pref back on soon, then.

> No crash for me with this build when scrolling the PDF, but I got some
> invalidation areas (black background for 1 or 2 pages of the PDF) when
> switching from another tab to the PDF tab. Is there a bug open about this
> issue?

Not that I know of, I filed bug 1300121. If you can reproduce this issue, could you attach a screenshot to the bug?
(In reply to Nicolas Silva [:nical] from comment #43)
It doesn't crash for me either.
Flags: needinfo?(thesweetlilycake)
(In reply to Nicolas Silva [:nical] from comment #46)
> Not that I know of, I filed bug 1300121. If you can reproduce this issue,
> could you attach a screenshot to the bug?

I added 2 screenshots.
49 and 50 were fixed by backing out the copy-on-write canvas pref, and the issue this bug is focused on seems to have been fixed since, so closing.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Whiteboard: [gfx-noted][fixed by disabling patch for Windows in bug 1285271] → [gfx-noted][fixed by disabling patch for Windows in bug 1285271][adv-main49-]
Group: gfx-core-security → core-security-release
So, we do hit the assert that was added in this bug; not that often, but non-zero.
https://crash-stats.mozilla.com/report/index/85eea596-e4c9-4883-99cf-59b672170130
They appear to come from PopClip called by gfxContext::~gfxContext.
Nical, does this help any?
Flags: needinfo?(nical.bugzilla)
(In reply to Milan Sreckovic [:milan] from comment #50)
> So, we do hit the assert that was added in this bug; not that often, but
> non-zero.
> https://crash-stats.mozilla.com/report/index/85eea596-e4c9-4883-99cf-
> 59b672170130
> They appear to come from PopClip called by gfxContext::~gfxContext.
> Nical, does this help any?

It mostly indicates that we have a bug somewhere where we pop more clips than we push on a canvas (probably related to the logic that tries to track the pushed clips between frames). Considering the volume it's not something I would spend time on in the near future.
Flags: needinfo?(nical.bugzilla)
Group: core-security-release
This could also be the cause behind Cairo specific bug 1339762 - the unbalance in cairo_save/cairo_restore could happen because of unbalanced clip push/pop.  And we have a reproducible case for that.  Worth following up on that one.
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(nical.bugzilla)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: