Closed
Bug 1120485
Opened 10 years ago
Closed 6 years ago
crash in mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) | mozilla::ipc::MessageChannel::~MessageChannel() | mozilla::layers::CompositorParent::Release()
Categories
(Core :: Graphics: Layers, defect, P3)
Tracking
()
RESOLVED
INCOMPLETE
mozilla48
People
(Reporter: mihaelav, Unassigned)
References
()
Details
(Keywords: crash, Whiteboard: [mozmill][gfx-noted])
Crash Data
Attachments
(3 files, 1 obsolete file)
(deleted),
patch
|
milan
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
milan
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
milan
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-62b61d2f-ab57-4c6b-93a7-ab40c2150112.
=============================================================
This crashed while running our mozmill functional suite.
Report: http://mozmill-daily.blargon7.com/#/functional/report/c0d2f578c4e131660e84b4e0e211031b
Comment 1•10 years ago
|
||
If this happens again please let us know. It would be good to have an idea of how frequently this is encountered.
Also, do you know if the mozmill suite loads pages with plugins? Stuff in the crash stack makes me think it might be related to plugin crashing.
Flags: needinfo?(mihaela.velimiroviciu)
Whiteboard: gfx-noted
Comment 2•10 years ago
|
||
Weird stuff happening here. Looks like we are asking a LayerManager to a widget that used to have one, then lost it somehow, and since it doesn't have a LayerManager anymore, it thinks it doesn't have a compositor protocol, except it does and things go bad when it reassigns the previous compositor without properly shutting it down.
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(mihaela.velimiroviciu)
Reporter | ||
Comment 3•10 years ago
|
||
The crash appeared in a test which doesn't use plugins, but Flash plugin was used in one of the previous tests
Flags: needinfo?(mihaela.velimiroviciu)
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(mihaela.velimiroviciu)
Comment 4•10 years ago
|
||
All crashes so far on crashstats were happening under Windows 7. There are 45 crashes in the last 7 days, but none before. So this might be a new regression. The first buildid for this crash is 20150109030224.
Crash Reason EXCEPTION_BREAKPOINT
Crash Address 0x74011e28
Here the list of the first 10 frames:
0 mozalloc.dll mozalloc_abort(char const* const) memory/mozalloc/mozalloc_abort.cpp
1 xul.dll NS_DebugBreak xpcom/base/nsDebugImpl.cpp
2 xul.dll mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) ipc/glue/MessageChannel.cpp
3 xul.dll mozilla::ipc::MessageChannel::~MessageChannel() ipc/glue/MessageChannel.cpp
4 xul.dll mozilla::layers::CompositorParent::Release() gfx/layers/ipc/CompositorParent.h
5 mozalloc.dll mozalloc.dll@0xfff
6 xul.dll nsBaseWidget::CreateCompositor(int, int) widget/nsBaseWidget.cpp
7 xul.dll nsBaseWidget::CreateCompositor() widget/nsBaseWidget.cpp
8 xul.dll nsWindow::GetLayerManager(mozilla::layers::PLayerTransactionChild*, mozilla::layers::LayersBackend, nsIWidget::LayerManagerPersistence, bool*) widget/windows/nsWindow.cpp
9 xul.dll nsIWidget::GetLayerManager(bool*) widget/nsIWidget.h
10 xul.dll PresShell::Paint(nsView*, nsRegion const&, unsigned int) layout/base/nsPresShell.cpp
OS: Windows NT → Windows 7
Comment 5•10 years ago
|
||
Maybe related to one of the changes landed between Jan 8th and 9th?
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=70de2960aa87&tochange=b3f84cf78dc2
There is at least bug 1109873, which seems to have changed code for layers.
Updated•10 years ago
|
Whiteboard: gfx-noted → [mozmill}[gfx-noted]
Updated•10 years ago
|
Whiteboard: [mozmill}[gfx-noted] → [mozmill][gfx-noted]
Reporter | ||
Comment 6•10 years ago
|
||
Crashed again in /testSecurity/testSSLStatusAfterRestart.js|testSSLStatusAfterRestart.js
(first time was in /testAwesomeBar/testEscapeAutocomplete.js|testEscapeAutocomplete.js), win 7 x86.
Report: http://mozmill-daily.blargon7.com/#/remote/report/569158a0e5d1513e2dbac88f3360f4bc
Reporter | ||
Updated•10 years ago
|
status-firefox38:
--- → affected
Comment 7•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #4)
> All crashes so far on crashstats were happening under Windows 7. There are
> 45 crashes in the last 7 days, but none before. So this might be a new
> regression. The first buildid for this crash is 20150109030224.
This sounds very similar to bug 1120252 and probably has the same root cause. It will be interesting to see if the patch that I landed on that bug fixes this or if it's just a band-aid that prevents some of the symptoms.
(In reply to Henrik Skupin (:whimboo) from comment #5)
> Maybe related to one of the changes landed between Jan 8th and 9th?
>
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=70de2960aa87&tochange=b3f84cf78dc2
>
> There is at least bug 1109873, which seems to have changed code for layers.
I would consider it unlikely that that bug causes this though.
Comment 8•10 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)
> (In reply to Henrik Skupin (:whimboo) from comment #5)
> > Maybe related to one of the changes landed between Jan 8th and 9th?
> >
> > https://hg.mozilla.org/mozilla-central/
> > pushloghtml?fromchange=70de2960aa87&tochange=b3f84cf78dc2
In fact given this range and considering it's windows only, I'm going to pin it on bug 1107718.
Blocks: 1107718
Flags: needinfo?(bas)
Updated•10 years ago
|
Comment 9•10 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)
> It will be interesting to see if the patch that I landed on that bug
> fixes this or if it's just a band-aid that prevents some of the symptoms.
There are more crashes on the latest nightly which includes my fix. See e.g. https://crash-stats.mozilla.com/report/index/fa07958a-3091-49e6-8f58-1fb2c2150113
Comment 10•10 years ago
|
||
Forgot Bas is on PTO. CC'ing milan for FYI since this seems like a potentially bad regression.
Comment 11•10 years ago
|
||
We did expect bug 1107718 to consolidate some crashes, in that a long tail of random signature crashes may all end up together now. This could be it. Did the overall number of crashes increase, do we have ways of tracking that?
Or, it could be a regression from bug 1107718 or something else, I just wanted to raise the chance that we're just seeing things better now.
Comment 12•10 years ago
|
||
That might be the case. CC'ing kairo in case he can help us analyze the crash data more in this respect.
I'm still thinking that the landing did regress something, because the crash observed in bug 1120252 (which I believe to also have the same root cause) was a very high-volume topcrash on windows. I would be surprised if that was the result of long-tail crashes all ending up in the same bucket. The stack there also seemed to point to some sort of corruption in the CompositorParent data structures.
nical, does your analysis in comment 2 sound like it might conceivably be expected behaviour as a result of bug 1107718?
Comment 13•10 years ago
|
||
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozalloc_abort%28char+const%2A+const%29+%7C+NS_DebugBreak+%7C+mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3ADebugAbort%28char+const%2A%2C+int%2C+char+const%2A%2C+char+const%2A%2C+bool%29+%7C+mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3A%7EMessageChannel%28%29+%7C+mozilla%3A%3Alayers%3A%3ACompositorParent%3A%3ARelease%28%29 should give insight on stats and links to more reports.
Comment 14•10 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> nical, does your analysis in comment 2 sound like it might conceivably be
> expected behaviour as a result of bug 1107718?
It is definitely possible: in the patch in that bug, there is a code path where we null out nsWindow's LayerManager, which could cause what I said in comment 2.
Comment 15•10 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #14)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> > nical, does your analysis in comment 2 sound like it might conceivably be
> > expected behaviour as a result of bug 1107718?
>
> It is definitely possible: in the patch in that bug, there is a code path
> where we null out nsWindow's LayerManager, which could cause what I said in
> comment 2.
This all sounds like a very likely scenario, and similarly I do believe this crash will have come in that place of another amount of (equal or larger) other crashes.
Flags: needinfo?(bas)
Comment 16•10 years ago
|
||
Just spotted bp-3400fc0f-57f5-4e24-bc95-b0b972150130 among the crash galore due to bug 1120331 I had earlier...
Report ID Date Submitted
bp-34263714-7508-4f34-bab9-e7ff52150202
2/2/2015 11:58 AM
bp-2fc68278-72ce-4533-84cf-7dc7d2150202
2/2/2015 11:57 AM
bp-3dc956fb-111b-428d-9626-16e352150202
2/2/2015 11:53 AM
bp-01144341-eda7-43c7-90c5-fb6d12150202
2/2/2015 11:53 AM
bp-3670c0f3-a365-4bec-8973-223fe2150202
2/2/2015 11:53 AM
bp-e1477fe0-2775-4979-bc17-d2eb62150202
2/2/2015 8:22 AM
bp-26e4454d-d9bd-4ce3-88b0-a296a2150202
2/2/2015 8:22 AM
bp-3400fc0f-57f5-4e24-bc95-b0b972150130
1/30/2015 3:48 PM
bp-5758085c-7936-473b-bc93-ed7aa2150130
1/30/2015 3:02 PM
bp-b6e80d31-2a7f-4e8b-a394-6c4eb2150130
1/30/2015 1:30 PM
Comment 17•10 years ago
|
||
Another from Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0 ID:20150205030205 CSet: 34a66aaaca81
Report ID Date Submitted
bp-00d2345c-57a1-4751-97fc-746972150205
2/5/2015 11:56 AM
Updated•9 years ago
|
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) | mozilla::ipc::MessageChannel::~MessageChannel() | mozilla::layers::CompositorParent::Release()] → [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) | mozilla::ipc::MessageChannel::~MessageChannel() | mozilla::layers::CompositorParent::Release()]
[@ mozallo…
Comment 18•9 years ago
|
||
This signature is fairly high on the Firefox 47.0a2 Top Crash list, e.g.
bp-91de2290-03db-46a2-a97d-d03892160318
It's a MOZ_RELEASE_ASSERT now:
http://hg.mozilla.org/releases/mozilla-aurora/annotate/bf6d2a4b1031/ipc/glue/MessageChannel.cpp#l522
Crash Signature: mozilla::layers::CompositorParent::Release] → mozilla::layers::CompositorParent::Release]
[@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::layers::PCompositorParent::~PCompositorParent ]
Mason, can you add GetLastError or whatever is appropriate here, so that we know what the error was in play when we hit this assert? (Even if the error code is device specific, it may shine some light on this.)
Flags: needinfo?(mchang)
Comment 20•9 years ago
|
||
Flags: needinfo?(mchang)
Attachment #8738016 -
Flags: review?(milan)
Updated•9 years ago
|
Attachment #8738016 -
Flags: review?(milan) → review+
Comment 21•9 years ago
|
||
Comment 22•9 years ago
|
||
Comment 23•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment 24•9 years ago
|
||
Whoops, leave open since the patch was just logging to see what's going on. The patch didn't actually fix the crash.
Hello Mason, this is a top-crash on Fx47 (Beta). It jumped up 18 spots and is #10 right now. Have we been able to use the additional logging to root cause this? The recent up spike is concerning and I am wondering if there a patch in the works that can we uplift to Beta47. Thanks!
Flags: needinfo?(mchang)
I misspoke, this jumped up 57(!!) spots and is at #12 atm.
Comment 27•9 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #25)
> Hello Mason, this is a top-crash on Fx47 (Beta). It jumped up 18 spots and
> is #10 right now. Have we been able to use the additional logging to root
> cause this? The recent up spike is concerning and I am wondering if there a
> patch in the works that can we uplift to Beta47. Thanks!
I haven't seen this crash on 48 as this was the only place where we added logging. All the reports are only on 47. I wonder if this was fixed in 48?
Flags: needinfo?(mchang)
(In reply to Mason Chang [:mchang] from comment #27)
>
> I haven't seen this crash on 48 as this was the only place where we added
> logging. All the reports are only on 47. I wonder if this was fixed in 48?
If that is the case, would you guys have a clue what fix from 48 can be cherry-picked and uplifted to 47?
Flags: needinfo?(milan)
Flags: needinfo?(mchang)
Nothing fixed it, I'd say, here's a 48 crash :)
https://crash-stats.mozilla.com/report/index/cdfc19a0-d9eb-4614-b4e9-9d1842160429
Error 6 is ERROR_INVALID_HANDLE. We could reset the mHandle to nullptr, just in case we're trying to close the channel multiple times? Should be safe enough, and we can see if it does anything to the crash rate.
Flags: needinfo?(milan)
Updated•9 years ago
|
Assignee: nobody → mchang
Updated•9 years ago
|
Crash Signature: mozilla::layers::CompositorParent::Release]
[@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::layers::PCompositorParent::~PCompositorParent ] → mozilla::layers::CompositorParent::Release]
[@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::layers::PCompositorParent::~PCompositorParent ]
[@ mozilla::ipc::MessageChannel::~MessageChannel | mozilla::layers::PCompositorBridgeParent::~PCompo…
Flags: needinfo?(mchang)
Comment 30•9 years ago
|
||
Just a check to see if we still have an mEvent. If we don't, don't try to close again. It's odd, I'm not sure if we actually share the handle with anyone else though since I don't see it referenced anywhere. Might be closed on the other side first?
Attachment #8747861 -
Flags: review?(milan)
Comment 31•9 years ago
|
||
Attachment #8747861 -
Attachment is obsolete: true
Attachment #8747861 -
Flags: review?(milan)
Attachment #8747883 -
Flags: review?(milan)
Updated•9 years ago
|
Attachment #8747883 -
Flags: review?(milan) → review+
Comment 32•9 years ago
|
||
Comment 33•9 years ago
|
||
bugherder |
Comment 34•8 years ago
|
||
This still seems to be happening[1], and I'm not really sure why we would get an invalid handle. I wonder if this is happening because another function that's using mEvent is already failing[2]
[1] https://crash-stats.mozilla.com/report/index/1eb8a3b6-dcae-4a16-83f1-27d0f2160509
[2] "It is usually not necessary to call CloseHandle if a function that uses a handle fails with ERROR_INVALID_HANDLE, because this error usually indicates that the handle is already invalidated." - https://msdn.microsoft.com/en-us/library/windows/desktop/ms724211(v=vs.85).aspx
Should we just ignore an error if we get this error then?
Updated•8 years ago
|
Comment 36•8 years ago
|
||
This originally landed in 47, bug 1237458, which changed the asserts from debug asserts to release asserts. Do you know if it's ok to ignore this error Bill? The assert in question is at [1], which is failing with an invalid handle.
[1] https://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#517
Flags: needinfo?(wmccloskey)
Comment 37•8 years ago
|
||
Log some uses of mEvent in MessageChannel and ensure that they succeed. Then we can see if some other function is setting the handle to an invalid state.
Attachment #8750476 -
Flags: review?(milan)
Comment on attachment 8750476 [details] [diff] [review]
Log uses of mEvent
Review of attachment 8750476 [details] [diff] [review]:
-----------------------------------------------------------------
It's useful to have different messages in each of the gfxDevCrash, so that we can tell them apart once this is in beta/release where there is no crash, and we can't tell from the stack which one of them triggered.
Attachment #8750476 -
Flags: review?(milan) → review+
Comment 39•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #38)
> Comment on attachment 8750476 [details] [diff] [review]
> Log uses of mEvent
>
> Review of attachment 8750476 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> It's useful to have different messages in each of the gfxDevCrash, so that
> we can tell them apart once this is in beta/release where there is no crash,
> and we can't tell from the stack which one of them triggered.
Updated the patch to do that. Try looks ok I think - https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b4ca9b63e8b&selectedJob=20653975 . Lots of orange on inbound as well, so I think this is ok.
Comment 40•8 years ago
|
||
Comment 41•8 years ago
|
||
bugherder |
Flags: needinfo?(wmccloskey)
No crashes with the extra information added in comment 41.
Comment 43•8 years ago
|
||
Interestingly, 96.08% of the reports with signature 'mozilla::ipc::MessageChannel::~MessageChannel | mozilla::layers::PCompositorBridgeParent::~PCompositorBridgeParent' have a Spanish locale
(either es-MX or es-ES).
Comment 44•8 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #43)
> Interestingly, 96.08% of the reports with signature
> 'mozilla::ipc::MessageChannel::~MessageChannel |
> mozilla::layers::PCompositorBridgeParent::~PCompositorBridgeParent' have a
> Spanish locale
> (either es-MX or es-ES).
New report about this issue on SUMO: https://support.mozilla.org/en-US/questions/1144682
And I guess the reporter is probably Spanish.
Comment 45•8 years ago
|
||
(In reply to Loic from comment #44)
> (In reply to Marco Castelluccio [:marco] from comment #43)
> > Interestingly, 96.08% of the reports with signature
> > 'mozilla::ipc::MessageChannel::~MessageChannel |
> > mozilla::layers::PCompositorBridgeParent::~PCompositorBridgeParent' have a
> > Spanish locale
> > (either es-MX or es-ES).
>
> New report about this issue on SUMO:
> https://support.mozilla.org/en-US/questions/1144682
>
> And I guess the reporter is probably Spanish.
The locale of the Firefox build he's using is 'en-US', but his name looks Spanish. Perhaps there's some Spanish website that is causing this to happen.
Looking at the URLs, the 'gob.mx' sites are pretty frequent.
Updated•8 years ago
|
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox51:
--- → affected
status-firefox52:
--- → ?
Comment 46•8 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #45)
> The locale of the Firefox build he's using is 'en-US', but his name looks
> Spanish. Perhaps there's some Spanish website that is causing this to happen.
He replied he is using es-MX locale.
Comment 47•8 years ago
|
||
(In reply to Loic from comment #46)
> (In reply to Marco Castelluccio [:marco] from comment #45)
> > The locale of the Firefox build he's using is 'en-US', but his name looks
> > Spanish. Perhaps there's some Spanish website that is causing this to happen.
>
> He replied he is using es-MX locale.
Interesting, his crash reports are with an en-US Firefox build (see 'useragent_locale' in the Metadata tab).
Updated•7 years ago
|
Assignee: mchang → nobody
Updated•7 years ago
|
Priority: -- → P3
Comment 48•6 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:davidb, maybe it's time to close this bug?
Flags: needinfo?(dbolter)
Comment 49•6 years ago
|
||
Yeah I think so.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 6 years ago
Flags: needinfo?(dbolter)
Resolution: --- → INCOMPLETE
Updated•6 years ago
|
Keywords: leave-open
Updated•5 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•