Closed
Bug 805353
Opened 12 years ago
Closed 12 years ago
crash in Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeFreeDirectBuffer
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox17 unaffected, firefox18- affected, firefox19+ fixed, firefox20 fixed)
RESOLVED
FIXED
Firefox 20
People
(Reporter: scoobidiver, Assigned: kats)
References
Details
(4 keywords, Whiteboard: [native-crash])
Crash Data
Attachments
(1 file)
(deleted),
patch
|
snorp
:
review+
|
Details | Diff | Splinter Review |
It has been hit by 6 users in the latest build and two users in Aurora (18.0a2/20121020 and 18.0a2/20121013). Let's assume it's a regression in 19.0. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=48502b61a63e&tochange=93cc1ee94291
Signature huge_dalloc | __wrap_free | Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeFreeDirectBuffer More Reports Search
UUID 2c8dc915-0b9c-43cf-8fe9-530ed2121025
Date Processed 2012-10-25 02:05:44
Uptime 2
Last Crash 1.0 weeks before submission
Install Age 5.6 hours since version was first installed.
Install Time 2012-10-24 20:32:09
Product FennecAndroid
Version 19.0a1
Build ID 20121024030643
Release Channel nightly
OS Android
OS Version 0.0.0 Linux 3.0.15-833154-user #1 SMP PREEMPT Wed Jul 4 15:47:23 KST 2012 armv7l samsung/m0ub/m0:4.0.4/IMM76D/I9300UBBLG2:user/release-keys
Build Architecture arm
Build Architecture Info
Crash Reason SIGSEGV
Crash Address 0x10
App Notes
AdapterDescription: 'ARM -- Mali-400 MP -- OpenGL ES 2.0 -- Model: GT-I9300, Product: m0ub, Manufacturer: samsung, Hardware: smdk4x12'
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+
samsung GT-I9300
samsung/m0ub/m0:4.0.4/IMM76D/I9300UBBLG2:user/release-keys
EMCheckCompatibility True
Adapter Vendor ID ARM
Adapter Device ID Mali-400 MP
Device samsung GT-I9300
Android API Version 15 (REL)
Android CPU ABI armeabi-v7a
Frame Module Signature Source
0 libmozglue.so huge_dalloc jemalloc.c:2293
1 libmozglue.so __wrap_free jemalloc.c:6577
2 libmozglue.so Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeFreeDirectBuffer nsGeckoUtils.cpp:52
3 libdvm.so libdvm.so@0x1ed72
4 dalvik-heap (deleted) dalvik-heap @0xf1ff16
5 libdvm.so libdvm.so@0x5902d
6 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x14f1cc
7 libmozglue.so libmozglue.so@0x108fb
8 libdvm.so libdvm.so@0xb3f9a
9 libdvm.so libdvm.so@0x343b2
10 dalvik-heap (deleted) dalvik-heap @0x100b8f6
11 dalvik-heap (deleted) dalvik-heap @0x100b8f6
12 dalvik-heap (deleted) dalvik-heap @0x100bb4e
13 libdvm.so libdvm.so@0x9521e
...
More reports at:
https://crash-stats.mozilla.com/report/list?signature=huge_dalloc+|+__wrap_free+|+Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeFreeDirectBuffer
Comment 1•12 years ago
|
||
I got this too. Pretty sure it's a recent regression. Happened on what seemed like startup, IIRC.
Reporter | ||
Comment 2•12 years ago
|
||
It stopped in 19.0a1/20121025. The working range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=93cc1ee94291&tochange=5c82f5a5e90d
It might have been caused by the bustage of bug 803013 or bug 795259.
Comment 3•12 years ago
|
||
If it's hit in Aurora, sounds like an existing bug that may be more readily exposed due to progressive rendering being turned on (bug 795259).
I wonder if it's related to this crash I got on try? https://tbpl.mozilla.org/php/getParsedLog.php?id=16466934&tree=Try&full=1#error0
This was an intermittent orange that I saw both with and without tiles, so I ignored it. Perhaps it's more prevalent now...
Comment 4•12 years ago
|
||
Cc'ing some people in case they have any ideas - I'd very much like bug 795259 to stick, so will try to get this fixed quickly.
Comment 5•12 years ago
|
||
There's a chance this could be a race condition - freeBuffer in BufferedCairoImage isn't synchronised. I'd like to have a better stack rather than just making that assumption, but it might be worth checking in a patch for this (freeBuffer shouldn't be called regularly anyway, so I don't think it's a performance issue to synchronise freeing) and seeing if the crash goes away.
Comment 6•12 years ago
|
||
In fact, I wonder if this is caused by disabling the screenshot layer rather than anything to do with progressive tiles - perhaps this is what happens if the pref return happens while a screenshot message is in-flight and the free is called in notifyScreenshot method in ScreenshotHandler?
There's definitely suspicious looking code here.
Assignee | ||
Comment 7•12 years ago
|
||
I don't think this is related to tiling stuff at all. If anything I think it might be related to the thumbnail changes that have been happening recently (i.e. possibly related to bug 787765 and bug 803687).
Also, the crash appears to be happening inside jemalloc, so I would like somebody familiar with that code to look at it and see why it might be happening. AFAIK a crash inside jemalloc indicates a bug inside jemalloc regardless of whatever else might be happening. CC'ing :jlebar and :glandium.
Comment 8•12 years ago
|
||
Blocks: 438871
Whiteboard: [native-crash][startupcrash] → [native-crash][startupcrash][orange]
Comment 9•12 years ago
|
||
(In reply to Kartikaya Gupta (:kats) from comment #7)
> AFAIK a crash inside jemalloc indicates a bug inside jemalloc
> regardless of whatever else might be happening. CC'ing :jlebar and :glandium.
A crash inside jemalloc is usually allocator mismatch. (trying to free a pointer that was malloc'ed by the system malloc)
Assignee | ||
Comment 10•12 years ago
|
||
That seems unlikely in this case because the pointer to this memory is stashed inside a java ByteBuffer object and can't just be passed around randomly. Could the crash happen because of a double free? That seems like a more likely possibility from the way the code is used.
Comment 11•12 years ago
|
||
Double-free is a possibility too.
Comment 12•12 years ago
|
||
Looks likely to be freeing a not-alloc'ed pointer (double-free, alloc-mismatch, freeing an internal pointer, etc).
You could test this by taking the following code in memory/mozjemalloc/jemalloc.c
node = extent_tree_ad_search(&huge, &key);
assert(node != NULL);
assert(node->addr == ptr);
extent_tree_ad_remove(&huge, node);
and s/assert/RELEASE_ASSERT/. (In fact, I'd probably take that as a patch to check in.) If one of the asserts gets hit (probably the first one), then jemalloc doesn't know about this pointer you're trying to free. If OTOH neither assertion fires, then it's possible the heap is corrupted.
Comment 13•12 years ago
|
||
I hit this on startup. I have not been able to reproduce.
* Current nightly
* Had an article in reader mode
* used some other apps, Firefox was in the background likely task killed by the os
* started up Firefox and pressed the new tab button
Comment 14•12 years ago
|
||
This may be related to bug 805355. The preference to disable the screenshot exposes some nasty race conditions.
Shouldn't we get the Java stacktrace from these crashes?
Comment 15•12 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #14)
> Shouldn't we get the Java stacktrace from these crashes?
Only if it throws a Java exception, as the exception handler is what annotates the stack trace, AFAIK.
So this could mean that we might not have a Java exception thrown in those cases - or that the mechanism for sending the stack traces is broken.
Comment 16•12 years ago
|
||
This may be relatedd to bug 722166 which boasts a huge collection of various android test-crashes-on-startup logs, typically SIGSEGVs with stack traces showing little more than libc, libskia, and libdvm.
Updated•12 years ago
|
tracking-fennec: ? → 18+
Updated•12 years ago
|
Assignee: nobody → snorp
Comment 17•12 years ago
|
||
Let's track for FF18 once it's clear that version is affected (not clear at this point).
Updated•12 years ago
|
status-firefox17:
--- → unaffected
status-firefox18:
--- → affected
status-firefox19:
--- → affected
Updated•12 years ago
|
Assignee: snorp → bugmail.mozilla
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [native-crash][startupcrash][orange] → [native-crash][startupcrash]
Reporter | ||
Comment 18•12 years ago
|
||
It's now a low volume crash across builds. It stopped spiking after 19.0a1/20121118. The working range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b959971b8219&tochange=4fddb9923ef0
tracking-fennec: 18+ → ?
status-firefox20:
--- → affected
Keywords: topcrash
Version: Firefox 19 → Firefox 18
Assignee | ||
Comment 19•12 years ago
|
||
If I'm reading the stats right, this hasn't happened on nightly since 817067 landed, which is promising. Will leave open a few more days to confirm.
Comment 20•12 years ago
|
||
untracking, given comment 18
tracking-fennec: ? → ---
tracking-firefox19:
? → ---
Assignee | ||
Comment 21•12 years ago
|
||
Hm. There's still one crash after bug 817067 landed (bp-847e820b-6c56-45c5-90e5-8c7af2121206). I'll give it some more time...
Assignee | ||
Comment 22•12 years ago
|
||
The crash volume has definitely dropped; I suspect that the one crash in comment 21 is coming from BufferedCairoImage rather than the thumbnail code; it appears that there is a race condition there as well as the freeBuffer() function could be called on two different thread concurrently (on the compositor thread via PluginLayer.performUpdates and on the UI thread via destroy). Patch attached.
Attachment #690369 -
Flags: review?(snorp)
Updated•12 years ago
|
Attachment #690369 -
Flags: review?(snorp) → review+
Assignee | ||
Comment 23•12 years ago
|
||
Comment 24•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 20
Reporter | ||
Updated•12 years ago
|
status-firefox20:
affected → ---
Reporter | ||
Comment 25•12 years ago
|
||
It's not fixed: bp-be57b0a8-0f42-4ec4-8edc-8885c2121212.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 26•12 years ago
|
||
There are no crashes after 20.0a1/20121214.
status-firefox20:
--- → unaffected
Assignee | ||
Comment 27•12 years ago
|
||
Almost certainly fixed by bug 817134 then.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Depends on: 817134
Resolution: --- → FIXED
Reporter | ||
Comment 28•12 years ago
|
||
Bug 817134 was uplifted to Branch 19.0.
Reporter | ||
Comment 29•12 years ago
|
||
It's #10 top crasher in 19.0b1.
tracking-firefox18:
--- → ?
Keywords: topcrash
Whiteboard: [native-crash][startupcrash] → [native-crash]
Comment 30•12 years ago
|
||
(In reply to Scoobidiver from comment #29)
> It's #10 top crasher in 19.0b1.
I guess you meant to nom it for 19? hence going ahead and tracking for that.
Updated•12 years ago
|
Reporter | ||
Comment 31•12 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #30)
> I guess you meant to nom it for 19?
Yes.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Almost certainly fixed by bug 817134 then.
No because its uplift to Beta hasn't fixed it in 19.0b1.
Comment 32•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Almost certainly fixed by bug 817134 then.
Kats - what next steps do you recommend here? Could another fix around 20.0a1/20121214 have resolved this issue and require uplift?
Assignee | ||
Comment 33•12 years ago
|
||
If I'm reading the crash-stats right, this crash is no longer happening in 19.0b2, although it was happening in 19.0b1. This makes sense since looking at the graph of mozilla-beta [1] the uplift of bug 817134 didn't go into 19.0b1 but did go into 19.0b2.
So I believe this is fixed in 19+ and no further action is required.
[1] https://hg.mozilla.org/releases/mozilla-beta/graph/301d54b3c444
Reporter | ||
Comment 34•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #33)
> If I'm reading the crash-stats right, this crash is no longer happening in
> 19.0b2, although it was happening in 19.0b1. This makes sense since looking
> at the graph of mozilla-beta [1] the uplift of bug 817134 didn't go into
> 19.0b1 but did go into 19.0b2.
You're right. I haven't looked at the right view in Mercurial.
> So I believe this is fixed in 19+ and no further action is required.
Definitively fixed in 19.0 Beta 2.
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•