Closed Bug 805353 Opened 12 years ago Closed 12 years ago

crash in Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeFreeDirectBuffer

Categories

(Firefox for Android Graveyard :: General, defect)

18 Branch
ARM
Android
defect
Not set
critical

Tracking

(firefox17 unaffected, firefox18- affected, firefox19+ fixed, firefox20 fixed)

RESOLVED FIXED
Firefox 20
Tracking Status
firefox17 --- unaffected
firefox18 - affected
firefox19 + fixed
firefox20 --- fixed

People

(Reporter: scoobidiver, Assigned: kats)

References

Details

(4 keywords, Whiteboard: [native-crash])

Crash Data

Attachments

(1 file)

It has been hit by 6 users in the latest build and two users in Aurora (18.0a2/20121020 and 18.0a2/20121013). Let's assume it's a regression in 19.0. The regression range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=48502b61a63e&tochange=93cc1ee94291 Signature huge_dalloc | __wrap_free | Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeFreeDirectBuffer More Reports Search UUID 2c8dc915-0b9c-43cf-8fe9-530ed2121025 Date Processed 2012-10-25 02:05:44 Uptime 2 Last Crash 1.0 weeks before submission Install Age 5.6 hours since version was first installed. Install Time 2012-10-24 20:32:09 Product FennecAndroid Version 19.0a1 Build ID 20121024030643 Release Channel nightly OS Android OS Version 0.0.0 Linux 3.0.15-833154-user #1 SMP PREEMPT Wed Jul 4 15:47:23 KST 2012 armv7l samsung/m0ub/m0:4.0.4/IMM76D/I9300UBBLG2:user/release-keys Build Architecture arm Build Architecture Info Crash Reason SIGSEGV Crash Address 0x10 App Notes AdapterDescription: 'ARM -- Mali-400 MP -- OpenGL ES 2.0 -- Model: GT-I9300, Product: m0ub, Manufacturer: samsung, Hardware: smdk4x12' EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ samsung GT-I9300 samsung/m0ub/m0:4.0.4/IMM76D/I9300UBBLG2:user/release-keys EMCheckCompatibility True Adapter Vendor ID ARM Adapter Device ID Mali-400 MP Device samsung GT-I9300 Android API Version 15 (REL) Android CPU ABI armeabi-v7a Frame Module Signature Source 0 libmozglue.so huge_dalloc jemalloc.c:2293 1 libmozglue.so __wrap_free jemalloc.c:6577 2 libmozglue.so Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeFreeDirectBuffer nsGeckoUtils.cpp:52 3 libdvm.so libdvm.so@0x1ed72 4 dalvik-heap (deleted) dalvik-heap @0xf1ff16 5 libdvm.so libdvm.so@0x5902d 6 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x14f1cc 7 libmozglue.so libmozglue.so@0x108fb 8 libdvm.so libdvm.so@0xb3f9a 9 libdvm.so libdvm.so@0x343b2 10 dalvik-heap (deleted) dalvik-heap @0x100b8f6 11 dalvik-heap (deleted) dalvik-heap @0x100b8f6 12 dalvik-heap (deleted) dalvik-heap @0x100bb4e 13 libdvm.so libdvm.so@0x9521e ... More reports at: https://crash-stats.mozilla.com/report/list?signature=huge_dalloc+|+__wrap_free+|+Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeFreeDirectBuffer
I got this too. Pretty sure it's a recent regression. Happened on what seemed like startup, IIRC.
It stopped in 19.0a1/20121025. The working range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=93cc1ee94291&tochange=5c82f5a5e90d It might have been caused by the bustage of bug 803013 or bug 795259.
If it's hit in Aurora, sounds like an existing bug that may be more readily exposed due to progressive rendering being turned on (bug 795259). I wonder if it's related to this crash I got on try? https://tbpl.mozilla.org/php/getParsedLog.php?id=16466934&tree=Try&full=1#error0 This was an intermittent orange that I saw both with and without tiles, so I ignored it. Perhaps it's more prevalent now...
Cc'ing some people in case they have any ideas - I'd very much like bug 795259 to stick, so will try to get this fixed quickly.
There's a chance this could be a race condition - freeBuffer in BufferedCairoImage isn't synchronised. I'd like to have a better stack rather than just making that assumption, but it might be worth checking in a patch for this (freeBuffer shouldn't be called regularly anyway, so I don't think it's a performance issue to synchronise freeing) and seeing if the crash goes away.
In fact, I wonder if this is caused by disabling the screenshot layer rather than anything to do with progressive tiles - perhaps this is what happens if the pref return happens while a screenshot message is in-flight and the free is called in notifyScreenshot method in ScreenshotHandler? There's definitely suspicious looking code here.
I don't think this is related to tiling stuff at all. If anything I think it might be related to the thumbnail changes that have been happening recently (i.e. possibly related to bug 787765 and bug 803687). Also, the crash appears to be happening inside jemalloc, so I would like somebody familiar with that code to look at it and see why it might be happening. AFAIK a crash inside jemalloc indicates a bug inside jemalloc regardless of whatever else might be happening. CC'ing :jlebar and :glandium.
Blocks: 438871
Whiteboard: [native-crash][startupcrash] → [native-crash][startupcrash][orange]
(In reply to Kartikaya Gupta (:kats) from comment #7) > AFAIK a crash inside jemalloc indicates a bug inside jemalloc > regardless of whatever else might be happening. CC'ing :jlebar and :glandium. A crash inside jemalloc is usually allocator mismatch. (trying to free a pointer that was malloc'ed by the system malloc)
That seems unlikely in this case because the pointer to this memory is stashed inside a java ByteBuffer object and can't just be passed around randomly. Could the crash happen because of a double free? That seems like a more likely possibility from the way the code is used.
Double-free is a possibility too.
Looks likely to be freeing a not-alloc'ed pointer (double-free, alloc-mismatch, freeing an internal pointer, etc). You could test this by taking the following code in memory/mozjemalloc/jemalloc.c node = extent_tree_ad_search(&huge, &key); assert(node != NULL); assert(node->addr == ptr); extent_tree_ad_remove(&huge, node); and s/assert/RELEASE_ASSERT/. (In fact, I'd probably take that as a patch to check in.) If one of the asserts gets hit (probably the first one), then jemalloc doesn't know about this pointer you're trying to free. If OTOH neither assertion fires, then it's possible the heap is corrupted.
I hit this on startup. I have not been able to reproduce. * Current nightly * Had an article in reader mode * used some other apps, Firefox was in the background likely task killed by the os * started up Firefox and pressed the new tab button
This may be related to bug 805355. The preference to disable the screenshot exposes some nasty race conditions. Shouldn't we get the Java stacktrace from these crashes?
(In reply to Benoit Girard (:BenWa) from comment #14) > Shouldn't we get the Java stacktrace from these crashes? Only if it throws a Java exception, as the exception handler is what annotates the stack trace, AFAIK. So this could mean that we might not have a Java exception thrown in those cases - or that the mechanism for sending the stack traces is broken.
This may be relatedd to bug 722166 which boasts a huge collection of various android test-crashes-on-startup logs, typically SIGSEGVs with stack traces showing little more than libc, libskia, and libdvm.
tracking-fennec: ? → 18+
Assignee: nobody → snorp
Let's track for FF18 once it's clear that version is affected (not clear at this point).
Assignee: snorp → bugmail.mozilla
Whiteboard: [native-crash][startupcrash][orange] → [native-crash][startupcrash]
It's now a low volume crash across builds. It stopped spiking after 19.0a1/20121118. The working range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b959971b8219&tochange=4fddb9923ef0
tracking-fennec: 18+ → ?
Keywords: topcrash
Version: Firefox 19 → Firefox 18
If I'm reading the stats right, this hasn't happened on nightly since 817067 landed, which is promising. Will leave open a few more days to confirm.
untracking, given comment 18
tracking-fennec: ? → ---
Hm. There's still one crash after bug 817067 landed (bp-847e820b-6c56-45c5-90e5-8c7af2121206). I'll give it some more time...
Attached patch Synchronize freeBuffer (deleted) — Splinter Review
The crash volume has definitely dropped; I suspect that the one crash in comment 21 is coming from BufferedCairoImage rather than the thumbnail code; it appears that there is a race condition there as well as the freeBuffer() function could be called on two different thread concurrently (on the compositor thread via PluginLayer.performUpdates and on the UI thread via destroy). Patch attached.
Attachment #690369 - Flags: review?(snorp)
Attachment #690369 - Flags: review?(snorp) → review+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 20
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
There are no crashes after 20.0a1/20121214.
Almost certainly fixed by bug 817134 then.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Depends on: 817134
Resolution: --- → FIXED
Bug 817134 was uplifted to Branch 19.0.
It's #10 top crasher in 19.0b1.
Keywords: topcrash
Whiteboard: [native-crash][startupcrash] → [native-crash]
(In reply to Scoobidiver from comment #29) > It's #10 top crasher in 19.0b1. I guess you meant to nom it for 19? hence going ahead and tracking for that.
(In reply to bhavana bajaj [:bajaj] from comment #30) > I guess you meant to nom it for 19? Yes. (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27) > Almost certainly fixed by bug 817134 then. No because its uplift to Beta hasn't fixed it in 19.0b1.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27) > Almost certainly fixed by bug 817134 then. Kats - what next steps do you recommend here? Could another fix around 20.0a1/20121214 have resolved this issue and require uplift?
If I'm reading the crash-stats right, this crash is no longer happening in 19.0b2, although it was happening in 19.0b1. This makes sense since looking at the graph of mozilla-beta [1] the uplift of bug 817134 didn't go into 19.0b1 but did go into 19.0b2. So I believe this is fixed in 19+ and no further action is required. [1] https://hg.mozilla.org/releases/mozilla-beta/graph/301d54b3c444
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #33) > If I'm reading the crash-stats right, this crash is no longer happening in > 19.0b2, although it was happening in 19.0b1. This makes sense since looking > at the graph of mozilla-beta [1] the uplift of bug 817134 didn't go into > 19.0b1 but did go into 19.0b2. You're right. I haven't looked at the right view in Mercurial. > So I believe this is fixed in 19+ and no further action is required. Definitively fixed in 19.0 Beta 2.
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: