Closed Bug 572011 Opened 15 years ago Closed 12 years ago

Crash @ nsDiskCacheStreamIO::FlushBufferToFile

Categories

(Core :: Networking: Cache, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla21
Tracking Status
firefox17 --- affected
firefox18 --- affected
firefox19 - ---
firefox20 + verified
firefox21 --- verified

People

(Reporter: Ricmacas, Assigned: michal)

References

Details

(Keywords: crash, topcrash, Whiteboard: [tbird crash])

Crash Data

Attachments

(1 file, 1 obsolete file)

Hello, one user intially reported this crash on SUMO Twitter 3.6.3 http://crash-stats.mozilla.com/report/index/219cdbde-e406-4f0f-8c60-00ccc2100612 But we asked the user to upgrade to 3.6.4 beta 7, and the crashes have now this signature (presented in summary): 3.6.4 http://crash-stats.mozilla.com/report/index/c21105ba-e160-4a63-92a1-b19732100613 I'll have more information when possible. All crash reports are in clean profiles.
0 ntdll.dll strstr 1 xul.dll nsDiskCacheStreamIO::FlushBufferToFile netwerk/cache/src/nsDiskCacheStreams.cpp:731 2 xul.dll nsDiskCacheStreamIO::Write netwerk/cache/src/nsDiskCacheStreams.cpp:587 3 xul.dll nsDiskCacheOutputStream::Write netwerk/cache/src/nsDiskCacheStreams.cpp:257 4 xul.dll nsCacheEntryDescriptor::nsOutputStreamWrapper::Write netwerk/cache/src/nsCacheEntryDescriptor.cpp:647 5 xul.dll nsInputStreamTee::TeeSegment xpcom/io/nsInputStreamTee.cpp:80 6 xul.dll nsInputStreamTee::WriteSegmentFun xpcom/io/nsInputStreamTee.cpp:109 7 xul.dll nsPipeInputStream::ReadSegments xpcom/io/nsPipe3.cpp:799 8 xul.dll nsInputStreamTee::ReadSegments xpcom/io/nsInputStreamTee.cpp:156 9 xul.dll nsStreamLoader::OnDataAvailable netwerk/base/src/nsStreamLoader.cpp:180 10 xul.dll nsStreamListenerTee::OnDataAvailable netwerk/base/src/nsStreamListenerTee.cpp:108 11 xul.dll nsHttpChannel::OnDataAvailable netwerk/protocol/http/src/nsHttpChannel.cpp:5388 12 xul.dll nsInputStreamPump::OnStateTransfer netwerk/base/src/nsInputStreamPump.cpp:508 13 xul.dll nsInputStreamPump::OnInputStreamReady netwerk/base/src/nsInputStreamPump.cpp:398 14 xul.dll nsInputStreamReadyEvent::Run xpcom/io/nsStreamUtils.cpp:191 15 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:527 16 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:142
Keywords: crash
Summary: Crash [@ strstr | nsDiskCacheStreamIO::FlushBufferToFile() ] → Crash [@ strstr][@ nsDiskCacheStreamIO::FlushBufferToFile() ]
Not a result of locked or corrupted files, because clean install didn't fix this issue.
(In reply to comment #2) > Not a result of locked or corrupted files, because clean install didn't fix > this issue. What about a new Firefox profile and manually copying important data over?
Tobbi, > All crash reports are in clean profiles. Besides, if you checked the crash reports, you could clearly see they are clean. This user has clean profile, in a clean install.
Component: General → Networking: Cache
Product: Firefox → Core
QA Contact: general → networking.cache
Version: 3.6 Branch → 1.9.2 Branch
the 3.6.4 and 3.6.4 crash don't look like the share much common code on the stacks. they could be two entirely different problems, but tickled by some underlying problem(s). The nsDiskCacheStreamIO::FlushBufferToFile might have 6 different associated signatures nsDiskCacheStreamIO::FlushBufferToFile signature breakdown signature distribution 6 signature list 475 strstr | nsDiskCacheStreamIO::FlushBufferToFile() 16 strstr | @0x1318d523 | nsDiskCacheStreamIO::FlushBufferToFile() 9 nsDiskCacheStreamIO::FlushBufferToFile() 1 strstr | @0x1310cfd3 | nsDiskCacheStreamIO::FlushBufferToFile() 1 strlen | @0x7d50d53 | nsDiskCacheStreamIO::FlushBufferToFile() 1 strlen | @0x4a40d53 | nsDiskCacheStreamIO::FlushBufferToFile() Correlation to startup or time of session 503 total crashes for nsDiskCacheStreamIO::FlushBufferToFile on 20100614-crashdata.csv 138 startup crashes inside 30 sec. 228 startup crashes inside 3 min. 141 repeated crashes inside 3 min. of last crash but they all seem to be happening on winXP os breakdown nsDiskCacheStreamIO::FlushBufferToFileTotal 9 Win5.1 1.00 Win6.0 0.00 Win6.1 0.00 and the problem stretches back across many releases. checking --- nsDiskCacheStreamIO::FlushBufferToFile 20100614-crashdata.csv found in: 3.6.3 3.5.9 3.6.4 3.0.19 3.6 3.6.2 3.0.4 3.5.1 3.0 3.5.7 3.5.5 3.5.3 3.5.2 3.5 3.0.7 3.0.11 3.0.10 3.0.1 release total-crashes nsDiskCacheStreamIO::FlushBufferToFile crashes pct. all 408812 503 0.00123039 3.6.3 270109 369 0.00136612 3.5.9 33612 52 0.00154707 3.6.4 50079 29 0.000579085 3.0.19 12129 16 0.00131915 3.6 11172 10 0.000895095 3.6.2 3238 5 0.00154416 3.0.4 541 5 0.00924214 3.5.1 439 3 0.00683371 3.0 1120 3 0.00267857 ... If the profiles are clean, one thing I'd suggest checking is for disk corruption on the systems where people are hitting this, and we should try to isolate to possible file system problems.
Well, both crash [@ strstr], the rest is unrelated...
User reports crashes in plugincontainer.exe in new profile and clean install, more info in a sec.
User claims he did a disk check and everything's fine.
He has scanned his system using Avira, Malware Bytes and Super Anti-Spyware
seen also in thunderbird - example bp-f1e0c3ae-f635-42db-9372-483252100625. some thunderbird also has stack variations: strlen | @0x1910ae8 | nsDiskCacheStreamIO::FlushBufferToFile() strlen | @0x2210ae8 | nsDiskCacheStreamIO::FlushBufferToFile() strlen | @0x2ae0ae8 | nsDiskCacheStreamIO::FlushBufferToFile() Odd, no thunderbird crashes with strlstr... sig prior to calendar date Jun 07, 2010 13:18 for any version, even though there is a 3.0 crash bp-b0d17e78-09c7-480d-9880-db8c12100607 (and it's the only crash for that user - no other versions appear for that OS) mod summary - socorro prefers strstr | nsDiskCacheStreamIO::FlushBufferToFile()
Summary: Crash [@ strstr][@ nsDiskCacheStreamIO::FlushBufferToFile() ] → Crash [@ strstr][@ nsDiskCacheStreamIO::FlushBufferToFile() ] [@ strstr | nsDiskCacheStreamIO::FlushBufferToFile()]
Crash Signature: [@ strstr] [@ nsDiskCacheStreamIO::FlushBufferToFile() ] [@ strstr | nsDiskCacheStreamIO::FlushBufferToFile()]
Crash Signature: [@ strstr] [@ nsDiskCacheStreamIO::FlushBufferToFile() ] [@ strstr | nsDiskCacheStreamIO::FlushBufferToFile()] → [@ strstr] [@ nsDiskCacheStreamIO::FlushBufferToFile() ] [@ strstr | nsDiskCacheStreamIO::FlushBufferToFile()]
It's #26 top browser crasher in 10.0b6 while only #137 in 9.0.1.
It's #61 top browser crasher in 10.0.1, #28 in 11.0b2, and #39 in 12.0a2. It might be related to bug 727097. More reports at: https://crash-stats.mozilla.com/report/list?signature=nsDiskCacheStreamIO%3A%3AFlushBufferToFile%28%29
Keywords: topcrash
Summary: Crash [@ strstr][@ nsDiskCacheStreamIO::FlushBufferToFile() ] [@ strstr | nsDiskCacheStreamIO::FlushBufferToFile()] → Crash @ nsDiskCacheStreamIO::FlushBufferToFile
nsDiskCacheStreamIO::FlushBufferToFile() is rare for thunderbird - roughly a dozen in a week
Whiteboard: [tbird crash]
It's now #4 top browser crasher in 10.0.2 and #6 in 11.0. It is correlated to Yandex Toolbar and Спутник @Mail.Ru: 38% (345/904) vs. 7% (4472/65399) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D} (Спутник @Mail.Ru) 38% (348/904) vs. 8% (5036/65399) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495) It is also correlated to Windows Based Wireless Networks: 34% (309/904) vs. 6% (3695/65399) wlanapi.dll
This stays constantly at high ranks. #7 in yesterday's data for 12.* Any chance someone can find out what's going on there?
Now it's at #5 on 12.
And this is #3 on 13 Beta in yesterday's data. Any chance we can do something about it?
Version: 1.9.2 Branch → Trunk
Michal - Can you take a look at this?
Assignee: nobody → michal.novotny
Ted suggested that it might be a problem with LSP's based on the stack trace.
It's possible that these are not all the same crash, but I grabbed the minidump from https://crash-stats.mozilla.com/report/index/90b4f9bf-b59b-4671-af96-d61122120613 . WinDBG gives me the exact same stack as Breakpad. Running minidump_stackwalk locally shows me: Thread 25 (crashed) 0 0x16c26f eip = 0x0016c26f esp = 0x084ffde0 ebp = 0x084ffe10 ebx = 0xffe92500 esi = 0x00176b01 edi = 0x00176b00 eax = 0x00000000 ecx = 0x07a6e000 edx = 0xffffffff efl = 0x00010246 Found by: given as instruction pointer in context 1 0x15d30c eip = 0x0015d30d esp = 0x084ffe18 ebp = 0x084ffe30 Found by: previous frame's frame pointer 2 xul.dll!nsDiskCacheStreamIO::FlushBufferToFile() [nsDiskCacheStreams.cpp:2b6 43ea8edf9 : 791 + 0xe] eip = 0x012a1be6 esp = 0x084ffe38 ebp = 0x084ffe5c Found by: previous frame's frame pointer So it's not as though Breakpad is resorting to stack scanning here, it's using frame pointers. I also ran another tool I have that scans the stack looking for anything that looks like a return address (http://hg.mozilla.org/users/tmielczarek_mozilla.com/dump-lookup/), and it doesn't show anything more useful on the stack above FlushBufferToFile. I also looked at the list of modules that we ever loaded in the process, and there didn't seem to be anything at the address where we're crashing.
This is #13 top crash on the list for 13.0.1.
This seems to be spiking again. I'm pretty sure it's connected to some third-party software as it's rising on multiple versions at the same time, but I wonder if there's something we can do here.
I have possibly some options to reduce or remove this crasher...
(In reply to Alfred Kayser from comment #25) > I have possibly some options to reduce or remove this crasher... Cool !!!
1. Remove 'offset' from nsDiskCacheStream::GetOutputStream (it is always zero). This simplifies the code, particular around the buffer handling. 2. Use buffer if and only aslong as the data fits in cacheblocks, otherwise directly write to the file (skipping the buffer) (the write to the file is buffered itself). This reduces the need for the buffer and chances of crashes. 3. Merge nsDiskCacheOutputStream with nsDiskCacheStreamIO: less malloc's and less code overall: reduces risk of crashing.
Depends on: 813935
Depends on: 814010
No longer depends on: 813935
Blocks: 405407
Here are some correlations from Firefox 17 that implicate Yandex in the crash on the Firefox side: nsDiskCacheStreamIO::FlushBufferToFile()|EXCEPTION_ACCESS_VIOLATION_READ (228 crashes) 43% (98/228) vs. 4% (1843/43394) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495) 42% (96/228) vs. 4% (1728/43394) vb@yandex.ru 37% (84/228) vs. 2% (947/43394) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D} 15% (35/228) vs. 0% (106/43394) {6236BA26-C117-4007-928C-DE0716C7FA78} (Chameleon Bob: layouts for Facebook, https://addons.mozilla.org/addon/11584) 15% (35/228) vs. 0% (108/43394) {6236BA26-C117-4007-928C-DE0716C7FA48} 15% (35/228) vs. 0% (109/43394) {6236BA26-C117-4007-928C-DE0716C7FA58} 15% (35/228) vs. 0% (112/43394) {6236BA26-C117-4007-928C-DE0716C7FA68} (Express Tab, https://addons.mozilla.org/addon/14584) 15% (35/228) vs. 0% (120/43394) {6236BA26-C117-4007-928C-DE0716C7FA38} 15% (35/228) vs. 0% (153/43394) {4933189D-C7F7-4C6E-834B-A29F087BFD23} 12% (28/228) vs. 2% (889/43394) jqs@sun.com (Java Quick Starter, http://java.sun.com/javase/downloads/) 10% (23/228) vs. 0% (30/43394) plusmo@plusmo.ru 9% (21/228) vs. 0% (25/43394) player@zaycev.fm 6% (14/228) vs. 0% (18/43394) 501891365d7e4@501891365d81d.info 6% (14/228) vs. 1% (399/43394) {dfefbe51-ca52-484b-adf0-6b158b05262d} 98% (223/228) vs. 93% (40195/43394) {972ce4c6-7e08-4474-a285-3208198ce6fd} (Default, https://addons.mozilla.org/addon/8150) 6% (14/228) vs. 1% (437/43394) {5911488E-9D1E-40ec-8CBB-06B231CC153F} 6% (14/228) vs. 1% (476/43394) gtffxtbr@GamingWonderland.com
The nsDiskCacheStreamIO::FlushBufferToFile() signature is #21 on 18.0b3 and up to #11 on 17.0.1 release. As the Yandex Bar add-on is completely JS-based (no binary code), any crash it triggers is very probably a problem in our code and we should investigate that, esp. with how much-used this add-on is in Russia. sfink, you have done work with debugging crashes with this add-on in bug 770238, can you help here as well?
The fix in bug 814010 will likely fix this bug. It seems risky to uplift it in branches.
Longstanding issue, we'll look into uplifting bug 814010 for FF19.
Attached patch Don't crash on write of empty buffer (obsolete) (deleted) — Splinter Review
Still causing crashes (although less than before). The crash is at PR_Write with access to 0x0. As mFD cannot be zero at that point (see code before), the other suspect is mBuffer: if (PR_Write(mFD, mBuffer, mStreamEnd) != (int32_t)mStreamEnd) { mBuffer could be zero when there is no buffer allocated (either failed to or first chunk is directly greater than 16k). In both cases, mStreamEnd will be zero, as there is nothing in the buffer. So to prevent PR_Write crashing, a check around it with mBuffer != null and mStreamEnd > 0 will help preventing to crash here.
Attachment #700937 - Flags: review?(michal.novotny)
Comment on attachment 700937 [details] [diff] [review] Don't crash on write of empty buffer First, I think that this crash isn't caused by mBuffer being null. See comment #6 in bug #752266. Also see discussion in bug #801394. BTW, this bug is also Windows only problem. Anyway, I agree that we should make sure that we don't pass null to PR_Write(). But I don't think that we should try to survive malloc() failure, because if we are out of memory the application will crash sooner or later. So I propose to change malloc()/realloc() with infallible moz_xmalloc()/moz_xrealloc() and to place a check before PR_Write() that aborts Firefox (even in release build) in case mBuffer is null, something like this: if (!mBuffer) { NS_RUNTIMEABORT("Fix me!"); }
Attachment #700937 - Flags: review?(michal.novotny) → review-
There can be multiple issues that can cause PR_Write to crash: 1. Failed to allocate mBuffer, because of lack of memory... 2. No mBuffer allocated because first part is > 16K, so the buffer allocate stage is skipped. 3. No mBuffr allocated because count = 0. Number 1 is crisis-mode, but should just return OUT_OF_MEMORY from the write call, so that stream writer can handle it. Number 2 and 3 should be protected within the code as in patch https://bugzilla.mozilla.org/attachment.cgi?id=700937. Number 2 will happen when chunks larger than 16K are written in the first call to Write. So, while number 1 is memory or virus or such related, number 2 and 3 are to be fixed by this patch.
Whiteboard: [tbird crash] → [tbird crash][startupcrash]
Depending on the crash signatures, 43% or 32% of crashes occur within one minute. It doesn't qualify this bug for the startupcrash whiteboard that requires more than 50% of crashes.
Whiteboard: [tbird crash][startupcrash] → [tbird crash]
Hmm, for Firefox Aurora, I did see the startup "rocket" symbol on it, that's what I told Alex in the channel meeting and he marked it that way. I now see that it barely qualifies there: https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2013-01-25&signature=nsDiskCacheStreamIO%3A%3AFlushBufferToFile%28%29&version=Firefox%3A20.0a2
(In reply to Alfred Kayser from comment #35) > Number 1 is crisis-mode, but should just return OUT_OF_MEMORY from the write > call, so that stream writer can handle it. As I wrote, use infallible memory allocators and don't try to handle OOM. The rest of the patch is OK.
Attached patch V2: Use moz_xmalloc (deleted) — Splinter Review
Attachment #700937 - Attachment is obsolete: true
Attachment #709359 - Flags: review?
Attachment #709359 - Flags: review? → review?(michal.novotny)
Attachment #709359 - Flags: review?(michal.novotny) → review+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
This needs to be applied to Aurora/Mozilla20 as it fixes a potential crash issue caused by the patch of bug 572011 which is applied to Mozilla20.
Target Milestone: mozilla21 → mozilla20
If tracking gets denied we may have to ask for approval of the patch. But the target milestone is still 21.
Target Milestone: mozilla20 → mozilla21
(In reply to Alfred Kayser from comment #43) > This needs to be applied to Aurora/Mozilla20 as it fixes a potential crash > issue caused by the patch of bug 572011 which is applied to Mozilla20. Here it's already bug 572011, a long standing issue. Are you sure you posted in the right bug?
This is the right bug with the patch that needs to go into Mozilla20. I meant to refer to bug 814010. See comment 19 in there.
BTW, "Target Milestone" means when it is supposed to be or was fixed on mozilla-central. If you want to indicate what other versions are affected, that's what the status-* flags are about (setting that for 20), tracking-* means that we should track this for a given train, if it's just about getting a patch uplifted, all that's needed is the relevant approval-* flag(s) on the patch (and switching the fitting status flag to fixed once the uplift lands).
Tracking - go ahead with nominating the patch for aurora uplift.
We'll be merging Aurora to the Beta channel next Tuesday so an uplift nomination and landing would be great to have before then so this crash doesn't hit our larger audience.
Flags: needinfo?(michal.novotny)
Comment on attachment 709359 [details] [diff] [review] V2: Use moz_xmalloc [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 814010 User impact if declined: crash Testing completed (on m-c, etc.): on m-c for a week now. Risk to taking this patch (and alternatives if risky): very low risk IMO. Just a couple if (null) checks and switch to use moz_xmalloc from malloc. Trivial. String or UUID changes made by this patch: none.
Attachment #709359 - Flags: approval-mozilla-aurora?
Flags: needinfo?(michal.novotny)
Comment on attachment 709359 [details] [diff] [review] V2: Use moz_xmalloc Thank you - please go ahead with this low risk uplift.
Attachment #709359 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Status: RESOLVED → VERIFIED
I took a quick look at the six post-fix crashes, and I'm guessing that the problem is that mFD is corrupted somehow - at http://hg.mozilla.org/releases/mozilla-beta/annotate/64a66423dbd3/netwerk/cache/nsDiskCacheStreams.cpp#l706, FlushBufferToFile() calls PR_Write, which calls through a function pointer mFD->methods->write. If that's pointing somewhere odd, that would explain why strstr (or other seemingly unrelated symbols) shows up at the top of the stack. The most likely candidate for corrupt mFD would be use-after-free (most likely in the form of use-after-close), but based on a quick review I'm not seeing an obvious place where that would happen.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: