Closed
Bug 790649
Opened 12 years ago
Closed 12 years ago
Worker thread crash @ffi_call
Categories
(Toolkit Graveyard :: OS.File, defect)
Tracking
(firefox16 verified, firefox17 verified, firefox18 verified, firefox-esr10 unaffected)
VERIFIED
FIXED
mozilla18
Tracking | Status | |
---|---|---|
firefox16 | --- | verified |
firefox17 | --- | verified |
firefox18 | --- | verified |
firefox-esr10 | --- | unaffected |
People
(Reporter: MatsPalmgren_bugz, Assigned: Yoric)
References
(Blocks 1 open bug, )
Details
(Keywords: crash, regression, sec-other, Whiteboard: [advisory-tracking-])
Attachments
(4 files, 4 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Yoric
:
review+
akeybl
:
approval-mozilla-aurora+
akeybl
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Yoric
:
review+
|
Details | Diff | Splinter Review |
(deleted),
application/zip
|
Details |
STR
1. load URL
(mozilla-inbound, rev 2bbc276ffd2a, debug build)
ACTUAL RESULTS
First-chance exception at 0x7736f9d2 in firefox.exe: 0xC0000008: An invalid handle was specified.
ffi_call
js::ctypes::CDataFinalizer::CallFinalizer
js::ctypes::CDataFinalizer::Methods::Dispose
js::CallJSNative
js::InvokeKernel
js::Interpret
...
see attached stack for details
Comment 1•12 years ago
|
||
Dupe of bug 790633?
Reporter | ||
Comment 2•12 years ago
|
||
Maybe, marking dependent for now. I'll re-test when that bug is fixed.
Depends on: 790633
Assignee | ||
Comment 4•12 years ago
|
||
Attaching an experimental fix. Testing as soon as Windows build is complete.
Assignee: nobody → dteller
Attachment #661791 -
Flags: feedback?
Assignee | ||
Comment 5•12 years ago
|
||
I have not been able to reproduce the problem yet, and I suspect that it depends on how many images are present in the thumbnails cache. I will keep trying, but in the meantime, I have attached an experimental fix, if anybody wants to test it.
Comment 6•12 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] from comment #5)
> I have not been able to reproduce the problem yet, and I suspect that it
> depends on how many images are present in the thumbnails cache. I will keep
> trying, but in the meantime, I have attached an experimental fix, if anybody
> wants to test it.
I'll test to see if I can still reproduce on the latest mc. Will post back today. If I can reproduce it easily I'll apply your patch and see if it fixes it.
One note, I was poking through crash stats looking for crashes that fit the signature I was seeing but didn't find any. So unless I'm looking for the wrong stack token this doesn't seem wide spread.
Reporter | ||
Comment 7•12 years ago
|
||
Fwiw, I can still reproduce with the attached patch.
Re: crash stats - I'm pretty sure I haven't seen this crash stack before
for this particular test. There's about 200 or so reports matching
"ffi_call" in the past 4 weeks:
https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&range_value=4&range_unit=weeks&query_search=signature&query_type=contains&query=ffi_call&reason=&build_id=&process_type=any&hang_type=any&do_query=1
"_PR_SocketSetSocketOption | ffi_call_win32" seems similar to this
in the sense that it has "dom::workers" on the stack. (the top stack
frame is bogus)
bp-ae38b16c-6324-457e-bb95-7ba3d2120905
"@0x0 | ffi_call_win32 " have reports for recent Nightlies, eg.
bp-65e93cba-9308-47ca-bb51-c7a632120914
these are on the main thread, with "js::CrossCompartmentWrapper::call"
on the stack.
Assignee | ||
Comment 8•12 years ago
|
||
I tend to believe that the PR_SocketSetSocketOption crash is unrelated.
Assignee | ||
Comment 9•12 years ago
|
||
Found a second error in the code. Mats, could you try it?
Attachment #661791 -
Attachment is obsolete: true
Attachment #661791 -
Flags: feedback?
Reporter | ||
Comment 10•12 years ago
|
||
v2 crashed with the same stack
Assignee | ||
Comment 11•12 years ago
|
||
Attaching fix v3.
Attachment #662353 -
Attachment is obsolete: true
Attachment #662464 -
Flags: feedback?(matspal)
Reporter | ||
Comment 12•12 years ago
|
||
v3 looks good so far; the crash used to occur within a couple of minutes,
but now it's still alive after an hour. Note that I have commented out
the "compartment mismatch" assertion though, since that's easily triggered
by this test (I have a JS stack for that assertion if it helps).
I'll leave the test running overnight...
This effectively stops me from debugging on Windows, since anytime I attach a debugger I just get Invalid Handle errors over and over again.
Severity: critical → blocker
Reporter | ||
Comment 14•12 years ago
|
||
Comment on attachment 662464 [details] [diff] [review]
Experimental fix, v3
fwiw, still no crash with this. khuey, you want to review? ;-)
Attachment #662464 -
Flags: review?(khuey)
Attachment #662464 -
Flags: feedback?(matspal)
Attachment #662464 -
Flags: feedback+
Comment on attachment 662464 [details] [diff] [review]
Experimental fix, v3
Review of attachment 662464 [details] [diff] [review]:
-----------------------------------------------------------------
Not really. I'm not familiar with ctypes or the Windows APIs here.
Attachment #662464 -
Flags: review?(khuey)
Assignee | ||
Updated•12 years ago
|
Assignee: dteller → nobody
Component: DOM: Workers → OS.File
Product: Core → Toolkit
QA Contact: dteller
Assignee | ||
Updated•12 years ago
|
Attachment #662464 -
Flags: review?(nfroyd)
Comment 16•12 years ago
|
||
Comment on attachment 662464 [details] [diff] [review]
Experimental fix, v3
Review of attachment 662464 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/osfile/osfile_win_back.jsm
@@ +89,5 @@
> // == would always return |false|.
> return INVALID_HANDLE;
> }
> return ctypes.CDataFinalizer(maybe, _CloseHandle);
> + };
Nit^2: I think the 'return' line above this is actually the one that's indented wrong.
@@ +106,5 @@
> + // == would always return |false|.
> + return INVALID_HANDLE;
> + }
> + return ctypes.CDataFinalizer(maybe, _FindClose);
> + };
Please change this into something like:
function possibly_invalid_HANDLE_import_from_C(finalizer) {
function import_from_C(maybe)
if (Types.int.cast(maybe).value == INVALID_HANDLE) {
// Explanatory comment
}
return ctypes.CDataFinalizer(maybe, finalizer);
}
return import_from_C;
}
and use that function in the appropriate places. I don't have a lot of experience with the Win32 APIs, but hopefully the refactoring will avoid issues in the future.
Attachment #662464 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 17•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #16)
> @@ +106,5 @@
> > + // == would always return |false|.
> > + return INVALID_HANDLE;
> > + }
> > + return ctypes.CDataFinalizer(maybe, _FindClose);
> > + };
>
> Please change this into something like:
>
> function possibly_invalid_HANDLE_import_from_C(finalizer) {
> function import_from_C(maybe)
> if (Types.int.cast(maybe).value == INVALID_HANDLE) {
> // Explanatory comment
> }
> return ctypes.CDataFinalizer(maybe, finalizer);
> }
> return import_from_C;
> }
>
> and use that function in the appropriate places. I don't have a lot of
> experience with the Win32 APIs, but hopefully the refactoring will avoid
> issues in the future.
This is actually slightly more complicated, due to initialization order of variables, so cleanup and factorization of code will probably involve something a little more OO. Due to the blocker status of this bug, do you mind if I land the patch as is and cleanup in a followup?
Comment 18•12 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] from comment #17)
> This is actually slightly more complicated, due to initialization order of
> variables, so cleanup and factorization of code will probably involve
> something a little more OO. Due to the blocker status of this bug, do you
> mind if I land the patch as is and cleanup in a followup?
I will take your word for it. :) Landing this as-is is fine. Please do file a followup after this bug has been made public.
Assignee | ||
Comment 19•12 years ago
|
||
Attaching the tested fix. Filed a followup refactoring as bug 792668.
Assignee: nobody → dteller
Attachment #662464 -
Attachment is obsolete: true
Attachment #662810 -
Flags: review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 20•12 years ago
|
||
I finally have a scenario that could explain why the testsuite passed but FF itself crashes for something that should not be a fatal error: according to http://msdn.microsoft.com/en-us/library/windows/desktop/ms724211%28v=vs.85%29.aspx
« If the application is running under a debugger, the function will throw an exception if it receives either a handle value that is not valid or a pseudo-handle value. This can happen if you close a handle twice, or if you call CloseHandle on a handle returned by the FindFirstFile function instead of calling the FindClose function. »
In other words, it is possible that the crashes happen only in presence of a debugger.
Assignee | ||
Comment 21•12 years ago
|
||
If confirmed, this is not a security bug after all.
Assignee | ||
Comment 22•12 years ago
|
||
Comment on attachment 662810 [details] [diff] [review]
Temporary fix
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 707681 (original error), 754671 (code is correct but triggers the dormant error).
User impact if declined: Attempting to debug Firefox under Windows fails because of crashes.
Testing completed (on m-c, etc.): With patch applied, we cannot reproduce the bug. Companion patch that makes the error visible to the testsuite will land soon on m-c.
Risk to taking this patch (and alternatives if risky): Very low. No API change. It fixes an error that was otherwise silent.
String or UUID changes made by this patch: None.
Attachment #662810 -
Flags: approval-mozilla-beta?
Attachment #662810 -
Flags: approval-mozilla-aurora?
Comment 23•12 years ago
|
||
Status: NEW → ASSIGNED
Keywords: checkin-needed
Comment 24•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #16)
> Nit^2: I think the 'return' line above this is actually the one that's
> indented wrong.
Looks like this nit was not addressed?
Assignee | ||
Comment 25•12 years ago
|
||
Here is the same fix, with the whitespace nit^2 addressed. Anyway, the followup bug contains a patch that generally looks nicer.
Attachment #662810 -
Attachment is obsolete: true
Attachment #662810 -
Flags: approval-mozilla-beta?
Attachment #662810 -
Flags: approval-mozilla-aurora?
Attachment #663039 -
Flags: review+
Assignee | ||
Comment 26•12 years ago
|
||
Comment on attachment 663039 [details] [diff] [review]
Temporary fix, nit addressed
(Updated nitpick as per my understanding of Gavin's request)
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 707681 (original error), 754671 (code is correct but triggers the dormant error).
User impact if declined: Attempting to debug Firefox under Windows fails because of crashes.
Testing completed (on m-c, etc.): With patch applied, we cannot reproduce the bug. Companion patch that makes the error visible to the testsuite will land soon on m-c.
Risk to taking this patch (and alternatives if risky): Very low. No API change. It fixes an error that was otherwise silent.
String or UUID changes made by this patch: None.
Attachment #663039 -
Flags: approval-mozilla-beta?
Attachment #663039 -
Flags: approval-mozilla-aurora?
Comment 27•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
status-firefox18:
--- → fixed
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Comment 28•12 years ago
|
||
Comment on attachment 663039 [details] [diff] [review]
Temporary fix, nit addressed
[Triage Comment]
Approving for Aurora/Beta based upon the risk evaluation and the benefit to debugging.
Attachment #663039 -
Flags: approval-mozilla-beta?
Attachment #663039 -
Flags: approval-mozilla-beta+
Attachment #663039 -
Flags: approval-mozilla-aurora?
Attachment #663039 -
Flags: approval-mozilla-aurora+
Updated•12 years ago
|
Keywords: checkin-needed
Comment 29•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/c7280b298af5
This doesn't apply cleanly to beta. Please post a new patch for it.
Assignee | ||
Comment 30•12 years ago
|
||
Same fix, this time for mozilla-beta.
Try run: https://tbpl.mozilla.org/?tree=Try&rev=baea0e889d23
Attachment #663882 -
Flags: review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Comment 31•12 years ago
|
||
Keywords: checkin-needed
Assignee | ||
Comment 32•12 years ago
|
||
Post-mortem for this bug, for future reference:
1. The design Microsoft APIs, as often, is "interesting": a |HANDLE| must be closed with |CloseHandle|. However, there are exceptions, and a |HANDLE| obtained through |FindFirstFile| must be closed with |FindClose|.
2. In the Windows back-end of OS.File, I made the double error of closing a find |HANDLE| with |CloseHandle| and not checking the result for possible errors. Consequently, the test suite did not notice the error.
3. The corresponding code was triggered by Tim's work on thumbnails, and also passed Tim's test suite.
4. At some point, people started running that code in the Microsoft Visual Debugger, to catch unrelated bugs. "Interestingly", when the Visual Debugger is launched, the otherwise silent error of attempting to close a find |HANDLE| with |CloseHandle| becomes a C++-style "First-chance exception", i.e. a crash.
5. Hilarity ensues.
Comment 33•12 years ago
|
||
looks like the original code here is from FF12, marking esr10 unaffected.
status-firefox-esr10:
--- → unaffected
Comment 34•12 years ago
|
||
Based on this being exposed by bug 754671, I assume that Firefox 15 is unaffected. Can someone confirm?
Updated•12 years ago
|
Blocks: 754671
Keywords: regression,
sec-other
Comment 35•12 years ago
|
||
Marking for not-tracking based on talking to Dveditz.
Whiteboard: [advisory-tracking-]
Comment 36•12 years ago
|
||
In trying to regress this bug, it continues to crash. I'm seeing this on Windows 7, with the following builds:
2012-11-13, 17.0b6
2012-11-15, Aurora
2012-11-15, Nightly
The test URL in this bug seems to cycle through many tests. I don't know if we're guaranteed to be looking at the same tests when the crash occurs.
Can someone look at the the logs to determine if this is the same issue? Or if we should be filing new bug(s) instead? Thank you.
In the meantime, this fix can't be verified.
Assignee | ||
Comment 37•12 years ago
|
||
Are you sure it's the same bug? I don't see any sign in your crash logs that point to ffi_call.
Assignee | ||
Comment 38•12 years ago
|
||
(In reply to Matt Wobensmith from comment #36)
> Can someone look at the the logs to determine if this is the same issue? Or
> if we should be filing new bug(s) instead? Thank you.
I am nearly certain that this is a different issue.
Comment 39•12 years ago
|
||
Today I used the exact same builds on Win7 as in comment 36, and no crashes.
I think the problem is that our baseline test (above) is a URL that contains a randomized suite of tests. Every time it's run, it is different.
Based on this, I will mark this bug verified and chalk the other crashes up to something I cannot reproduce for now. If they resurface in the future, I'll file a new bug.
Verified original crash on 2012-9-12, nightly
Confirmed fixed:
2012-10-24, 16.02
2012-11-13, 17.0b6
2012-11-15, Aurora
2012-11-15, Nightly
Status: RESOLVED → VERIFIED
Updated•12 years ago
|
Group: core-security
Updated•1 year ago
|
Product: Toolkit → Toolkit Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•