Closed Bug 419668 Opened 17 years ago Closed 17 years ago

filepicker throws an exception in NSOpenPanel openPanel runModalForDirectory when uploading files

Categories

(Core :: Widget: Cocoa, defect, P1)

All
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla1.9beta5

People

(Reporter: marcia, Assigned: smichaud)

References

()

Details

(Keywords: crash)

Attachments

(4 files, 1 obsolete file)

I crashed in this stack on Saturday (2008022304) while trying to upload an image to a picasa album. Currently it ranks #5 in the list of beta4pre crashes on mac. STR (although it may not work reliably). 1. Try to upload a file to picasa by using the upload photo link. 2. Select browse and find the file you want to upload. In my case, I had trouble with the system recognizing the file in my pictures folder that I wanted, and I kept clicking to try to get it to show. Eventually I crashed. I am not able to consistently reproduce this. I was using a wireless internet connection when this occurred. I see that users are still hitting this crash today, so that is why I am filing. Meanwhile, will work on getting a consistent set of STR.
See more reports: http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0b4pre&range_value=2&signature=nsObjCExceptionLogAbort%28NSException%2A%29 Mac Only (10.4 and 10.5). It looks its on trunk only. Beta 3 RC doesnt have the problem. Not much comments found in the crash reports.
Flags: blocking-firefox3?
Sample stack frame: 0 nsObjCExceptionLogAbort(NSException*) mozilla/widget/src/cocoa/nsCocoaWindow.mm:63 1 -[NSWindow(MethodSwizzling) nsCocoaWindow_NSWindow_sendEvent:] mozilla/widget/src/cocoa/nsCocoaWindow.mm:1973 2 AppKit@0xdba2b 3 AppKit@0x27cd57 4 AppKit@0x277483 5 AppKit@0x26855c 6 nsFilePicker::GetLocalFiles(nsString const&, int, nsCOMArray<nsILocalFile>&) mozilla/widget/src/cocoa/nsFilePicker.mm:277 7 nsFilePicker::Show(short*) mozilla/widget/src/cocoa/nsFilePicker.mm:212 8 nsFileControlFrame::MouseClick(nsIDOMEvent*) mozilla/layout/forms/nsFileControlFrame.cpp:345 --> Core::Widget:Mac
Assignee: nobody → joshmoz
Component: General → Widget: Mac
Flags: blocking-firefox3?
Product: Firefox → Core
QA Contact: general → mac
Flags: blocking1.9?
Component: Widget: Mac → Widget: Cocoa
QA Contact: mac → cocoa
Once bug 419392 is fixed we'll have a more useful stack trace, but in the mean time can somebody reproduce this and then look in Console.app for any output that comes after the crash? It should print out the name of the exception being thrown and the reason for the throw. nsObjCExceptionLogAbort is the function that gets called to handle any objective-c exceptions. It is always going to be high on the top-crashers list because it is a catch-all. Unfortunately it can be difficult to figure out which crashes with nsObjCExceptionLogAbort on the stack are related, which means it is hard to say how common this particular crash is.
Summary: Crash in nsObjCExceptionLogAbort(NSException*) while trying to upload an image file → xpcom file io throws an exception when uploading files
276 int result = [thePanel runModalForDirectory:theDir file:nil types:filters];
Summary: xpcom file io throws an exception when uploading files → filepicker throws an exception in NSOpenPanel openPanel runModalForDirectory when uploading files
Severity: normal → critical
(In reply to comment #3) > Unfortunately it can be difficult to figure out > which crashes with nsObjCExceptionLogAbort on the stack are related, which > means it is hard to say how common this particular crash is. Once bug 418386 is fixed that should be easier.
Flags: tracking1.9? → blocking1.9?
Flags: tracking1.9+
Flags: blocking1.9?
Flags: blocking1.9+
Priority: -- → P2
Severity: critical → major
Flags: tracking1.9+
Apparently people are having a difficult time reproducing. blocking1.9- until we can repro and get console output.
Flags: tracking1.9+
Flags: blocking1.9-
Flags: blocking1.9+
Priority: P2 → --
Flags: tracking1.9+
Ok, I went back to the sat build and was able to crash again. Here is what comes out in the console: 2/29/08 1:21:51 PM com.apple.launchd[86] (0x10cfc0.Locum[2448]) Exited: Terminated 2/29/08 1:27:00 PM com.apple.launchd[86] (0x10cfc0.Locum[2481]) Exited: Terminated 2/29/08 1:27:09 PM com.apple.launchd[86] (0x10d030.Locum[2483]) Exited: Terminated 2/29/08 1:27:13 PM com.apple.launchd[86] (0x10d070.Locum[2484]) Exited: Terminated 2/29/08 2:08:22 PM com.apple.launchd[86] (0x10d880.Locum[2638]) Exited: Terminated 2/29/08 2:09:25 PM [0x0-0x75075].org.mozilla.firefox ### MRJPlugin: getPluginBundle() here. ### 2/29/08 2:09:25 PM [0x0-0x75075].org.mozilla.firefox ### MRJPlugin: CFBundleGetBundleWithIdentifier() succeeded. ### 2/29/08 2:09:25 PM [0x0-0x75075].org.mozilla.firefox ### MRJPlugin: CFURLGetFSRef() succeeded. ### 2/29/08 2:09:55 PM [0x0-0x75075].org.mozilla.firefox Fri Feb 29 14:09:55 h-164.office.mozilla.org firefox-bin[2648] <Error>: CGBitmapContextCreateImage: invalid context 2/29/08 2:11:18 PM firefox-bin[2648] NSInvalidArgumentException: *** -[NSCFArray insertObject:atIndex:]: attempt to insert nil 2/29/08 2:11:18 PM [0x0-0x75075].org.mozilla.firefox 2008-02-29 14:11:18.062 firefox-bin[2648:10b] NSInvalidArgumentException: *** -[NSCFArray insertObject:atIndex:]: attempt to insert nil My STR: 1. Go to picasweb and click and browse icon to upload a file. 2. Go to Pictures directory and then rapidly switch back and forth between different file names. Eventually the cancel and OK button both won't work, and if I keep clicking rapidly I crash.
Note that this was much easier for me to reproduce on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b4pre) Gecko/2008022304 Minefield/3.0b4pre than using today's nightly.
There's an undocumented function named NSInvalidArgumentException defined in the Foundation framework. I'll bet that's what displays the NSInvalidArgumentException message in the console. If I'm right, it'd be really useful for someone to do the following: Run Minefield in gdb, break on NSInvalidArgumentException, and keep running until Minefield crashes. Then do a stack trace inside gdb. No, I don't expect you'll be able to do that, Marcia. But someone else might be able to. (For the purposes of this test, it doesn't matter whether or not your build contains debug symbols. So any downloadable distro (e.g. any nightly) will do.)
> and keep running until Minefield crashes Actually, gdb will break on NSInvalidArgumentException just _before_ the crash.
> (For the purposes of this test, it doesn't matter whether or not > your build contains debug symbols. So any downloadable distro > (e.g. any nightly) will do.) Actually, this isn't _quite_ true. The trace you get (by breaking on NSInvalidArgumentException) will be accurate wrt OS code and symbols. But if you use a Minefield distro (as opposed to a debug build you've made yourself), many of the browser symbols will be inaccurate.
Is this related to bug 400724? Marking wanted-next+ ...
Flags: wanted-next+
> Is this related to bug 400724? It could be. Marcia, can you still reproduce bug 400724? And if so, do you get stuff in the console similar to what you reported in comment #7?
Steven: Using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b4pre) Gecko/2008030304 Minefield/3.0b4pre, I am having frequent crashes when trying to upload files on Leopard, so yes, I can repro bug 400724 using today's build. Here is what I see in my console: 3/3/08 11:56:44 AM firefox-bin[469] NSInternalInconsistencyException: -[NSNavMatrix(0x1ce66db0) lockFocus] failed with window=0x0, windowNumber=0, [self isHiddenOrHasHiddenAncestor]=0 3/3/08 11:56:44 AM [0x0-0x1d01d].org.mozilla.firefox 2008-03-03 11:56:44.078 firefox-bin[469:10b] *** Assertion failure in -[NSNavMatrix lockFocus], /SourceCache/AppKit/AppKit-949.27/AppKit.subproj/NSView.m:4751 3/3/08 11:56:44 AM [0x0-0x1d01d].org.mozilla.firefox 2008-03-03 11:56:44.079 firefox-bin[469:10b] NSInternalInconsistencyException: -[NSNavMatrix(0x1ce66db0) lockFocus] failed with window=0x0, windowNumber=0, [self isHiddenOrHasHiddenAncestor]=0 Here are breakpads for my two crashes: http://crash-stats.mozilla.com/report/index/ffecb67a-e95b-11dc-b49c-001a4bd46e84 and http://crash-stats.mozilla.com/report/index/95d12740-e95a-11dc-af56-001a4bd43e5c. I am using Shockwave Flash 9.0 r115.
Interesting. And thanks, Marcia. The console errors aren't the same, so this bug (bug 419668) and bug 400724 probably aren't related. But this info could really help resolve bug 400724. Could you also paste it into a comment there?
We should fix this. +'ing w/ P2 along with bug 400724.
Flags: wanted-next+
Flags: blocking1.9-
Flags: blocking1.9+
This is probably related to bug 420967.
Priority: -- → P2
Assignee: joshmoz → smichaud
Marcia, I'm going to try to track this down ... but I need your help. 1) Firstly, I need very precise instructions how to reproduce this problem. There appear to be at least two different ways to upload photos to picasa.google.com (one of which uses software that you download from Google). I can't tell from your STRs which you used. 2) Secondly, since I've never taken a digital photo, I need either a) A place I can find sample photos to download and then upload. b) Simple instructions on how to use (say) the camera in a MacBook Pro to produce examples I can use in my upload tests. I'm assuming that you still _can_ reproduce this problem with current nightlies. I hope the answer is "yes" :-)
Marcia, never mind about #2 above -- Josh filled me in.
Steven: Regarding whether this is still happening - it was using Saturday's build that I have ~7 crash reports. In this case I was using my account on sitewelder.com and uploading files that were stored in my user directory under the "Pictures" subdirectory. I am trying to get a test account for you to use for sitewelder since I don't want to mess up my existing account. I think it is a good acid test since it seems to happen with that site quite a bit. The consistent thing I see is strange behavior of both the "Select" and "Cancel" button after I select a file(s) to upload. Often both buttons become nonresponsive after I make my file selection. I then am unable to escape or Cmd+W out of the dialog. I do not see this issue when I use Safari.
I've now managed to reproduce this (though not yet in a build with debug symbols). Following Marcia's suggestion, I went to the test page at http://www.element-it.com/Examples/MultiPowUpload/SimpleUpload.html, after making sure that I had a fairly large list (around 40 files all in one directory). I used command-click and shift-click _a lot_ (to multiselect files). This has the advantage that you can click like crazy (while either of these keys is down) without terminating the dialog. The "clicking like crazy" seems to be what does the trick. Here's my breakpad report: http://crash-stats.mozilla.com/report/index/2ebf645e-eef3-11dc-a57b-001a4bd43ed6
I should mention that I've now got the comment #21 crash on a couple of different machines -- but both were running OS X 10.5.2. I get a different (unrelated) crash on OS X 10.4.11. I may post info about that here, or perhaps open a new bug.
> I get a different (unrelated) crash on OS X 10.4.11. I may post > info about that here, or perhaps open a new bug. Turns out that's a dup of bug 409615 (and of bug 400724). I've commented about it (and posted a gdb stack trace with debug symbols) at bug 409615 comment #22.
Here's a gdb stack trace of the "assertion failure" exception from comment #22. I think it shows pretty clearly the problem underlying this bug (bug 419668) and bug 420967 -- a Cocoa modal event loop (for an app-modal dialog) has been "interrupted" (by the processing of a Gecko event) and then "restarted". In this case, the particular problem (the one that leads to an assertion failure) appears to be as follows: [NSMatrix mouseLoop] is called and then (as part of its initialization of the "mouseLoop") calls [NSEvent startPeriodicEventsAfterDelay:withPeriod:]. But then the mouseLoop is interrupted (by the processing of a Gecko event via NativeEventCallback()) and restarted (via ProcessNextNativeEvent()). When [NSMatrix mouseLoop] is called again, it redoes its initialization and once again calls [NSEvent startPeriodicEventsAfterDelay:withPeriod:]. But since periodic events are already being generated, this causes an assertion failure. (In my next comment I'm going to post a patch that fixes this problem.)
Attached patch Fix (obsolete) (deleted) — Splinter Review
This patch suspends the processing of Gecko events while a Cocoa app-modal event loop is running (while a Cocoa app-modal dialog is being displayed). I've tested it (on OS X 10.5.2) with the STRs from this bug and bug 420967 -- it appears to fix both bugs. And it doesn't seem to cause any regressions, on either 10.4.11 or 10.5.2. In particular it doesn't seem to make the menus behave any differently than before. Here's a tryserver build made with this patch: https://build.mozilla.org/tryserver-builds/2008-03-13_09:44-smichaud@mozilla.com-bugzilla419668/smichaud@mozilla.com-bugzilla419668-firefox-try-mac.dmg
Attachment #309145 - Flags: review?(joshmoz)
I tested the tryserver build in Comment 26 (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b5pre) Gecko/2008031309 Minefield/3.0b5pre), and it looks good to me on Leopard. I tested a few different sites and uploaded photos, including http://www.element-it.com/Examples/MultiPowUpload/SimpleUpload.html, moo.com, and uploadsmart demo page. I did not experience any crashes or the button freezing that I was experiencing previously when using these sites. Good work steven.
Comment on attachment 309145 [details] [diff] [review] Fix The approach seems reasonable, I'm just concerned about performance. This will always call NS_IsMainThread, and on the main thread, will always call at least [NSApp _isRunningModal] and possibly more. I don't know if these can become expensive in aggregate when hooked into the event system like this. Have you done performance measurements?
Comment on attachment 309145 [details] [diff] [review] Fix r+, but please address Mark's perf concerns
Attachment #309145 - Flags: review?(joshmoz) → review+
> Have you done performance measurements? No, I haven't. What do you think would be the best way to do this?
Personally, I'd say land it and watch Tp/Ts :) I was more requesting commentary from you.
Attachment #309145 - Flags: superreview?(roc)
Here's the commentary :-) > This will always call NS_IsMainThread If MOZILLA_INTERNAL_API is defined (which appears to be the default), this translates into a call to nsThreadManager::GetIsMainThread(), which just checks a flag (and therefore shouldn't cause any performance problems). The other checks are done by Apple, so (of course) it's difficult to know what's involved. But they might also all just check a flag. In any case, I just discovered an [NSApplication _isRunningAppModal] method, which might be better, since it's simpler than what I'm doing now and (I hope) gives me exactly the information I want. Both it and _isRunningModal are undocumented. But both are present in Tiger and Leopard, and have been present (in the same form) since at least Jaguar (OS X 10.2). Of course I need to test _isRunningAppModal. But if the tests work out, I'll post a new patch (and tryserver build) that uses _isRunningAppModal.
I think we should move ahead with the current patch while you investigate that. We can always take a perf improvement later if things turn out that way.
> If MOZILLA_INTERNAL_API is defined (which appears to be the > default), this translates into a call to > nsThreadManager::GetIsMainThread(), which just checks a flag (and > therefore shouldn't cause any performance problems). Actually this is oversimplified, though still basically accurate. The current thread manager is stored in a static variable (nsThreadManager::sInstance), whose GetIsMainThread() method is called, which compares an instance variable (nsThreadManager::mMainPRThread) with what's returned by PR_GetCurrentThread(), which in turn returns what it finds in (OS-provided) thread-specific storage. (Of course all these variables need to be initialized, but that happens very early on.)
> I think we should move ahead with the current patch while you > investigate that. We can always take a perf improvement later if > things turn out that way. OK ... but my tests shouldn't take more than another couple of hours.
Actually, Josh, I've now discovered that OnDispatchedEvent() isn't called as often as I thought (regardless of the optimization issues) -- which could (in principal) allow some Gecko events to be processed when they shouldn't be. So I'll need to do another patch, regardless. I still expect to have it by the end of today.
Why do we have to stop processing Gecko events while an app-modal dialog is showing? That seems bad. It means that pages will stop loading, plugins will stop running, JS animations will stop, etc etc. You're going to partially regres bug 389931.
> Why do we have to stop processing Gecko events while an app-modal > dialog is showing? At this point I'm not sure there's any other way to fix this problem. > It means that pages will stop loading, plugins will stop running, JS > animations will stop, etc etc. Groan, you're right. I'd forgotten about this. I need to keep digging to see if I can square this particular circle.
Attachment #309145 - Flags: superreview?(roc)
Roc, how about if I can figure out a way to only allow Gecko timer events to be processed?
Only allowing Gecko timer events won't actually help. We need a wide range of events for things to work properly. Mats, see comment #25. I think the patch in bug 420148 will help a lot here, possibly even solve the bug, because instead of spawning a nested event loop we will prefer to return to the outer event loop if at all possible. Sometimes, though, returning to the outer event loop might not be possible, for example if a Gecko event triggers another modal dialog. I'm not sure if that would cause this kind of crash to reappear. I'm not sure what we should do in that case.
Depends on: 420148
Actually, I've now thought of another possible way to get at this problem (without making any changes to appshell) -- tomorrow I'll try discarding mouse events that happen in ChildView objects if the browser's running app-modal (this will stop those events ever getting to Gecko).
Here's a revision of my patch that (as far as I can tell) fixes all the problems it's aimed at and answers all objections that have been made to date. I chose not to go with the strategy from comment #41, because overnight I thought of a better idea: Instead of suspending the processing of Gecko events while an app-modal dialog is displayed, why not just stop native events being processed via Gecko (i.e. via ProcessNextNativeEvent())? This was very simple to do, and seems to work perfectly (in my fairly light testing). Here's a tryserver build made using this patch. Please pile in and bang away at it as hard as you can :-) https://build.mozilla.org/tryserver-builds/2008-03-14_08:55-smichaud@mozilla.com-bugzilla419668-rev1/smichaud@mozilla.com-bugzilla419668-rev1-firefox-try-mac.dmg
Attachment #309145 - Attachment is obsolete: true
Attachment #309447 - Flags: review?(joshmoz)
Attachment #309447 - Flags: review?(joshmoz) → review+
Attachment #309447 - Flags: superreview?(roc)
(In reply to comment #40) > I think the patch in bug 420148 will help [...] possibly even solve the bug I think so too. I tested uploading images to myspace, picasaweb and sitewelder and it worked for me (with a build with bug 420148). As roc said, not handling events when _isRunningAppModal seems like it could cause native events to not be handled at all if we're stuck in a modal window event loop. If the problem can't be reproduced anymore I would recommend leaving the code as is.
Comment on attachment 309447 [details] [diff] [review] Fix rev 1 (simplify, answer objections) Clearing review request in the hope this patch is no longer needed
Attachment #309447 - Flags: superreview?(roc)
Robert - why do you think this patch is not necessary? Steven, did I miss something that makes this patch not necessary?
I agree that (now that the patch for bug 420148 has landed) this patch _might_ no longer be needed. But the cross-platform nsBaseAppShell.cpp code necessarily takes a broader perspective (and operates with less information) than the platform-specific nsAppShell.mm code. In short, these two patches don't (and can't) address exactly the same problem. And I need to test to be sure that the patch for bug 420148 actually does fix the symptoms of both this bug (bug 419668) and bug 420967. I'll dig into this on Monday. In the meantime I'm not going to let this destroy my weekend :-)
(In reply to comment #46) > Robert - why do you think this patch is not necessary? The stack in comment #25 shouldn't happen anymore. Instead of creating a nested Cocoa event loop, Mats' fix makes us return to the outer event loop being driven by the file dialog.
Using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b5pre) Gecko/2008031704 Minefield/3.0b5pre, and testing uploading files at http://www.element-it.com/Examples/MultiPowUpload/SimpleUpload.html, I have not been able to crash today.
The patch for bug 420148 _does_ fix the symptoms of this bug ... but I think that's only an accident. It clearly doesn't fix bug 420967. And it doesn't fix the underlying problem that causes both this bug (bug 419668) and bug 420967. (For my evidence see bug 420967 comment #6 through bug 420967 comment #8.) The only way I can think that my current patch (attachment 309447 [details] [diff] [review]) would cause trouble is if a Gecko modal dialog is spawned from a Cocoa app-modal dialog. But (by the definition of a Cocoa app-modal dialog) that should never happen -- and if it did that would be a Cocoa bug. So I think that my current patch (attachment 309447 [details] [diff] [review]) should be landed as is, and that this bug (properly understood) doesn't depend on bug 420148.
No longer depends on: 420148
Attachment #309447 - Flags: superreview?(roc)
> But (by the definition of a Cocoa app-modal dialog) that should > never happen -- and if it did that would be a Cocoa bug. That would be a _Gecko_ bug.
I think bug 420967 comment #7 is showing that the fix in bug 420148 wasn't complete enough. We simply should not be spawning a new native event loop there, we should be returning to the outer one. Mats?
Priority: P2 → P1
Roc may be correct, but until a patch is in the tree and we know for sure I think we should take Steven's patch here. We can always take it out later if we want, right now we need to resolve these blocking crashes and I think Steven's patch is safe enough.
Comment on attachment 309447 [details] [diff] [review] Fix rev 1 (simplify, answer objections) roc, sr'ing this based on josh's request... if you disagree, please comment and we can get it fixed. One nit: please just 'return PR_FALSE;' directly instead of return moreEvents (line above, too) -- there's no reason to obfuscate the code by returning a variable that has never been modified since initialization.
Attachment #309447 - Flags: superreview?(roc) → superreview+
Checking in widget/src/cocoa/nsAppShell.mm; /cvsroot/mozilla/widget/src/cocoa/nsAppShell.mm,v <-- nsAppShell.mm new revision: 1.29; previous revision: 1.28 done Landed on trunk.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Verified by using given URL and Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9b5pre) Gecko Minefield/3.0b5pre ID:2008031923. No crash anymore.
Status: RESOLVED → VERIFIED
Hardware: PC → All
Target Milestone: --- → mozilla1.9beta5
This is OK but I still think we should fix the underlying bug and back this out.
As far as I'm concerned my patch _does_ fix the underlying bug (the bug that underlies this bug and bug 420967).
No, it doesn't. The underlying bug here is that we are spawning a nested native event loop when we shouldn't.
> The underlying bug here is that we are spawning a nested native > event loop when we shouldn't. This is an extremely vague description. How do you know "we are spawning a nested native event loop when we shouldn't"? I'm sure there are many different, unrelated cases. THe comments in the patch for bug 420148 are a bit more specific: It seems to stop native events being processed from Gecko events _while a native event loop is running that has itself been spawned from a Gecko event (via ProcessNextNativeEvent())_. But that's only tangentially related to the problem that my patch resolves (it stops native events being processed from Gecko events while a native app-modal loop is running). I'd bet that the problem my patch fixes is Mac-specific, and needs a Mac-specific solution. If other OSs have similar problems, they may also need OS-specific solutions.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: