Closed
Bug 768651
Opened 12 years ago
Closed 11 years ago
Firefox started with "cfx run" hangs for some URLs: Windows only
Categories
(Add-on SDK Graveyard :: General, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla25
People
(Reporter: wbamberg, Assigned: gkrizsanits)
References
Details
Attachments
(1 file)
Reported in the forum by Konrad Gorski: https://forums.mozilla.org/addons/viewtopic.php?f=27&t=10904&p=22901#p22901
1) On Windows 7, create a minimal add-on, for example:
const widgets = require("widget");
const tabs = require("tabs");
var widget = widgets.Widget({
id: "mozilla-link",
label: "Mozilla website",
contentURL: "http://www.mozilla.org/favicon.ico",
onClick: function() {
tabs.open("http://www.mozilla.org/");
}
});
console.log("The add-on is running.");
2) Run it using "cfx run", then navigate to certain URLs, for example:
http://www.ebay.com/ctg/Nikon-D5100-162-MP-Digital-SLR-Camera-Black-Kit-w-AFS-1855mm-VR-Lens-/101827356?_pcatid=782&_pcategid=31388&_dmpt=Digital_Cameras&_dashexp=1
http://www.ebay.com/itm/COBRA-ESD-9275-Digital-6-Band-Laser-Radar-Detector-w-Safety-Alert-LaserEye-/350560167708?pt=LH_DefaultDomain_0&hash=item519f03a71c
3) Firefox will hang.
If instead you:
1) build the add-on with "cfx xpi"
2) run Firefox
3) install the add-on
4) visit those pages
...then everything's fine.
As a guess, one of the SDK's default preferences that change Firefox's defaults for cfx run/test?
(In reply to Wes Kocher (:KWierso) from comment #1)
> As a guess, one of the SDK's default preferences that change Firefox's
> defaults for cfx run/test?
I've tested with all of the preferences that the SDK sets commented out, and this still happens.
I've tested with all of the environmental variables that the SDK sets commented out, and this still happens.
That last post in the forum thread about a Windows update possibly causing this seems possible.
Comment 3•12 years ago
|
||
I think if you set hangmonitor.timeout in about:config to a number of seconds then after that a crash report will be submitted. That could at least give us a stack trace so we know what is hanging here.
Comment 5•12 years ago
|
||
Looks like it's hanging on plugin IPC stuff. I imagine disabling plugins would make the problem go away. Might need some of the plugin guys to look at this.
Comment 6•12 years ago
|
||
Maybe Eddy might have some quick thoughts here
Comment 7•12 years ago
|
||
(In reply to Dave Townsend (:Mossop) from comment #6)
> Maybe Eddy might have some quick thoughts here
nsNPAPIPlugin::CreatePlugin is returning with error code NS_ERROR_OUT_OF_MEMORY (see thread 0, stackframe 14). That looks very suspicious.
Comment 8•12 years ago
|
||
Wouldn't it be related to bug 771847?
Is firefox crashing in bug 771847? If yes, we may have to tweak our test runner in order to retrieve the crash report somehow!!
Assignee: nobody → ejpbruel
Whiteboard: [triage:followup]
Priority: -- → P2
Whiteboard: [triage:followup]
Priority: P2 → P1
Comment 9•12 years ago
|
||
have the same problem on FF 14.0.1 on Win7
Comment 10•12 years ago
|
||
make off all plugins - now can develop without problem
Comment 11•12 years ago
|
||
I can reproduce this, the problem is specifically with the flash plugin (so far I've seen crashes on last.fm and amazon.com/cloudplayer). Disabling flash makes the problem go away.
Comment 12•12 years ago
|
||
I'm seeing this now with Firefox 19.0.2 under "cfx" on Windows 7 Pro. Firefox hangs on pages which have Flash. Pages without Flash work fine. One or two copies of "plugin-container.exe" are running. Firefox does not respond to input, the cursor does not change on mouse-over, and the Firefox window cannot be closed.
Killing the "plugin-container.exe" processes will sometimes get Firefox going again.
Comment 13•12 years ago
|
||
Tried opening the error console, then loading a page with Flash with the browser running under "cfx":
Errors:
Timestamp: 3/22/2013 4:46:55 PM
Warning: ReferenceError: reference to undefined property aEvent.button
Source File: chrome://browser/content/browser.js
Line: 8893
Timestamp: 3/22/2013 4:46:55 PM
Warning: ReferenceError: reference to undefined property e.button
Source File: chrome://browser/content/utilityOverlay.js
Line: 148
Running the same version of Firefox, not under "cfx", works fine for the same pages.
Comment 14•12 years ago
|
||
The above is with Windows 7 Pro, Firefox 19.0.2, Flash 11.6.602.180, SDK 1.13.2.
So it's broken with the latest and greatest versions of everything.
Comment 15•12 years ago
|
||
cfx run seems flaky, I wonder if cfx.js will help us here eventually. My workaround is just to use Wladimir's add-on:
https://addons.mozilla.org/en-US/firefox/addon/autoinstaller/
In my mind the class of problems you might run into because you aren't using a clean Firefox profile are exceedingly rare.
Assignee | ||
Updated•11 years ago
|
Assignee: ejpbruel → gkrizsanits
Assignee | ||
Comment 16•11 years ago
|
||
Here is what happens:
- nsNPAPIPlugin::CreatePlugin sends out an rpc init request for the flash plugin
- flash returns an error code from the NP_Initialize (error code value: 1)
(- regardless of the error code on windows we send out an async SendSetAudioSessionData right after, but I turned this off and did not change a thing)
- PPluginModuleParent::CallNP_Shutdown gets called because of the error code, an rpc NP_Shutdown request is sent out
- PluginModuleChild::AnswerNP_Shutdown (this is the plugin-container process) calls mShutdownFunc(), which is a function in the flash plugin itself, and it never returns, and our main thread in the firefox process is being blocked waiting for the response for the rpc command...
Problems:
1., I don't think it's a good idea that the shutdown part is rpc... can we do this part async? Or can we use some kind of timeout and shut it down forcefully after a while (I know this sounds terrible too...)? Letting the plugin-container process deadlock our main thread this way is quite bad, is there a chance we can fix this somehow? I kind of know how difficult is to give a good answer for this... I'm just desperately trying to figure out a workaround for this issue since this bug hurting add-on developers a lot. Any smart hack would be great... (the real solution would be not to block ever the main thread of the firefox process I guess, but that's on a long time wish list of all of us and very hard in practice, right?)
2., If we could figure out why the flash plugin fails to init in this setup that would be great. I've been trying to playing with procmon to figure out the root of the failure, and changing stuff like the cwd of the firefox process we start up with cfx, but no luck. Should we try and assemble a minimal example, and it to the flash plugin developers?
Flags: needinfo?(benjamin)
Comment 17•11 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #16)
> Here is what happens:
>
> - nsNPAPIPlugin::CreatePlugin sends out an rpc init request for the flash
> plugin
> - flash returns an error code from the NP_Initialize (error code value: 1)
> (- regardless of the error code on windows we send out an async
> SendSetAudioSessionData right after, but I turned this off and did not
> change a thing)
> - PPluginModuleParent::CallNP_Shutdown gets called because of the error
> code, an rpc NP_Shutdown request is sent out
This is a bug. If NP_Initialize fails, NP_Shutdown should not be called. Please file a separate bug for that.
> 1., I don't think it's a good idea that the shutdown part is rpc... can we
> do this part async?
Not really, no.
Or can we use some kind of timeout and shut it down
> forcefully after a while (I know this sounds terrible too...)?
We already have a timeout for RPC calls; 45 seconds in release builds and infinite in debug builds. It should also show the plugin hang UI on Windows.
>
> 2., If we could figure out why the flash plugin fails to init in this setup
> that would be great. I've been trying to playing with procmon to figure out
> the root of the failure, and changing stuff like the cwd of the firefox
> process we start up with cfx, but no luck. Should we try and assemble a
> minimal example, and it to the flash plugin developers?
It is unlikely that this will get attention from Adobe. It sounds like your test runner is setting some environment variable or other setup which causes Flash to fail. You can of course just debug the Flash player at the point we call NP_Initialize to see if you can figure out what's going on.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 18•11 years ago
|
||
Thanks for the lots of useful info.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #17)
> This is a bug. If NP_Initialize fails, NP_Shutdown should not be called.
> Please file a separate bug for that.
Bug 889480.
> We already have a timeout for RPC calls; 45 seconds in release builds and
> infinite in debug builds. It should also show the plugin hang UI on Windows.
Alright, I don't think I have anything better than that...
> It is unlikely that this will get attention from Adobe.
I was afraid you're going to say this :)
> You can of course just debug the Flash player at the point we
> call NP_Initialize to see if you can figure out what's going on.
I wish I had a debug version of the Flash player, or the source code... I feel like poking a black box with a stick and hope that it starts to work accidentally...
Assignee | ||
Comment 19•11 years ago
|
||
How much of a win would be if instead of hanging on sites for a long long time, flash plugin would simply just fail to load instead in this setup? Hopefully we can figure out why flash fails to init (and I think we should), just trying to estimate how much we would gain if Bug 889480 were fixed.
Flags: needinfo?(dtownsend+bugmail)
Comment 20•11 years ago
|
||
cfx is a python tool, right? The slightly tedious way to do this is to launch Firefox from Python using subprocess (or whatever cfx is doing) and progressively add special environment setup until Flash fails. Some possibilities:
* security descriptors on process launch
* Custom profile, files or permissions
* Unusual prefs
* Environment variables
* Process groups
In fact, if cfx is using process groups at all, it's possible that is the cause. Flash probably uses groups in its own way to enable its sandbox, and if cfx is setting up its own process groups that could interfere.
Comment 21•11 years ago
|
||
When 889480 is fixed Flash will no longer hang, it just won't work.
Assignee | ||
Comment 22•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #20)
> cfx is a python tool, right? The slightly tedious way to do this is to
> launch Firefox from Python using subprocess (or whatever cfx is doing) and
> progressively add special environment setup until Flash fails. Some
> possibilities:
>
> * security descriptors on process launch
> * Custom profile, files or permissions
> * Unusual prefs
> * Environment variables
> * Process groups
>
> In fact, if cfx is using process groups at all, it's possible that is the
> cause. Flash probably uses groups in its own way to enable its sandbox, and
> if cfx is setting up its own process groups that could interfere.
Not much I see yet there:
env vars:
https://github.com/mozilla/addon-sdk/blob/master/python-lib/cuddlefish/runner.py#L502
process start:
https://github.com/mozilla/addon-sdk/blob/master/python-lib/mozrunner/__init__.py#L56-L59
The no-remote flag can be interesting...
I'm sure you know a lot more about the killableprocess.py than I do :) Does it do anything special in the way it starts up the process?
Comment 23•11 years ago
|
||
Holy crap, are you guys still using killableprocess?
https://github.com/mozilla/addon-sdk/blob/master/python-lib/mozrunner/killableprocess.py#L119 could well be the cause of this. Try removing that flag and see what happens.
Comment 24•11 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #19)
> How much of a win would be if instead of hanging on sites for a long long
> time, flash plugin would simply just fail to load instead in this setup?
> Hopefully we can figure out why flash fails to init (and I think we should),
> just trying to estimate how much we would gain if Bug 889480 were fixed.
I'll certainly take that over hanging any day of the week. I do think having flash working is useful for certain use cases (add-ons that affect youtube f.e.)
Flags: needinfo?(dtownsend+bugmail)
Comment 25•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #23)
> Holy crap, are you guys still using killableprocess?
>
> https://github.com/mozilla/addon-sdk/blob/master/python-lib/mozrunner/
> killableprocess.py#L119 could well be the cause of this. Try removing that
> flag and see what happens.
we made the mistake of forking mozrunner some time ago and we haven't taken the time to get ourselves onto a more recent version :(
Assignee | ||
Comment 26•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #23)
> Holy crap, are you guys still using killableprocess?
>
> https://github.com/mozilla/addon-sdk/blob/master/python-lib/mozrunner/
> killableprocess.py#L119 could well be the cause of this. Try removing that
> flag and see what happens.
It didn't help, but at least I know where to focus now, I know some about windows processes, just not familiar with all these python code in the SDK. Has been trying to find my way in it so far... I'll play with it some more tomorrow.
Assignee | ||
Comment 27•11 years ago
|
||
CREATE_BREAKAWAY_FROM_JOB is the main problem, Flash plugin starts a child process as well at some point. If I comment it out I get all sort of errors cannot access the process, and I get all sort of errors, because we cannot access some handles...
File "c:\development\addonsdk\trunk\addon-sdk\python-lib\cuddlefish\runner.py", line 708, in run_app
runner.start()
File "c:\development\addonsdk\trunk\addon-sdk\python-lib\mozrunner\__init__.py", line 532, in start
self.process_handler = run_command(self.command+self.cmdargs, self.env, **self.kp_kwargs)
File "c:\development\addonsdk\trunk\addon-sdk\python-lib\mozrunner\__init__.py", line 60, in run_command
return killableprocess.Popen(cmd, cwd="c:/Development/mozilla/mozilla-central3/obj/dist/bin", env=env, **killable_kw
args)
File "c:\mozilla-build\python\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "c:\development\addonsdk\trunk\addon-sdk\python-lib\mozrunner\killableprocess.py", line 165, in _execute_child
winprocess.AssignProcessToJobObject(self._job, int(hp))
File "c:\development\addonsdk\trunk\addon-sdk\python-lib\mozrunner\winprocess.py", line 51, in ErrCheckBool
raise WinError()
WindowsError: [Error 5] Access is denied.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "c:\mozilla-build\python\lib\atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "c:\development\addonsdk\trunk\addon-sdk\python-lib\cuddlefish\runner.py", line 536, in maybe_remove_outfile
os.remove(outfile)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\gar
bo\\appdata\\local\\temp\\harness-stdout-m7uwfe'
Error in sys.exitfunc:
Traceback (most recent call last):
File "c:\mozilla-build\python\lib\atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "c:\development\addonsdk\trunk\addon-sdk\python-lib\cuddlefish\runner.py", line 536, in maybe_remove_outfile
os.remove(outfile)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\gar
bo\\appdata\\local\\temp\\harness-stdout-m7uwfe'
But if I comment out the CREATE_SUSPENDED, then the process starts regardless, and flash works in it. Although I have to comment out the in/out/err channels as well too, to have the console working as before, and the closing of the process is less than ideal...
There should be a way to set a flag on the job object (JOB_OBJECT_LIMIT_BREAKAWAY_OK) and then no subsequent child process will be part of this job, that might be enough for making flash to work and would be a minimal change... but I don't know how to do that from python... Frankly, I don't know much about win jobs, just found this flag on msdn.
Also, I'm not really sure why we need all this killableprocess thing. It would be great to do something simpler probably...
Comment 28•11 years ago
|
||
I suspect most of it stems from the days when Firefox would restart itself out from under you, to ensure that you could kill the actual process.
Assignee | ||
Comment 29•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #28)
> I suspect most of it stems from the days when Firefox would restart itself
> out from under you, to ensure that you could kill the actual process.
So it seems like if I don't set the CREATE_BREAKAWAY_FROM_JOB flag and don't create a windows job object (and assign the process to it), flash just works. What might be the downside of this approach (if there is any) in the current world?
Comment 30•11 years ago
|
||
The point of killableprocess is that it would not only kill the process you launch, but also any *other* child process that gets launched. Removing the job/CREATE_BREAKAWAY_FROM_JOB code will basically mean that killableprocess at best only kills the one process it launches, and not any subprocesses such as plugin processes.
Assignee | ||
Comment 31•11 years ago
|
||
I think that it's still better than the current version we have. I wish I could come up with something better, but I kind of tried out all the various flags, windows has to offer here for the job object, and flash does not seem to like any of them, or we get an access denied when trying to access the process handle. So I can keep playing with it for a while but I'm getting a bit pessimistic about finding any better solution here. The good thing that hitting the close button of the browser kills everything nicely at least, but if addons will have their separate process in the future, this is going to be a problem. What do you think Dave?
Comment 32•11 years ago
|
||
Its possible you can experiment with giving the Firefox process the JOB_OBJECT_LIMIT_BREAKAWAY_OK permission, but that assumes that Flash creates its subprocesses with CREATE_BREAKAWAY_FROM_JOB, which is... unlikely.
Assignee | ||
Comment 33•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #32)
> Its possible you can experiment with giving the Firefox process the
> JOB_OBJECT_LIMIT_BREAKAWAY_OK permission, but that assumes that Flash
> creates its subprocesses with CREATE_BREAKAWAY_FROM_JOB, which is...
> unlikely.
Exactly. I was desperate enough to try that but it is not the case, and it's unlikely that anyone will make that change... further more, any other plugin can create custom processes.
Assignee | ||
Comment 34•11 years ago
|
||
I was talking to Dave about this one, and he agreed that this approach is still a lot more than what we have right now. But we were wondering how the current mozrunner solve this whole shutting down problem without this killableprocess? And if we could integrate it / migrate to it easily?
Assignee | ||
Comment 35•11 years ago
|
||
So the current version of mozrunner does the same thing (creating job and all), so in this respect it's very similar. We might want to migrate to it at some point, but it won't fix our problem.
On try my fix seems to be green...
https://tbpl.mozilla.org/?tree=Try&rev=a61bff0e0abb
Assignee | ||
Comment 36•11 years ago
|
||
Attachment #775577 -
Flags: review?(dtownsend+bugmail)
Updated•11 years ago
|
Attachment #775577 -
Flags: review?(dtownsend+bugmail) → review+
Comment 37•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/addon-sdk
https://github.com/mozilla/addon-sdk/commit/465b49e7c0d74899e1fef7ad2f8425b4c2e85fa0
Bug 768651 - Cfx run hangs on windows on sites using flash
https://github.com/mozilla/addon-sdk/commit/980641c74debec6ed76a021fc5be13124d59164a
Merge pull request #1104 from krizsa/master
Fixing Bug 768651 - cfx run hangs on windows on sites with flash. r=Mossop
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Depends on: 897683
Comment 39•11 years ago
|
||
It's still hanging for me on youtube.com with Firefox Setup Stub 25.0b7
Comment 40•11 years ago
|
||
(In reply to cprcrack from comment #39)
> It's still hanging for me on youtube.com with Firefox Setup Stub 25.0b7
Which version of the SDK?
Comment 41•11 years ago
|
||
It's not supposed to be "fixed" until Firefox 25. The release channel is still at 24.
Comment 42•11 years ago
|
||
(In reply to John Nagle from comment #41)
> It's not supposed to be "fixed" until Firefox 25. The release channel is
> still at 24.
I downloaded from the beta channel at <http://www.mozilla.org/firefox/beta/>, version 25.0 beta 7. It's supposed to be fixed there right?
(In reply to Dave Townsend (:Mossop) from comment #40)
> Which version of the SDK?
The latest: addon-sdk-1.14
Comment 43•11 years ago
|
||
(In reply to cprcrack from comment #42)
> (In reply to John Nagle from comment #41)
> > It's not supposed to be "fixed" until Firefox 25. The release channel is
> > still at 24.
>
> I downloaded from the beta channel at
> <http://www.mozilla.org/firefox/beta/>, version 25.0 beta 7. It's supposed
> to be fixed there right?
>
> (In reply to Dave Townsend (:Mossop) from comment #40)
> > Which version of the SDK?
>
> The latest: addon-sdk-1.14
Unfortunately the current release version is too old to contain this fix. We're working on releasing 1.15 soon to address this
Comment 44•11 years ago
|
||
I'm running 1.15 and FF 26 on Win 7 pro and still appear to be getting the error. Any recommendations to investigate?
You need to log in
before you can comment on or make changes to this bug.
Description
•