Closed
Bug 392294
Opened 17 years ago
Closed 14 years ago
long delay on shutdown if a global js variable access from outside context
Categories
(Core :: XPConnect, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: ynvich, Unassigned)
Details
(Keywords: perf)
Attachments
(4 files, 1 obsolete file)
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-1)
Build Identifier: Mozilla-1.9a8pre/20070807
This is happening in a custom xulrunner application. Xulrunner version is 1.9a8pre, last checkout was on 2007-08-07. OS is GNU/Linux (debian/unstable).
I have a javascript testing module which implements nsIWebProgressListener. The module checks for window state after async commands (mostly chrome URL loads and history loads). The window the global variables, some of which are assigned instances of C++ components.
Let's assume
testWindow - is an XPC wrapped instance of the window,
gNative - is an instance of any native XPCOM component,
gScripted - is an instance of a javascript object in the window context
If testWindow.wrappedJSObject.gNative is accessed, xulrunner will spend at least 20 seconds for shutdown on 1.1GHz Pentium-M w/1Gb RAM (System is good enough, mozilla debug build takes around 45 mins).
If gScripted has a member function getNative() {return gNative;}, and testWindow.wrappedJSObjetc.gScipted.getNative() is acquired, xulrunner will exit normally - instantly.
Reproducible: Always
Actual Results:
20 seconds of 100%CPU usage at shutdown
Expected Results:
instant shutdown
I have tried to quickly find a good example in mozilla code, but in vain. I have a workaround, so this is not a critical bug. At the same time, exposed behavior is definitely not normal, that is why this bug is submitted.
Feel free to ask for more details or testing if needed.
Comment 1•17 years ago
|
||
20 seconds is plenty of time to get stacks using gdb or pstack, or whatever is similar on your OS. Without that poor man's profiling data, it's impossible to assign this bug to a component. So please do provide that data.
If possible, please provide your XULRunner app or a reduced version so others can reproduce what you are seeing.
In the mean time, punting to XPConnect.
/be
Assignee: general → nobody
Component: JavaScript Engine → XPConnect
QA Contact: general → xpconnect
Reporter | ||
Comment 2•17 years ago
|
||
This backtrace is taken in the middle of the shutdown. Maybe 9-10 sec after my tests completed. After 'continue' command the program ran for 10 sec more.
Reporter | ||
Comment 3•17 years ago
|
||
(In reply to comment #1)
> If possible, please provide your XULRunner app or a reduced version so others
> can reproduce what you are seeing.
The app is available via git at git://repo.or.cz/abstract.git, via svn at svn://abstract.svn.sf.net/svnroot/abstract
I will attach a tarball here.
Reporter | ||
Comment 4•17 years ago
|
||
It requires installed xulrunner of at least 1.9a7, automake-1.10, autoconf-2.61.
And it uses nsDeque so 'xpcom/ds/libxpcomds_s.a' is needed in $LDPATH (sdk/lib should be fine).
./check.mk is a bootstrapping makefile, default target will run tests which expose the issue.
The issue is caused by line 161 in 'app/aaTestVC.js'. If the line is removed, everything is fine.
Reporter | ||
Comment 5•17 years ago
|
||
(In reply to comment #3)
> The app is available via git at git://repo.or.cz/abstract.git, via svn at
> svn://abstract.svn.sf.net/svnroot/abstract
But no bug is exposed there. Bugs are not committed, sorry for noise. The tarball contains the wrong line.
Reporter | ||
Comment 6•17 years ago
|
||
I can cause a similar cyclical processing on shutdown, if I create a reference cycle and assign one of the cycle's vertices to a javascript variable.
But that is not the case in the bug description since all object are already assigned to javascript variables.
Maybe cross-context calls create a reference cycle of some kind?
Comment 7•17 years ago
|
||
I'm experiencing some very similar symptoms on Firefox 3 nightly 20071129 (along with previous versions including 3b1 but not 2.x).
In case it was something to do with the javascript on my homepage (iGoogle) I set my default startup page to blank in Firefox's prefs. It still happens with the blank page. I start firefox, then click the close box on the window and my CPU usage for the firefox process goes to 100% for about 10-20 seconds.
I got some gdb stacks for the threads during this time which are attached.
Additionally because the stacks did not give much info, I did an strace on this to see what kinds of system calls are going on. See strace attachment.
System info:
Ubuntu 7.10,
libgtk: 2.12.0-1ubuntu3
gnome: 2.20.1-0ubuntu1
x.org: 7.2-5ubuntu13
metacity: 2.20.0-0ubuntu3
firefox: Build identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007113004 Minefield/3.0b2pre
Comment 8•17 years ago
|
||
Comment 9•17 years ago
|
||
Updated•17 years ago
|
Attachment #290975 -
Attachment mime type: application/octet-stream → text/plain
Updated•17 years ago
|
Attachment #290976 -
Attachment mime type: application/octet-stream → text/plain
Comment 10•17 years ago
|
||
I managed to get a better stack trace by using my own build from the trunk. It looks like this is a regression from the new thread manager (bug 326273).
Attachment #290975 -
Attachment is obsolete: true
Reporter | ||
Comment 11•17 years ago
|
||
(In reply to comment #6)
> Maybe cross-context calls create a reference cycle of some kind?
I am practically sure about this statement now.
I've just build my app on Ubuntu 7.10, and got the delay, which I don't experience on Debian/unstable.
I have bisected my code on Ubuntu to locate the cause of delay. The results are intriguing. This first cause, was a refcount cycle (me bad :) between c++ XPCOMs. The delay went away after, the leak was fixed. But it was not the only cause. The second case is a bit more difficult to explain:
1. Create an instance of a native XPCOM (N1) in a chrome JS (Context1).
2. Invoke a JS XPCOM.
3. Create an instance of a native XPCOM (N2) in a JS XPCOM (Context2).
4. Acquire N1 from Context2 using wrappedJSObject
5. Assign N2 into N1 in Context2
6. Access N1.N2 from Context1.
Can you capture that in a minimal test? Also, running a profile of shutdown (could use oprofile or kcachegrind) would make it a lot easier to figure out what's going on.
(Does this happen with the current trunk? I'm not sure if you're still using that old alpha in your testing.)
Reporter | ||
Comment 13•17 years ago
|
||
(In reply to comment #12)
> Can you capture that in a minimal test? Also, running a profile of shutdown
> (could use oprofile or kcachegrind) would make it a lot easier to figure out
> what's going on.
Not that it can be done in a minimal test, since it needs at least 4 different files plus boilerplate. But my app is using mozilla build system, so it should relatively easy to reproduce:
(In mozilla tree)
hg-clone http://hg.aasii.org/abstract abstract
cd abstract
vi app/test/unit/mozconfig (edit libxulsdk location)
make -s -f check.mk
The colliding JS contexts are in app/chrome/content/transcript.js line 211, and app/test/aaTestAccountViews.js line 474.
Of course, I can run profiling software on the app, but need more detailed instructions for that.
> (Does this happen with the current trunk? I'm not sure if you're still using
> that old alpha in your testing.)
It happens with xulrunner-1.9b2, built from firefox-3.0b2-source.tar.bz2, my fault not to mention this.
For quick reference, ubuntu version can be downloaded from sf.net
http://downloads.sourceforge.net/abstract/xulrunner-upstream_1.8.99%2B1.9b2-ubuntu1_i386.deb
http://downloads.sourceforge.net/abstract/xulrunner-upstream-dev_1.8.99%2B1.9b2-ubuntu1_all.deb
Reporter | ||
Comment 14•17 years ago
|
||
(In reply to comment #11)
> I've just build my app on Ubuntu 7.10, and got the delay, which I don't
> experience on Debian/unstable.
And it is not happening on Windows XP either.
Reporter | ||
Comment 15•17 years ago
|
||
(In reply to comment #11)
> I've just build my app on Ubuntu 7.10, and got the delay, which I don't
> experience on Debian/unstable.
Further experiments have made the case even more interesting. Default g++ on Ubuntu 7.10 is 4.1.3 prerelease. If my app is built with g++-4.2, everything is fine.
So, it looks like, a bug in g++-4.1.3 is causing CPU-intensive processing on XPCOM shutdown, if any XPCOM refcycle participant is ever stored into a JS variable.
Since the behavior is compiler-dependent, this bug might be a WORKSFORME candidate. However, that compiler bug has a positive side effect of exposing XPCOM leaks. If the nature of the bug is discovered, it may become a basis for leak-detecting tool.
Reporter | ||
Updated•14 years ago
|
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•