Closed Bug 237977 Opened 21 years ago Closed 20 years ago

Infinite javascript loop could not stopped.

Categories

(Core :: DOM: Core & HTML, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: mark, Assigned: jst)

References

()

Details

(Keywords: hang)

Attachments

(5 files, 3 obsolete files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 Entering http://www.netbricks.net/ into the URL field hangs all Mozilla windows (browser and mail). The status bar shows "Transferring data from..." The WindowsXP Task Manager shows CPU at 100% and the memory allocated to Mozilla slowly increasing. I have left it for about 30 minutes and nothing changed (it didn't run out of memory, but probably would after many hours). The URL opens and displays OK in Opera (7.23) and MSIE (6.0.2800). Reproducible: Always Steps to Reproduce: 1. Enter http://www.netbricks.net/ in the URL windows and hit return 2. That's it! 3. Actual Results: Mozilla hangs. Expected Results: Displayed web page (or at least tried to).
Got the same result with Mozilla 1.7b
I see this also with Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7b) Gecko/20040316 I only have Process Explorer running. The stack trace of the thread shows gklayout.dll+0xbae2 gklayout.dll+0xbb82 gklayout.dll+0x13597 gklayout.dll+0x1eebb gklayout.dll+0x1d806 gklayout.dll+0x325d6 gklayout.dll+0x32fb7 gklayout.dll+0xddb6 gklayout.dll+0x9c52f gklayout.dll+0xdda1d gklayout.dll+0xe6e4c gklayout.dll+0xe4564 gklayout.dll+0xe74bd gklayout.dll+0xddcee gklayout.dll+0xe06ba gklayout.dll+0x13e577 gklayout.dll+0x13e733 xpc3250.dll+0x15ee4 js3250.dll+0x27c37 js3250.dll+0x281a5 js3250.dll+0x202bb js3250.dll+0x1bdd3 js3250.dll+0x48b4 gklayout.dll+0x14cfa2 gklayout.dll+0xc3258 gklayout.dll+0xc3051 gklayout.dll+0xc2de5 gklayout.dll+0x1b55d8 gklayout.dll+0x1b5177 gklayout.dll+0x8f7ce gklayout.dll+0xe7207 gklayout.dll+0xe5ba8 gkparser.dll+0xb0fa gkparser.dll+0x8c1f gkparser.dll+0x95d5 gkparser.dll+0x8747 gkparser.dll+0x809f gkparser.dll+0x13440 gkparser.dll+0x13243 gkparser.dll+0x12b88 gkparser.dll+0x29364 gkparser.dll+0x5eb8 gkparser.dll+0x11dee gkparser.dll+0x120ff gkparser.dll+0x14158 gkparser.dll+0x12b00 ntoskrnl.exe+0x516104d
Status: UNCONFIRMED → NEW
Ever confirmed: true
With a late nightly I got an endless list of errors: CSS Error (http://www.netbricks.net/ :1.53): Unknown property 'filter'. Declaration dropped. 'filter' is only used as a CSS property in dmenu2.js: d.write("<div style=\"position:absolute;visibility:hidden;z-index:1;filter:progid:DXImageTransform.Microsoft.Fade(duration=0.25)\""); The gklayout.dll references in my previous message are reflow functions in nsPresShell like AlreadyInQueue. The queue is growing which could be a result of the endless number of written div's and table's in the javascript. I think the reason is an endless recursion in buildMenu: if(a[i][0]!=void(0)){ ... buildMenu(a[i]); ... } I haven't too much time to analyse the exact reason. The bug es reproducable by renaming the dmenu2.js to .html, adding <html> and <body> tags and the the buildMenu() javascript commands. Maybe some one could create a minimal testcase with these information. I think mozilla should not 'hang', even if this is a javascript implementation error.
Attached file Not minimized testcase (deleted) —
Assignee: general → general
Component: Browser-General → DOM: Level 0
QA Contact: general → ian
Depends on: 240934
Keywords: perf
Confirmed. Hangs on both URL and testcase. - Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a) Gecko/20040514 Firefox/0.8.0+ - Microsoft Windows 2000 Pro 5.00.2195 SP4
Wow, quite the bug you've found here! Firefox 0.9.1+ (20040726) Mozilla 1.8.0 (2007071408) and Netscape 7.1 all froze with URL and testcase.
Attached file Same testcase, a bit less HTML output (deleted) —
I removed some of the "d.write" statements and the result is nearly the same. CPU performance goes up to 100% but after a few seconds Mozilla comes back with the popup dialog "A script on this page causing mozilla to run slowly. .... Do you want to abort the script?". This popup dialog comes also back after restoring some of the "d.write" statements, but it just takes a bit longer. I guess, that the dialog also popup for the URL if you will wait long enough. I see two bugs here: 1) The endless recursion in buildMenu. (Which could be a tech evangelism.) 2) The time before the script warning dialog popups.
(In reply to comment #7) > 1) The endless recursion in buildMenu. (Which could be a tech evangelism.) In http://www.netbricks.com/dmenu2.js is the variable "holdi" global defined, but it's only used (local) in "buildMenu" to remember the last menu item when building a submenu. Unfortunately is this variable overwritten if it builds a sub submenu. This will result in a wrong menu item that should be drawn in the main menu. The recursion stops if "ihold" is defined local. An other problem is that the test if an element of "a[i]" is an array is done by comparing "a[i][0]!=void(0)". If this is true, then it is assumed that "a[i]" is an array. This test is also true for strings in Mozilla. (I think Mozilla handles this correct.) I think a correct type test should be based on "typeof".
Blocks: 113492
> 1) The endless recursion in buildMenu. (Which could be a tech evangelism.) I filled bug 254540 for the www.netbricks.com tech evangelism. This bug should follow the issue, that the javascript execution could not stopped.
Assignee: general → general
Component: DOM: Level 0 → JavaScript Engine
Keywords: perfhang
OS: Windows XP → All
QA Contact: ian → pschwartau
Summary: www.netbricks.net - Mozilla hangs. CPU at 100%, memory use slowly increases. → Infinite javascript loop could not stopped.
Some information what I found so far. I tested this with in DevStudio. nsJSContext::DOMBranchCallback is called well and so I expected the script warning to abort the script. The marked line 479 in http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/dom/src/base/nsJSEnvironment.cpp&rev=1.236&root=/cvsroot&mark=479,486,502,2071#460 returns false every 1 minute. Line 486 every 8 minutes. And line 502 after 15 minutes. The timeout for a script defined in line 2071 is 5 seconds. So I think the break logic in DOMBranchCallback needs some review. Maybe an additional time based algorithm could help. Idea: Calculate the runtime for let say n cycles. Interpolate from this the expected cycles for a runtime of 5 seconds. http://www.netbricks.net/ shows the script warning dialog if I run it in the debugger. I don't know why it does not show it if I call the URL directly in mozilla. I'm not be able to create a real minimized testcase. I guess that mozilla builds (in a separate task?) the HTML-output of a very (endles) nested table. Maybe the real problem is a timing problem or a conflict between different tasks. Firefox 1.0PRrc also hang. I think that such javascript is rare but it is frustrating that javscript could hang a browser. Maybe it's too late, but I think firefox should not be shipped with something like this.
Flags: blocking-aviary1.0?
Use the right component, please. /be
Assignee: general → general
Component: JavaScript Engine → DOM: Level 0
jst, can you have a look at this?
Assignee: general → jst
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Attached file Simplified testcase (obsolete) (deleted) —
Attachment #161510 - Attachment is obsolete: true
Attached file Even simpler (deleted) —
There's more to this than some potentially slow DOM code on the site, the JS on the page is an infinite loop in Firefox, and not so in IE. Check out the attached testcase. If you add a "var" before the "holdi" in the for loop in the buildMenu() function the problem goes away. With the attached testcase we show the dialog that lets you terminate the script, with the real site we don't tho...
So the page does a bunch of document.writes etc, and the time between branch callbacks goes way up, so it'll take us a *long* time to get to show the dialog... I doubt there's much we can do about this for 1.0. Brendan, any clues as to why the simplified testcase is an infinite loop in Firefox but not in IE?
SpiderMonkey extends ECMA-262 Edition 3 by reflecting string characters as indexed properties whose values are unit-length strings, each containing the indexed char: for s = "hello", s[0] === 'h', s[1] === 'e', etc. This script is testing whether a[i] is an array by testing whether a[i][0] != (void)0. That's a bad idea in light of the freedom given in ECMA-262 Edition 3 Section 16 to extend objects with new properties. Someone please contact the author of this dmenu3.js file and (a) implore them to use local variables well; (b) test using (a[i].constructor == Array) or similar to determine whether a[i] is an array. /be
Resetting blocker flag, and over to evangelism.
Assignee: jst → french
Component: DOM: Level 0 → French
Flags: blocking-aviary1.0+
Product: Browser → Tech Evangelism
QA Contact: pschwartau → french
Version: Trunk → unspecified
The information in comments In addition to my comment 7: In all simplified testcases appears the warning dialog, but I never saw it for the "not minimized testcase". (In one case I waited over 12 hours.) The javascript warning will shown if I run it within a debugger. (Sorry that I didn't mentioned this before.) As per comment 9, I already filed a bug for the evangelism (bug 254540). I leaved this bug for the problem why JS doesnt't show a warning for this infinite loop. If it is possible to popup the warning dialog with the "Not minimized testcase", then this bug should closed as INVALID or it should follow the issue that the warning dialog should appear earlier (in an acceptable response time) and not after 12 hours or more.
Ok, fair enough. Back to the browser.
Component: French → DOM
Product: Tech Evangelism → Browser
Version: unspecified → 1.0 Branch
I added some debugging code to see what's going on here and everything's working as expected, sortof. The problem is that what the page is doing ends up being an O(n^2) operation (or something of that order) where every iteration through the infinite loop takes longer and longer, and we'd need 64k branch callbacks to offer to stop the script. By the time I got to 20k things were going really slow already, and I just gave up (after maybe 10k). Had I waited long enough, I bet the dialog would have shown up (unless we'd run out of memory before that point)... We *could*, and probably should, drop the number of required branch callbacks before we display the dialog since I was running Firefox for some time today and never ever saw us get past the initial filter in the branch callback function (4096 callbacks), so we wouldn't hurt performance in general cases, and we'd get the benefits of a browser that's harder to lock up.
Assignee: french → jst
In comment 10 I suggested to implement a time depending algorithm. I think about something like this: 1) Perform every 4096 callbacks a check. 2) Calculate the runtime for the first check. 3) Calculate the number of checks which are expected for a runtime of 5 seconds. I guess that in cases like O(n^2) these 5 seconds are exceeded by the first check. (Or the number of checks is low enough.) 4) Show the dialog after the number of calculted checks. The values are only suggestions based on the existing code. Maybe other improvements like a recalculation of the number of checks every 10 checks are also usefull.
Only an idea idea: Check the javascript performance of the computer (in the startup sequence) and depending on this is the callback check threshold calculated.
(In reply to comment #23) > Only an idea idea: Check the javascript performance of the computer (in the > startup sequence) and depending on this is the callback check threshold calculated. That'll give us inconsistent behaviour since there's no way to know that what we measured wasn't under heavy load. It won't be reliable.
This patch is only a suggestion for more (and acurate) flexible and time based outer check could be implemented. The URL self runs for approx. 10 seconds. before the warning appears. Even if the machine is slow during the first check, is the correct number of checks recalculated before the warning appears. The worst case is if the computer s performance increases slightly, so each time is a new number of checks calculated. I think that the algorithm self doesn't need too much time, because the most of the calls are filtered by the first "if" statement before. Possible problems 1) is the initial PR_Now() correct placed? 2) duration must be non zero but this is expected but not checked. (We have 4096 calls to the function before we calculate the duration) 3) I don't know if the usage of the long macros and data types is correct. 4) At least 4096 calls are made before a time check is done. 5) A linear "time/number of callbacks" is expected. (We will calculate a too large number for O(n^2) problems, but I think the result is also acceptable in those cases.)
Attached patch Simpler approach. (obsolete) (deleted) — Splinter Review
This is a simpler approach, that I think will give us more consistent results and should IMO be good enough. What it does is to lower the number of callbacks needed to initialize the start time of the script from 32k to 256, and from then on check the time every 4k callbacks, that way this works fairly well since even in O(n^2) JS code we'll get to 4096 in a fairly reasonable ammount of time, compared to the 64k callbacks it used to take to hit the time check. And even if these numbers seem fairly low, we don't hit even the stat time initialization code (i.e. we don't get 256 callbacks) while starting Firefox and doing some normal browsing on random sites. And sure, the number of time comparison's we'll do in (pathological) tight loops goes up, but it's not like they're *that* expensive after all.
Can we get TIBET and other intensive JS frameworks and apps tested on a build with this patch? jst can probably furnish a Windows Firefox build. /be
Sure, if jst can shoot me a build (Windows or Linux, doesn't matter), I'd be more than happy to. I don't build Mozilla regularly here. Cheers, - Bill
Comment on attachment 162753 [details] [diff] [review] Simpler approach. + if (++ctx->mBranchCallbackCount == INITIALIZE_TIME_BRANCH_COUNT) mBranchCallbackCount can overflow so this can lead to an infinite loop.
Blocks: 261974
QA Contact: french → ian
Version: 1.0 Branch → Trunk
Attachment #162753 - Attachment is obsolete: true
Comment on attachment 167784 [details] [diff] [review] Updated diff of the simpler approach with no wrap-around problems. Brendan, if you r+sr I can land this and we'll see how it performs in the real world. I've been running with this for a while now w/o ever seeing this dialog pop up when it shouldn't.
Attachment #167784 - Flags: superreview?(brendan)
Attachment #167784 - Flags: review?(brendan)
I've been buried here so I haven't worked through the problems I had with the private build that jst provided. I would still be to put TIBET through its paces with this patch, if I could get a functional build (I'm probably being stupid somewhere :-)). I can't think of a library that exercises the JS engine more than we do. I will have some time within the next couple of days to do this. Otherwise, if Brendan wants to r+sr, I can just download a nightly and, if TIBET runs into real problems, I'll either file against this bug or open a new one. Whichever you guys want to do, I'm game for. Cheers, - Bill
Comment on attachment 167784 [details] [diff] [review] Updated diff of the simpler approach with no wrap-around problems. I'd rather see one test that causes an early return, so can you use the smallest mask (INITIALIZE_TIME_BRANCH_MASK 0xff) and return if the incremented counter is not zero masked with that? If it is 0, then you can do the equality test with that mask + 1 (256), and do further stuff such as the MAYBE_GC mask test. /be
Attachment #167784 - Attachment is obsolete: true
Attachment #167784 - Flags: superreview?(brendan)
Attachment #167784 - Flags: review?(brendan)
Attachment #168174 - Flags: superreview?(brendan)
Attachment #168174 - Flags: review?(brendan)
Comment on attachment 168174 [details] [diff] [review] Faster than the above, per Brendan's suggestion. Nit: INIT_TIME_BRANCH_COUNT_MASK is more consistent with the other *_COUNT_MASK names (past, and single present example), and still readable. r+sr=me, thanks! /be
Attachment #168174 - Flags: superreview?(brendan)
Attachment #168174 - Flags: superreview+
Attachment #168174 - Flags: review?(brendan)
Attachment #168174 - Flags: review+
Fix checked in. Looks like the site in this bug has changed since this was reported, so hard to use that to verify this now...
Status: NEW → ASSIGNED
Johnny, is the testcase in bug 261974 useful for verifying?
Maybe, in a build with SVG enabled, that is.
Oh, and marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Initial testing with the TIBET framework on: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a6) Gecko/20041213 shows that we're ok. Still reserving the right to yell at some point :-). Thanks Johnny. Cheers, - Bill
The first textcase, not minimised (attachment 144328 [details]), still hangs SeaMonkey 1.0a, without any warning dialog... (stack : http://rafb.net/paste/results/7hbZgb99.nln.html )
(In reply to comment #41) attachment 144328 [details] causes the slow dialog warning in FF 1.6a1/WinXP from today with a fresh profile and the default dom.max_script_run_time of 5 seconds.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: