Closed
Bug 51954
Opened 24 years ago
Closed 24 years ago
Random crashes in id lookup
Categories
(Core :: JavaScript Engine, defect, P2)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
M18
People
(Reporter: pschwartau, Assigned: brendan)
References
Details
(Keywords: crash, Whiteboard: [nsbeta3+][rtm++] FIX IN TRUNK, READY FOR RTM)
Attachments
(3 files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
Using JS shell built 2000-09-08 on WinNT, Linux, and MacOS9.
The above testcase now crashes on Linux and Mac.
On WinNT: doesn't crash, but takes 28 seconds to complete.
Here is the part of the testcase that seems to cause the crash,
with a debugging "print" added:
function addTestCase( t ) {
for ( d = 0; d < TimeInMonth(MonthFromTime(t)); d+= msPerDay ) {
t += d;
testcases[tc++] = new TestCase( SECTION,
"(new Date("+t+")).getDate()",
DateFromTime(LocalTime(t)),
(new Date(t)).getDate() );
}
print ("t is " + t);
exit (3);
}
Here is what appeared in the console:
t is -62132140800000
and then: CRASH! Will attach Linux stack trace below...
Reporter | ||
Comment 1•24 years ago
|
||
Comment 2•24 years ago
|
||
I saw that the stack trace included global_resolve, which is defined in js.c.
On a hunch, I did blame, and found code there bracketed in #ifdef
LAZY_STANDARD_CLASSES that uses the new lazy class initialization APIs. (bug
46703)
If I comment out #define LAZY_STANDARD_CLASSES earlier in the file, the test
runs to completion. It still takes a long time, so I don't think that's part of
the problem.
The trace also includes 'StdNameToAtom (cx=0x80d7198, name=0x80cd3fc "escape")'
... so it seems that we're trying to resolve 'escape' when we run into trouble.
'escape' may have gone away recently as a fix to 42221, so it seems likely that
this problem arises from an interaction between the fixes to 42221 and 46703.
CCing Brendan for a look.
Assignee | ||
Comment 3•24 years ago
|
||
I've been running the testsuite successfully for a while, during and long after
the LAZY_STANDARD_CLASSES stuff was committed.
mccabe, which pointer is bad, and can you find out why? I'll look too, in a
bit.
/be
Comment 4•24 years ago
|
||
Brendan -
Are you running the testsuite from the standalone JS shell? It's the missing
(or possibly not entirely gone) 'escape' function that I suspect.
Assignee | ||
Comment 5•24 years ago
|
||
Yes, I'm running js/tests/jsDriver.pl -e smdebug -L lc2 lc3 -t and getting no
crashes. At least I did two or three days ago. How do you make it crash? And
did you debug around a bit to see what the bad address was, and whence it came?
/be
Comment 6•24 years ago
|
||
I'm getting an assert failure now...
[rginda@swamp tests] perl jsDriver.pl -e smdebug -l ecma/Date/15.9.5.10-2.js -f
/dev/null -k
-#- Executing 1 test(s).
*-* Testcase ecma/Date/15.9.5.10-2.js failed:
Expected exit code 0, got 134
Testcase terminated with signal 0
Complete testcase output was:
Assertion failure: first, at jscntxt.c:85
-#- Wrote results to '/dev/null'.
-#- 1 test(s) failed
Comment 7•24 years ago
|
||
ok, I guess I needed to make clean. I don't see any failure in the date
testcase anymore, but instead, five other segfaults show up under the ecma
suite. I'll go discuss this with phil and try to sort it out.
Comment 8•24 years ago
|
||
I'm testing on Linux with the standalone js engine. Executing
'load("../jsref.js");load("15.9.5.10-2.js")' in the tests/ecma/Date directory
gets me a crash. I think it has to do with overflowing the GC heap before a GC
can occur, but I'm still looking.
Brendan, sorry for the poorly founded suppositions earlier.
Comment 9•24 years ago
|
||
I'm still seeing this. It looks like it was introduced with the
LAZY_STANDARD_CLASSES checkin of 8/19/2000. (Verified via before-and-after
checkouts.)
I'm still using the same recipe to reproduce. It looks like something elusive,
as making the testcase generate fewer objects (it uses many) or using the proper
"shell.js" test init file rather than "jsref.js" hides the problem.
In the JS shell, load("../jsref.js"); load("15.9.5.10-2.js"); foo; (where foo is
undefined) crashes. The crash occurs when the engine tries to look up "foo" via
js.c's global_resolve; this runs into the JS_ResolveStandardClass code. This
bombs when trying to atomize some of the standard class names to compare to
"foo", as the hashtable entries for some standard names ("escape") point to
garbage. Just why, I don't know.
Updated•24 years ago
|
Summary: Testcase mozilla/js/tests/ecma/Date/15.9.5.10-2.js failing → Random crashes in id lookup
Assignee | ||
Comment 10•24 years ago
|
||
Try purify.
Any GC'ing going on? A breakpoint will tell.
/be
Comment 11•24 years ago
|
||
mccabe and I spent some time looking at this. Purify did not gather any extra
info - it crashed in the same place but no error or warnings preceeded that.
What we *did* do was find the hashtable entry that contains a garbage string. We
set a data breakpoint for that data changing and let it rip. We then got a stack
trace for the location where that JSString - that *should* be rooted in the atom
table - is free'd. This is before the point when it will later crash. This free
happens in a finalization in gc. The *really* interesting thing is that this is
a gc invoked from the "last ditch" gc in js_AllocGCThing!
This links us to the problem in the marking phase of the same sort of "last
ditch" gc we see in bug 53123.
I'll attach the stack of the gc location.
Comment 12•24 years ago
|
||
Assignee | ||
Comment 13•24 years ago
|
||
Oh My Gawd.
This is an ancient bug in atom mark/sweep GC. Patch coming up. I can't believe
this wasn't found sooner. The gist of it is that only pinned atoms have their
keys marked, and therefore not swept. Worse, we rely on marking pinned atoms
(not their keys, mind you) to keep the pinned atoms (again, not their keys) from
being swept. Argh!
/be
Assignee | ||
Comment 14•24 years ago
|
||
Wait, it's not as bad as I said -- I was on goof gas. The problem is that the
atom mark function wants to mark only pinned atoms, and it was doing that, but
it was failing to mark all atom's keys. This too seems right: some atoms are
garbage, and so are their keys. If a garbage atom has a non-garbage key, the
key will be marked otherwise (via a live object's strong ref to that string
jsvsal, say).
Now, in the "last ditch" case where js_AllocGCThing calls js_GC, we won't sweep
any atoms (the GC_KEEP_ATOMS gcflag is passed to js_GC) because atoms are never
rooted (they aren't jsvals, too bad; old old design decision). But note well!
Atoms are not swept, but their keys may be, because we have failed to mark those
keys (such as the string "escape"), and no other ref may exist to the key.
Patch still coming up, but this isn't as dire as I thought in my goof-gas state:
it afflicts only the last-ditch GC attempt in js_AllocGCThing.
/be
Assignee | ||
Comment 15•24 years ago
|
||
Assignee | ||
Comment 16•24 years ago
|
||
mccabe, jband: you guys deserve the credit for leading this horse to water and
making him drink. Gimme some testing, r= and a= love, ok?
beard: please get me nsbeta3++ approval once the r's and a's are lined up.
/be
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Comment 17•24 years ago
|
||
r=mccabe, looks good and makes sense. The testcase now works OK.
Beard is out of town, so another ++er is needed.
Assignee | ||
Comment 18•24 years ago
|
||
Looking for a=jband, need trudelle to play beard here. Peter, this is not a
topcrash bug that we know of, but it should be rtm++ at least. I'm concerned
that it and 53123-reopen (which is topcrash) are accounting for other topcrash
bugs that haven't been diagnosed. I'd like to get the fix in for nsbeta3 too,
therefore -- less noise for the remaining bug signal to rise above.
/be
Comment 19•24 years ago
|
||
a=jband. This looks right to me. The crasher test now runs ok. I can imagine
this bug causing some havoc.
Comment 20•24 years ago
|
||
Only PDT can double-plus, but I'll give it an nsbeta3+/rtm+ to get on their
radar, and some persuasive whiteboard verbiage. To get it into nsbeta3, someone
will have to plead the merits to PDT, and given the lateness, it will have to be
on the basis of known horrible effect on the user (i.e. P1), preferably combined
with a simple/safe/well-tested fix. I don't see the evidence here yet. Noise
argument can also be used as a reason *not* to change nsbeta3 bits unnecessarily.
Whiteboard: [nsbeta3+][rtm+] FIX READY TO LAND
Assignee | ||
Comment 21•24 years ago
|
||
"Noise argument can also be used as a reason *not* to change nsbeta3 bits
unnecessarily."
Certainly, and that's the PDT's call. But fix-in-hand should mean something.
The fix here is not going to make things worse.
/be
Assignee | ||
Comment 22•24 years ago
|
||
Fix checked into trunk.
/be
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 23•24 years ago
|
||
Oops, I should keep this open so the trunk changes get onto the Netscape 6
branch after PR3, for RTM.
/be
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [nsbeta3+][rtm+] FIX READY TO LAND → [nsbeta3+][rtm+] FIX IN TRUNK, READY FOR RTM
Assignee | ||
Updated•24 years ago
|
Status: REOPENED → ASSIGNED
Comment 24•24 years ago
|
||
PDT marking [rtm++]
Whiteboard: [nsbeta3+][rtm+] FIX IN TRUNK, READY FOR RTM → [nsbeta3+][rtm++] FIX IN TRUNK, READY FOR RTM
Assignee | ||
Comment 25•24 years ago
|
||
Fix is in the branch too, for RTM.
/be
Assignee | ||
Comment 26•24 years ago
|
||
Forgot to close this!
/be
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 24 years ago
Resolution: --- → FIXED
Comment 28•22 years ago
|
||
*** Bug 174871 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•