Closed Bug 565312 Opened 15 years ago Closed 11 years ago

[SoftFloat] NULL pointer dereferenced in js/src/nanojit/Assembler.cpp

Categories

(Core Graveyard :: Nanojit, defect, P3)

ARM
Linux
defect

Tracking

(Not tracked)

RESOLVED WONTFIX
flash10.2

People

(Reporter: junkmailnotread, Unassigned)

References

Details

Attachments

(3 files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 Build Identifier: fennec-2.0a1pre fennec-2.0a1pre built by myself and running on an HP iPAQ hx4700 (armv5te). fennec crashes immediately after loading an internet page, for example Google. I traced the crash to js/src/nanojit/Assembler.cpp, whose function Assembler::findVictim() returns a NULL pointer to Assembler::registerAlloc(). The NULL pointer is immediately dereferenced in Assembler::registerAlloc(). Adding my own debug to Assembler::findVictim(): LIns* Assembler::findVictim(RegisterMask allow) { printf("allow=0x%x\n", allow); NanoAssert(allow); LIns *ins, *vic = 0; int allow_pri = 0x7fffffff; printf("vic=%p, allow_pri=%d, FirstReg=%d, LastReg=%d\n", vic, allow_pri, FirstReg, LastReg); for (Register r = FirstReg; r <= LastReg; r = nextreg(r)) { printf("r=%d\n", r); if ((allow & rmask(r)) && (ins = _allocator.getActive(r)) != 0) { int pri = canRemat(ins) ? 0 : _allocator.getPriority(r); printf("ins=%p, pri=%d\n", ins, pri); if (!vic || pri < allow_pri) { vic = ins; allow_pri = pri; printf("vic=%p, allow_pri=%d\n", vic, allow_pri); } } } NanoAssert(vic != 0); printf("return vic=%p\n", vic); return vic; } showed that Assembler::findVictim() was called 861 times, and returned NULL on the last call. This is the debug from the last 2 calls: allow=0x1 vic=(nil), allow_pri=2147483647, FirstReg=0, LastReg=22 r=0 ins=0xfd7e7c, pri=151 vic=0xfd7e7c, allow_pri=151 r=1 r=2 r=3 r=4 r=5 r=6 r=7 r=8 r=9 r=10 r=11 r=12 r=13 r=14 r=15 r=16 r=17 r=18 r=19 r=20 r=21 r=22 return vic=0xfd7e7c allow=0x7f0000 vic=(nil), allow_pri=2147483647, FirstReg=0, LastReg=22 r=0 r=1 r=2 r=3 r=4 r=5 r=6 r=7 r=8 r=9 r=10 r=11 r=12 r=13 r=14 r=15 r=16 r=17 r=18 r=19 r=20 r=21 r=22 return vic=(nil) Segmentation fault That final "allow" value of 0x7f0000 was the only one not to have any of the l.s. 16 bits set: % grep '^allow' db2.out | sort -u allow=0x1 allow=0x170 allow=0x190 allow=0x1c0 allow=0x1f0 allow=0x2 allow=0x270 allow=0x2b0 allow=0x320 allow=0x370 allow=0x380 allow=0x3b0 allow=0x3d0 allow=0x3f0 allow=0x4f0 allow=0x570 allow=0x5f0 allow=0x610 allow=0x670 allow=0x6f0 allow=0x750 allow=0x760 allow=0x770 allow=0x7a0 allow=0x7b0 allow=0x7d0 allow=0x7e0 allow=0x7f0 allow=0x7f0000 allow=0x7ff allow=0xbfff allow=0xfbff allow=0xfdff allow=0xfeff allow=0xff7f allow=0xffbf allow=0xffdf allow=0xffef allow=0xfffb allow=0xfffd allow=0xfffe allow=0xffff I can add more debug if required. Reproducible: Always
Output from --enable-debug build. It shows the final assertion failure from Assembler.cpp: Assertion failed: vic != 0 (/home/software/ipaq/user/mozilla-central/js/src/nanojit/Assembler.cpp:2392)
Thanks for the debug output, that helps a lot. This is on ARM. allow=0x7f0000 is equivalent to allow=FpRegs. Somehow we're trying to allocate an FpReg, and we decide that none of them are free, but then when we try to find one to evict we fail because all of them are actually free. Ie. findVictim() should not have been called. We need to work out why it was called. Sounds like the 'free' and 'active' lists have gotten out of sync. junkmailnotread, can you apply the attached patch (on top of your first debugging patch) and rerun, and attach the output from the final two occurrences again? It'll need to be a debug build.
Attachment #445014 - Attachment description: patch → patch with additional diagnostic printf calls
Attached file full debug output (deleted) —
OK here it is; my debug + your debug + --enable-debug is their entirety.
Thanks. Well that's weird. We have this: setA___=0x7f0000, set__F_=0x407f, setA_F_=0x0 So any FpReg is allowed, no FpReg is free, and yet we have this: r=0 r=1 r=2 r=3 r=4 r=5 r=6 r=7 r=8 r=9 r=10 r=11 r=12 r=13 r=14 r=15 r=16 r=17 r=18 r=19 r=20 r=21 r=22 which indicates that none of the registers have an entry in the 'active' list. Which would indicate a monumental free/active out-of-sync condition, ie. every FpReg is out-of-sync. The most interesting thing is that this is the problematic instruction: ii2d1 = ii2d js_UnboxDouble1, hcalli2 So I'm guessing it is a SoftFloat problem. junkmailnotread, is the armv5te an ARM chip that lack floating-point hardware? That would explain why others haven't hit this, because most ARM chips that Nanojit is used on do have floating-point hardware, AFAIK. Hmm, should we be trying to allocate FpRegs when using SoftFloat? I think Ed Smith knows more about SoftFloat than I do, but he's away this week. Bug 543636 is also relevant here.
BTW, this bug is marked critical but I suspect it could be downgraded, depending on how important the "HP iPAQ hx4700 (armv5te)" platform is to our goals. Input about this from people who know more about it than me would be welcome!
Summary: NULL pointer dereferenced in js/src/nanojit/Assembler.cpp → [SoftFloat] NULL pointer dereferenced in js/src/nanojit/Assembler.cpp
(In reply to comment #4) > junkmailnotread, is the armv5te an ARM chip that lack floating-point hardware? Another possibility is that it has FP hardware but the OS has disabled it. See bug 543636 comment 3.
As for FP hardware, I really had no idea. I just pass the -march=armv5te flag to my gcc cross-compiler. So I compiled a small test program with floating point operations and examined the assembler output. Sure enough it was making library calls. As for "critical", that just matches the descriptions presented when one opens a new bug; i.e. that the application crashed. I don't mind what severity is chosen.
There is no accessible FP hardware; passing -mfloat-abi=softfp caused the test program to generate an illegal instruction. I could see nothing obvious in the kernel configuration regarding FP hardware.
Severity: critical → normal
(In reply to comment #8) > There is no accessible FP hardware; passing -mfloat-abi=softfp caused the test > program to generate an illegal instruction. I could see nothing obvious in the > kernel configuration regarding FP hardware. I see. AFAIK the OS-disabling-FP-hardware thing is Windows-only. So the question becomes: do we (Mozilla, Adobe) care about this HP iPAQ hx4700 device?
(In reply to comment #9) > So the question becomes: do we (Mozilla, Adobe) care about this HP iPAQ hx4700 > device? Or more generally, do you care about devices without FP hardware.
Adobe still cares about devices without FP hardware. In 2009 we were still dealing with Arm-11 devices where the device manufacturer would disable the VFP to conserve power. It is quite possible that we're still dealing with low-end devices (e.g, settop boxes, feature phones) where an FPU is not available. I'll look at getting a more definitive answer from product mgmt.
I have solved my immediate problem by building fennec without nanojit support; it no longer crashes. I cannot tell what the performance impact has been.
(In reply to comment #11) > Adobe still cares about devices without FP hardware. It'll fall on someone from Adobe to fix this bug, then. Thanks for the info.
(In reply to comment #4) > > Hmm, should we be trying to allocate FpRegs when using SoftFloat? I think > Ed Smith knows more about SoftFloat than I do, but he's away this week. > > Bug 543636 is also relevant here. We do care about SoftFloat, yes, when used from TR. I would expect in soft-float mode, that no FP-reg requiring instructions are emitted by the LIR frontend, and no FP registers would be marked available or free by the ARM backend. A few more asserts in Assembler::gen(), or in ValidateWriter, might catch the problem earlier if this is truly a softfloat problem. Since the bug came up with TM, there isn't clearly something fixable here for TR.
(In reply to comment #14) > > Since the bug came up with TM, there isn't clearly something fixable here for > TR. AFAICT all the SoftFloat code is in NJ itself. In particular, SoftFloatFilter, which introduced the SoftFloat opcodes. So if you do care about SoftFloat I think you shouldn't ignore this problem, because it's likely to be a problem for TR as well. Bug 543636 said something about Adobe getting buildbot testing of SoftFloat up. Has that happened?
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → Windows CE
Priority: -- → P3
Hardware: Other → ARM
Target Milestone: --- → flash10.2
OS: Windows CE → Linux
Product: Core → Core Graveyard
Nanojit has been dead for several years. Its Bugzilla component has been moved to the graveyard (bug 984276). I checked all the open bugs. They're all uninteresting, so I'm WONTFIXing them all. Apologies for the bugspam.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: