Closed Bug 124335 Opened 23 years ago Closed 23 years ago

Lazy-state bitpacking in nsXULElement.h assumes 8-byte aligned pointers

Categories

(Core :: XUL, defect)

defect
Not set
blocker

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: sfraser_bugs, Assigned: shaver)

References

Details

(Keywords: crash)

Attachments

(2 files, 1 obsolete file)

We have some bad memory accesses in XUL/XBL code that cause Mozilla to die on startup. I can repro this using both CFM and Mach-O builds. To repro: Fire up MallocDebug on OS X (which comes with the developer tools). Choose File->New Window, and pick the Mozilla app (either CFM or Mach-O), and Launch. Mozilla will crash: Macho: Command: mozilla-bin PID: 403 Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_INVALID_ADDRESS (0x0001) at 0xfffffffc Thread 0 Crashed: #0 0x003a1854 in nsCOMPtr_base::~nsCOMPtr_base(void) #1 0x01351580 in nsClassList::~nsClassList(void) #2 0x0134ff60 in nsClassList::ParseClasses(nsClassList **, nsAString const &) #3 0x013578b4 in nsXULElement::SetAttr(nsINodeInfo *, nsAString const &, int) #4 0x013499bc in nsXMLContentSink::AddAttributes(unsigned short const **, nsIContent *, int) #5 0x013475a0 in nsXMLContentSink::HandleStartElement(unsigned short const *, unsigned short const **, unsigned int, unsigned int, unsigned int) #6 0x013b25f0 in nsXBLContentSink::HandleStartElement(unsigned short const *, unsigned short const **, unsigned int, unsigned int, unsigned int) #7 0x005cede0 in nsExpatDriver::HandleStartElement(unsigned short const *, unsigned short const **) #8 0x005ce614 in Driver_HandleStartElement(void *, char const *, char const **) #9 0x005f2210 in doContent #10 0x005f18bc in contentProcessor #11 0x005f3148 in cdataSectionProcessor #12 0x005f13a0 in XML_Parse #13 0x005cff98 in nsExpatDriver::ParseBuffer(char const *, unsigned int, int) #14 0x005d0250 in nsExpatDriver::ConsumeToken(nsScanner &, int &) #15 0x005e7cec in nsParser::Tokenize(int) #16 0x005e5488 in nsParser::ResumeParse(int, int, int) #17 0x005e7970 in nsParser::OnDataAvailable(nsIRequest *, nsISupports *, nsIInputStream *, unsigned int, unsigned int) #18 0x013c14c0 in nsXBLStreamListener::OnDataAvailable(nsIRequest *, nsISupports *, nsIInputStream *, unsigned int, unsigned int) #19 0x00a64940 in nsJARChannel::OnDataAvailable(nsIRequest *, nsISupports *, nsIInputStream *, unsigned int, unsigned int) #20 0x00a21b0c in nsOnDataAvailableEvent::HandleEvent(void) #21 0x00a125c4 in nsARequestObserverEvent::HandlePLEvent(PLEvent *) #22 0x00360fe4 in PL_HandleEvent #23 0x00360ee0 in PL_ProcessPendingEvents #24 0x00361e74 in nsEventQueueImpl::ProcessPendingEvents(void) #25 0x00743394 in nsMacNSPREventQueueHandler::ProcessPLEventQueue(void) #26 0x00749928 in Repeater::DoRepeaters(EventRecord const &) #27 0x00731920 in nsMacMessagePump::DispatchEvent(int, EventRecord *) #28 0x00726894 in nsAppShell::DispatchNativeEvent(int, void *) #29 0x00684100 in nsXULWindow::ShowModal(void) #30 0x0068202c in nsContentTreeOwner::ShowAsModal(void) #31 0x0047c088 in nsWindowWatcher::OpenWindowJS(nsIDOMWindow *, char const *, char const *, char const *, int, unsigned int, long *, nsIDOMWindow **) #32 0x0047b36c in nsWindowWatcher::OpenWindow(nsIDOMWindow *, char const *, char const *, char const *, nsISupports *, nsIDOMWindow **) #33 0x008fd898 in nsProfile::LoadDefaultProfileDir(nsCString &, int) #34 0x008fcc94 in nsProfile::StartupWithArgs(nsICmdLineService *, int) #35 0x0068a364 in nsAppShellService::DoProfileStartup(nsICmdLineService *, int) #36 0x00005228 in InitializeProfileService(nsICmdLineService *) #37 0x000061b0 in main1(int, char **, nsISupports *) #38 0x0000694c in main #39 0x00002910 in _start #40 0x00002740 in start CFM: Exception: EXC_BAD_INSTRUCTION (0x0002) Code[0]: 0x00000002 Code[1]: 0x035699d4 Thread 0 Crashed: #0 0x035699d4 in 0x35699d4 #1 0xbfffec58 in 0xbfffec58 #2 0x033bf7cc in nsXULElement::_dt(void) #3 0x033bfc04 in nsXULElement::Release(void) #4 0x006b42b4 in nsCOMPtr_base::_dt(void) #5 0x03398428 in nsXULDocument::OverlayForwardReference::_dt( (void)) #6 0x0338860c in nsXULDocument::ResolveForwardReferences(void) #7 0x0339541c in nsXULDocument::ResumeWalk(void) #8 0x03385498 in nsXULDocument::OnPrototypeLoadDone(void) #9 0x0339faa8 in nsXULPrototypeDocument::NotifyLoadDone(void) #10 0x033852e4 in nsXULDocument::EndLoad(void) #11 0x03378ce8 in XULContentSinkImpl::DidBuildModel(int) #12 0x02dd6af4 in nsExpatDriver::DidBuildModel(unsigned int, int, nsIParser *, nsIContentSink *) #13 0x02dbbea0 in nsParser::DidBuildModel(unsigned int) #14 0x02dbd294 in nsParser::ResumeParse(int, int, int) #15 0x02dbfae8 in nsParser::OnStopRequest(nsIRequest *, nsISupports *, unsigned int) #16 0x02bec854 in nsJARChannel::OnStopRequest(nsIRequest *, nsISupports *, unsigned int) #17 0x02c0646c in nsOnStopRequestEvent::HandleEvent(void) #18 0x02c05880 in nsARequestObserverEvent::HandlePLEvent(PLEvent *) #19 0x0068afe0 in PL_HandleEvent #20 0x0068ae4c in PL_ProcessPendingEvents #21 0x006303fc in nsEventQueueImpl::ProcessPendingEvents(void) #22 0x02e5599c in nsMacNSPREventQueueHandler::ProcessPLEventQueue(void) #23 0x02e55760 in nsMacNSPREventQueueHandler::RepeatAction(EventRecord const &) #24 0x02ea7b14 in Repeater::DoRepeaters(EventRecord const &) #25 0x02e69498 in nsMacMessagePump::DispatchEvent(int, EventRecord *) #26 0x02e69070 in nsMacMessagePump::DoMessagePump(void) #27 0x02e689ac in nsAppShell::Run(void) #28 0x02e1bc3c in nsAppShellService::Run(void) #29 0x00558c0c in main1(int, char **, nsISupports *) #30 0x005596dc in main Macho build is from today, CFM build from yesterday.
Severity: normal → major
Keywords: crash
Another stack (Mach-O debug build) Cannot access memory at address 0x55555554 #1 0x012d9090 in nsVoidArray::InsertElementAt (this=0x6e2da0, aElement=0x117327f0, aIndex=0) at nsVoidArray.cpp:408 #2 0x085cc4ec in nsXULAttributes::AppendElement (this=0x6e2d88, aElement=0x6e2da0) at nsXULAttributes.h:179 #3 0x085d6a78 in nsXULElement::SetAttr (this=0x11e5a84c, aNodeInfo=0x11e55890, aValue=@0xbfffd778, aNotify=0) at nsXULElement.cpp:2625 #4 0x085d70b8 in nsXULElement::SetAttr (this=0x11e5a84c, aNameSpaceID=0, aName=0xa0b755c, aValue=@0xbfffd778, aNotify=0) at nsXULElement.cpp:2695 #5 0x101131e0 in nsDocElementBoxFrame::CreateAnonymousContent (this=0x11e5e4b4, aPresContext=0x114e55e0, aAnonymousItems=@0x11e5aa04) at nsDocElementBoxFrame.cpp:148 #6 0x10082230 in nsCSSFrameConstructor::CreateAnonymousFrames (this=0x116b9174, aPresShell=0xcc02004, aPresContext=0x114e55e0, aState=@0xbfffdd4c, aParent=0x11dcbef8, aDocument=0xe94f004, aParentFrame=0x11e5e4b4, aChildItems=@0xbfffdb60) at nsCSSFrameConstructor.cpp:4886 #7 0x10082100 in nsCSSFrameConstructor::CreateAnonymousFrames (this=0x116b9174, aPresShell=0xcc02004, aPresContext=0x114e55e0, aTag=0x0, aState=@0xbfffdd4c, aParent=0x11dcbef8, aNewFrame=0x11e5e4b4, aChildItems=@0xbfffdb60, aIsRoot=1) at nsCSSFrameConstructor.cpp:4865 #8 0x1007d268 in nsCSSFrameConstructor::ConstructDocElementFrame (this=0x116b9174, aPresShell=0xcc02004, aPresContext=0x114e55e0, aState=@0xbfffdd4c, aDocElement=0x11dcbef8, aParentFrame=0x11e5e16c, aParentStyleContext=0x11e5e23c, aNewFrame=@0xbfffde14) at nsCSSFrameConstructor.cpp:3237 #9 0x1008d208 in nsCSSFrameConstructor::ContentInserted (this=0x116b9174, aPresContext=0x114e55e0, aContainer=0x0, aChild=0x11dcbef8, aIndexInContainer=0, aFrameState=0x0, aInContentReplaced=0) at nsCSSFrameConstructor.cpp:8549 #10 0x087652e8 in StyleSetImpl::ContentInserted (this=0x11fca4c, aPresContext=0x114e55e0, aContainer=0x0, aChild=0x11dcbef8, aIndexInContainer=0) at nsStyleSet.cpp:1445 #11 0x0ffe0798 in PresShell::InitialReflow (this=0xcc02004, aWidth=17, aHeight=17) at nsPresShell.cpp:2630 #12 0x08605b08 in nsXULDocument::StartLayout (this=0xe94f004) at nsXULDocument.cpp:4406 #13 0x0860d200 in nsXULDocument::ResumeWalk (this=0xe94f004) at nsXULDocument.cpp:5941 #14 0x085faafc in nsXULDocument::EndLoad (this=0xe94f004) at nsXULDocument.cpp:1679 #15 0x085ec8c8 in XULContentSinkImpl::DidBuildModel (this=0x11d6481c, aQualityLevel=0) at nsXULContentSink.cpp:536 #16 0x0223346c in nsExpatDriver::DidBuildModel (this=0x11e1e358, anErrorCode=0, aNotifySink=1, aParser=0x11dbcaf8, aSink=0x11d6481c) at nsExpatDriver.cpp:841 #17 0x02254140 in nsParser::DidBuildModel (this=0x11dbcaf8, anErrorCode=0) at nsParser.cpp:1385 #18 0x022553d0 in nsParser::ResumeParse (this=0x11dbcaf8, allowIteration=1, aIsFinalChunk=1, aCanInterrupt=1) at nsParser.cpp:1892 #19 0x0225717c in nsParser::OnStopRequest (this=0x11dbcaf8, request=0x11d82894, aContext=0x0, status=0) at nsParser.cpp:2516 #20 0x04b31a84 in nsJARChannel::OnStopRequest (this=0x11d82894, jarExtractionTransport=0x11d7a098, context=0x0, aStatus=0) at nsJARChannel.cpp:611 #21 0x04a9eb00 in nsOnStopRequestEvent::HandleEvent (this=0x11e3b450) at nsRequestObserverProxy.cpp:212 #22 0x04a9d524 in nsARequestObserverEvent::HandlePLEvent (plev=0x11e3b450) at nsRequestObserverProxy.cpp:115 #23 0x0133d958 in PL_HandleEvent (self=0x11e3b450) at plevent.c:590 #24 0x0133d6d8 in PL_ProcessPendingEvents (self=0xa44a548) at plevent.c:520 #25 0x0133fd08 in nsEventQueueImpl::ProcessPendingEvents (this=0xa44aaa4) at nsEventQueue.cpp:388 #26 0x03310370 in nsMacNSPREventQueueHandler::ProcessPLEventQueue (this=0x57b1c0) at nsToolkit.cpp:149 #27 0x03310130 in nsMacNSPREventQueueHandler::RepeatAction (this=0x57b1c0, inMacEvent=@0x336de94) at nsToolkit.cpp:114 #28 0x0331ad54 in Repeater::DoRepeaters (aMacEvent=@0x336de94) at nsRepeater.cpp:136 #29 0x032f218c in nsMacMessagePump::DispatchEvent (this=0x11ff43c, aRealEvent=0, anEvent=0x336de94) at nsMacMessagePump.cpp:489 #30 0x032dd680 in nsAppShell::DispatchNativeEvent (this=0x4fef3c, aRealEvent=0, aEvent=0x336de94) at nsAppShell.cpp:248 #31 0x02833374 in nsXULWindow::ShowModal (this=0xa3892dc) at nsXULWindow.cpp:281 #32 0x02844990 in nsWebShellWindow::ShowModal (this=0xa3892dc) at nsWebShellWindow.cpp:1090 #33 0x0282f740 in nsContentTreeOwner::ShowAsModal (this=0x6d6848) at nsContentTreeOwner.cpp:433 #34 0x00e494f4 in nsWindowWatcher::OpenWindowJS (this=0x57bb34, aParent=0x0, aUrl=0x120d424 "chrome://communicator/content/profile/profileSelection.xul", aName=0x40d1274 "_blank", aFeatures=0x40d0fc0 "centerscreen,chrome,modal,titlebar", aDialog=1, argc=1, argv=0x120b9e4, _retval=0xbffff05c) at nsWindowWatcher.cpp:704 #35 0x00e47bcc in nsWindowWatcher::OpenWindow (this=0x57bb34, aParent=0x0, aUrl=0x120d424 "chrome://communicator/content/profile/profileSelection.xul", aName=0x40d1274 "_blank", aFeatures=0x40d0fc0 "centerscreen,chrome,modal,titlebar", aArguments=0x121be08, _retval=0xbffff05c) at nsWindowWatcher.cpp:451 #36 0x040b9108 in nsProfile::LoadDefaultProfileDir (this=0x58b254, profileURLStr=@0xbffff170, canInteract=1) at nsProfile.cpp:574 #37 0x040b8240 in nsProfile::StartupWithArgs (this=0x58b254, cmdLineArgs=0x4e2254, canInteract=1) at nsProfile.cpp:418 #38 0x0283cffc in nsAppShellService::DoProfileStartup (this=0x4f5108, aCmdLineService=0x4e2254, canInteract=1) at nsAppShellService.cpp:243 #39 0x00005c90 in InitializeProfileService (cmdLineArgs=0x4e2254) at nsAppRunner.cpp:936 #40 0x00006f04 in main1 (argc=1, argv=0xbffff5b8, nativeApp=0x0) at nsAppRunner.cpp:1238 #41 0x00007ad8 in main (argc=1, argv=0xbffff5b8) at nsAppRunner.cpp:1625 #42 0x00002a1c in _start () #43 0x0000284c in start ()
Hmm, some code in nsXULElement.h is doing some evil bit-stuffing: #define LAZYSTATE_MASK ((PRWord(1) << LAZYSTATE_BITS) - 1) #define ATTRIBUTES_MASK (~LAZYSTATE_MASK) nsXULAttributes * GetAttributes() const { return NS_REINTERPRET_CAST(nsXULAttributes *, mBits & ATTRIBUTES_MASK); } void SetAttributes(nsXULAttributes *aAttributes) { NS_ASSERTION((NS_REINTERPRET_CAST(PRWord, aAttributes) & ~ATTRIBUTES_MASK) == 0, "nsXULAttributes pointer is unaligned"); mBits &= ~ATTRIBUTES_MASK; mBits |= NS_REINTERPRET_CAST(PRWord, aAttributes); } and indeed before we crash, i see: ###!!! ASSERTION: nsXULAttributes pointer is unaligned: '(NS_REINTERPRET_CAST(PRWord, aAttributes) & ~ATTRIBUTES_MASK) == 0', file nsXULElement.h, line 541 ###!!! Break: at file nsXULElement.h, line 541 Maybe malloc_debug stuff gives back unaligned pointers. Yuck.
So the problem here is that nsresult nsXULAttributes::Create(nsIContent* aContent, nsXULAttributes** aResult) { ... *aResult = new nsXULAttributes(aContent))) is giving us back a pointer that is 4- but not 8-byte aligned, so the bit- stuffing in Slots::Get/SetAttributes does not work. Assuming 8-byte alignment sounds dangerous. Waterson did this for bug 105068.
Summary: Crash on startup with memory checking tool → Lazy-state bitpacking in nsXULElement.h assumes 8-byte aligned pointers
Shaver wants this.
Assignee: hyatt → shaver
I mentioned to Simon in his cube that nsFixedSizeAllocator might help here, for both footprint reasons (no malloc overhead, recycling wins) and because it can guarantee alignment sufficient for the tagging scheme waterson used. Shaver, you planning to nsFSA this? /be
1.0: it'll make us faster and less crashy. (Yeah, nsFSA or something like it.)
Status: NEW → ASSIGNED
Keywords: mozilla1.0, perf
Target Milestone: --- → mozilla1.0
Shouldn't this be a 0.9.9 blocker? Win-debug doesn't even startup without smfrs patch
It what? I'm a bit (OK, a lot) surprised, because I would expect that to stop our developers in their tracks pretty cold. I'll get right on it, though.
sicking, are you using some "byte-align malloc'd memory" option? /be
Nope, i don't have anything extrodinary set up in my environment. Just turn off mailnews and a few more and turn on svg. Opt build runs fine
I am suffering from this I have only svg and mathml enabled otherwise a normal build (win98).
Attached patch Use nsFixedSizeAllocator for nsXULAttributes. (obsolete) (deleted) — Splinter Review
This builds in content/xul/content/src, but the rest of my build is still running, so I haven't tested it beyond that yet. It passed a desk-check, though!
*** Bug 134362 has been marked as a duplicate of this bug. ***
Marking blocker, this is affecting developers and testers. Will pass the patch along to those affected to see if it helps.
Severity: major → blocker
The patch is not compiling for us, it appears to be missing operator delete: b\contentshared_s.lib +++ make: libs in c:\build\tree\imicqtree\mozilla\content\build +++ make: Creating DLL: .\WIN32_D.OBJ\gkcontent.dll Creating library .\WIN32_D.OBJ\gkcontent.lib and object .\WIN32_D.OBJ\gkconte nt.exp contentxulcontent_s.lib(nsXULAttributes.obj) : error LNK2001: unresolved externa l symbol "private: static void __cdecl nsXULAttributes::operator delete(void *,u nsigned int)" (??3nsXULAttributes@@CAXPAXI@Z) .\WIN32_D.OBJ\gkcontent.dll : fatal error LNK1120: 1 unresolved externals NMAKE : fatal error U1077: 'link' : return code '0x460' Stop. Notice the declaration that was added to the header: +private: + // Hide so that all construction and destruction use Create and Destroy. + static void *operator new(size_t); + static void operator delete(void *, size_t); We tried clobbering just to make sure, and run into this problem regardless. Thanks, syd
Yeah, smfr found that too, and I fixed it in my tree. I thought I updated the patch here, but apparently didn't, and I'm not at the machine with that tree right now. Just give them empty bodies in the declaration, and you should be fine: static void *operator new(size_t) { } static void operator delete(void *, size_t) { }
It looks like it works for me (after fixing the patch) I was able to start on win98 I did get an assertion Write me!: '0' file y:\mozilla\layout\html\base\src\nsScrollFrame.cpp 286 but it appears unrelated.
Re: comment 3... Color me puzzled, but I thought malloc was guaranteed to return something that was aligned to the largest primitive type (i.e., a double-word); e.g., so that you'd always have aligned access for floating point: double *p = (double *) malloc(sizeof(double)); *p = 3.14159; /* aligned access */ If malloc's behavior is not defined this way, I may have made this mistake elsewhere, too. :-(
but a double-word is 4 bytes, and the current code assumes 8-byte alignment. 4 byte alignment i've been told is safe, and while debugging this crash i always got 4-byte aligned mem-blocks.
ooh fun reading. well, it looks like the ghostscript people ran into something like this in January http://www.ghostscript.com/pipermail/gs-code-review/2002-January/001729.html Here's a quote from somewhere in the thread... The problem fundamentally is that there are two different kinds of pointer alignment mod N: (A) A pointer references an address that is 0 mod N. (B) A pointer references an address where the hardware can successfully make an N-byte reference. malloc, on all platforms, return blocks that obey (B) for the strictest alignment required by the hardware. On the x86, type (B) alignment is never required, so malloc could return totally unaligned blocks and still obey (Microsoft's interpretation of) the ANSI C standard. Also of interest: http://www.byte.com/art/9603/sec2/art5.htm
wouldn't it be neater to handle the gRefCnt increasing/decreasing in CreateAttrAllocator/DestroyAttrAllocator and call them AddrefAttrAllocator/ReleaseAttrAllocator? That way you could also get rid of the goto...
marking all/all because there's a w98 dupe...
OS: Mac System 9.x → All
Hardware: Macintosh → All
The patch + fixes from commment #17 fix this for me in Win98. I'd vote for checking it in.
Attachment #73536 - Attachment is obsolete: true
Patch works for me too, fixes crasher.
Comment on attachment 80458 [details] [diff] [review] shaver's patch with additions from comment 17 Doesn't delete null its lvalue operand? If not, how unsafe of C++! Someone inform ignorant old me. Who includes <new.h>? I don't see it, maybe there is a nested include -- but I say make a direct dependency from the .cpp that uses placement new. Prevailing ! style has a space after the unary operator, no? r=brendan@mozilla.org with any real nits picked. /be
Attachment #80458 - Flags: review+
Attachment #80458 - Flags: superreview+
Comment on attachment 80458 [details] [diff] [review] shaver's patch with additions from comment 17 sr=waterson
Depends on: 105068
shaver, you gonna check this in or what?
Damn, I missed your sr in my bugmail. My bad, I'll hit the trunk first thing tomorrow.
Landed on the trunk (finally).
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
My checkin busted Forte or some such Sun compiler, because it doesn't like |operator new| not returning anything. The fix will cause a pile of warnings with gcc and other platforms, which don't like returning NULL (they want me to throw an exception, but at least it's only a warning). Now you all know.
Comment on attachment 80458 [details] [diff] [review] shaver's patch with additions from comment 17 a=rjesup@wgate.com for branch checkin. Please leave this bug open or open a followon and resolve the warnings on the trunk (possibly on the branch too if we decide to care).
Attachment #80458 - Flags: approval+
Checked into the 1.0 branch.
Keywords: perffixed1.0.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: