Closed Bug 8150 Opened 25 years ago Closed 25 years ago

top talkback m6: was raptorhtml.dll crash; now NS_NewConverterStream sometimes fails on Win95

Categories

(Core :: Internationalization, defect, P3)

x86
Windows 95
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: chofmann, Assigned: dp)

References

Details

Attachments

(1 file)

This is a take off from bug 7802 which describes several crash on startup problems seen buy many users running M6 outside netscape. A few people inside netscape have also seen this crash. refer to 7802 for a complete listing. This is the number one problem reported by the 700 unique users reporting crashes on M6 and is hindering our ability to see accurate MTBF numbers. I'm concerned if we ship M7 with this crash we risk serveral tester giving up... ------- Additional Comments From namachi@netscape.com 06/11/99 12:17 ------- Call Stack: (Signature = nsHTMLReflowState::ComputeContainingBlockRectangle ae968a8c) nsHTMLReflowState::ComputeContainingBlockRectangle [d:\builds\seamonkey\mozilla\layout\html\base\src\nsHTMLReflowState.cpp, line 693] nsHTMLReflowState::InitConstraints [d:\builds\seamonkey\mozilla\layout\html\base\src\nsHTMLReflowState.cpp, line 769] nsHTMLReflowState::Init [d:\builds\seamonkey\mozilla\layout\html\base\src\nsHTMLReflowState.cpp, line 146] nsHTMLReflowState::nsHTMLReflowState [d:\builds\seamonkey\mozilla\layout\html\base\src\nsHTMLReflowState.cpp, line 129] ViewportFrame::Reflow [d:\builds\seamonkey\mozilla\layout\html\base\src\nsViewportFrame.cpp, line 433] PresShell::InitialReflow [d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 889] XULDocumentImpl::StartLayout [d:\builds\seamonkey\mozilla\rdf\content\src\nsXULDocument.cpp, line 3931] XULDocumentImpl::EndLoad [d:\builds\seamonkey\mozilla\rdf\content\src\nsXULDocument.cpp, line 1831] CWellFormedDTD::DidBuildModel [d:\builds\seamonkey\mozilla\htmlparser\src\nsWellFormedDTD.cpp, line 309] nsParser::DidBuildModel [d:\builds\seamonkey\mozilla\htmlparser\src\nsParser.cpp, line 512] nsParser::ResumeParse [d:\builds\seamonkey\mozilla\htmlparser\src\nsParser.cpp, line 867] nsParser::EnableParser [d:\builds\seamonkey\mozilla\htmlparser\src\nsParser.cpp, line 587] CSSLoaderImpl::Cleanup [d:\builds\seamonkey\mozilla\layout\html\style\src\nsCSSLoader.cpp, line 595] CSSLoaderImpl::SheetComplete [d:\builds\seamonkey\mozilla\layout\html\style\src\nsCSSLoader.cpp, line 665] CSSLoaderImpl::ParseSheet [d:\builds\seamonkey\mozilla\layout\html\style\src\nsCSSLoader.cpp, line 697] CSSLoaderImpl::DidLoadStyle [d:\builds\seamonkey\mozilla\layout\html\style\src\nsCSSLoader.cpp, line 727] DoneLoadingStyle [d:\builds\seamonkey\mozilla\layout\html\style\src\nsCSSLoader.cpp, line 537] nsUnicharStreamLoader::OnStopBinding [d:\builds\seamonkey\mozilla\network\module\nsNetStreamLoader.cpp, line 158] nsDocumentBindInfo::OnStopBinding [d:\builds\seamonkey\mozilla\webshell\src\nsDocLoader.cpp, line 1531] OnStopBindingProxyEvent::HandleEvent [d:\builds\seamonkey\mozilla\network\module\nsNetThread.cpp, line 594] StreamListenerProxyEvent::HandlePLEvent [d:\builds\seamonkey\mozilla\network\module\nsNetThread.cpp, line 474] PL_HandleEvent [plevent.c, line 492] PL_ProcessPendingEvents [plevent.c, line 453] _md_EventReceiverProc[plevent.c, line 872] KERNEL32.DLL + 0x3663 (0xbff73663) KERNEL32.DLL + 0x228e0 (0xbff928e0) 0x00768c14 ------- Additional Comments From chofmann@netscape.com 06/11/99 13:12 ------- So it looks like we head into this code and crash under some kind of condition... the question is had the train already left the tracks? 683 troy 1.46 // Called by InitConstraints() to compute the containing block rectangle for 684 // the element. Handles the special logic for absolutely positioned elements 685 void 686 nsHTMLReflowState::ComputeContainingBlockRectangle(const nsHTMLReflowState* aContainingBlockRS, 687 nscoord& aContainingBlockWidth, 688 nscoord& aContainingBlockHeight) 689 { 690 // Unless the element is absolutely positioned, the containing block is 691 // formed by the content edge of the nearest block-level ancestor 692 aContainingBlockWidth = aContainingBlockRS->computedWidth; 693 aContainingBlockHeight = aContainingBlockRS->computedHeight; 694 695 if (NS_FRAME_GET_TYPE(frameType) == NS_CSS_FRAME_TYPE_ABSOLUTE) { 696 // See if the ancestor is block-level or inline-level 697 if (NS_FRAME_GET_TYPE(aContainingBlockRS->frameType) == NS_CSS_FRAME_TYPE_INLINE) { 698 // The CSS2 spec says that if the ancestor is inline-level, the containing 699 // block depends on the 'direction' property of the ancestor. For direction 700 // 'ltr', it's the top and left of the content edges of the first box and 701 // the bottom and right content edges of the last box 702 // 703 // XXX This is a pain because it isn't top-down and it requires that we've 704 troy 1.46 // completely reflowed the ancestor. It also isn't clear what happens when 705 // a relatively positioned ancestor is split across pages. So instead use 706 // the computed width and height of the nearest block-level ancestor 707 const nsHTMLReflowState* cbrs = aContainingBlockRS; 708 while (cbrs) { 709 nsCSSFrameType type = NS_FRAME_GET_TYPE(cbrs->frameType); 710 if ((NS_CSS_FRAME_TYPE_BLOCK == type) || 711 (NS_CSS_FRAME_TYPE_FLOATING == type) || 712 (NS_CSS_FRAME_TYPE_ABSOLUTE == type)) { 713
Target Milestone: M7
If this bug is as important as the description implies, why is it Priority P3 and Severity Normal?
Severity: normal → blocker
Blocks: 7919
I've been running this on NT tonight, and can't seem to get it to crash. The bug cites win98 as the target OS, but I don't have a 98 machine. I've traced the code in question, and my current guess is that one of the container frames has a non-zero (garbage) value for it's reflow state. If that's true and we're messaging it, it could easily explode. I'm wondering if the XUL guys can confirm that the frames that get constructed have their reflowstate initialized properly.
adding hyatt to cc list for more eyes
I've got a couple of Win98 boxes. I'll see what's up.
It's worth noting that I've built on Win98 for months, and I've never seen this crash before. It might only happen with optimized builds?
I have been seeing an assertion thrown quite regularly on viewer startup that happens in InitConstraints. This has been going on for a while now on Win98 only.
Yay! The assertion I've been seeing in VIEWER (note that I'm saying VIEWER and not APPRUNNER) leads to the same crash if I keep going. Here is the stack trace in viewer. nsHTMLReflowState::ComputeContainingBlockRectangle(const nsHTMLReflowState * 0x00000000, int & 1, int & 10483000) line 692 + 6 bytes nsHTMLReflowState::InitConstraints(nsIPresContext & {...}) line 769 nsHTMLReflowState::Init(nsIPresContext & {...}) line 146 nsHTMLReflowState::nsHTMLReflowState(nsIPresContext & {...}, const nsHTMLReflowState & {...}, nsIFrame * 0x01b076e0, const nsSize & {...}) line 129 ViewportFrame::Reflow(ViewportFrame * const 0x01b065e4, nsIPresContext & {...}, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, unsigned int & 0) line 433 PresShell::InitialReflow(PresShell * const 0x01ae6130, int 9120, int 4410) line 894 HTMLContentSink::StartLayout() line 2019 HTMLContentSink::OpenBody(HTMLContentSink * const 0x00da7570, const nsIParserNode & {...}) line 1772 CNavDTD::OpenBody(const nsIParserNode & {...}) line 2381 + 40 bytes CNavDTD::OpenContainer(const nsIParserNode & {...}, int 1) line 2547 + 12 bytes CNavDTD::HandleDefaultStartToken(CToken * 0x01aeb6c0, nsHTMLTag eHTMLTag_body, nsIParserNode & {...}) line 1094 + 14 bytes CNavDTD::HandleStartToken(CToken * 0x01aeb6c0) line 1411 + 31 bytes NavDispatchTokenHandler(CToken * 0x01aeb6c0, nsIDTD * 0x01ae8a60) line 249 + 12 bytes CTokenHandler::operator()(CToken * 0x01aeb6c0, nsIDTD * 0x01ae8a60) line 80 + 14 bytes CNavDTD::HandleToken(CNavDTD * const 0x01ae8a60, CToken * 0x01aeb6c0, nsIParser * 0x00da76d0) line 691 + 18 bytes CNavDTD::BuildModel(CNavDTD * const 0x01ae8a60, nsIParser * 0x00da76d0, nsITokenizer * 0x01ae80a0, nsITokenObserver * 0x00000000, nsIContentSink * 0x00da7570) line 522 + 20 bytes nsParser::BuildModel() line 902 + 34 bytes nsParser::ResumeParse(nsIDTD * 0x00000000) line 849 + 11 bytes nsParser::OnDataAvailable(nsParser * const 0x00da76d4, nsIURL * 0x00dbcc50, nsIInputStream * 0x00dbc260, unsigned int 5978) line 1071 + 17 bytes nsDocumentBindInfo::OnDataAvailable(nsDocumentBindInfo * const 0x00dbce40, nsIURL * 0x00dbcc50, nsIInputStream * 0x00dbc260, unsigned int 5978) line 1504 + 24 bytes OnDataAvailableProxyEvent::HandleEvent(OnDataAvailableProxyEvent * const 0x00dbdd00) line 634 StreamListenerProxyEvent::HandlePLEvent(PLEvent * 0x00dbdd04) line 473 + 12 bytes PL_HandleEvent(PLEvent * 0x00dbdd04) line 491 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x00d43e90) line 452 + 9 bytes _md_EventReceiverProc(HWND__ * 0x00000f20, unsigned int 55404, unsigned int 0, long 13909648) line 877 + 9 bytes KERNEL32! bff7363b() KERNEL32! bff942e7() I will follow this up with the assertion that's getting hit in InitConstraints, since that might help rickg et. al. diagnose what's going wrong. Note that my viewer just crashes randomly on Win98 with this bug. It happens 1 out of every 5 times or so.
I hit an assertion on line 761 of nsHTMLReflowState.cpp. The containing block is null. NS_ASSERTION(nsnull != cbrs, "no containing block"); It's only after I keep going past this assertion that I crash. The containing block gets dereferenced even though it's null, and then I crash.
Ok, so here's what's going down. In the InitialReflow of a document, we have this ViewPortFrame. We start looking at child frames. The first child frame is a ScrollFrame. When we first initialize this child's reflow state object, that object calls its Init method. That Init method then tries to compute the nearest enclosing containing block (basically it's looking for a parent frame with a display type of block). It does this search by crawling up the reflow state stack and looking at the frame stored in each reflow state object. It ends up looking at the ViewPortFrame. Now normally when the mDisplay variable of the mStyleContext for that frame is examined, it has a display type of BLOCK (represented with a numeric value of 1). However, about 1 out of every 5 times I run viewer, the outermost ViewPortFrame instead has a style context whose mDisplay has value of 2, indicating an INLINE rather than a BLOCK display type. The containing block search then basically fails, since it crawls all the way up the reflow state stack without finding a containing block. Then you hit the assertion warning you about the fact that no containing block was found, and if you keep going, you crash, since the following code assumes a containing block was found and tries to dereference it. I don't know yet why the display type is sometimes 2 instead of 1, but that's what's happening, folks.
cc'ing Peter Linss.
Assignee: rickg → peterl
Given my trace and hyatts comments, it may be something you know how to kill. Can you please take a look?
Of course the nasty part about this bug is that it does seem to occur only on Win98.
Whiteboard: top m6 talkback crasher
Status: NEW → ASSIGNED
This is caused by UA.css failing to load ocasionally on Win95/98. The reason for that is NS_NewConverterStream fails. I haven't looked too deeply into that, but it seems that maybe the component manager fails to load the converters. Could be related to threading/race issues in component manager that we've seen before. I have a workaround ready to go that prevents the layout code from crashing when UA.css is absent.
Assignee: peterl → ftang
Status: ASSIGNED → NEW
Component: Layout → Internationalization
Summary: Crash on Startup in raptorhtml.dll -> nsHTMLReflowState -> ComputeContainingBlockRetangle → NS_NewConverterStream sometimes fails on Win95
Whiteboard: top m6 talkback crasher
Work-around to layout dependency on UA.css checked in. Now someone needs to fix the converter stream problem. Starting with intl folks.
Status: NEW → ASSIGNED
Is this still M7 blocker after peterl check in his work around ? (Does that mean it won't crash anymore ?) Anyone have a machine which can reproduce the problem ? Do we know which part of the NS_NewConverterStream failed ?
I saw the problem on our IQA lab- Japanese 98 It say it cannot load UA.css error code 80040154
Assignee: ftang → dp
Status: ASSIGNED → NEW
80040154 is #define NS_ERROR_FACTORY_NOT_REGISTERED ((nsresult) 0x80040154L I have check inside GetUnicodeConverter() code and there are no place we can possible return that error code. I am sure the failure is not inside GetUnicodeDecoder() by code review. The only other place it could return this particular error code is in xpcom/io/nsUnicharInputStream.cpp : 145 res = nsServiceManager::GetService(kCharsetConverterManagerCID, 146 kICharsetConverterManagerIID, (nsISupports**)&ccm); inside 131 NS_NewB2UConverter() implementation. [ Note- the converter seems load ok LATER when I load some Japanese HTML pages. Which mean we must register the CID of converter manager correctly- otherwise, it won't do the Japanese converter later neither ] reassign to dp since he own nsServiceManager::GetService
Ok. I read the report. Is is apprunner too or only viewer. I am assuming that some saw this the crash in apprunner release. And hyatt has gotten only viewer DEBUG to crash with this symptom. Peter, viewer shouldn't crash. I presume your workaround is error checking on the return value. I think your workaround should go in nomatter what. Am I on the right track so far. Next, someone with a debug build of apprunner/viewer on win98: could you do this: set NSPR_LOG_MODULES nsComponentManager:5 set NSPR_LOG_FILE xpcom.log ./viewer (or) apprunner Reproduce the bug and add the xpcom.log file as an attachment to the bug.
Status: NEW → ASSIGNED
The reason I am concerned is that I want to know if we are dealing with the same bug. viewer DEBUG crashing could be different from apprunner release crashing.
Summary: NS_NewConverterStream sometimes fails on Win95 → top talkback m6: was raptorhtml.dll crash; now NS_NewConverterStream sometimes fails on Win95
about 400 people saw this crash in m6 apprunner. it was the top win32 talkback crash reported
This crash has been happening since M6 with release mode apprunner (via talkback). I've reproduced it in viewer under both Win95 and Win98 with current debug code, not under NT. (It happens randomly 1 of 5 times or so). The crash no longer happens with my fix (the crash was layout code not handling the absence of UA.css), but when it happens, we still no longer have UA.css which leads to exceptionally bad layout (for instance, everything in INLINE). I'll attach a log file shortly.
Attached file Log file requested (deleted) —
Depends on: 7308
I see the log. Thanks peterl for the super fast reponse. This is the registry/xpcom multithreading thing as you suspected peter. bug# 7308 From peter's log: 0[10229e0]: nsComponentManager: CreateInstance({1e3f79f1-6b6b-11d2-8a86-00600811a836}) 0[10229e0]: nsComponentManager: FindFactory({1e3f79f1-6b6b-11d2-8a86-00600811a836}) 0[10229e0]: not found in factory cache. Looking in registry -429249[1052a90]: nsComponentManager: ProgIDToCLSID(application/x-unknown-content-type)->[FAILED] 0[10229e0]: FindFactory() FAILED 0[10229e0]: CreateInstance() FAILED. Let me see what I can do about it in
I have provided a patch to peterl. Here what the patch does: Index: nsComponentManager.cpp =================================================================== RCS file: /cvsroot/mozilla/xpcom/components/nsComponentManager.cpp,v retrieving revision 1.35 diff -c -r1.35 nsComponentManager.cpp *** nsComponentManager.cpp 1999/06/14 02:06:44 1.35 --- nsComponentManager.cpp 1999/06/18 22:31:08 *************** *** 875,881 **** --- 875,892 ---- { PR_LOG(nsComponentManagerLog, PR_LOG_ALWAYS, ("\t\tnot found in factory cache. Looking in registry")); + + // bug# 7308 , bug# 8150 + // Findfactory randomly fails if a ProgIDToCLSID() happenes + // at the same time from another thread. + // The registry seems to be locking properly. Until I figureout + // what the right problem is, I am putting this major locks on + // these two routines + // PlatformFind() and PlatformProgIDToCLSID() + //to achieve mutual exclusion at a course level. + PR_EnterMonitor(mMon); nsresult rv = PlatformFind(aClass, &entry); + PR_ExitMonitor(mMon); // If we got one, cache it in our hashtable if (NS_SUCCEEDED(rv)) *************** *** 957,963 **** --- 968,985 ---- else { // This is the first time someone has asked for this // ProgID. Go to the registry to find the CID. + + // bug# 7308 , bug# 8150 + // Findfactory randomly fails if a ProgIDToCLSID() happenes + // at the same time from another thread. + // The registry seems to be locking properly. Until I figureout + // what the right problem is, I am putting this major locks on + // these two routines + // PlatformFind() and PlatformProgIDToCLSID() + //to achieve mutual exclusion at a course level. + PR_EnterMonitor(mMon); res = PlatformProgIDToCLSID(aProgID, aClass); + PR_ExitMonitor(mMon); if (NS_SUCCEEDED(res)) { // Found it. So put it into the cache. This is a more course locking fix than needs be. But a much safer one. No outside module function are ever called in this function thread. Hence no chance for dead-lock. No returns being missed, hence no forgetting to unlock. Since I cannot reproduce the bug, I have to rely on the few people who can. Sorry to be bothering you peter. Thanks for the help.
Severity: blocker → critical
Target Milestone: M7 → M8
Course grain locks checked in to achieve mutual exclusion. This fixes the problem but aint the right fix. Keeping bug open until I checkin the right fix.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Full fix checked in. Rolled back the course grain fix.
Status: RESOLVED → VERIFIED
Marking as verified fixed.
Something is still very very wrong on Windows 98, and it happens on my machine with viewer and with apprunner. Every so often (still about 1 out of 5 times), there is a very long hang before viewer starts up. When it finally does start up, everything does seem to look and run ok... This happens on any new window creation and not necessarily just on the first window creation. Should I file a separate bug on this issue, or do we assume that it's related to this problem? Regardless, things are still very horked on Windows 98, and we need to fix it.
hyatt, see my comments dated 6/08 in bug 4901 dealing with the console slowing down launch on win95. Is this the same problem?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: