Closed Bug 538722 Opened 15 years ago Closed 14 years ago

early Firefox 3.6 Crash [@ nsHtml5ElementName::initializeStatics() ] - Prevent the old html5 parser from being used on the 3.6 branch

Categories

(Core :: DOM: HTML Parser, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- .9-fixed

People

(Reporter: chofmann, Assigned: hsivonen)

References

Details

(Keywords: crash, Whiteboard: [3.6.x])

Crash Data

Attachments

(3 files, 1 obsolete file)

very early in 3.6 RC1 data. stacks look like http://crash-stats.mozilla.com/report/index/469ec5ed-696d-45dd-8b7e-838652100108 Frame Module Signature [Expand] Source 0 xul.dll nsHtml5ElementName::initializeStatics parser/html/nsHtml5ElementName.cpp:806 1 xul.dll nsHtml5Module::InitializeStatics parser/html/nsHtml5Module.cpp:60 2 xul.dll nsLayoutStatics::Initialize layout/build/nsLayoutStatics.cpp:274 3 xul.dll Initialize layout/build/nsLayoutModule.cpp:351 4 xul.dll nsGenericModule::Initialize obj-firefox/xpcom/build/nsGenericFactory.cpp:273 5 xul.dll nsGenericModule::GetClassObject obj-firefox/xpcom/build/nsGenericFactory.cpp:361 6 xul.dll nsFactoryEntry::GetFactory xpcom/components/nsComponentManager.cpp:3706 7 nspr4.dll PR_Unlock nsprpub/pr/src/threads/combined/prulock.c:347 8 xul.dll nsComponentManagerImpl::CreateInstanceByContractID xpcom/components/nsComponentManager.cpp:1680 9 xul.dll nsComponentManagerImpl::GetServiceByContractID xpcom/components/nsComponentManager.cpp:2252 10 xul.dll CallGetService obj-firefox/xpcom/build/nsComponentManagerUtils.cpp:94 11 xul.dll NS_CheckContentLoadPolicy obj-firefox/dist/include/nsContentPolicyUtils.h:223 12 xul.dll nsDocShell::InternalLoad docshell/base/nsDocShell.cpp:7575 13 xul.dll nsDocShell::LoadURI docshell/base/nsDocShell.cpp:1354 all the reports are Windows NT 5.1.2600 Service Pack 2 and this might be the same person with about 20 crashes submitted within a few minutues http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=nsHtml5ElementName%3A%3AinitializeStatics%28%29&version=Firefox%3A3.6
That sure is a weird place to crash in. Since this code runs regardless of the html5.enable pref, turning the pref off. Since no one should be turning the pref *on* on the 1.9.2 branch (the pref enables abandoned code there), I suggest removing the HTML5 parser from the 1.9.2 branch.
(In reply to comment #1) > That sure is a weird place to crash in. Since this code runs regardless of the > html5.enable pref, turning the pref off. s/turning the pref off/turning the pref off doesn't help/ > Since no one should be turning the pref *on* on the 1.9.2 branch (the pref > enables abandoned code there), I suggest removing the HTML5 parser from the > 1.9.2 branch. Attaching a patch that removes the HTML5 parser from the 1.9.2 branch. Even if this particular crash were figured out and fixed, the parser on the branch would still be obsolete compared to code on the trunk, so actually fixing just the crash doesn't seem worthwhile. It's unfortunate to do this so late in the 1.9.2 cycle, but until now, I had assumed the crashes only occurred when the parser was turned on. But now that the HTML5 parser code is a problem even when html5.enable=false, I guess it's necessary to pursue this even if very late. :-/ The real patch is too large to attach, so I've edited the patch to omit the content of several files that are deleted.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
one other lower risk idea here would be to change the name of the pref. its possible some people flipped the pref long ago to check out the new features, then forgot about it, and now hit some crashes. by just flipping the pref they would have to make a conscious decision to flip the pref, and we wouldn't be advertising or promoting the new pref name and we wouldn't be encouraging its use.
(In reply to comment #3) > one other lower risk idea here would be to change the name of the pref. That would address the problem of trunk testers accidentally running 3.6 with the obsolete HTML5 parser. It wouldn't address this crash, though, because this crash is in code that runs at app startup regardless of the pref value.
(In reply to comment #3) > one other lower risk idea here would be to change the name of the pref. Here is a lower risk alternative patch (that leaves a lot of dead code around). This patch does the following: 1) Makes the HTML5 parser run no code--not even initialization code, since this crash is in the initialization code (regardless of pref). 2) Removes the HTML5-parser-sensitive test case parser/htmlparser/tests/mochitest/test_bug502091.html 3) Reverts parser/htmlparser/tests/mochitest/test_compatmode.html to the state it was in before it was modified to toggle the html5.enable pref. 4) Removes the html5.enable pref from modules/libpref/src/init/all.js This patch should have the same run-time effects as the previous patch. However, I believe this patch is lower risk because it doesn't change linkage and doesn't change .cpp files except a single file inside the HTML5 parser directory. - - As for the crash itself, I have to wonder if it happens on the trunk, too. The code that crashes is almost the same on the trunk, and the crash location makes no sense as a potential crash location.
Attachment #421214 - Flags: review?(jst)
Hmm. One occurrence of the crash on the trunk with a different version of Windows: http://crash-stats.mozilla.com/report/index/d9011a0c-5bb2-4c59-b9f7-87e812100110 Also, the branch crashes have some variability in the source line of the crash. If the app dies due to OOM on Windows, what should the stack look like? Should there be jemalloc on the top of the stack? (I have a hard time explaining why this crash would happen except for OOM at app startup.)
nominating to get all the right people in on progress and discussion of this one.
Flags: blocking1.9.2?
If I'm reading the crash data right, we saw a total of 24 crashes, all on Windows? Based on that I don't think this blocks, seems more likely that it's a single user, doesn't it?
more than just 24 crashes on a single day, but volume is generally low. the spike days are worrisome. 12/26 = 48 crashes 1/08 = 25 crashes
all 48 of the 12/26 crashes were inside the same hour and on the same OS as well hourly frequency of reports 48 2009122604 os breakdown 48 1 Windows NT6.0.6002 Service Pack 2 44 operator new(unsigned int) | nsHtml5ElementName::initializeStatics() 4 nsHtml5ElementName::initializeStatics()
NT6.0SP2 = Vista SP2 The distribution of these crashes makes me suspect a single user crashing repeatedly. I don't think that it blocks, based on the overall percentage of crashes, but we should keep an eye on it and perhaps consider taking the code-path excision for 3.6.x.
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
Whiteboard: [3.6.x]
I think there is probably some low volume crash here that takes more than 300-700k users to make it observable. being out of memory at start up of the browser would follow that pattern. Then I think there is probably one or two users that still, knowingly or unknowingly, have the pref flipped and are hitting html5 parser crashes unnecessarily. I'd probably recommend not taking the higher risk path of backing out all the code, and just keeping a close eye on this after release. If we do find that this explodes when we get a million+ users the patch is ready to go for a respin. It might be worth getting some test builds with the patches as one more preparation step just in case they are needed.
Agreed, I don't think this should block, and I'd really like to know more about this before taking the steps to back anything out, given the nature of this crash. I'd almost expect this to be caused by something other than the code that's crashing here, be that out of memory, or some third party code doing something it's not supposed to do. Hard to say of course before learning more about this...
Comment on attachment 421037 [details] [diff] [review] Remove the HTML5 parser from the 1.9.2 branch Clearing review requests until we decide we want to take either of these patches (which I still think we don't).
Attachment #421037 - Flags: review?(jst)
Attachment #421214 - Flags: review?(jst)
(In reply to comment #14) > Clearing review requests until we decide we want to take either of these > patches (which I still think we don't). I don't understand why we wouldn't want to disable the HTML5 parser on the 1.9.2 branch altogether (as a stability fix in a point release using the smaller one of the two patches). I can't think of a situation where it would be advisable for anyone to run 1.9.2 with the HTML5 parser enabled. Bugs like bug 539736 and bug 539237 indicate that people end up running the obsolete code when sharing profiles across release and trunk.
One of the main reasons to land the HTML5 parser on trunk when we did was to start getting input on if the HTML5 parsing algorithm is compatible with the web. We'll have a much better opportunity to get help with that if we keep the code in the 3.6 release.
(In reply to comment #16) > One of the main reasons to land the HTML5 parser on trunk when we did was to > start getting input on if the HTML5 parsing algorithm is compatible with the > web. We'll have a much better opportunity to get help with that if we keep the > code in the 3.6 release. In June/July, it was useful to get feedback on the algorithm in the state it is in on the 1.9.2 branch. The algorithm changed in response the feedback. Now it is useful to get feedback on the current state of the algorithm, but it's not useful to expose 1.9.2 users to the aspects of the algorithm that weren't good and have already been fixed in the spec and on the trunk. Also, if we ask users to turn the HTML5 parser on in Firefox 3.6, the users will be exposed to known crashes that have been fixed on trunk (but the patches don't apply to 1.9.2) in addition to being exposed to parsing algorithm flaws that have already been fixed on trunk. I think having the same pref enable the code that needs testing on trunk and enable obsolete code in 3.6 actually hinders getting testing for the trunk, because the problem of enabling obsolete code on 3.6 is a reason not to enable the HTML5 parser on the trunk if you run the same profile with both. OTOH, when people have the old snapshot of the HTML5 parser enabled on 3.6, alarms about problems on 3.6 sound (see bug 539736 and bug 539237) and then it turns out that the problem neither applies to the default config of 3.6 nor to the trunk with the HTML5 parser enabled.
> In June/July, it was useful to get feedback on the algorithm in the state it is > in on the 1.9.2 branch. The algorithm changed in response the feedback. Now it > is useful to get feedback on the current state of the algorithm, but it's not > useful to expose 1.9.2 users to the aspects of the algorithm that weren't good > and have already been fixed in the spec and on the trunk. The algorithm hasn't changed that dramatically, has it? > Also, if we ask users to turn the HTML5 parser on in Firefox 3.6, the users > will be exposed to known crashes that have been fixed on trunk (but the > patches don't apply to 1.9.2) How common are these crashes? I agree that if they are bad enough that turning on the pref is essentially useless then we might as well not have the pref. But I didn't think we were ever in that bad of a state. > I think having the same pref enable the code that needs testing on trunk and > enable obsolete code in 3.6 actually hinders getting testing for the trunk, > because the problem of enabling obsolete code on 3.6 is a reason not to enable > the HTML5 parser on the trunk if you run the same profile with both. That I agree with. But that seems like a different fix?
(In reply to comment #18) > The algorithm hasn't changed that dramatically, has it? </script> parsing has changed very dramatically, and </script> parsing is the most interesting thing to test for Web compat, in my opinion. > > Also, if we ask users to turn the HTML5 parser on in Firefox 3.6, the users > > will be exposed to known crashes that have been fixed on trunk (but the > > patches don't apply to 1.9.2) > > How common are these crashes? I agree that if they are bad enough that turning > on the pref is essentially useless then we might as well not have the pref. But > I didn't think we were ever in that bad of a state. Not particularly common, it seems: http://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&platform=windows&platform=mac&platform=linux&platform=solaris&branch=1.9.2&date=&range_value=2&range_unit=weeks&query_search=signature&query_type=contains&query=nsHtml5&build_id=&do_query=1 (Aside: It's relevant to this bug that there are also Windows crashes in the initializeStatics methods of other classes.) It still annoying to hit a known crasher--especially if you are the kind of user who chooses releases over nightlies. There's also bug 509851 which isn't a crasher, but that I wouldn't want to expose to users. (Backporting that one would be doable but probably not a good use of time.) And, of course, there's bug 503632. Bug 533381 may be relevant, too (I haven't checked). At least so far, the utility of the bug catch from the 1.9.2 branch has been negative. That is, apart from this initializeStatics problem, there have been no trunk-relevant non-duplicate bugs filed based on 1.9.2 testing. However, there have been duplicates and bugs that don't apply to the trunk. > > I think having the same pref enable the code that needs testing on trunk and > > enable obsolete code in 3.6 actually hinders getting testing for the trunk, > > because the problem of enabling obsolete code on 3.6 is a reason not to enable > > the HTML5 parser on the trunk if you run the same profile with both. > > That I agree with. But that seems like a different fix? Changing the pref name on 3.6 would be another way address the profile sharing problem. But is there any situation where it would be advisable to turn on the HTML5 parser in 3.6? That is, what would be a situation with 3.6 where the expectation of finding new bugs would outweigh the downside of hitting old more frequently occurring bugs? It seems to me that the kind of users who run releases (as opposed to running trunk nightlies) shouldn't be running an old snapshot of the parser.
nsHtml5NamedCharacters::initializeStatics() ,ranked at #210, is the only html5 signature I see showing up in the top 300 more reports at http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=nsHtml5NamedCharacters%3A%3AinitializeStatics%28%29&version=Firefox%3A3.6
Comment on attachment 421214 [details] [diff] [review] Alternative patch: Make the HTML5 parser not participate at run time without removing all the code can we just do this?
Attachment #421214 - Flags: review?(bzbarsky)
I'm not sure I follow that patch (maybe needs more context?). How does it make the pref not work?
nsHtml5Module::InitializeStatics() { +#if 0 nsContentUtils::AddBoolPrefVarCache("html5.enable", &sEnabled); http://mxr.mozilla.org/mozilla1.9.2/ident?i=content::sEnabled * content/base/src/nsHTMLContentSerializer.cpp o line 129 -- loopForward = nsHtml5Module::sEnabled; * content/base/src/nsContentUtils.cpp o line 3635 -- if (isHTML && nsHtml5Module::sEnabled) { * content/html/document/src/nsHTMLDocument.cpp o line 658 -- PRBool loadAsHtml5 = nsHtml5Module::sEnabled; o line 1808 -- PRBool loadAsHtml5 = nsHtml5Module::sEnabled; Since sEnabled is a static and starts out false, and the pref isn't checked by InitializeStatics, it can't transition out of false. Thus content will never use the module.
looks good to me, nowhere else sets sEnabled.
Blocks: 577364
Comment on attachment 421214 [details] [diff] [review] Alternative patch: Make the HTML5 parser not participate at run time without removing all the code We need to turn off this experimental code on the old branch. If people want to test the new HTML 5 parser they should use the newer supported version not an old snapshot.
Attachment #421214 - Flags: approval1.9.2.8?
Attachment #421214 - Attachment is obsolete: true
Attachment #456608 - Flags: review?(bzbarsky)
Attachment #456608 - Flags: approval1.9.2.8?
Attachment #421214 - Flags: review?(bzbarsky)
Attachment #421214 - Flags: approval1.9.2.8?
Comment on attachment 456608 [details] [diff] [review] Alternative patch: Make the HTML5 parser not participate at run time without removing all the code, with more context r=bzbarsky
Attachment #456608 - Flags: review?(bzbarsky) → review+
Blocks: 561652
Blocks: 581761
Severity: normal → critical
Keywords: crash
Comment on attachment 456608 [details] [diff] [review] Alternative patch: Make the HTML5 parser not participate at run time without removing all the code, with more context a=LegNeato for 1.9.2.9
Attachment #456608 - Flags: approval1.9.2.9? → approval1.9.2.9+
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Summary: early Firefox 3.6 Crash [@ nsHtml5ElementName::initializeStatics() ] → early Firefox 3.6 Crash [@ nsHtml5ElementName::initializeStatics() ] - Prevent the old html5 parser from being used on the 3.6 branch
Crash Signature: [@ nsHtml5ElementName::initializeStatics() ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: