Closed Bug 41983 Opened 24 years ago Closed 23 years ago

HTML content inside XHTML document should allow namespaced attributes

Categories

(Core :: XML, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: nisheeth_mozilla, Assigned: sicking)

References

Details

(Keywords: dataloss, xhtml, Whiteboard: [Hixie-P2] [ADT2])

Attachments

(13 files, 21 obsolete files)

(deleted), text/xml
Details
(deleted), text/xml
Details
(deleted), text/xml
Details
(deleted), text/xml
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), text/xml
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), patch
peterv
: review+
Details | Diff | Splinter Review
HTML content doesn't store attributes that have a namespace other than none or the HTML namespace. In the XHTML case, it should deal with all attributes and preserve the namespace.
Adding vidur and jst to the cc list.
Setting milestone to M18 and marking nsbeta3... We need this to fully support XHTML. I'll add more to this bug once I've investigated exactly what needs to be done to fix this.
Status: NEW → ASSIGNED
Keywords: nsbeta3
Target Milestone: --- → M18
Adding Pierre to Cc: too since the HTML attribute code is mixed in with the style code and they're quite tightly tied into the style system. IMO we need to move over to a model where all attributes are stored as separate objects in some kind of array in the content object, currently the HTML attributes are stored in two different arrays, one for the names and one for the values (if I understand the code correctly, which I'm not sure I do yet/any more). There's probably an easier way out of this than changing the way the attributes are stored though and that's to basically replace the nsIAtom that is currently used for storing the attribute name with an nsINodeInfo (which can hold the name, prefix and namespace ID).
Keywords: xhtml
This bug has been marked "future" because the original netscape engineer working on this is over-burdened. If you feel this is an error, that you or another known resource will be working on this bug,or if it blocks your work in some way -- please attach your concern to the bug for reconsideration, but do not clear the nsbeta3- nomination.
Whiteboard: [nsbeta3-]
Target Milestone: M18 → Future
/me thinks that this bug should be taken again under consideration because the tests attached to Candidate Recommendation "W3C Selectors" make it very visible, and it will reduce our CSS selectors conformance ratio (96% expected by time of PR). peterv confirms it will also be a problem for XSLT. We think we have found the source of the problem : in nsXMLContentSink::AddAttributes in line 448, the |nameSpaceID = kNameSpaceID_None;| should in fact copy the namespace ID of the content element. We tried if (aContent) aContent->GetNameSpaceID(nameSpaceID); and ended up with assertion and core dump in XBL on startup, even if our code seems perfectly reasonable. Cc:ing hyatt if he can help.
I'm not sure whether this is the right place but glazou pointed me to this bug. Some Mozilla builds ago (possibly around M18) it was possible in a XUL element to do <button mynamespace:myattribute="whatever" id="1" value="foo"/> and in JS, I could do var button = document.getElementById("1"); alert(button.getAttribute("mynamespace:myattribute"); With current builds (0.8.1), this no longer works, the attribute evaluates to an empty string. When I request the attribute "myattribute" without the namespace, I get it. Is this the same problem. Shouldn't this work like in my first example?
I am tentatively pulling this back from Future and into 0.9.1, and nominating for nsbeta1. The right way to fix this is to make HTML use the nsNodeInfo objects (like XML) - they store the namespace etc. information correctly.
Keywords: nsbeta1
Target Milestone: Future → mozilla0.9.1
achimha, please file the xul problem as a separate bug (against me) and attach a testcase if you can, this is for HTML only. Thanks.
This is likly to affect xslt pretty much. Many XSLT stylesheets contain xhtml elements and we will use the namespaceURI and name to copy attributes from xslt docuemnt to source document...
*** Bug 75675 has been marked as a duplicate of this bug. ***
moving to TM of 0.9.2 per PDT triage (you can check it into 0.9.1 until Friday, 18/May/01 or into 0.9.2 after the tree opens)
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Priority: P3 → P2
Moving P2 and P3 bugs over to 0.9.3...
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Whiteboard: [nsbeta3-] → [Hixie-P2]
I am pretty sure this is also breaking our namespace support for SVG elements. Initially when SVG appeared in Mozilla they were based on HTML element implementation, and I suspect the situation has not changed. I suspect this is breaking the DOM Level 2 tests on http://www.zvon.org/xxl/DOM2reference/Output/index.html I feel this bug is really important, and actually causes problems that I feel should fall into severity: major. Updating severity as well.
Severity: normal → major
heikki - the 'new' SVG patches have the content bits inheriting from nsXMLElement (via nsSVGElement)
Missed 0.9.3.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Bulk re-assign of my 0.9.4 bugs to Heikki. I will not have the cycles to work on these bugs while Clayton is on sabbatical for the next six weeks.
Assignee: nisheeth → heikki
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.4 → mozilla0.9.5
*** Bug 101594 has been marked as a duplicate of this bug. ***
Whaa, I don't see a need for the additions to nsIContent here, nsIContent should have all it needs to properly deal with attributes that carry full namespace info. Attributes are unambigously identified by local name and namespace ID so there's no need to add methods for getting or unsetting attributes based on a nsINodeInfo. The GetAttrAt() method in the patch doesn't make sense either, if we need something like that then we should change the current GetAttrNameAt() to return a nsINodeInfo in stead of returning the prefix, localname and namespace id separately, but that shouldn't be needed for fixing this bug either.
Hmm... do we create new namespace IDs for each combination of namespace URI and prefix? I assumed (yeah, bad me for not checking) that we only map the URI to ID. Consider following: <foo x:a="bar" y:a="rab" xmlns:x="foobar" xmlns:y="foobar" /> The above would probably not pass through a validating schema "parser", because there is effectively two attributes with the same name (in the same namespace). But we don't have a schema parser. If only the URI is mapped to ID, then name and ID would not be enough to distinguish between x:a and y:a.
Status: NEW → ASSIGNED
A namespace-aware xmlparser should not consider that as valid xml for precely that reason (according to the namespace spec). I don't know if mozilla will accept that, but i sure hope it doesn't. And yes you're right, we map an ID to a namespace URI, not a uri+prefix combination.
Both Mozilla and IE6 load the testcase quite happily without any errors. The Namespaces spec says (in http://www.w3.org/TR/1999/REC-xml-names-19990114/#uniqAttrs ) that attributes must be unique for a document to be conformant. But the spec does not say what should happen if a processor detects such a nonconformant thing in a document.
Of course moz and IE swallows that second testcase just as nicely. (the second testcase uses a prefix that isn't declared with a xmlns attribute) IMHO we should treat both these problems the same way we do with non-valid xml, I would consider it just as serious. It's a damn shame that the namespace-rec doesn't require it the same way that the xml-rec does :(
Target Milestone: mozilla0.9.5 → mozilla0.9.6
*sob* I just realized we have a pretty much no-win situation here... Consider the following Namespace-conforming XML document: <xhtml:p id="id1" style="color:red;" xhtml:style="color:green;" xmlns:xhtml="http://www.w3.org/1999/xhtml" > 1: Hello, should I be red or green? </xhtml:p> Since 'style' and 'xhtml'style' are in different namespaces, this is correct. According to most (I disagree, but...), an attribute without a namespace should function as if it was in the same namespace as the element. Technically attributes only have a namespace if they have the explicit namespace prefix in front of them (they do not inherit default namespaces, for example). Obviously attribute in the correct namespace should work as well. So, Mozilla seems to accept both attributes, but it acts on the last one (as can be seen in the attached testcase: the last attribute wins). The DOM methods only affect the active attribute, which can be seen because color is removed by clicking on the buttons in the testcase. Opinions on what to do...?
Yes, it's perfectly correct, but it's the 'style' attribute that should work. The 'xhtml:style' should have no effect, the reason it does is probobly becuase of this bug (ie attributes on html-elements don't get a namespace) so mozilla probobly can't distiguish them. Even though most people think that there should be no difference there should, specs say so and it's the solution which shold hold better in the long run.
IMO the above testcase should be green, since xhtml:style="..." doesn't mean anything AFAIK, the XHTML spec doesn't define attributes in the XHTML namespace does it (i.e. all attributes are w/o namespace, if this is not the case then all attributes would need prefixes in XHTML)? If I'm correct, then xhtml:style="..." above is invalid, and meaningless. And yes, I agree that namespace errors should be treated as wellformedness errors, if we have two attributes with the same namespace and same localname we should flag an error (people even talk about namespace wellformedness) and stop parsing.
Er, oops, make that "red", not "green".
Ok, I also spoke with rayw. 'xhtml:style' does not mean anything, only 'style' has XHTML semantics if it is part of an XHTML element. That should be easy to fix.
I'm hearing some misstatements about attribute namespaces here. The HTML code is wrong in that it accepts attributes that should have no namespace when they are set using the HTML namespace, e.g., our system will accept this: (Assume xhtml namespace is defined using the xhtml prefix in all examples.) <table xhtml:border="2"> ... </table> Technically the border attribute above is meaningless and should be ignored, and yet we honor it, because we allow attributes to be set in the HTML namespace. The following, however, is correct. <table border="2"> ... </table> Any non-global attributes (i.e., *all* HTML attributes) should be using kNameSpaceID_None in our system. There isn't a single global attribute in HTML AFAIK. Note that tons of our code butchers this and does it wrong. The code is littered with Get/SetAttrs that use the HTML namespace instead of the correct namespace of None. So basically what jst said is completely correct. xhtml:style is completely meaningless as a means of doing inline style as are any other HTML attributes set using the HTML namespace.
I have gone through my tree and changed all suspicuous kNameSpaceID_HTML to kNameSpaceID_None. With some other minor changes I now get partly styled content as well. I expect to fix several bugs at once when I land my fix (this is still gonna take some time, hopefully next week, tho): 1. XHTML elements can store attributes that have been bound to any namespace. 2. XHTML elements will only recognize HTML attributes if the attributes do not belong into a namespace. 3. XHTML elements report the correct namespace ID. 4. localName returns the element name with correct case. 5. Fix serializers so that we do not generate unneeded namespace declarations.
I opened bug 103255 on the namespace-conforming issue
I am now at a stage where the brief tests I have tried seem to indicate that styling etc. works correctly. I still crash on Bugzilla query page, and it seems like I am not holding a ref to a nodeInfo when I should be... One set back currently is: NS_IMETHODIMP nsGenericHTMLElement::GetNameSpaceID(PRInt32& aID) const { // This seems right, but it currently breaks almost everything //mNodeInfo->GetNamespaceID(aID); aID = kNameSpaceID_HTML; return NS_OK; } The commented out way seems like the correct way to go, but I don't think our stylesheets are prepared in a way that allows this (the browser comes up but most content is not disployed right, and the URLbar becomes a list box, just to name a few problems). I am now checking the callers of nsGenericHTMLElement::GetNameSpaceID() to see who makes bad assumptions. I'll attach a patch as soon as it finishes going through the whole tree (it has been interesting finding small glitches here and there).
actually it seem like our html.css is wrong. It starts with @namespace url(http://www.w3.org/1999/xhtml); which means we'll only style elements in the xhtml namespace, which seems wrong since html elements is in the null namespace. Could this be the cause?
Huh? What is wrong with setting the default namespace to HTML in html.css? That is perfectly correct AFAICT.
That might be the biggest problem. I'll try that out, as well as checking the other stuff I mentioned. I am just fearful that basically doubling all our HTML specific selectors is going to severly impact performance :(
HTML elements have no namespace. XHTML elements are in XHTML namespace. We need a way to distinguish between these, see bug 103225.
Why do HTML elements have no namespace? Why would there even need to be a distinction?
Hyatt, HTML elements do *not* have a namespace, they predate XML namespaces, and no namespace is specified anywhere, they should not have a namespace. IMO our html.css should *not* define a default namespace and when html.css is loaded by a non-HTML document the code that loads it should set the default namespace (if default namespaces were inherited in CSS we could load a xhtml.css file that defines the default namespace and @include's html.css, but I don't think that's how CSS works). HTML elements *not* having a namespace is how we tell HTML and XHTML elements appart. Code that wants to do something special for "HTML" elements but doesn't care if the element is HTML or XHTML should use nsIContent::IsContentOfType(nsIContent::eHTML). To check if an element is an XHTML element one needs to check only if the namespace is kNameSpaceID_XHTML (should replace kNameSpaceID_HTML), and to check if an element is an good ol' HTML element on needs to check nsIContent::IsContentOfType(eHTML) *and* that the element has no namespace. Code that needs to distinguish between HTML and XHTML elements should be rare.
html.css sets a default namespace for performance reasons. You don't want to waste time trying to compare XUL elements in the chrome to any of the style rules in html.css. I don't care how the namespace is set, but we should ensure that html.css style rules are only matched by HTML elements (whatever their namespace).
When html.css is loaded by XUL (or any other non-HTML documents) we should absolutely set the default namespace in the stylesheet to the XHTML namespace, for both performance and correctness reasons.
OS: Windows NT → All
Hardware: PC → All
> if (tagPrefix.Length() > 0) { IsEmpty() is preferred. no? >if (childArea.get() == mCurrentFocus) { > nsAutoString tabIndexStr; > childArea->GetAttr(kNameSpaceID_None, >nsHTMLAtoms::tabindex, tabIndexStr); Will mCurrentFocus ever be null? If childArea is also null then it could cause a crash. > if (mNodeInfo->NamespaceEquals(kNameSpaceID_None)) { Need null check. >PRInt32 nsid; >GetNameSpaceID(nsid); >nsAutoString name,prefix; >if (nsid == kNameSpaceID_None) { I don't get this. Since GetNameSpaceID() returns kNameSpaceID_HTML the if(condn) will never be true. right? Note: haven't finished reviewing completely.
>if (0 == count) { if (count == 0) would be better. > name->ToString(attrName); >.... >if (NS_CONTENT_ATTR_HAS_VALUE == >mAttributes->GetAttribute(nsHTMLAtoms::style,kNameSpaceID_None, value)) { .... > aAttributes->GetAttribute(nsHTMLAtoms::dir,kNameSpaceID_None, value); Need null checks. right? Note: Still reviewing :-)
Well, I don't think any of those places needs a null check, as the old code did not have null checks either. And it seems nsGenericHTMLElement is assumed to always have mNodeInfo (there is old code assuming that so I live by the same assumptions). The style things are also old code - I can change them but they were not introduced by my patch.
The below code is there to anticipate the fix to GetNameSpaceID() in bug 103225, but right now it is not needed, I will comment it out. >PRInt32 nsid; >GetNameSpaceID(nsid); >nsAutoString name,prefix; >if (nsid == kNameSpaceID_None) { I run jrgm's page load tests (2 times without fix, 2 times with fix), and we are 0.2% to 2.4% slower (the margin of error in that test is estimated to be about 2% so the results fall into that). Next I am going to see if we leak.
>+ nsCOMPtr<nsIBindingManager> bindingManager; >+ mDocument->GetBindingManager(getter_AddRefs(bindingManager)); >+ nsCOMPtr<nsIXBLBinding> binding; >+ bindingManager->GetBinding(this, getter_AddRefs(binding)); null check bindingManager. >static PRInt32 CompareNodeInfo(nsINodeInfo *aNodeInfo1, nsINodeInfo >*aNodeInfo2) >+{ >+ nsAutoString qname1,qname2; >+ aNodeInfo1->GetQualifiedName(qname1); >+ aNodeInfo2->GetQualifiedName(qname2); null check aNodeInfo1 & aNodeInfo2
heikki, can we deCOMtaminate GetNameSpaceID and its descendents, and also wean them off of nsHashtable and onto PLDHashTable instead? That should save a fair amount of time. http://lxr.mozilla.org/mozilla/source/content/base/src/nsNameSpaceManager.cpp#476 http://lxr.mozilla.org/mozilla/source/content/base/src/nsNameSpaceManager.cpp#139 I don't see any reason for infallible methods to return NS_OK directly, and to return their real result via a reference out param. DeCOMtaminate! I don't see any reason to subroutine FindNameSpaceID if you can look directly into a global PLDHashTable with PL_DHashTableOperate(&gURIToIDTable, aURI.get(), PL_DHASH_LOOKUP), cast the returned entry pointer down to an nsURIToIDEntry*, and return that entry's mID (or else return kNameSpaceID_Unknown if PL_DHASH_ENTRY_IS_FREE(entry)). You'll save on malloc overhead elsewhere, too (when adding to the table). If I can help with pldhash.h usage, please let me know. /be
I run jrgm's page load test with XPCOM_MEM_LEAK_LOG set. I got the same assertions during the test as far as I can tell. There are some differencies in the leaks, but I am pretty sure most of the differences must be some random interference (and some component failed to initialize in the run without my changes). Some things that might be relevant: with my changes we leak one more node info and one more node info manager (this could still be noise). Overall the run with my changes leaked 33% more (went from 35332 bytes to 47084 bytes). I'll look at the hash stuff Brendan suggested and some other minor issues I realized next.
NameSpaceManagerImpl::GetNameSpaceID() is false alarm for this bug. It is called three times during startup, and was called 7 times by the time I started jrgm's page load test - it did not get called during the test. When I went to Bugzilla query page I saw about 10 calls when I seleced a product and moved the mouse around. Quantify showed that NameSpaceManagerImpl::GetNameSpaceID() takes about 1ms of time to run, which is basically nothing. For DHTML optimization someone might want to take a look at it in the future. I noticed that I have an unneeded routine in the attachec patch, CompareNodeInfo(). When I removed it and changed the callers to fast pointer comparisons (which was all that was needed), I run jrgm's test once more. It said new code is 1.7% slower. I know the new code must be slower because it is doing more stuff, but all the values still fall under/around the 2% treshold. Would it be ok to check that stuff in, or do I need to figure out some optimizations that bring us at least to where we were?
- In nsGenericHTMLElement.cpp: @@ -1909,7 +1924,7 @@ sheet = dont_AddRef(GetAttrStyleSheet(mDocument)); if (sheet) { // set attr via style sheet - result = sheet->UnsetAttributeFor(aAttribute, this, mAttributes); + result = sheet->UnsetAttributeFor(aAttribute,aNameSpaceID, this, mAttributes); Be consistent with space-after-comma, i.e. add a space. Same thing further down in that file: + nsresult result = mAttributes ? mAttributes->GetAttribute(aAttribute,aNameSpaceID, &value) : and yet more in nsGenericHTMLElement::GetHTMLAttribute(), and even more futher down... Maybe run over these changes looking for |,[^\S]|? There's a ton of those... Other than that, sr=jst As Brendan suggests, we should convert nsNameSpaceManager (and nsNodeInfoManager) over to using pldhash (or create a C++ wrapper class for pldhash and use that), but I think we can do that as a separate bug. Heikki, would you file a bug on that?
Attachment #54063 - Flags: superreview+
bug 72722 is about making nsHashtable use PLDHashTable, but it'll require API changes to nsHashtable.h. Don't count on it, and don't waste time and space (and development time) on wrapping pldhash.h if you're upgrading from plhash.h usage -- or so I say. /be
if you really wanna find all whitespace fixes then have a look at http://www.johnkeiser.com/cgi-bin/jst-review-cgi.pl It'll find some other more "real" problems as well
LOL: JST Review Simulacrum!?
Attachment #54063 - Flags: review+
Tracy Walker went through the smoketests and they pass with these changes. I'll post a message to some mozilla newsgroups to let people know about this and unless I hear loud objections I will check this in in a couple of days.
(Sorry if this gets attached twice -- I thought I'd posted this already.) In your post on <news://news.mozilla.org/3BD4960E.20901@netscape.com>, you mention that this causes a slight slowdown on jrgm's page load tests. While I heartily _thank_ you for at least checking, I'd like to propose that we fix the performance problem before considering this change. At this point, we're kicking and scraping for 1 and 2 percent wins on page load performance, and fixing a somewhat obscure (in the real world, anyway) problem with namespaces on HTML elements doesn't seem worth it to me. So, consider a strong objection raised. :-)
I second waterson's comment too. Heikki, can you see if you can come up with a patch without hurting performance?
Yup, let's get rid of the performance degradation before checking this in.
*sigh* I was afraid I would get objections about the perf, which was why I posted the message. The patch itself can't be made faster in my opinion, which means that we need to come up with an optimization somewhere else (or in handling of attributes in general) that gives us ~2% improvement.
Beware if you optimize something else: make sure it is deeply entangled in your patch otherwise we'll take the optimization and still leave the patch aside. I saw this happen before.
I have cleaned some code (getting rid of attribute management functions in nsIHTMLStyleSheet etc.) and put some duplicate code that makes us slightly faster at the expense of, well, dulicate code. Probably not even close to what is required perf wise, but it is a start... I also realized this was a dataloss bug, since if you try adding namespaced attributes they get lost silently. To counter waterson's comments about obscure cases, this patch fixes XHTML content in general, making serialization work without XHTML hacks and fixes some case sensitivity bugs there. This also helps XSLT, since generally people want to transform arbitrary XML into XHTML where these problems can arise.
I don't feel comfortable landing this so close to the end of the milestone, moving forward. I tested perf once after the removal of attribute handling code in stylesheet, and got result that my code is 1.8% FASTER than code on trunk. I am now working on making node info manager use double hashing.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
The latest patch as is gave me exactly the same perf as without it. But I realized the computation of the hash key is really slow, so I checked what it would be with the old fast, simple hash key and got perf increase of 1.7%. I also checked once more without double hashing, and the perf was 2.1% faster. Now, in summary: fix with node info double hash and slow hash key = 0% difference with trunk fix with node info double hash and fast hash key = 1.7% faster fix with old node info hash = 2.1% faster It could be that perhaps I do not know how to use double hashing effectively or it does not buy us anything here. If you can spot bad things let me know. I'll attach the latest and fastest patch (double hash can be controlled by a def in nsNodeInfo.h). In any case I think this is enough to show that this patch will result in perf boost. Adding perf keyword ;)
Keywords: perf
I don't know for sure why this gives a perf increase, but I would guess that removal of the methods from nsIHTMLStyleSheet simplify the code so that the optimizer can do a better job; it might also help with memory caches (unless I really botched something and we are not doing what we should be doing). So, reviews, comments, permission to check in since no perf hit anymore?
Comment on attachment 56346 [details] [diff] [review] Best perf patch double hashing is a space win over chaining, not necessarily a time win (it could be a time win, too, depending on malloc costs). Have you measured footprint differences among the patches? /be
Comment on attachment 56346 [details] [diff] [review] Best perf patch Have you checked for MLKs? If you did then r=harishd.
Attachment #56346 - Flags: review+
Node info manager is a document member, and one node info manager typically has about 100 entries in it. If double hash saves a few pointers compared to chaining the advantage is minimal. I'll leave the double hashing code there in case we some day want to do it, but ifdeffed out for now. I did not run mlk tests with this new code, because it did not change significantly compared to the code which I did test. Running through jrgm's page load tests showed a little difference but I cannot be sure if that was just noise. In any case I will be watching Tinderbox leak numbers when I check in.
heikki: double hashing doesn't save "a few pointers", it saves 3 words per entry, or more, and it allocates all entries in one contiguous allocation from the malloc heap. I'm fine with leaving the PLHashTable usage as-is, but I think we should make a measurement: how many entries in a given hash table, on average? Can you get that figure, or at least a typical, if not mean/variance (see HASHMETER #ifdefs in http://lxr.mozilla.org/mozilla/source/nsprpub/lib/ds/plhash.c)? It may all be small potatoes -- if you know that there are only a few dozen entries, or fewer, and that entries are rarely added or removed, but frequently looked up, tell me and I'll shut up. Thanks, /be
Brendan, Heikki: the nodeinfo hash typically has between 10 and 50 entries in it (and there are about 30 of these hashes in memory when starting mozilla), and once a page is loaded entries are rarely added. The numbers of lookups that are done in those hashes varies from very few to *thousands*, depending on the number of elements and attributes in the document that's loaded (the number of entries in the hashes is independent of the direct number of elements/attributes in the document).
Actually, those numbers should grow a little (but not much) with this patch since now HTML attribute names also end up in the hash.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Blocks: 71668
Removing perf keyword since this no longer gives a perf boost.
No longer blocks: 71668
Keywords: perf
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Target Milestone: mozilla0.9.9 → mozilla1.0
The patch does the following: * Remove the nsIHTMLAttributes interface and have all clients use the nsHTMLAttributes class directly. This saves the overhead of virual functions and saves us 8 bytes (vtable+refcount) on all attribute maps. http://lxr.mozilla.org/mozilla/ident?i=nsIHTMLAttributes * Remove some dead-stupid attribute-handling functions in nsIHTMLStyleSheet. The functions simply forwarded the call to nsIHTMLAttributes so there's no reason not to call that directly. * Enable namespaced attributes by storeing a nsINodeInfo as name when the namespace != null. When the namespace is null it simply stores the localName-atom as key in the attributemap so there should be no extra localName+namespace -> nsINodeInfo calls. The patch does _not_ fix all callers that use the html-namespace when they should use the null namespace. I think it might be a good idea to do that separatly to simplify reviews and keep down patchsize. I can't say that i'm overly happy with the patch. There is a certain amount of code-duplication in there since i didn't want to add extra logic to the commonly used functions out of fear for slowdowns. And the "namespace-ization" is pretty spotty. For example it does not support namespaces on stylesheet- mapped attributes (since at the moment there are no need for that). It works suffitenly well though. I plan to make a complete rewrite of the html- attributes code (it's currently way too complex for my liking), but that's for after 1.0 since the risk for regressions would be fairly high.
Attached patch patch v1 Beta (obsolete) (deleted) — Splinter Review
Just noticed that the patch seriously regresses html-forms. Not sure if it's due to a bug in my patch or some attribute-namespace abuse in the forms-code. I'll invertigate tomorrow
Attached patch Patch v1 RC1 (obsolete) (deleted) — Splinter Review
This one doesn't break forms
Attachment #72917 - Attachment is obsolete: true
Attached patch Patch v1 RC1b (obsolete) (deleted) — Splinter Review
sorry, forgot to remove a XUL change i have in my tree
Attachment #73057 - Attachment is obsolete: true
Giving this another round of thought i wonder if I should try to fix all errorous callers and remove the "change unknown and html namespaces to null" code. Since we'll probably gonna have to do a whole lot of testing after that change it might be good to get that testing on this first patch too? What do you guys think?
just noted that i have some problems building debug. Shouldn't be a problem for perfomance tests though. I have it fixed in my tree here, let me know if you need it, i can create a separate patch with just those changes if needed.
Given the size of this current patch I'd rather see the namespace changes go in as a separate fix.
Attached patch make debug build too (obsolete) (deleted) — Splinter Review
just the debug stuff
Patch RC1b + the debug patch + a change around line 2000 in nsGenericHTMLelement.cpp to move impact var outside of the if (document) in order to compile: I see 1.1% improvement in page load time using jrgm's page load test! Good work. Time for reviews and testing. I wonder if FedEx ships Champagne... ;)
Attached patch latest combined patch that builds (obsolete) (deleted) — Splinter Review
This patch gives 1.1% perf increase in page load while at the same time fixing this bug.
Attachment #51365 - Attachment is obsolete: true
Attachment #51625 - Attachment is obsolete: true
Attachment #52814 - Attachment is obsolete: true
Attachment #53031 - Attachment is obsolete: true
Attachment #53684 - Attachment is obsolete: true
Attachment #54063 - Attachment is obsolete: true
Attachment #56337 - Attachment is obsolete: true
Attachment #56346 - Attachment is obsolete: true
Attachment #73061 - Attachment is obsolete: true
Attachment #73321 - Attachment is obsolete: true
Comment on attachment 73602 [details] [diff] [review] latest combined patch that builds >Index: content/base/public/nsINodeInfo.h >=================================================================== >+ PRBool ExpandedNameEquals(nsINodeInfo *aNodeInfo) const This is a weird function... I would have chosen maybe NameAndNamespaceEquals(). It worries me a little that this is even needed, but... >Index: content/html/content/src/nsGenericHTMLElement.cpp >=================================================================== > nsGenericHTMLElement::GetAttr(PRInt32 aNameSpaceID, nsIAtom *aAttribute, >- nsAWritableString& aResult) const >+ nsIAtom*& aPrefix, nsAWritableString& aResult) const > { >- aResult.SetLength(0); Truncate aResult here in case we have an error return. >Index: content/html/content/src/nsHTMLUnknownElement.cpp >=================================================================== > nsHTMLUnknownElement::SetAttribute(PRInt32 aNameSpaceID, >@@ -276,24 +255,13 @@ > GetMappedAttributeImpact(aAttribute, nsIDOMMutationEvent::MODIFICATION, impact); > > nsCOMPtr<nsIHTMLStyleSheet> sheet(dont_AddRef(GetAttrStyleSheet(mDocument))); >- if (sheet) { // set attr via style sheet >- result = sheet->SetAttributeFor(aAttribute, aValue, >- (NS_STYLE_HINT_CONTENT < impact), >- this, mAttributes); >- } >- else { // manage this ourselves and re-sync when we connect to doc >- result = EnsureWritableAttributes(this, mAttributes, PR_TRUE); >- >- if (mAttributes) { >- PRInt32 count; >- result = mAttributes->SetAttributeFor(aAttribute, aValue, >- (NS_STYLE_HINT_CONTENT < impact), >- this, nsnull, count); >- if (0 == count) { >- ReleaseAttributes(mAttributes); >- } >- } >- } >+ if (!mAttributes) { >+ result = NS_NewHTMLAttributes(&mAttributes); >+ NS_ENSURE_SUCCESS(result, result); >+ } >+ result = mAttributes->SetAttributeFor(aAttribute, aValue, >+ (NS_STYLE_HINT_CONTENT < impact), >+ this, sheet); Is this safe this way, what if !sheet? I believe I replaced all instances where sheet was needed with ones where sheet was not used, because it seemed to work just fine. There are other instances like this as well elsewhere. Will continue with >Index: content/html/style/src/nsHTMLAttributes.cpp
Comment on attachment 73602 [details] [diff] [review] latest combined patch that builds >Index: content/html/style/src/nsHTMLAttributes.cpp >=================================================================== >@@ -76,16 +76,16 @@ > mNext(nsnull) > { > MOZ_COUNT_CTOR(HTMLAttribute); >- NS_IF_ADDREF(mAttribute); >+ NS_IF_ADDREF(aAttribute); For consistency do this: + mAttribute.Addref(); > PRBool operator==(const HTMLAttribute& aOther) const > { >- return PRBool((mAttribute == aOther.mAttribute) && >+ return PRBool((mAttribute.mBits == aOther.mAttribute.mBits) && > (mValue == aOther.mValue)); Now we are really getting to this union construct that I am not sure is safe to do; I need assurances from C++ knowledgeable folk ;) Assuming the union is safe, for now: The mBits check is there to see if you are comparing same type objects, right? If so, shouldn't you be comparing bit 1 instead of the whole value? Something like: + return PRBool(((mAttribute.mBits & 1) == (aOther.mAttribute.mBits & 1)) && (mValue == aOther.mValue)); >+ mAttribute.mAtomName->ToString(temp); >+ mAttribute.GetNodeInfo()->GetQualifiedName(temp); You assume mAtomName and GetNodeInfo() will always give you a valid pointer. Isn't this a dangerous assumption? I don't see anything preventing someone from assigning nulls to these... Hmm... but were there these kinds of safety checks in the old code either? >- const HTMLAttribute* attr = HTMLAttribute::FindHTMLAttribute(aAttrName, mFirstUnmapped); >- return attr != nsnull; >+ return !!HTMLAttribute::FindHTMLAttribute(aAttrName, aNamespaceID, mFirstUnmapped); What is up with the double bang? Remove both. >+ HTMLAttribute* attr = HTMLAttribute::FindHTMLAttribute(localName, >+ namespaceID, >+ mFirstUnmapped); >+ NS_ASSERTION(attr, "failed to find attribute"); >+ attr->mValue.SetStringValue(aValue); If you didn't find the attribute, we crash here. Wouldn't it be better to return? >Index: content/html/style/src/nsIHTMLAttributes.h >=================================================================== This file should be renamed to nsHTMLAttributes (remove + add) since there is no longer interface here. >+/* >+ * union that represents an atom or a nodeinfo. The union handles NO >+ * refcounting, that is up to the client >+ */ I feel this comment should be more prominent somehow. Explain why we need union. State that it is the client's responsibility to call Addref and Release methods. >+union nsHTMLAttrName { >+ nsHTMLAttrName() >+ { >+ } Should this explicitly initialize member, or does it happen automatically? >+ nsHTMLAttrName(nsIAtom* aAtom) : mAtomName(aAtom) >+ { >+ } >+ >+ nsHTMLAttrName(nsINodeInfo* aNodeInfo) : mNodeInfoName(aNodeInfo) >+ { >+ mBits |= 1; >+ } There is no null checking here. Debug assertions wouldn't hurt, I think. Also is this mBits use really safe? >+ nsINodeInfo* GetNodeInfo() const >+ { >+ NS_ASSERTION(!IsAtom(), "getting NodeInfo-value of atom-name"); >+ return (nsINodeInfo*)(mBits & ~1); >+ } >+ void SetNodeInfo(nsINodeInfo* aNodeInfo) >+ { >+ mNodeInfoName = aNodeInfo; >+ mBits |= 1; >+ } SetNodeInfo() could use an assertion to see if we are trying to set null nodeinfo, right? Again that mBits usage that I feel quesy about... >+ nsIAtom* mAtomName; // Used if mBits & 1 == 0 >+ nsINodeInfo* mNodeInfoName; // Used if mBits & 1 == 1 >+ PRWord mBits; So is this really safe? And if it is, better replace the magical '1' with a define. Also the naming could be clearer in my opinion: mName mNodeInfo (dunno yet what would be better than mBits, mBits is too cryptic though) What kind of testing have you done? Have you tried document.load(), XMLHttpRequest, DOMParser and XMLSerializer?
Attachment #73602 - Flags: needs-work+
nsHTMLAttrName::Addref and Release actually have the NS_IF_ADDREF forms, so those guys play it safe... (null atom or node info).
I am pretty sure you need to change nsXMLContentSerializer.cpp as well. Please see what I did in http://bugzilla.mozilla.org/attachment.cgi?id=54063&action=view I think you deserve the glory of fixing this bug. Reassigning.
Assignee: heikki → sicking
Status: ASSIGNED → NEW
ADT2 per ADT triage.
Whiteboard: [Hixie-P2] → [Hixie-P2] [ADT2]
> >Index: content/base/public/nsINodeInfo.h > >=================================================================== > >+ PRBool ExpandedNameEquals(nsINodeInfo *aNodeInfo) const > > This is a weird function... I would have chosen maybe > NameAndNamespaceEquals(). It worries me a little that this is even > needed, but... Strictly speaking it's not needed, but it does save some refcounting and it makes the code a bit smoother (we could use this function elsewhere too). Changed name to NameAndNamespaceEquals, expanded-names is an XSLT/XPath name for namespace+localName > Truncate aResult here in case we have an error return. Done > Is this safe this way, what if !sheet? I believe I replaced all instances > where sheet was needed with ones where sheet was not used, because it seemed > to work just fine. There are other instances like this as well elsewhere. When the stylesheet existed we used to do (from stylesheet) aAttributes->SetAttributeFor(aAttribute, aValue, aMappedToStyle, aContent, this); and when it didn't exist (from element) mAttributes->SetAttributeFor(aAttribute, aValue, (NS_STYLE_HINT_CONTENT < impact), this, nsnull); so a missing stylesheet simply amounted to a null-value as stylesheet. I guess the stylesheet version used to do a lot of other things, but it doesn't any more. > > PRBool operator==(const HTMLAttribute& aOther) const > > { > >- return PRBool((mAttribute == aOther.mAttribute) && > >+ return PRBool((mAttribute.mBits == aOther.mAttribute.mBits) && > > (mValue == aOther.mValue)); > > The mBits check is there to see if you are comparing same type > objects, right? If so, shouldn't you be comparing bit 1 instead of > the whole value? Something like: It's ment to check that both the name-type and the name-value is the same, i.e. that the two names are exactly alike. > >+ mAttribute.mAtomName->ToString(temp); > >+ mAttribute.GetNodeInfo()->GetQualifiedName(temp); > > You assume mAtomName and GetNodeInfo() will always give you a valid > pointer. Isn't this a dangerous assumption? I don't see anything > preventing someone from assigning nulls to these... Hmm... but were there > these kinds of safety checks in the old code either? The old code sometimes assumed that the atom was != null and sometimes was nullsafe. I've tried to follow the old code and only assume non-null where the old code did (I actually am a bit more nullsafe then the old code). I added an NS_ENSURE_ARG_POINTER(aAttrName) at the top of the nsINodeInfo-version of nsHTMLAttributes::SetAttributeFor so nodeinfos should now never be able to be null. > >+ return !!HTMLAttribute::FindHTMLAttribute(aAttrName, aNamespaceID, > What is up with the double bang? Remove both. I need to convert the pointer to a PRBool, so it's basically a smooth way of doing |return foo() ? PR_TRUE : PR_FALSE;| > If you didn't find the attribute, we crash here. Wouldn't it be better > to return? It *should* never be null, but as we discussed on IRC i added an |if| just-in- case-and-since-we're-close-to-1.0-and-this-code-is-dying-soon-anyway > >Index: content/html/style/src/nsIHTMLAttributes.h > >=================================================================== > This file should be renamed to nsHTMLAttributes (remove + add) since > there is no longer interface here. Ok, will do. I havn't renamed it yet to make the patch readable. Let me know when you want a patch that does the renaming as well. > I feel this comment should be more prominent somehow. Explain why we > need union. State that it is the client's responsibility to call Addref and > Release methods. More detailed comment added. > >+union nsHTMLAttrName { > >+ nsHTMLAttrName() > >+ { > >+ } > Should this explicitly initialize member, or does it happen automatically? That would probably be a perf-hit. There are a lot |attrs = new nsHTMLAttrName[x]; memcpy(attrs, ...| which would do a lot of unneccesary initializing if the empty ctor cleared the members. To keep the union slim I wanted to make it behave like an |nsIAtom*| (which is what the old code used) rather then an |nsCOMPtr<nsIAtom>|. > >+ nsHTMLAttrName(nsIAtom* aAtom) : mAtomName(aAtom) > >+ { > >+ } > >+ > >+ nsHTMLAttrName(nsINodeInfo* aNodeInfo) : mNodeInfoName(aNodeInfo) > >+ { > >+ mBits |= 1; > >+ } > There is no null checking here. Debug assertions wouldn't hurt, I think. Done > Also is this mBits use really safe? It should be. Depending on how clever compilers are it might not get compiled as optimally as it could, but i'd need to look at the resulting assembler to make sure. > SetNodeInfo() could use an assertion to see if we are trying to set > null nodeinfo, right? Done > So is this really safe? And if it is, better replace the magical > '1' with a define. Done > Also the naming could be clearer in my opinion: > mName > mNodeInfo > (dunno yet what would be better than mBits, mBits is too cryptic though) Called it |mAtom| and |mNodeInfo| since they are both really names. I think mBits is what we use elsewhere where we do similar bit-packing. > What kind of testing have you done? Have you tried document.load(), > XMLHttpRequest, DOMParser and XMLSerializer? Tested DOMParser and XMLSerializer and a whole lot of Get/Set/Remove/Has- Attribute. Will do document.load() and XMLHttpRequest but i might need some server-help with that. > I am pretty sure you need to change nsXMLContentSerializer.cpp as well Done, I don't 100% understand what bugs we were fixing using those hacks (but removing them certainly looked like the RightThing to do). Some of them didn't seem to be related to namespaceness of attributes
Attached patch addresses heikkis comments (obsolete) (deleted) — Splinter Review
Attachment #73602 - Attachment is obsolete: true
Comment on attachment 75129 [details] [diff] [review] addresses heikkis comments >Index: content/base/src/nsHTMLContentSerializer.cpp >=================================================================== You shouldn't touch HTML serializer. Your changes would make us output invalid HTML if we had a pure XHTML document with nondefault namespace prefix, and we somehow managed to serialize that using HTML serializer. At the least, your changes would slow us down needlessly. >Index: content/base/src/nsXMLContentSerializer.cpp >=================================================================== I will do some XML serializing tests to see if all this works ok, it seems good, though. >Index: content/html/style/src/nsHTMLAttributes.cpp >=================================================================== > PRBool operator==(const HTMLAttribute& aOther) const > { >- return PRBool((mAttribute == aOther.mAttribute) && >+ return PRBool((mAttribute.mBits == aOther.mAttribute.mBits) && > (mValue == aOther.mValue)); > } I believe this is wrong then, based on your explanation of what it should do. You are not checking that the types are the same, you are testing that the pointers are the same, above. > void Set(nsIAtom* aAttribute, const nsHTMLValue& aValue) > { >- NS_IF_RELEASE(mAttribute); >- mAttribute = aAttribute; >- NS_IF_ADDREF(mAttribute); >+ mAttribute.Release(); >+ mAttribute.mAtom = aAttribute; >+ NS_IF_ADDREF(aAttribute); Replace last line with >+ mAttribute.Addref(); because you should always addref/release the actual pointer you are storing/ releasing, and it would also be more consistent with the Release(). Do it for all other Set() functions as well. >Index: content/html/style/src/nsIHTMLAttributes.h >=================================================================== Rename this for next diff (remember to cvs add and do cvs diff -uN). >+#define ATTRNAME_MASK 0x01 >+#define ATTRNAME_ATOM 0x00 >+#define ATTRNAME_NODEINFO 0x01 Um, this is getting needlessly complicated. What I meant just replace the magical '1' with something like NS_ATTRNAME_NODEINFO_BIT (which would be defined to have value of 1). >+ void SetNodeInfo(nsINodeInfo* aNodeInfo) >+ { >+ NS_ASSERTION(aNodeInfo, "null nodeinfo-name in nsHTMLAttrName"); It is probably a client error if we already have mNodeInfo, so we should probably assert for that, too. Add: + NS_ASSERTION(!GetNodeInfo(), "forgot to call Release()?"); Which kind of gets me to another point: it would probably be good to replace the NS_ASSERIONs with NS_ABORT|WARN_IF_FALSE. Not a strong requirement, though.
Attachment #75129 - Flags: needs-work+
> It is probably a client error if we already have mNodeInfo, so we > should probably assert for that, too. Add: > > + NS_ASSERTION(!GetNodeInfo(), "forgot to call Release()?"); In some cases the previous value can be != null. When allocating a new array with nsHTMLAttrNames and setting the values manually the old value can be anything.
Comment on attachment 75488 [details] [diff] [review] addresses last comments and renames nsIHTMLAttributes.h to nsHTMLAttributes.h Just a couple of nits: >Index: content/html/style/src/nsHTMLAttributes.h >=================================================================== >+ void Release() >+ mBits = 1; Use NS_ATTRNAME_NODEINFO_BIT instead of the magic number here too. >+ nsIAtom* mAtom; // Used if mBits & NS_ATTRNAME_NODEINFO_BIT != 0 >+ nsINodeInfo* mNodeInfo; // Used if mBits & NS_ATTRNAME_NODEINFO_BIT == 0 The comments are incorrect, it is just the opposite. And even in comments I would prefer not to compare things to zero. I would suggest something like: + nsIAtom* mAtom; // Used if: !(mBits & NS_ATTRNAME_NODEINFO_BIT) + nsINodeInfo* mNodeInfo; // Used if: mBits & NS_ATTRNAME_NODEINFO_BIT With those r=heikki. If you can, put up a binary somewhere and URL into this bug so that others can easily load a test build. I will also continue testing. I can also provide a test binary unless you beat me to it.
Attachment #75488 - Flags: review+
Attached patch state-of-the-art patch (obsolete) (deleted) — Splinter Review
Attachment #75488 - Attachment is obsolete: true
the above four testcases are the once i've used for testing (though they've all mutated a bit over time so this is just the last versions of them). I also tried attachment 73128 [details] [diff] [review] from bug 119317 which uses namespaced attributes on html-elements, that patch works with this patch, but asserts and fails without the patch
Status: NEW → ASSIGNED
Comment on attachment 76092 [details] testcase: correct case for tags? yep, it works just fine. (tags *shouldn't* be affected by this patch at all, but it's good to test of course)
first two work just nicly (i.e. the serialized result looks like the parsed string, even when the cases are "wrong"). However since we still do the "treat xhtml-namespace as null namespace since we still have a lot of broken callers" thing the namespace is changed and therefor the prefixes are dropped on the last two testcases
Attached patch merge with tip (obsolete) (deleted) — Splinter Review
Attachment #75723 - Attachment is obsolete: true
The latest patch had at least nsAReadableStrings that prevent compilation, in nsHTMLAttribute*. I fixed those and it built on opt Linux. I run perf tests on that and there was no difference in average, although min & max were much better with your patch. The perf numbers could be noise, but at least it is not slowing us down.
I should mention that bloat goes down too, we save 8 bytes (vtable+refcounter) for each html-element with attributes. I havn't done any measurements how much this adds up to on normal surfing, but at least it's going in the right direction.
..this adds up to in day-to-day surfing, but it is a step in the right direction
- In HashValue(), no need for the NS_PTR_TO_INT32() for mBits, mBits is alrady a PRWord (long of some sort), and not a pointer. - In nsHTMLAttributes.cpp: - list->mNext = new nsClassList(NS_NewAtom(start)); + list->mNext = new nsHTMLClassList(NS_NewAtom(start)); This is either a leak, or a bad ownership model. But then again, you didn't really introduce this, so I guess we can live with this for a while longer... - Also, while the below change does the right thing, it's conceptually wrong: mAttrNames[mAttrCount++] = aAttrName; - NS_ADDREF(aAttrName); + aAttrName.Addref(); you want to addref mAttrNames[...] and not aAttrName, but either way... - In nsHTMLAttrName::Addref(), no big deal, but there's no need to check for if(IsAtom()), you could just cast mBits & ~NS_ATTRNAME_NODEINFO_BIT into an nsISupports (which is shared by nsIAtom and nsINodeInfo) and call AddRef() on that. Your call. Same thing in Release(). - You might want to make nsHTMLAttrName a class and not a union so that you can protect the data in it from users of it. I.e. make nsHTMLAttrName a class, and add a union as a member of the class where you put mAtom, mNodeInfo and mBits. With that, sr=jst
Attachment #76219 - Flags: superreview+
Attached patch address jsts comments (obsolete) (deleted) — Splinter Review
Attachment #76219 - Attachment is obsolete: true
Comment on attachment 76282 [details] [diff] [review] address jsts comments bringing over reviews
Attachment #76282 - Flags: superreview+
Attachment #76282 - Flags: review+
Attached patch with the new nsHTMLAttributes.h file (obsolete) (deleted) — Splinter Review
sorry, i lost the |cvs add/remove| when i recreated my tree :(
Attachment #75674 - Attachment is obsolete: true
Attachment #76282 - Attachment is obsolete: true
Comment on attachment 76293 [details] [diff] [review] with the new nsHTMLAttributes.h file a=roc+moz
Attachment #76293 - Flags: superreview+
Attachment #76293 - Flags: review+
Attachment #76293 - Flags: approval+
attachment 76293 [details] [diff] [review] is checked in. Thanks for everybody that helped with landing this. Special thanks to heikki for doing boring performance testing! bug 134278 and bug 134279 filed for remaining issues
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
arg! Seems like i left a bogus assertion in there :( patch coming up
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch remove assertion (deleted) — Splinter Review
Attachment #76293 - Attachment is obsolete: true
Comment on attachment 77444 [details] [diff] [review] remove assertion r=peterv
Attachment #77444 - Flags: review+
Comment on attachment 77444 [details] [diff] [review] remove assertion sr=jst
Attachment #77444 - Flags: superreview+
Comment on attachment 77444 [details] [diff] [review] remove assertion a=asa (on behalf of drivers) for cleanup checkin to the 1.0 trunk
Attachment #77444 - Flags: approval+
checked in
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Marking verified in the April 11th 0S X(2002-04-11-08) and April 10th Windows ME (2002-04-10-03) builds.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: