Closed
Bug 801402
Opened 12 years ago
Closed 12 years ago
Use EncodingUtils::FindEncodingForLabel instead of nsCharsetAlias::GetPreferred from HTML5 parser and DOM APIs
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
mozilla19
People
(Reporter: emk, Assigned: emk)
References
(Blocks 2 open bugs)
Details
(Keywords: dev-doc-needed)
Attachments
(8 files, 11 obsolete files)
(deleted),
patch
|
sicking
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sicking
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
hsivonen
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
khuey
:
review+
hsivonen
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
hsivonen
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
emk
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
hsivonen
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
hsivonen
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
It will take a long time to align intl/uconv to Encoding Standard because of the backward compatibility.
Assignee | ||
Comment 1•12 years ago
|
||
Assignee | ||
Comment 2•12 years ago
|
||
Attachment #672554 -
Flags: review?(jonas)
Assignee | ||
Comment 3•12 years ago
|
||
Attachment #672555 -
Flags: review?(bugs)
Assignee | ||
Comment 4•12 years ago
|
||
Attachment #672557 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 5•12 years ago
|
||
Attachment #672558 -
Flags: review?(bugs)
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #672559 -
Flags: review?(hsivonen)
Assignee | ||
Comment 7•12 years ago
|
||
Attachment #672560 -
Flags: review?(bugs)
Comment 8•12 years ago
|
||
Comment on attachment 672555 [details] [diff] [review]
Use FindEncodingForLabel from XSLT
Jonas or peterv should review changes to XSLT.
Attachment #672555 -
Flags: review?(bugs) → review?(jonas)
Updated•12 years ago
|
Attachment #672560 -
Flags: review?(bugs) → review?(hsivonen)
Assignee | ||
Updated•12 years ago
|
Attachment #672555 -
Attachment is patch: true
Comment 9•12 years ago
|
||
Comment on attachment 672558 [details] [diff] [review]
Use FindEncodingForLabel from OS.File
Sorry Jonas, I think you should review this too.
Attachment #672558 -
Flags: review?(bugs) → review?(jonas)
Assignee | ||
Comment 10•12 years ago
|
||
Comment on attachment 672553 [details] [diff] [review]
Make EncodingUtil methods case sensitive
This patch caused many test failures.
Attachment #672553 -
Attachment is obsolete: true
Attachment #672553 -
Flags: review?(jonas)
Comment on attachment 672560 [details] [diff] [review]
Use FindEncodingForLabel from XML parser
The new method returns a label in the lower case and the returned label is used in a context that expect a Gecko-canonical label. Unfortunately, those are not all lower-case, yet.
Attachment #672560 -
Flags: review?(hsivonen) → review-
Comment on attachment 672559 [details] [diff] [review]
Use FindEncodingForLabel from HTML parser
Same problem as with the SAX patch.
Attachment #672559 -
Flags: review?(hsivonen) → review-
Oh. I didn’t look at the patch that changed what EncodingUtils return. But looks like that’s obsolete now.
Assignee | ||
Comment 14•12 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #13)
> Oh. I didn’t look at the patch that changed what EncodingUtils return. But
> looks like that’s obsolete now.
Yup, I'll attach an updated patch once test failures are resolved.
Assignee | ||
Comment 15•12 years ago
|
||
- Fixed some bonehead mistakes.
- Made EncodingUtils thread-safe.
Attachment #673022 -
Flags: review?(jonas)
Assignee | ||
Comment 16•12 years ago
|
||
Ah, one hank lacked.
Attachment #673022 -
Attachment is obsolete: true
Attachment #673022 -
Flags: review?(jonas)
Attachment #673025 -
Flags: review?(jonas)
Assignee | ||
Comment 17•12 years ago
|
||
Fixed some tests.
Attachment #672554 -
Attachment is obsolete: true
Attachment #672554 -
Flags: review?(jonas)
Attachment #673027 -
Flags: review?(jonas)
Assignee | ||
Comment 18•12 years ago
|
||
Fixed a test.
Attachment #672557 -
Attachment is obsolete: true
Attachment #672557 -
Flags: review?(bent.mozilla)
Attachment #673028 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 19•12 years ago
|
||
Henri, could you rethink on the premise of the change of EncodingUtils?
Attachment #672559 -
Attachment is obsolete: true
Attachment #673032 -
Flags: review?(hsivonen)
Assignee | ||
Updated•12 years ago
|
Attachment #672560 -
Flags: review- → review?(hsivonen)
Comment on attachment 673025 [details] [diff] [review]
Make EncodingUtil methods case sensitive
Drive-by comments:
Adding a mutex seems unfortunate. I’d prefer a design that only reads the shared data and, therefore, doesn’t need a mutex. The old alias code does not need a mutex, because it works with plain old C data in a read-only fashion: http://mxr.mozilla.org/mozilla-central/source/intl/locale/src/nsUConvPropertySearch.cpp#10
> CopyASCIItoUTF16(mEncoding, aEncoding);
> nsContentUtils::ASCIIToLower(aEncoding);
Why ASCIIToUTF16 instead of UTF8ToUTF16? What does the ASCIIToUTF16 version do to non-ASCII bytes?
Attachment #672560 -
Flags: review?(hsivonen) → review+
Comment on attachment 673032 [details] [diff] [review]
Use FindEncodingForLabel from HTML parser
>@@ -1186,21 +1188,18 @@ nsHtml5StreamParser::PreferredForInterna
>- rv = nsCharsetAlias::GetPreferred(newEncoding, preferred);
>- if (NS_FAILED(rv)) {
>+ if (!EncodingUtils::FindEncodingForLabel(newEncoding, preferred)) {
> // This charset has been blacklisted for permitting XSS smuggling.
> // EncMetaNonRoughSuperset is a reasonable approximation to the
> // right error message.
For this code to continue to make sense, you need to change the earlier nsCharsetAlias::Equals call to use the same data as EncodingUtils::FindEncodingForLabel. Once you change that, if your new Equals method has the same rv semantics as the old one (the old one returns a failure rv for bogus labels rather than returning success rv and setting the boolean to false), this call should never fail, because unsupported encodings would have been caught by the Equals check returning a failure rv for a bogus label. If you introduce an equals check with more intuitive semantics (merely returns false for bogus labels), then you need to make sure the error messages still work the right way with bogus labels but catching the bogus labels by other means (or with FindEncodingForLabel here and rearranging the error messages).
Attachment #673032 -
Flags: review?(hsivonen) → review-
In general, nsCharsetAlias::Equals calls need to be replaced with something that uses the same data EncodingUtils::FindEncodingForLabel.
Assignee | ||
Comment 23•12 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #20)
> Adding a mutex seems unfortunate. I’d prefer a design that only reads the
> shared data and, therefore, doesn’t need a mutex. The old alias code does
> not need a mutex, because it works with plain old C data in a read-only
> fashion:
> http://mxr.mozilla.org/mozilla-central/source/intl/locale/src/
> nsUConvPropertySearch.cpp#10
I borrowed nsUConvPropertySearch. Mutex is not longer present. Now EncodingUtils is completely static class.
> > CopyASCIItoUTF16(mEncoding, aEncoding);
> > nsContentUtils::ASCIIToLower(aEncoding);
>
> Why ASCIIToUTF16 instead of UTF8ToUTF16? What does the ASCIIToUTF16 version
> do to non-ASCII bytes?
mEncoding is alwas ASCII.
Attachment #673025 -
Attachment is obsolete: true
Attachment #673025 -
Flags: review?(jonas)
Attachment #673583 -
Flags: review?(jonas)
Assignee | ||
Comment 24•12 years ago
|
||
Fixed one more test.
Attachment #673027 -
Attachment is obsolete: true
Attachment #673027 -
Flags: review?(jonas)
Attachment #673584 -
Flags: review?(jonas)
Assignee | ||
Comment 25•12 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #21)
> >@@ -1186,21 +1188,18 @@ nsHtml5StreamParser::PreferredForInterna
>
> >- rv = nsCharsetAlias::GetPreferred(newEncoding, preferred);
> >- if (NS_FAILED(rv)) {
> >+ if (!EncodingUtils::FindEncodingForLabel(newEncoding, preferred)) {
> > // This charset has been blacklisted for permitting XSS smuggling.
> > // EncMetaNonRoughSuperset is a reasonable approximation to the
> > // right error message.
>
> For this code to continue to make sense, you need to change the earlier
> nsCharsetAlias::Equals call to use the same data as
> EncodingUtils::FindEncodingForLabel. Once you change that, if your new
> Equals method has the same rv semantics as the old one (the old one returns
> a failure rv for bogus labels rather than returning success rv and setting
> the boolean to false), this call should never fail, because unsupported
> encodings would have been caught by the Equals check returning a failure rv
> for a bogus label. If you introduce an equals check with more intuitive
> semantics (merely returns false for bogus labels), then you need to make
> sure the error messages still work the right way with bogus labels but
> catching the bogus labels by other means (or with FindEncodingForLabel here
> and rearranging the error messages).
I've rewritten PreferredForInternalEncodingDecl so that nsCharsetAlias::Equals is no longer needed.
(In reply to Henri Sivonen (:hsivonen) from comment #22)
> In general, nsCharsetAlias::Equals calls need to be replaced with something
> that uses the same data EncodingUtils::FindEncodingForLabel.
HTML5 parser and old HTML parser was the only callers of nsCharsetAlias::Equals.
Attachment #673032 -
Attachment is obsolete: true
Attachment #673586 -
Flags: review?(hsivonen)
Assignee | ||
Comment 26•12 years ago
|
||
Green on try.
https://tbpl.mozilla.org/?tree=Try&rev=78f3ec0385c6
Attachment #673586 -
Flags: review?(hsivonen) → review+
Comment on attachment 673583 [details] [diff] [review]
Make EncodingUtil methods case sensitive
Did this patch bitrot or does it need another patch to be applied first?
Blocks: 796882
Assignee | ||
Comment 28•12 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #27)
> Comment on attachment 673583 [details] [diff] [review]
> Make EncodingUtil methods case sensitive
>
> Did this patch bitrot or does it need another patch to be applied first?
It needs attachment 671648 [details] [diff] [review] in bug 801487 patch first.
Assignee | ||
Comment 29•12 years ago
|
||
The attachment 671648 [details] [diff] [review] has been landed.
No longer depends on: 801487
Attachment #672555 -
Flags: review?(jonas) → review+
Attachment #672558 -
Flags: review?(jonas) → review+
Comment on attachment 673583 [details] [diff] [review]
Make EncodingUtil methods case sensitive
Review of attachment 673583 [details] [diff] [review]:
-----------------------------------------------------------------
I think Henri would be a better reviewer.
Attachment #673583 -
Flags: review?(jonas) → review?(hsivonen)
Comment on attachment 673584 [details] [diff] [review]
Use EncodingUtils::FindEncodingForLabel from DOM
Review of attachment 673584 [details] [diff] [review]:
-----------------------------------------------------------------
::: content/base/src/nsXMLHttpRequest.cpp
@@ +755,5 @@
> nsAutoCString charsetVal;
> + bool ok = channel ? NS_SUCCEEDED(channel->GetContentCharset(charsetVal)) :
> + false;
> + if (ok) {
> + ok = EncodingUtils::FindEncodingForLabel(charsetVal, mResponseCharset);
bool ok = channel &&
NS_SUCCEEDED(...) &&
EncodingUtils::Find...;
Attachment #673584 -
Flags: review?(jonas) → review+
Assignee | ||
Comment 33•12 years ago
|
||
Fixed a bitrot by bug 801487 update & bug 799917 backout.
Attachment #673583 -
Attachment is obsolete: true
Attachment #673583 -
Flags: review?(hsivonen)
Attachment #678708 -
Flags: review?(hsivonen)
Assignee | ||
Comment 34•12 years ago
|
||
Unbitrotted & resolved a review comment.
Attachment #673584 -
Attachment is obsolete: true
Attachment #678713 -
Flags: review+
Comment on attachment 678708 [details] [diff] [review]
Make EncodingUtil methods case sensitive
Looks good to me.
Attachment #678708 -
Flags: review?(hsivonen) → review+
Comment on attachment 673586 [details] [diff] [review]
Use FindEncodingForLabel from HTML parser
Bug 716579 bitrotted the nsParser and nsScanner parts of this patch. :-(
Assignee | ||
Updated•12 years ago
|
Attachment #673586 -
Attachment is obsolete: true
Assignee | ||
Comment 38•12 years ago
|
||
Comment on attachment 673028 [details] [diff] [review]
Use FindEncodingForLabel from FileReaderSync
Bent seems to be very busy.
Attachment #673028 -
Flags: review?(bent.mozilla) → review?(khuey)
Assignee | ||
Comment 39•12 years ago
|
||
Comment on attachment 678878 [details] [diff] [review]
Use FindEncodingForLabel from HTML parser
These changes themselves are OK, but I think the code in nsScanner after this patch can get confused and assert fatally if you don’t also change the alias resolution in nsDocument::TryChannelCharset to use the new alias code.
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsDocument.cpp#3047
Attachment #678878 -
Flags: review?(hsivonen) → review+
Assignee | ||
Comment 41•12 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #40)
> Comment on attachment 678878 [details] [diff] [review]
> Use FindEncodingForLabel from HTML parser
>
> These changes themselves are OK, but I think the code in nsScanner after
> this patch can get confused and assert fatally if you don’t also change the
> alias resolution in nsDocument::TryChannelCharset to use the new alias code.
> http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsDocument.
> cpp#3047
It's changed in attachment #678713 [details] [diff] [review].
Assignee | ||
Comment 42•12 years ago
|
||
The worker part can be landed separately.
Keywords: checkin-needed
Whiteboard: [leave open][land except 673028]
Comment 43•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/931d42b01133
https://hg.mozilla.org/integration/mozilla-inbound/rev/e85222a1e777
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4b9eb815efa
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2a8c4a79453
https://hg.mozilla.org/integration/mozilla-inbound/rev/c0af6d983c0e
https://hg.mozilla.org/integration/mozilla-inbound/rev/eb7d1fd8a868
Flags: in-testsuite+
Keywords: checkin-needed
Whiteboard: [leave open][land except 673028] → [leave open]
Updated•12 years ago
|
Attachment #672555 -
Flags: checkin+
Updated•12 years ago
|
Attachment #672558 -
Flags: checkin+
Updated•12 years ago
|
Attachment #672560 -
Flags: checkin+
Updated•12 years ago
|
Attachment #678708 -
Flags: checkin+
Updated•12 years ago
|
Attachment #678713 -
Flags: checkin+
Updated•12 years ago
|
Attachment #678878 -
Flags: checkin+
Comment 44•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/931d42b01133
https://hg.mozilla.org/mozilla-central/rev/e85222a1e777
https://hg.mozilla.org/mozilla-central/rev/b4b9eb815efa
https://hg.mozilla.org/mozilla-central/rev/b2a8c4a79453
https://hg.mozilla.org/mozilla-central/rev/c0af6d983c0e
https://hg.mozilla.org/mozilla-central/rev/eb7d1fd8a868
Assignee | ||
Comment 45•12 years ago
|
||
Now form submission uses encoders so we need this workaround again until bug 562091 is fixed.
Attachment #679731 -
Flags: review?(hsivonen)
Assignee | ||
Comment 46•12 years ago
|
||
Comment on attachment 673028 [details] [diff] [review]
Use FindEncodingForLabel from FileReaderSync
Henri, could you review this? This patch shouldn't require any knowledge specific to workers. I really want to land this part before aurora uplifting.
Attachment #673028 -
Flags: review?(hsivonen)
Assignee | ||
Comment 47•12 years ago
|
||
Comment on attachment 679731 [details] [diff] [review]
Reintroduce x-windows-949 hack
r+, but this is so sad. We really need to align our internal encoding names with the Encoding Standard.
Attachment #679731 -
Flags: review?(hsivonen) → review+
Comment on attachment 673028 [details] [diff] [review]
Use FindEncodingForLabel from FileReaderSync
I’m rather surprised that our old code assumed big endian with the "UTF-16" label and no BOM.
Attachment #673028 -
Flags: review?(hsivonen) → review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Whiteboard: [leave open]
Attachment #673028 -
Flags: review?(khuey) → review+
Comment 50•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/1889ff1d55da
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e88fade2d0c
Keywords: checkin-needed
Updated•12 years ago
|
Attachment #679731 -
Flags: checkin+
Updated•12 years ago
|
Attachment #673028 -
Flags: checkin+
Comment 51•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/1889ff1d55da
https://hg.mozilla.org/mozilla-central/rev/3e88fade2d0c
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Depends on: 814900
Assignee | ||
Comment 52•12 years ago
|
||
Since Firefox 19, Web pages can use only encodings defined in <http://encoding.spec.whatwg.org/>.
Keywords: dev-doc-needed
Comment 53•12 years ago
|
||
This needs to be backed out of FF19 due to bug 782412, which caused "users will not be able to open Web pages encoded in UTF-16 big endian".
tracking-firefox19:
--- → +
Comment 54•12 years ago
|
||
(In reply to Alex Keybl [:akeybl] from comment #53)
> This needs to be backed out of FF19 due to bug 782412, which caused "users
> will not be able to open Web pages encoded in UTF-16 big endian".
Nevermind we'll take bug 814900 instead.
tracking-firefox19:
+ → ---
Depends on: 844461
Depends on: 845743
Depends on: 847880
Depends on: 847886
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•