Closed
Bug 935453
Opened 11 years ago
Closed 11 years ago
Gather telemetry about HZ and other encodings we might try to remove
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: hsivonen, Assigned: hsivonen)
References
Details
(Whiteboard: [qa-])
Attachments
(1 file)
(deleted),
patch
|
emk
:
review+
geekboy
:
review+
|
Details | Diff | Splinter Review |
Of the encodings that we do support, I think HZ is by far the scariest
one, because it's escape delimiters are printable characters in ASCII
and because it's fairly easy to construct attack byte sequences that
round-trip from HZ to Unicode and back.
I think we should investigate if we could get away with removing
support for HZ and mapping its labels to the replacement encoding
without breaking the Web too much.
Assignee | ||
Comment 1•11 years ago
|
||
Morphing bug scope to save trouble creating a large number of bugs and patches.
Summary: Gather telemetry about HZ usage → Gather telemetry about HZ and other encodings we might try to remove
Assignee | ||
Comment 2•11 years ago
|
||
Attachment #829214 -
Flags: review?(VYV03354)
Assignee | ||
Comment 3•11 years ago
|
||
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes
Requesting policy compliance review from sstamm. This patch adds telemetry for recording if certain character encoding decoders have been instantiated during the session. I believe this falls under measuring "feature usage".
Attachment #829214 -
Flags: review?(sstamm)
Comment 4•11 years ago
|
||
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes
Note that some encodings are instantiated solely for internal usage.
For example, Thebes uses Mac encodings and x-johab.
https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.cpp?rev=c51e46899a05#1149
https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.cpp?rev=c51e46899a05#1187
x-johab is also used by the OS/2 port.
https://mxr.mozilla.org/mozilla-central/source/widget/os2/nsOS2Uni.cpp?rev=fd7a0ace6b0e#24
>+ Telemetry::Accumulate(Telemetry::DECODER_INSTANTIATED_ISO2022JP, true);
I regularly encounter Web pages encoded by ISO-2022-JP. Mailnews will encounter it much more frequently. This will only degrade performance.
Also, ISO-2022-JP is defined a non-replacement encoding in the Encoding spec. I don't understand why do you try to remove ISO-2022-JP so eagerly and it is totally unacceptable for Japanese users (at least for me).
Attachment #829214 -
Flags: review?(VYV03354) → review-
Assignee | ||
Comment 5•11 years ago
|
||
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes
(In reply to Masatoshi Kimura [:emk] from comment #4)
> Comment on attachment 829214 [details] [diff] [review]
> Add telemetry probes
>
> Note that some encodings are instantiated solely for internal usage.
> For example, Thebes uses Mac encodings and x-johab.
> https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.
> cpp?rev=c51e46899a05#1149
> https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.
> cpp?rev=c51e46899a05#1187
I'm trying to learn if that code actually ends up being used. No one replied when I asked in dev-platform. Probably because no one knows withou telemetry.
> x-johab is also used by the OS/2 port.
> https://mxr.mozilla.org/mozilla-central/source/widget/os2/nsOS2Uni.
> cpp?rev=fd7a0ace6b0e#24
OK. Doesn't hurt to see if we can not compile it for other platforms.
> >+ Telemetry::Accumulate(Telemetry::DECODER_INSTANTIATED_ISO2022JP, true);
>
> I regularly encounter Web pages encoded by ISO-2022-JP. Mailnews will
> encounter it much more frequently. This will only degrade performance.
> Also, ISO-2022-JP is defined a non-replacement encoding in the Encoding
> spec. I don't understand why do you try to remove ISO-2022-JP so eagerly and
> it is totally unacceptable for Japanese users (at least for me).
I'm not trying to remove that one. I'm trying to understand it's level of usage to contrast with HZ. (Marking it as .notForBrowser in the other patch was an accident.)
Attachment #829214 -
Flags: review- → review?(VYV03354)
Comment 6•11 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #5)
> I'm not trying to remove that one. I'm trying to understand it's level of
> usage to contrast with HZ. (Marking it as .notForBrowser in the other patch
> was an accident.)
I thought because the subject says "other encodings we might try to remove" :)
I'm not yet convinced of the usefulness of counting ISO-2022-JP usage here. ISO-2022-JP is the default encoding of Japanese Thunderbird. I'm fine with other probably-rarely-used-encodings.
Comment 7•11 years ago
|
||
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes
Review of attachment 829214 [details] [diff] [review]:
-----------------------------------------------------------------
(In reply to Henri Sivonen (:hsivonen) from comment #3)
> Requesting policy compliance review from sstamm. This patch adds telemetry
> for recording if certain character encoding decoders have been instantiated
> during the session. I believe this falls under measuring "feature usage".
I don't so much do policy compliance review anymore, but I agree this is more or less "feature usage." If I understand correctly, it's feature usage triggered by the sites a user visits, so it's different than changing the default font or something. This means we learn a little bit about what sites a user browses, though it doesn't seem to be a whole lot. If the aim is to identify rarely used decoders and remove them, I think it's reasonable to collect this data.
Attachment #829214 -
Flags: review?(sstamm) → review+
Comment 8•11 years ago
|
||
CC'ing Curtis and Alina as an FYI.
Assignee | ||
Comment 9•11 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #6)
> I'm not yet convinced of the usefulness of counting ISO-2022-JP usage here.
It's one of the XSS-dangerous encodings. It seems weird to avoid having data about its usage. Are you sure that its usage on the Web (not email) is so high that it's useless to compare the relative usage of ISO-2022-JP in the Japanese Firefox localization with the relative usage of HZ in the Simplified Chinese localization as a way of understanding if HZ is used too much for HZ to be suitable for removal?
Anyway, I want to land. Do I get r=emk if I remove ISO-2022-JP counting?
Comment 10•11 years ago
|
||
I think we should measure it. It seems good to have statistics on it and not rely on assumptions.
Comment 11•11 years ago
|
||
Comment on attachment 829214 [details] [diff] [review]
Add telemetry probes
OK, go for it.
Attachment #829214 -
Flags: review?(VYV03354) → review+
Assignee | ||
Comment 12•11 years ago
|
||
Comment 13•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Updated•11 years ago
|
Whiteboard: [qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•