Closed Bug 935453 Opened 11 years ago Closed 11 years ago

Gather telemetry about HZ and other encodings we might try to remove

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

(Whiteboard: [qa-])

Attachments

(1 file)

Of the encodings that we do support, I think HZ is by far the scariest one, because it's escape delimiters are printable characters in ASCII and because it's fairly easy to construct attack byte sequences that round-trip from HZ to Unicode and back. I think we should investigate if we could get away with removing support for HZ and mapping its labels to the replacement encoding without breaking the Web too much.
Morphing bug scope to save trouble creating a large number of bugs and patches.
Summary: Gather telemetry about HZ usage → Gather telemetry about HZ and other encodings we might try to remove
Attached patch Add telemetry probes (deleted) — Splinter Review
Attachment #829214 - Flags: review?(VYV03354)
Comment on attachment 829214 [details] [diff] [review] Add telemetry probes Requesting policy compliance review from sstamm. This patch adds telemetry for recording if certain character encoding decoders have been instantiated during the session. I believe this falls under measuring "feature usage".
Attachment #829214 - Flags: review?(sstamm)
Comment on attachment 829214 [details] [diff] [review] Add telemetry probes Note that some encodings are instantiated solely for internal usage. For example, Thebes uses Mac encodings and x-johab. https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.cpp?rev=c51e46899a05#1149 https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.cpp?rev=c51e46899a05#1187 x-johab is also used by the OS/2 port. https://mxr.mozilla.org/mozilla-central/source/widget/os2/nsOS2Uni.cpp?rev=fd7a0ace6b0e#24 >+ Telemetry::Accumulate(Telemetry::DECODER_INSTANTIATED_ISO2022JP, true); I regularly encounter Web pages encoded by ISO-2022-JP. Mailnews will encounter it much more frequently. This will only degrade performance. Also, ISO-2022-JP is defined a non-replacement encoding in the Encoding spec. I don't understand why do you try to remove ISO-2022-JP so eagerly and it is totally unacceptable for Japanese users (at least for me).
Attachment #829214 - Flags: review?(VYV03354) → review-
Comment on attachment 829214 [details] [diff] [review] Add telemetry probes (In reply to Masatoshi Kimura [:emk] from comment #4) > Comment on attachment 829214 [details] [diff] [review] > Add telemetry probes > > Note that some encodings are instantiated solely for internal usage. > For example, Thebes uses Mac encodings and x-johab. > https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils. > cpp?rev=c51e46899a05#1149 > https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils. > cpp?rev=c51e46899a05#1187 I'm trying to learn if that code actually ends up being used. No one replied when I asked in dev-platform. Probably because no one knows withou telemetry. > x-johab is also used by the OS/2 port. > https://mxr.mozilla.org/mozilla-central/source/widget/os2/nsOS2Uni. > cpp?rev=fd7a0ace6b0e#24 OK. Doesn't hurt to see if we can not compile it for other platforms. > >+ Telemetry::Accumulate(Telemetry::DECODER_INSTANTIATED_ISO2022JP, true); > > I regularly encounter Web pages encoded by ISO-2022-JP. Mailnews will > encounter it much more frequently. This will only degrade performance. > Also, ISO-2022-JP is defined a non-replacement encoding in the Encoding > spec. I don't understand why do you try to remove ISO-2022-JP so eagerly and > it is totally unacceptable for Japanese users (at least for me). I'm not trying to remove that one. I'm trying to understand it's level of usage to contrast with HZ. (Marking it as .notForBrowser in the other patch was an accident.)
Attachment #829214 - Flags: review- → review?(VYV03354)
(In reply to Henri Sivonen (:hsivonen) from comment #5) > I'm not trying to remove that one. I'm trying to understand it's level of > usage to contrast with HZ. (Marking it as .notForBrowser in the other patch > was an accident.) I thought because the subject says "other encodings we might try to remove" :) I'm not yet convinced of the usefulness of counting ISO-2022-JP usage here. ISO-2022-JP is the default encoding of Japanese Thunderbird. I'm fine with other probably-rarely-used-encodings.
Comment on attachment 829214 [details] [diff] [review] Add telemetry probes Review of attachment 829214 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Henri Sivonen (:hsivonen) from comment #3) > Requesting policy compliance review from sstamm. This patch adds telemetry > for recording if certain character encoding decoders have been instantiated > during the session. I believe this falls under measuring "feature usage". I don't so much do policy compliance review anymore, but I agree this is more or less "feature usage." If I understand correctly, it's feature usage triggered by the sites a user visits, so it's different than changing the default font or something. This means we learn a little bit about what sites a user browses, though it doesn't seem to be a whole lot. If the aim is to identify rarely used decoders and remove them, I think it's reasonable to collect this data.
Attachment #829214 - Flags: review?(sstamm) → review+
CC'ing Curtis and Alina as an FYI.
(In reply to Masatoshi Kimura [:emk] from comment #6) > I'm not yet convinced of the usefulness of counting ISO-2022-JP usage here. It's one of the XSS-dangerous encodings. It seems weird to avoid having data about its usage. Are you sure that its usage on the Web (not email) is so high that it's useless to compare the relative usage of ISO-2022-JP in the Japanese Firefox localization with the relative usage of HZ in the Simplified Chinese localization as a way of understanding if HZ is used too much for HZ to be suitable for removal? Anyway, I want to land. Do I get r=emk if I remove ISO-2022-JP counting?
I think we should measure it. It seems good to have statistics on it and not rely on assumptions.
Comment on attachment 829214 [details] [diff] [review] Add telemetry probes OK, go for it.
Attachment #829214 - Flags: review?(VYV03354) → review+
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: