Closed Bug 945213 Opened 11 years ago Closed 4 years ago

Align our heuristic detection of Japanese encodings with Trident or WebKit

Categories

(Core :: Internationalization, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: hsivonen, Unassigned)

References

Details

(Keywords: perf)

The way we do heuristic detection has the drawback that it may cause a reload in the mid-parse. Superficially, it seems that other browsers don't suffer from this problem. We should investigate what Trident and WebKit do about Japanese encoding detection and align our detection method with whichever makes more sense: the Trident method or the WebKit method. The WebKit code is http://trac.webkit.org/browser/trunk/Source/WebCore/loader/TextResourceDecoder.cpp#L157 We should also find out why Blink devs threw away the WebKit code and started using ICU's detector instead: https://mxr.mozilla.org/chromium/source/src/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp#40
The WebKit method would probably mean undoing bug 620106 and returning to a state similar to the one the HTML5 parser was in before bug 620106 was fixed. As is usual with bugs related to encoding detection, http://planet.debian.or.jp/ has fixed itself since.
Severity: normal → enhancement
Keywords: perf
FWIW, the site reported at https://webcompat.com/issues/15193 should render legibly when using the Japanese localization of Firefox. (But, yeah, having it be localization-dependent isn't ideal.)

See https://hsivonen.fi/utf-8-detection/ for the explanation of issues in this space.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.