Closed Bug 1631983 Opened 5 years ago Closed 4 years ago

windows-1252 English apostrophe contraction I’v detected as windows-1254

Categories

(Core :: Internationalization, defect, P2)

73 Branch
Desktop
All
defect

Tracking

()

RESOLVED FIXED
mozilla78
Tracking Status
firefox-esr68 --- unaffected
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- fixed

People

(Reporter: valera, Assigned: hsivonen)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15

Steps to reproduce:

Go to:
http://heroescommunity.com/viewthread.php3?TID=22792&pagenumber=15
Or any other view thread page on heroescommunity.com
This forum worked perfectly with firefox for many years until a recent firefox release.

Actual results:

Browser reloads the page the moment it finishes loading, triggering the site’s DDOS defences. This only seems to happen on thread view pages. They worked fine for the last several years until the recent firefox update that added this auto-reload behavior.

Expected results:

Page should not reload itself when it finishes loading.

Status: UNCONFIRMED → NEW
Has Regression Range: --- → yes
Component: Untriaged → Internationalization
Ever confirmed: true
Keywords: regression
OS: Unspecified → All
Product: Firefox → Core
Regressed by: 1551276
Hardware: Unspecified → Desktop
Version: 75 Branch → 73 Branch

So the site has two relevant bugs, and the Firefox mechanism for dealing with those bugs also has a bug and arguably a second one.

The two site bugs are:

  1. The site doesn't declare its encoding. To solve this problem, please put <meta charset=windows-1252> in the site template between the <head> tag and the <title> tag.
  2. In the page footer, the tags <a href="post.php3?action=newpost&FID=8" rel="nofollow"> and <a href="post.php3?action=reply&TID=22792&pn=15" rel="nofollow"> are not indented only by ASCII spaces, but there are no-break space characters in what's supposed to be indent.

The Firefox bug is that the string

I’v

encoded as windows-1252 is detected as windows-1254 rather than windows-1252. I will need to investigate more to understand why that happens. It doesn't happen when I run the detector in isolation of the rest of Firefox.

Arguably the second Firefox bug is that the IBM866 probe can be triggered solely by byte 0xA0 (no-break space in windows-1252, common Cyrillic letter in IBM866). The IBM866 probe should have a special case to require it to see more bytes that can be interpreted as Cyrillic letters before it can trigger.

I'm not going to treat the reloading itself as a bug, since this a site having this eager DDoS defense has not been reported more often.

Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Summary: Latest version of firefox reloads pages once they finish loading → windows-1252 English apostrophe contraction I’v detected as windows-1254

I will need to investigate more to understand why that happens. It doesn't happen when I run the detector in isolation of the rest of Firefox.

This doesn't happen with

n’t

which suggests that this is about I in Turkish case-folding to dotless ı and that being a common character at word-final position.

Priority: -- → P2

(In reply to Henri Sivonen (:hsivonen) from comment #2)

  1. The site doesn't declare its encoding. To solve this problem, please put <meta charset=windows-1252> in the site template between the <head> tag and the <title> tag.

This fix has been implemented on the site. Thanks!

Thank you for the prompt responses, much appreciated. I’ve implemented the charset tag in the site’s header which seems to have stopped the insta-reloads in Firefox. Having the reload-generating bug be fixed in one of the upcoming releases of Firefox would be great too as it wasn’t there few versions ago.

  • Avoid misdetecting windows-1252 English as windows-1254.
  • Avoid misdetecting windows-1252 English as IBM866.
  • Avoid misdetecting windows-1252 English as GBK or EUC-KR.
  • Improve Chinese and Japanese detection by not giving single-byte encodings score for letter next to digit.
  • Improve Italian, Portuguese, Castilian, Catalan, and Galician detection by taking into account ordinal indicator use.
  • Reduce lookup table size.

Binary size

The code size grew more than the data size shrunk. :-(

I've verified that the patch detects the page as it was reported (before adding <meta charset=windows-1252>) as windows-1252.

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/23537 for changes under testing/web-platform/tests
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Upstream PR merged by moz-wptsync-bot
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78

The patch landed in nightly and beta is affected.
:hsivonen, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(hsivonen)

Too late to uplift now. :-(

Flags: needinfo?(hsivonen)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: