Closed Bug 246194 Opened 21 years ago Closed 21 years ago

Crash setting character encoding to Unicode (UTF-16) on any random website [@ nsBlockFrame::GetFrameForPointUsing ]

Categories

(Core :: Internationalization, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: martijn.martijn, Assigned: jshin1987)

References

()

Details

(Keywords: crash, verified1.7, Whiteboard: fixed-aviary1.0)

Crash Data

Attachments

(3 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040608 Firefox/0.8.0+ Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040608 Firefox/0.8.0+ Talkback ID: TB81836Z http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=2&type=iid&id=TB81836Z It happens when I change the Character Encoding to UTF-16. Do this: View -> Character Encoding -> More -> Unicode (UTF-16) It does not happen with Unicode(UTF-16 Little Endian) or Unicode (UTF-16 Little Endian) or Unicode (UTF-8). Originally discussed here (second part of it): http://forums.mozillazine.org/viewtopic.php?p=570536#570536 Also happens with: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a) Gecko/2004-05-17 Reproducible: Always Steps to Reproduce: 1. View -> Character Encoding -> More -> Unicode (UTF-16) 2. 3. Actual Results: Crash Expected Results: No crash
Confirming with Firefox0.9rc under Windows 2003
Severity: normal → critical
Summary: Crash setting character encoding to Unicode (UTF-16) on any random website → Crash setting character encoding to Unicode (UTF-16) on any random website [@nsBlockFrame::GetFrameForPointUsing]
also crashes Windows Mozilla 1.7rc3 : TB84664H
Flags: blocking1.7?
Attached file Mozilla debug stack 20040604 Linux (deleted) —
I get this assertion before crashing Charset Overlay menu item pressed: UTF-16 ###!!! ASSERTION: Too few bytes in input: '*aSrcLength >= 2', file nsUCS2BEToUnicode.cpp, line 221 Break: at file nsUCS2BEToUnicode.cpp, line 221
Keywords: crash
OS: Windows 2000 → All
Summary: Crash setting character encoding to Unicode (UTF-16) on any random website [@nsBlockFrame::GetFrameForPointUsing] → Crash setting character encoding to Unicode (UTF-16) on any random website [@ nsBlockFrame::GetFrameForPointUsing ]
The assertion was added in bug 235090 on 2004-04-23. I'm not sure if this is a regression caused by that particular patch but can someone perhaps test that by backing out that patch?
Well, I don't see a Unicode (UTF-16) menu option in: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7b) Gecko/2004-04-21 But I see it in: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a) Gecko/2004-04-26 And in that build it crashes too.
Stack is similar to bug 242620 and bug 235405. It seems that changing to UTF-16 is making all the content disappear from under us and triggering that bug(s). jshin, do you think that at http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvlatin/nsUCS2BEToUnicode.cpp#250 we should be setting mState to 0?
Wrong again: the source of the problem is in zeroing *aSrcLength and *aDstLength at the same place. The caller at http://lxr.mozilla.org/seamonkey/source/parser/htmlparser/src/nsScanner.cpp#354 then consumes one byte from source and inserts a replacement character in destination, decrements source length and destination length by one each and tries to proceed with the conversion. First problem: mState is still 2, so the converter tries again to find a BOM where no BOM should be (and my idea in the last comment of zeroing mState in the converter doesn't help, because the caller calls Reset()). The converter returns failure again and the loop continues to the end of the source buffer. Second problem: since the caller expects source length to be in octets and destination length to be in PRUnichars, the destination buffer is overrun halfway through the loop.
Do we even want to be returning NS_ERROR_ILLEGAL_INPUT when we don't suceed in detecting endianness? Maybe better to default to big-endian and carry on. Obviously in the situation here we're going to end up displaying garbage whatever we do, but if the input is really some flavour of UTF-16, we have a 50% chance of getting it right.
Assignee: general → smontagu
Component: Browser-General → Internationalization
QA Contact: general → amyy
Attached patch patch (deleted) — Splinter Review
That's exactly what I've been thinking :-).
Comment on attachment 150462 [details] [diff] [review] patch while you're here, please correct the typo "edinanness". With that, r=smontagu
Attachment #150462 - Flags: review+
Comment on attachment 150462 [details] [diff] [review] patch thanks for the prompt r and the analysis that led to the patch. I'll fix the typos when landing. i asked dbaron for sr on irc. i'm asking a1.7 at the same time because it's a crash and the patch is simple enough that makes mozilla work the way it works for non-UTF-16 pages when UTF-16LE/UTF-16BE is explicitly selected.
Attachment #150462 - Flags: superreview?(dbaron)
Attachment #150462 - Flags: approval1.7?
Comment on attachment 150462 [details] [diff] [review] patch Since this is for the Web, it seems more like you should pick an endianness rather than using platform endianness. How about little-endian (since x86 is)?
you mean x86 is predominant so that LE is a lot more likely than BE, right? I agree. I'll do that.
David, by your own logic I would say pick big-endian, following clause D42 in chapter 3 of the Unicode standard http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf: "The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian."
Assignee: smontagu → jshin
OK, big-endian then. and sr=dbaron with that change
Comment on attachment 150462 [details] [diff] [review] patch a=asa (on behalf of drivers) for checkin to 1.7 branch. Time is short so please try to get this in in the next couple of hours if at all possible. Thanks.
Attachment #150462 - Flags: approval1.7? → approval1.7+
thanks for sr/a. fix landed in both trunk and 1.7branch
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
There's still the "edinanness" typo.
Attachment #150462 - Flags: superreview?(dbaron) → superreview+
thanks for the note. it turned out that there were two typos (it was 'edinaness') and I fixed only the first 'ed->end' in the prev. checkin. i've just fixed the second.
Just thought that for aviary-1.0 and ff 0.9 branch landing, I'd better put this up.
Keywords: fixed1.7
Whiteboard: fixed-aviary1.0
Flags: blocking1.7?
Verified on Mozilla 1.7 branch. Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040617.
Keywords: fixed1.7verified1.7
Crash Signature: [@ nsBlockFrame::GetFrameForPointUsing ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: