Closed Bug 5723 Opened 26 years ago Closed 26 years ago

parser strip out Unicode U+xx00 from html attribute

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: ftang, Assigned: rickg)

References

()

Details

(Whiteboard: DEPEND - Intl)

I find this problem when I try to fix the form submission for non ISO-8859-1 character set. 1. Select "View:Default Character Set" to "Shift_JIS" 2. Set a break point in SetAttribute function (static) in nsHTMLContentSink.cpp 3. Go to the above url. 4. You will find out all those ALTTEXT which should have 4 characters only have two characters. All the characters in U+xx00 (for example U+6700 ) are strip off by Tokenizer. Note: you don't need to use Japanese system or even install Japanese font to debug this. Just look at your debugger I have one time trace back to parser code, and I am sure the problem is in the parser. Maybe tokenizer. all the U+xx00 characters have problem. Not sure about other characters.
Priority: P3 → P2
change priority to p2.
Status: NEW → ASSIGNED
Priority: P2 → P3
This is a legitimate bug, and is fixed with nsString2. As soon as that becomes the defacto string, this will go away.
QA Contact: 3847 → 4141
Target Milestone: M6
Assignee: rickg → ftang
Status: ASSIGNED → NEW
Handing this back to you to keep track of. See my earlier comments.
Target Milestone: M6 → M7
Rick said he will land nsString2 shortly after M6, so moving this to M7. Rick, if you provide QA with an nsString2 enabled binary, maybe they can see if this really fixes the problem.
Status: NEW → ASSIGNED
Whiteboard: DEPEND - Intl
Blocks: 7228
Assignee: ftang → rickg
Status: ASSIGNED → NEW
It is fixed in this case. Reassign it back to rickg but mark it fix
Status: NEW → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
verified
You need to log in before you can comment on or make changes to this bug.