Closed Bug 72299 Opened 24 years ago Closed 24 years ago

User could not override charset from http server

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: amyy, Assigned: shanjian)

References

()

Details

(Keywords: intl)

Attachments

(8 files)

On 03-16 Mtrunk build. No matter how you turn off or on(all, Japanese, Chinese...etc) the auto-detect, and set encoding anything other than EUC-JP (the correct encoding of this page), e.g. western8859-1, Shift-jis, Big5...etc, the page still display Japanese fine. There is a similar bug 72074, but I don't know if they are same problem though.
QA Contact: andreasb → ylong
Reassign to shanjian.
Assignee: nhotta → shanjian
Keywords: intl
www.yahoo.co.jp now provide charset information through http header in content-type field. That will explain part of the problem. The observed behavior does reveal one problem, that is the decument charset could not be override.
Status: NEW → ASSIGNED
Attached patch patch part1, for html document (deleted) — Splinter Review
Attached patch patch part 2, to xml document (deleted) — Splinter Review
There is several changes need to be explained: 1, The layout of "{" and "}" in original file was not right. If you search the matched pair of {} in "(kCharsetFromUserDefault > charsetSource)" section, you will see what I mean. That cause this problem. 2, "kCharsetFromPreviousLoading" is ambiguous, "kCharsetFromUserForced" is more clear. However, "kCharsetFromUserForced" is not used and "kCharsetFromPreviousLoading" is only referenced here. So I replaced "kCharsetFromPreviousLoading" with "kCharsetFromUserForced". 3, File nsXMLDocument.cpp (Patch part 2) might looks very confusing. The original function is almost unreadable. I relayout the whole function while keep the semantic. This file also contains mismatched "{" and "}".
change summary to better state the problem.
Summary: Auto-Detect doesn't work well with this site. → User could not override charset from http server
*** Bug 53140 has been marked as a duplicate of this bug. ***
Please fix for M0.9.1 and assigned priority P3.
Keywords: nsbeta1
I am looking at the first patch. So, he current code the following conditions are applied to more lines of code. if(kCharsetFromUserDefault > charsetSource) My question is that, why only overriding the server charset was not working?
Current code author's original intention is to considering each charset source one by one. But because a mismatched "{" and "}", the override charset source is skipped in certain condition.
Yes, I understand that, but my question is how that is related to this bug and what other cases might be affected by the change.
I do not quite understand your question. How is this related with this bug? Because the override charset is skipped, so when http server provides charset, user could not override it. What other case might be affected? Hard to say. If it also changed other behavior, the bahavior that got changed might be very likely to be a bug.
Let me ask a more specific question. Why pnly http header had a problem and meta was okay? Both kCharsetFromHTTPHeader and kCharsetFromMetaTag are larger than kCharsetFromUserDefault. 85 typedef enum { 86 kCharsetUninitialized = 0, 87 kCharsetFromWeakDocTypeDefault, 88 kCharsetFromUserDefault , 89 kCharsetFromDocTypeDefault, 90 kCharsetFromCache, 91 kCharsetFromParentFrame, 92 kCharsetFromBookmarks, 93 kCharsetFromAutoDetection, 94 kCharsetFromMetaTag, 95 kCharsetFromByteOrderMark, 96 kCharsetFromHTTPHeader, 97 kCharsetFromUserForced, 98 kCharsetFromOtherComponent, 99 kCharsetFromPreviousLoading 100 } nsCharsetSource;
>Why pnly http Why only http
If you take a look at the orginal nsHTMLDocument.cpp befor my patch, in MSVC editor search for the ending "}" for line 664's "{", you will see that it is below all charset comparisons. Before this line, the only possible charset assignment is from http server. That's why if charset is available in http, we will see the problem.
r=nhotta
Chris, can you sr this one? I will adjust the brace "{" to end of last line before check in.
kewl . . . let's fix this one.
Attach readable diffs (i.e. cvs diff -u) and I will sr this bug.
Actually, there are 2 very popular sites have same problem: http://www.mainichi.co.jp http://www.singtao.co.jp
Attached patch more readable patch, p1 (deleted) — Splinter Review
Attached patch more readable patch, p2 (deleted) — Splinter Review
Johny, I update the patch using -u format. Even with that one, the second patch might still looks too complicated.
Huh, those new patches are also cvs diff -c patches, and they're hard to read, but I read the first one and I think I got it. However, I don't approve of this change as is, the code changes seem ok, but the readability of the code where those changes are made in nsHTMLDocument.cpp I just can't accept, clean it up! Get rid of tabs, be consistent with the indentation, use the same brace style as the rest of the file, i.e.: if (...) { ... } and not: if (...) { ... } or: if (...) { ... } and so on. Emacs could do that cleanup for you almost automatically... Do the cleanup, and make sure you really do get diff -u's next time (i.e. the lines in the diff should be marked with + and -, no changed lines should be marked with !). And once you're at it, clean up the same code in nsXMLDocument.cpp.
Attached patch updated part1 (deleted) — Splinter Review
Attached patch updated part 2 (deleted) — Splinter Review
Attached patch repost part2, -u version (deleted) — Splinter Review
Attached patch repost part2, -ub version (deleted) — Splinter Review
Johny: I mistyped -c for -u when I generating the second batch of patch. Sorry. For XML document, my original reformated file is missing. So I did one more time of reformatting. This time, more extensively that last time. The last 2 part is the -u and -ub version for part2.
sr=jst
Still tabs and random indentation: + if(NS_SUCCEEDED(rv) && (nsnull != calias) ) { + nsAutoString preferred; + rv = calias->GetPreferred(theCharset, preferred); + if(NS_SUCCEEDED(rv)) { + charset = preferred; + charsetSource = kCharsetFromHTTPHeader; + } + nsServiceManager::ReleaseService(kCharsetAliasCID, calias); What's the deal? BTW, use -w over -b if you want to ignore all whitespace differences. /be
This is not noticable when I set my tab width to 2. Before I checked in, I set tab to 8 and correct all the inconsistence. Now it should be fine. There should be no tab in that function. fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Checked it on 05-15 trunk build: http://www.yahoo.co.jp - you can get charset overrided by menu changing. However, as I montionted on 05-11, http://www.mainichi.co.jp and http://www.singtao.com still see this problem. I'll re-open it, please change the status if it was cause by something else.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The 2 websites you listed have meta tag. So the charset information if provided by meta tag instead of http server. Please submit new bugs against that. Please also note that even thought the display is correct, charset mark in charset menu is incorrect.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
OK, I'll mark it as verified. Open bug 81244 for the other issue.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: