Closed
Bug 253807
Opened 20 years ago
Closed 20 years ago
RSS preview and subjects are always UTF-8
Categories
(MailNews Core :: Feed Reader, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rowla, Assigned: mscott)
References
Details
(Keywords: fixed-aviary1.0)
Attachments
(7 files)
(deleted),
image/png
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
image/png
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
image/png
|
Details | |
(deleted),
image/png
|
Details |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040731 Firefox/0.9.1+
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040731 Firefox/0.9.1+
Characters in preview window and subjects are all "?" because of "Content-Type:
text/html; charset=UTF-8" header, even when the site has different charset such
as euc-jp.
Reproducible: Always
Steps to Reproduce:
1. Subscribe to RSS http://slashdot.jp/slashdotjp.rss
2. Update feed.
Actual Results:
Characters are all "?".
Expected Results:
Thunderbird should auto-detect charset used in the feed.
Assignee | ||
Comment 1•20 years ago
|
||
I don't know enough about what's going on here to be sure. But if I look at the
XML generated for:
http://slashdot.jp/slashdotjp.rss
I see the following right at the top:
<?xml version="1.0" encoding="utf-8"?>
specifying that the feed is utf-8. We honor that and treat the parsed out feeds
as being utf-8...
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: --- → Thunderbird0.8
Assignee | ||
Comment 2•20 years ago
|
||
another question for you. Not having a japanese system I need to ask.
Does the name of the folder in the folder pane for this feed look correct?
i.e. is the problem just the subject and the feed contents or does it also
include the name of the feed folder in the folder pane?
Reporter | ||
Comment 3•20 years ago
|
||
(In reply to comment #2)
> Does the name of the folder in the folder pane for this feed look correct?
>
> i.e. is the problem just the subject and the feed contents or does it also
> include the name of the feed folder in the folder pane?
Names of the Folders panel are correct. Problem is just the subject and the feed
contents.
Example rdf is in UTF-8 as you pointed out, that was a bad example. Although,
rdf is in UTF-8 but the site is in euc-jp. I should have mentioned this.
Reporter | ||
Comment 4•20 years ago
|
||
Found good example: http://rebecca.ac/milano/mt/index.rdf
This rdf file is in:
<?xml version="1.0" encoding="EUC-JP"?>
and this site is written in:
<meta http-equiv="Content-Type" content="text/html; charset=EUC-JP" />
but header of feed is "Contetn-Type: text/html; charset=UTF-8".
Assignee | ||
Comment 5•20 years ago
|
||
I don't have the ability to read Japanese fonts. Ayumi, here's a screen shot of
the message list pane with some changes I made to try to mime encode the
subject as EUC_JP. Can you tell me if this screen shot looks correct (with the
Subject) or is it impossible to tell since I don't have a japanese font
installed?
Assignee | ||
Comment 6•20 years ago
|
||
This patch does several things:
1) extracts the document charset from the XML document for the RSS feed.
2) mime encodes the subject in the forementioned character set before writing
the subject to the mail folder
3) Sets the charset parameter on the Content-Type header to match the
forementioned character set instead of hard coding a value of UTF-8.
What works with this patch:
1) Subjects valus for sites that use 8-bit ascii such as:
http://www.heise.de/newsticker/heise.rdf
http://photodb.kicker.de/library/rss091/kicker.xml
These sites now have the correct subject values show up in the thread pane and
the message pane.
2) I need a japanese user to confirm from the screen shot I posted that this
patch also fixes the subject values.
Remaining issues:
When the RSS article is just a iframe link to the website, the contents of the
iframe are not properly getting decoded. For both the ascii with accented
characters case and the JA case, I don't think the body is rendering correctly.
I don't know why yet. Maybe we need to explicitly set a charset on the iframe?
Reporter | ||
Comment 7•20 years ago
|
||
(In reply to comment #5)
> Created an attachment (id=155038)
> screen shot of my attempted fix
>
> I don't have the ability to read Japanese fonts. Ayumi, here's a screen shot of
> the message list pane with some changes I made to try to mime encode the
> subject as EUC_JP. Can you tell me if this screen shot looks correct (with the
> Subject) or is it impossible to tell since I don't have a japanese font
> installed?
>
It is impossible to tell from the screenshot. May be because of a Japanese font
you don't have.
I'll test build Thunderbird with your patch.
Assignee | ||
Comment 8•20 years ago
|
||
Hmm fixing the iframe problem is going to be really hard. In fact right now I
don't know how to fix it.
The message pane is always showing UTF-8 for the gecko instance running inside
of it. when we display a message we check the charset attribute and convert the
entire message body from the specified charset to UTF-8 before rendering the
text. This works great except for RSS.
For many feed articles the message body is just an iframe link:
<iframe src="some feed article url"></iframe>
We convert that from the appropriate charset to UTF-8 which gives us back:
<iframe src="some feed article url"></iframe>
and layout renders the contents of the iframe which is treated as UTF-8 because
we've told layout our message has been converted to utf-8.
I don't know how we can convert data we don't actually have to utf-8 before
layout renders it.
Assignee | ||
Comment 9•20 years ago
|
||
About the original feed that started this discussion:
http://slashdot.jp/slashdotjp.rss
I'm not sure what to do here for the subjects. The RSS document says its a
UTF-8 document so we try to convert the values for the title field to UTF-8.
However that fails because the titles are really in EUC-ja. Seems like a problem
with the feed not really being encoded in the charset it advertises...
Reporter | ||
Comment 10•20 years ago
|
||
Reporter | ||
Comment 11•20 years ago
|
||
> 2) I need a japanese user to confirm from the screen shot I posted that this
> patch also fixes the subject values.
My test build is finished. However, I'm still seeing subject and preview in "?".
While I was comareing both patched and not-patched version, I've noticed one
part where Japanese letters are rendered correct in both cases (place where I
marked on screenshot).
Assignee | ||
Comment 12•20 years ago
|
||
I've gone ahead and checked my patch into the aviary 1.0 branch. This addresses
the subject issues for 8-bit ascii characters. Now we need to figure out what's
going on with more complicated charsets like EUC-ja for the subject. I suspect
I'm never getting the correct characters from the javascript when I try to mime
encode the header. I see lots of weird javascript string assertions about data
being lost before the mime encoding code ever gets called.
Assignee | ||
Comment 13•20 years ago
|
||
RSS articles that are iframes whose src attribute points back to the website
article aren't getting the correct document charset conversion happening on the
iframe contents. This was occurring because mailnews forces a charset value of
UTF-8 on the message pane because libmime converts message bodies from their
native charsets to UTF-8 before giving the data to gecko.
This fix attempt to get around that problem by setting a default character set
on the message pane dochsell instead of a force charset. By using the default
charset method, the nested iframe is no longer forced to use the UTF-8 charset
of the outer frame (the message pane).
This patch fixes the RSS problem. However, I'm quite concerned it may break
display of I18n message bodies for regular mail and news articles. My limited
tests showed that it didn't break rendering of mail messages but I need more
testing help. Ayumi, up for another patch to test?
Assignee | ||
Comment 14•20 years ago
|
||
Oh in case I wasn't clear, the 2004-08-03 patch addresses the RSS body ONLY. It
doesn't address japanese characters in the subject fields still looking incorrect...
Comment 15•20 years ago
|
||
Note that three out of the four feeds mentioned above have problems with their
character encoding - you can check it with http://feedvalidator.org/.
The reason that they may be displayed as intented anyway is probably due to bug
247024.
Reporter | ||
Comment 16•20 years ago
|
||
(In reply to comment #14)
> Oh in case I wasn't clear, the 2004-08-03 patch addresses the RSS body ONLY. It
> doesn't address japanese characters in the subject fields still looking
incorrect...
I've just finished building Thunderbird with your patch, and it is working.
Assignee | ||
Comment 17•20 years ago
|
||
(In reply to comment #16)
> incorrect...
>
> I've just finished building Thunderbird with your patch, and it is working.
Did I read this right. Do you mean with both the RSS body patch and the patch to
mime encode the subject, you are now seeing the subjects look correct and the body?
As in this bug is fixed with those two patches? :) Woo Hoo!
Reporter | ||
Comment 18•20 years ago
|
||
> As in this bug is fixed with those two patches? :) Woo Hoo!
No, no. As you mentioned, the subject line is still "?". I meant the preview is
working.
Assignee | ||
Comment 19•20 years ago
|
||
*** Bug 254424 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 20•20 years ago
|
||
I think this patch gets us a lot closer and it fixes a nasty regression
Attachment #155147 [details] [diff] introduced.
Ayumi, can you try this patch and see if it works for the subject?
Note: You must re-download the headers with a build that has this latest patch
to test if it really fixed things. If the feed article was already downloaded
using a bad build then the header will still look wrong. So make sure you
delete the RSS feed then add it again...
Assignee | ||
Comment 21•20 years ago
|
||
I just checked the 08/05 patch into the branch to get more testing. The
FeedItem.js change that converts the JS unicode string back to a char * in the
original charset is something we definetly want even if the problem isn't fully
fixed. This gets rid of some really nasty xpconnect errors I was seeing when
xpconnect tried to pass the unicode string into nsIMsgLocalFolder::AddMessage as
a char * without converting the string properly.
Reporter | ||
Comment 22•20 years ago
|
||
The patch is working in 'some' cases.
o http://slashdot.jp/slashdotjp.rss -- working
o http://rebecca.ac/milano/mt/index.rdf -- not wroking
Comment 23•20 years ago
|
||
Mozilla Thunderbird 0.7+ (Windows/20040807)
Most pages are working for me, but some page is not.
Case of following sites, both message/preview pane are broken.
http://blog.bulknews.net/mt/index.rdf
http://naoya.dyndns.org/~naoya/mt/index.rdf
Comment 24•20 years ago
|
||
Another RSS that doesn't work as expected:
http://www.pc-magazin.de/rss/all
IMHO Thunderbird should use the default charset for mails if no charset is
specified in the rss feed.
Assignee | ||
Comment 25•20 years ago
|
||
Moving the remaining work for this bug to 0.9. The initial patches for this bug
have fixed a lot of issues. Most remaining issues are with sites that list the
wrong encoding but there are still some sites that just look wrong even though
they do use the right encoding. I don't know why that is yet.
Target Milestone: Thunderbird0.8 → Thunderbird0.9
Comment 26•20 years ago
|
||
> Does the name of the folder in the folder pane for this feed look correct?
>
> i.e. is the problem just the subject and the feed contents or does it also
> include the name of the feed folder in the folder pane?
No, the filename is bad. Look at http://www.vaclavak.net/weblog/weblog.xml which
is in ISO-8859-2 encoding. Problem is not only with subjets, but also with the
name of the feed (and the name of the folder and the name of the filename).
It creates strange filename "Weblog Våclavåk" and "Weblog Våclavåk.msf", but
the "Weblog Våclavåk" is empty and the msgbox is stored in the file "f85ec1e9"
and "f85ec1e9.msf".
As the result there is created second "Weblog Våclavåk" folder in the TB. And
there are problems, when you drag and drop this folder (dropped folder lost is
contents).
For me seems, that TB is creating bad filename "Weblog Våclavåk" and sometimes
write data to the "f85ec1e9" and sometimes is trying to read from the "Weblog
Våclavåk" (which is empty).
Screenshot are comming...
Comment 27•20 years ago
|
||
Added screenshot
Comment 28•20 years ago
|
||
Added screenshot
Comment 29•20 years ago
|
||
xref: bug #264071
I see now some things from my comment #26
>and sometimes write data to the "f85ec1e9"
were reported in the bug #264071
Assignee | ||
Comment 30•20 years ago
|
||
I don't currently have any pending issues here for 0.9. But these fixes do need
migrated to the trunk. leaving open for that.
Target Milestone: Thunderbird0.9 → Thunderbird1.1
Comment 31•20 years ago
|
||
adding fixed-aviary1.0 per comment 30 (also to help in our queries).
Keywords: fixed-aviary1.0
Assignee | ||
Comment 32•20 years ago
|
||
checked into the trunk too.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
Target Milestone: Thunderbird1.1 → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•