Closed
Bug 162765
Opened 22 years ago
Closed 21 years ago
RFC2231 filename support for nsExternalAppHandler
Categories
(Core Graveyard :: File Handling, defect, P3)
Core Graveyard
File Handling
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla1.5alpha
People
(Reporter: nhottanscp, Assigned: jshin1987)
References
()
Details
(Keywords: intl)
Attachments
(4 files, 11 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Biesinger
:
review+
alecf
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Biesinger
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Bienvenu
:
superreview+
|
Details | Diff | Splinter Review |
http://www.ietf.org/rfc/rfc2231.txt
separated from bug 161174, a filename in HTTP header may be encoded in RFC2231
form, currently Mozilla do not support that for file download.
Assignee | ||
Comment 1•22 years ago
|
||
I filed bug 155949 and added pretty extensive information
on the issue. IMHO, it's better to make this one as a duplicate
of bug 155949. The other way around is all right with me,
but in that case what I wrote there would have to be copied here,
which seems to me to be sort of waste of time and resource.
Comment 2•22 years ago
|
||
*** Bug 155949 has been marked as a duplicate of this bug. ***
Updated•22 years ago
|
OS: Windows 2000 → All
Priority: -- → P3
Hardware: PC → All
Target Milestone: --- → mozilla1.3alpha
Assignee | ||
Comment 3•22 years ago
|
||
With this patch, the last 5 cases at http://jshin.net/moztest/download.html
work as expected/intended. If RFC 2231 compliant C-D http header
is present, charset is read off and the parameter value is
decoded accordingly. When raw 8bit characters are used
without charset info. in C-D header, UTF-8 is tried first and
if that does not work, the document charset (originCharset) is
used instead.
A couple of issues:
EnsureUTF8Filename() was copied almost verbatim (aside
from URL encoding/decoding) from
nhotta's EnsureUTF8Spec() in nsSmtpService.cpp.
It might be a good idea to have EnsureUTF8String()
somewhere in intl/uconv to avoid cut'n'paste of
the code.
We may want to have 'IsUTF8()' that works simliarly
to 'IsASCII()'. (I saw in one of bugs related to this
that 'IsUTF8()' was briefly discussed, but forgot which)
To me converting to UCS2 and then back
to UTF-8 and comparing the result with the original
string does not seem to be a very good idea.
And, there may be other problems as well. It'd be
nice to hear feedback, positive or negative.
Assignee | ||
Comment 4•22 years ago
|
||
One more issue which I think has to be dealt with
separately from this one is what to do when
the local file system encoding cannot represent
characters in a suggested filename. Users can for sure
override, but my test result of the page mentioned
previously can be confusing to users.
After verifying the patch works fine under ko_KR.UTF-8
locale, I ran Mozilla under C locale(Linux) and obviously
Korean characters in the URL aforementioned
cannot be represented in C locale. Without
any other option, Mozilla assumes that
the local file system encoding is the encoding
of the locale. However, the filepicker dialog
shows a Korean filename as if Mozilla can
use it. When I press 'OK' button,
Mozilla falls back to 'something' and
the download manager uses that fallback.
A more gracious and user-friendly approach
is necessary here.
Comment 5•22 years ago
|
||
One thing that sorta leaps out at me there is that this makes uriloader depend
on a mailnews module... won't this break when we build with --disable-mailnews?
Assignee | ||
Comment 6•22 years ago
|
||
> this makes uriloader depend
> on a mailnews module... won't this break when we build with --disable-mailnews?
I never thought about it. I guess that's likely.(I wish we had
a generic mime module not tied to mailnews.) Then, I looked
at Makefile.in in mailnews/mime/build and it does not depend on mailnews.
The dependency is the other way around (mailnews/base/build).
Now the question is what exactly --disable-mailnews disables....
If indeed that dependency
breaks a build with mailnews diabled, what would you suggest? Perhaps,
we just have to go back to nhotta's original code and build upon
it to support RFC 2231. Or just copy'n'past two functions in mimehdrs.cpp
into nsExternalHelperAppService.cpp although that may be pretty ugly(?).
Before pondering further, let's get the definitive answer on the dependency
issue. well, I can clobber and build
with --disable-mailnews. I'm gonna ask the owner of mime module..
Comment 7•22 years ago
|
||
The simplest solution would likely be to move this stuff into uconv or necko and
have both mailnews and uriloader is it from there....
Assignee | ||
Comment 8•22 years ago
|
||
A problem with moving this stuff out of mime module into uconv or necko
is that 'MimeHeaders_get_parameter()' is invoked 50+ times in mime module.
That, by itself, may not be a big problem (I believe none of them
is used in a context critical to the performance) except that it's tedious
to change them all. Another function, mime_decode_filename() refered
to in GetMimeHeaderParameter() is invoked only 7times, but dividing
it into a part not depending on mime module and the other dependent on
it is a bit ugly although doable. If we wanna go down that path,
I guess we definitely need to talk to the owner of mime module,
ducarroz@netscape.com. I'm now adding ducarroz to cc.
Updated•22 years ago
|
QA Contact: sairuh → petersen
Assignee | ||
Comment 9•22 years ago
|
||
I took another look at this bug and at first it looked like
I could just move three functions out of mimehdrs.cpp
and put them in uconv or necko. It turned out that
the extent to which things depend on each ohter
makes it hard to separate cleanly what I need from
the rest of mailnews/mime.
Now I'm considering moving the whole mailnews/mime
out of mailnews and putting it under necko or somewhere else.
That may have been what Boris meant in comment #7.
Anyway, ducarroz and Boris, how does it sound?
This is a big change (and the cvs admin. wouldn't
happy for havig to do extensive operation to keep 'history' intact
after moving so many files....).
Comment 10•22 years ago
|
||
That's sort of what I meant, yes. I'm not sure what the mailnews team's take on
that is.... ccing darin also, since he should certainly know about changes like
this. :)
If we can't reuse the mailnews code, I would opt for a helper function somewhere
(sort of like the mimetype-parsing function we have) that would take a string
and parse out the disposition and filename.... that would allow us to save some
already-duplicated code in uriloader and exthandler.
Assignee | ||
Comment 11•22 years ago
|
||
I tried compiling in mailnews/mime even when mailnews is disabled
(by tweaking a couple of Makefile.in and allmakefiles.sh. this way,
I thought we'd be able to avoid 'cvs surgery'), but
found that 'mime' has a dependency on the rest of mailnews
(there's only one dependency, but it's rather hard to remove it).
So, that doesn't work.
Instead, I'm going for the second option while trying to resue mailnews/mime
code whenever possible by making a new directory next to mailnews/mime and
putting there what we need _outside_ mailnews. What's in
there will be compiled regardless of whether mailnews is enabled or not.
Assignee | ||
Comment 12•22 years ago
|
||
I moved what's necessary for this out of mailnews/mime
and put them in a new mailnews/mimehdrparm (name and
location are tentative). Because some of them are utility functions
used by the rest of mailnews, I took some care to make them
available to mailnews 'directly' (not via xpcom). Perhaps, this
is not a good idea and I should expose them only thru xpcom.
I also have to make a couple of methods scriptable so that
Javascript can call them. I found that 'save target as'
takes a separate code path from 'Save' and is dealt with with JS.
Besides, it might be better to make 'service' do more work
than callers.
I haven't yet implemented getting the MIME charset/encoding
corresp. to the current locale and use that as the last resort
when MIME charset info. is not otherwise available/inferred.
There's no XP-way available at the moment...
Anyway, please, take a look and make some comments.
There are a lot of 'quirkiness' in a sense ...
Attachment #96925 -
Attachment is obsolete: true
Assignee | ||
Comment 13•22 years ago
|
||
A collection of new files in mimehdrparam module.
I'm not sure of the best way to build a lib for this
module. Currently, both dll/shared lib as a component
and static lib. are built. When I only built
static lib., do_GetService() somehow failed
(perhaps I should have rebuilt from the top of the
tree in that case). On the other hand, I couldn't
resolve symbols for mailnews (when it's enabled)
when mimehdrparam was built as dll only. I found
some old news postings about this in xpcom newsgroup.
Any advice/comment on this and other aspects would be appreciated a lot.
Assignee | ||
Comment 14•22 years ago
|
||
http://jshin.net/moztest/download.html (I don't have privilige to put
this in the URL field. Boris, can I take this bug?)
has some testing cases (the last section).
I'm going to revise the page so that it may not work sometimes.
BTW, i18n of 'save target as' is bug 158006. Once I make a couple
of methods exposed to JS, it should be easy to fix it.
Comment 15•22 years ago
|
||
oh. I didn't realize you did not have the privs to take the bug. Sorry about
that. Reassigning. Feel free to mark any of those as needing review from me
once you're somewhat happy with them. ;)
Assignee: bzbarsky → jshin
Priority: P3 → --
Target Milestone: mozilla1.3alpha → ---
Assignee | ||
Comment 16•22 years ago
|
||
Thank you, Boris for assigning this to me.
Anyway, before making some 'feature enhancements'
and structural changes I talked about, I need to ask a
few questions about the way things are laid out and
built.
What my patch as of now does are :
A. make a new nsIMimeHdrParam service with a single method ( for now)
to retrieve rfc2231/rfc2047-decoded values of mime header parameters
(in my first patch, the method was added to nsIMimeConverter service,
but it introduced mailnews dependency which needs to be avoided).
This module may have to be expanded later if it turns out that
Mozilla needs to generate (as opposed to parse) multipart/form-data
(with RFC 2231 style C-D header) for form submission
B. moves a dozen of functions out of libmime (comi18n.cpp and mimehdrs.cpp)
and put them into a new 'module' mimehdrparam. Tentatively, I put
it under mailnews, but it can go anywhere in the source tree.
An upside of putting it there is that its 'relation' with mailnews
is not entirely severed. On the other hand, it may be considered
'hackish' to compile in 'mailnews/mimehdrparam' without turning
on 'mailnews' when 'mailnews' is disabled. (mailnews/mimehdparam is
added as a 'tire-9' module so that it's always built).
C. most functions separated are not used in the rest of mailnews but
there are three or four of them required by mailnews. To make them
available to mailnews across 'module boundaries', I think there are
two options. This is an area where I need your opinion as well as
that of ducarroz and darin.
One is to add them as 'methods' to new nsIMimeHdrParam
and the other is to just expose them as 'extern' and link
mailnews (when enabled) with mimehdrparam lib(static). I took the
latter path. The latter, in turn, seems to be approachable
in two ways(or more ways ...). The first is to separate
a part that implements method(s) for nsIMimeHdrParam
from the part that need to be used by both MimeHdrParam
and mailnews and build dll/so(component) for the former and
static lib. for the latter. The path my latest patch took
is to leave them together, but to build both dll/so(component)
and a static lib. for the whole thing. I set both
IS_COMPONENT and BUILD_STATIC_LIBS in Makefile.in of
src directory of mimehdrparam) and it works fine
under Win2k(I'm gonna try it under Linux soon)
Before that, I also
tried FORCE_STATIC_LIBS without IS_COMPONENT but
it didn't work. (compilation went thru, but
|do_GetService| failed for nsIMimeHdrParam) I read quite a lot of news postings
and comments in bug 46775 about 'static components
for xpcom', but am not yet sure
which is the right/optimal way which wouldn't break build on all
platforms Mozilla is ported to. To make things more complicated,
libmime is a part of metamodule 'mail'.
As for item A, I guess I have a strong case. Now for item B
and C, I like to know what the module owner (ducarroz) as well as
Boris and darin thinks because they're about taking away something from mailnews.
I'm also taking a liberty to add waterson (hope he won't mind :-)) to
CC for 'xpcom build' issue.
Thanks again for reading and help in advance..
Priority: -- → P3
Target Milestone: --- → mozilla1.3alpha
Comment 17•22 years ago
|
||
ccing alecf too; since we want to try to avoid creating a whole new module...
Not sure whether mailnews would be ok with moving this into necko or something
like that....
Assignee | ||
Comment 18•22 years ago
|
||
Like Boris wrote, I'd like to hear from mailnews/xpcom experts as to how to
divide
things and where to put them.
In this patch, consumers of GetMimeHeaderParameter are relieved of
making sure the result is indeed in UTF-8. The chore is now taken care
by the method. I made it(actually there are two of them because
I'm not sure which string types are allowed in scriptable
methods. I found that nsAString works well instead of
wstring. Can I use nsACString as well in place of 'string') scriptable
and it works with Javascript. Javascript part fixes I18N issue
of 'save target as'(not completely but RFC 2231 compliant
cases are well taken care of) and C++ part fixes 'save'. Raw 8bit
chars are interpreted as UTF-8 at first and then as 'originCharset'
of the source URL if available. For textual types, I may also
try 'contentCharset' of the current channel(as opposed to
originCharset). . It appears easy
to do this in C++, but in JS, I need to dig up more.
However, checking contentCharset may not be such a good idea
because so many servers suck and don't emit the correct
MIME charset in HTTP header for textual types(text/*).
Featurewise ('save target as' was filed as a separate bug. it's bug 158006),
it does what I want it to do. However, I'm not sure how to put things
together or take them apart (?).
Attachment #106968 -
Attachment is obsolete: true
Attachment #106972 -
Attachment is obsolete: true
Comment 19•22 years ago
|
||
Jungshik, I am busy right now but I'll take a close look soon.
Comment 20•22 years ago
|
||
I haven/t reviewed the whole patch (I am not the right person for things else
than mailnews by the way) but mimehdrparam should not be part of mailnews. we
don't want the browser to have a dependency on mailnews! The goal of extracting
this code from mime was to not have a dependency on mime (which is part of
mailnews). This code should live in netwerk: maybe in netwerk/mime or in
netwerk/mimehdrparam
ALso, looks like you have extracted too much stuff especially stuff used only
for emails.
cc'ing nhotta for his advice about international issues
Comment 21•22 years ago
|
||
use netwerk/mime/ instead of creating a new subdirectory under netwerk/. thx!
Assignee | ||
Comment 22•22 years ago
|
||
Ducarroz and Darin,
Thank you for your comments/answers and I'm sorry for the late response.
>looks like you have extracted too much stuff especially stuff used only for emails.
Actually, I tried to extract the only part of mailnews that I need.
The only thing I think I extracted unnecessarily is the
content of 'nsMailHeader.h' most of which are only
for emails as you wrote. I'll put it back.
A dozen or so functions I took away from mimehdrs.cpp and comi18n.cpp
are not used by the rest of mailnews at least as of now. There are, however,
3 or 4 functions that are needed by both 'mimehdrparam'
and mailnews. They're the cause of my headache because
I have to make them available across module boundaries
(see comment #16, item C).
>use netwerk/mime/ instead of creating a new subdirectory under netwerk/.
I'll try, but at the moment, it seems easier to make a new subdirectory under
netwerk if that's not the only way to take care of issues I wrote about
in comment #16 (item A and B were resolved by your and Ducarroz's responses,
but to have a satisfactory resolution of C(in comment #16), I need some help from
xpcom expert. Alecf, can you help?
Assignee | ||
Updated•22 years ago
|
Comment 23•22 years ago
|
||
sure. A few general comments:
- stuff should not only not depend on mailnews, but it shouldn't even be in the
mailnews/ subdirectory if you expect it to build when mailnews is disabled.
- who is the primary consumer of this interface, or what is the primary function
of this code that's being moved out?
Is it network related? then it belongs in necko.dll or necko2.dll.
Is it international/charset processing stuff? Then it belongs in i18n.dll.
Is it for dealing with helper apps? then it belongs in the uriloader/ directory.
Some guidelines:
No matter where it goes, the only way to make methods available across modules
in xpcom is via an interface. If your utilities do not maintain state, or you
only need to maintain some singular state across the process, then consumers
will need to create it as a service.
I know that creating an object just to make a stateless call seems like
overkill, but its the only way for us to do it.. and it is done across the rest
of the product in various places. Linking against a new DLL is really a last
resort, and trying to manually load a DLL, find a symbol, etc, has about as much
static footprint overhead as going through XPCOM and getting a service.
Furthermore, I would like to see as absolutely minimum code as possible moved
out of mailnews and made available to the general public! And if it isn't mail
related, it shouldn't even have mail in the names of the files, classes, or
contractIDs.
Let me know if I can be of more help - or catch me on IRC and we can discuss it.
Status: NEW → ASSIGNED
Comment 24•22 years ago
|
||
> Is it network related?
Yes.
> Is it international/charset processing stuff?
Yes.
The code in question is parsing code for general network headers that allows
non-ascii header values (handles parsing, conversion to Unicode, etc). It'd be
fine in either of those modules....
We could either make this a service (which would take the header CString every
time) or an object (takes string once, parses once, returns many results).
Which one we choose should just depend on the usage patterns.
Assignee | ||
Comment 25•22 years ago
|
||
Finally, I managed to do what's long overdue. I'm
sorry for the delay and thank you all for your help.
I just want to put this somewhere safer than my disk :-).
This patch does everything I planned for this bug
(bug 158006 needs a small patch to JS code relying
on this patch) but a lot of clean-up and documentation
have yet to be done. I'll do it soon(perhaps early next week).
In the meantime, here's what this patch does:
0. add a new interface nsIEnsureUTF8 to
intl/uconv.
This is to avoid the code duplication. It's based
on Naoki's implementation in mailnews/compose.
At a couple of places, I need to use this
and this may be useful at other places as well.
1. add a new interface nsIMIMEHeaderParam and
its implementation to netwerk/mime.
It offers 4 methods, but only one of them
|getHeaderParam2| is for general consumption.
Other three are implemented because they're
not only needed internally but also used
heavily in mailnews/mime. They're implemented
by moving corresponding functions in
mailnews/mime/src (see below) and applying
a little cosmetic(??) touch and impedance
matching.
2. I redefined three functions in mailnews/mime/src
(in comi18n.cpp and mimehdrs.cpp) as wrappers
for three methods mentioned above.
3. Several static functions and arrays in comi18n.cpp
and mimehdrs.cpp were moved to nsMIMEHeaerParam
implementation. They're used by three methods
above, but are NOT called by any other functions
in mailnews so that it's safe to move them away
from mailnews.
4. At a couple of places in uriloader, |GetHeaderParam2|
is called to deal with RFC 2231 and RFC 2047 encoded
C-D filename/name parameter.
Please, take a look and give me some feedback.
There would be a lot of comments. I don't think
it's pretty as it is now. In many places,
I was caught in the middle between string/wstring
and ns*String especially because three methods
moved from mailnews/mime and other static functions
are almost entirely based on string and I didn't
want to modify them extensively (at least in the
first pass)
Attachment #107217 -
Attachment is obsolete: true
Comment 26•22 years ago
|
||
From a totally cursory look:
1) nsIEnsureUTF8.idl should not contain C++ goop.
2) It should not have tabs.
3) Why the [const] on the in params? What does that do?
4) What is the encoding of the return value of ensureUTF8Spec?
5) Could the nsEnsureUTF8 code use that proposed function to check for
UTF-8ness? Converting back and forth to UTF-16 seems silly....
6) nsUConvModule.cpp seems to have some leakage from a different patch
7) nsIMIMEHeaderParam.idl should have comments in JavaDoc format (starting with
/**) and all methods should be commented, as well as all params.
Indentation should be fixed.
I didn't read the code guts at all, pretty much; just the interfaces.
Comment 27•22 years ago
|
||
Comment on attachment 117077 [details] [diff] [review]
a new patch
NS_IENSUREUTF8_CONTRACTID
that is a bad name. the contract specifies an implementation, not an interface.
{249f52a3-2599-4b00-ba40-0481364831a2}
I don't think it's a good idea to give the interface the same id as the class.
+ AUTF8String convertToUTF8([const] in ACString aString,
"in nsACString" already makes it const (C++ header has const nsACString&)
+static NS_DEFINE_CID(kIEnsureUTF8IID, NS_IENSUREUTF8_IID);
+static NS_DEFINE_CID(kCharsetConverterManagerCID,
NS_ICHARSETCONVERTERMANAGER_CID);
please don't use CIDs, but contract ids instead.
+NS_IMETHODIMP nsEnsureUTF8::ConvertToUTF8
+(const nsACString &aString, const char *aCharset, PRBool aCheck,
+ nsACString &aUTF8String)
is there a special reason for putting the argument list onto a new line?
+ aUTF8String.Truncate(0);
remove the 0
+ aString.Equals(NS_ConvertUCS2toUTF8(NS_ConvertUTF8toUCS2(aString)))) {
do we _still_ not have a better way to do this?
+NS_IMETHODIMP nsEnsureUTF8::EnsureUTF8Spec
+(const nsACString &aSpec, const char *aCharset, nsACString &aUTF8Spec)
+ aUTF8Spec.Truncate(0);
again, no 0, and same question about argument list
+ AString getHeaderParameter2([const] in AUTF8String aHeaderVal,
why the 2? why the const?
+ /*
+ * char * mime_decode_filename(char *name, const char *charset,
+ * MimeDisplayOptions *opt)
looks like you should remove it.
Comment 28•22 years ago
|
||
Comment on attachment 117077 [details] [diff] [review]
a new patch
a few of my own comments:
- you need fresh license files from http://www.mozilla.org/MPL/boilerplate-1.1/
- don't use inout string - the memory management is really ugly. Instead use
inout AString - you'll find your code cleans up signifigantly. In fact, from
looking at the code, I can't tell if that really should be inout, or just out.
- when constructors and destructors have no body, you should inline them to let
the compiler optimize them out
- I'm really not a fan of all this code just copied and pasted in from mail..it
had previously been copied in from Netscape 4.x, and I'd really like to see it
streamlined and cleaned up.. there seems to be a lot of legacy stuff in there,
but I'm not sure.
Ugh, and all these "static char foo[256]" stuff - do we really need base64
decoding?
I guess before we just slap in all this code from mail, I want you (jshin) to
do a thorough code scrubbing to make sure you're only bringing in the
functionality you need, and that the code is factored properly (for instance,
do we already have a base64 decoder?)
Comment 29•22 years ago
|
||
>do we already have a base64 decoder
sure. NSPR has PL_Base64Decode.
http://lxr.mozilla.org/nspr/source/nsprpub/lib/libc/include/plbase64.h#68
Assignee | ||
Comment 30•22 years ago
|
||
Thank you all for comments.
bz> 2) It should not have tabs.
bz> 7) nsIMIMEHeaderParam.idl should have comments in JavaDoc format
Yup. these are all parts of clean-up/documentation I wrote about.
bz > 3) Why the [const] on the in params? What does that do?
cb> in nsACString" already makes it const
is there a special reason for putting the argument list onto a new line?
Thanks. I must have done too many copy'n'paste without thinking much.
cb> is there a special reason for putting the argument list onto a new line?
I thought I was just following the 'local' convention in the directory, but
I may have been mistaken because a second look a moment ago didn't show me
that pattern.
cb> + AString getHeaderParameter2([const] in AUTF8String aHeaderVal,
cb> why the 2?
Because there's another one with almost identical name that's only
used internally and by mailnews as a wrapper. Anyway, I'm not
so fond of various names used in my patch so that I'd listen
to you for better names.
bz> 4) What is the encoding of the return value of ensureUTF8Spec?
It's UTF-8. I'll make it clear.
bz> 5) Could the nsEnsureUTF8 code use that proposed function to check for
bz> UTF-8ness? Converting back and forth to UTF-16 seems silly....
cb> do we _still_ not have a better way to do this?
I fully agree. That's bug 191541 as you know :-) It's not there yet.
I love to use that. The reason I didn't rush to add that is
I could find only a few plaes in the entire tree that checks
UTF-8ness. With that implemented, probably more people will
begin to use it.
bz> 6) nsUConvModule.cpp seems to have some leakage from a different patch
I also noticed it and removed that in my tree.
al> do we really need base64 decoding?
Yes, we do for RFC 2047 decoding. Pls, note that the code is shared
by mailnews and necko. Even necko alone needs it because quite a lot of
broken http servers send out C-D filename param in RFC 2047 style
instead of RFC 2231 style.
cb> sure. NSPR has PL_Base64Decode.
Thanks. With this information,
I can remove the static array you hate along with |intl_mime_decode_b| :-)
al> I'm really not a fan of all this code just copied and pasted in from mail..
al> it had previously been copied in from Netscape 4.x, and I'd really like
al> to see it streamlined and cleaned up..
I agree with you. I'm not a fan of stuff I copied
and pasted with a minimal change for impedance matching and
filling up a couple of holes. I tried not to touch them
in this bug because I didn't want to introduce a regression
in mailnews.
al> there seems to be a lot of legacy stuff in there,
al> but I'm not sure.
Perhaps the coding style and things like that are from an
old C school. Feature-wise, I think most of them are still
necessary. However, it could well be safe
to drop 'HZ' check in |intl_is_utf8| (better, we can just use
IsUTF8 I wrote about above).
> I want you (jshin) to
> do a thorough code scrubbing to make sure you're only bringing in the
> functionality you need, and that the code is factored properly
I'll. I believe I haven't brought in anything unnecessary, but I'll
take another look. As for factoring, I'll remove
an static array for Q decoding(256byte saving) in my local copy. This code
doesn't belong where speed is critical so that I can replace array look-up
with a couple of bound checks.
Comment 31•22 years ago
|
||
> It's UTF-8. I'll make it clear.
Make it return a AUTF8String. That should make it plenty clear... ;)
Assignee | ||
Comment 32•22 years ago
|
||
>> It's UTF-8. I'll make it clear.
>Make it return a AUTF8String.
Ooops. It's not UTF-8 but it's US-ASCII. Because input urlSpec
is url-unescaped, converted to UTF-8 if necessary, and url-escaped
back before being returned. So I have to keep ACString.
> nsIEnsureUTF8.idl should not contain C++ goop
By this, I guess you meant removing CONTRACTID and
CID definitions for class implementing interface
and putting them in the header file
for a collection of CONTRACTIDs and CIDs for the
module the implementation belongs to (nsUConvModule in this case).
Do you want me to go ahead and make this header file and
move all #defines's for CONTRACTIDs and CIDs in intl/uconv/idl/*.idl
there? Hmm.. I'd rather not for this bug. Because then I have
to hunt for every single file refering to nsIxxxxx corresponding to
intl/uconv/idl/*idl and get them to include nsUConvCID.h(yet to
be created) as well.
In case of nsIMIMEHeaderParam.idl, I can just remove
C++ defines and put them in preexisting nsNetCID.h.
BTW, why do so many idl files have these defines? I'm aware
that no matter how many files may have this glitch, it doesn't
make the practice right, but am just curious.
Or, did you mean a totally different thing?
exist
Comment 33•22 years ago
|
||
Yeah, I meant the CID/Contractid. There's no particular need for a #define for
the contractid anyway; people should just use the string, imo. But yeah, you
could create such a file for uconv; no need to do major cleanup, just start with
putting these in there. ;)
As for why they're in IDL, someone who didn't have a clear understanding of the
difference between an interface and a contract decided it'd save time to do it
and the rest was history.
Assignee | ||
Comment 34•22 years ago
|
||
Thank you. I'll do it. BTW, I've just found what I'd been
looking for for a while (methods like ReplaceChar,
StripWhiteSpace..) that would simplify things a bit in my patch.
Why haven't they been advertised? Is it because new implementations
for them haven't yet been delivered?
Another question.. Is the regular expression engine used by Javascript
interpreter available in a form or another to other parts of mozilla?
Comment 35•22 years ago
|
||
Those functions tend to have somewhat slow implementations, so they're
discouraged except when they're clearly the best option... But if you're in code
where perf is not critical, go for it.
As for regexps... bug 106590
Comment 36•22 years ago
|
||
> Because there's another one with almost identical name that's only
>used internally and by mailnews as a wrapper.
To me, that sounds like a bad reason...
If it's only used internally, I don't think it should be advertised in an idl
file... and can mailnews directly use the other function?
Assignee | ||
Comment 37•22 years ago
|
||
> I don't think it should be advertised in an idl
> file... and can mailnews directly use the other function?
No. That's what I tried to do in my patch last year. Actually, I think
it's possible with some 'liking gymanstics', but Alec replied in comment #23
to my query about that possibility (comment #22 and comment #16 C)
that using an interface is the only (reasonable) way to make methods available
across modules although it sounds like an overkill.
Anyway, I'll try to come up with better names for both.
Assignee | ||
Comment 38•22 years ago
|
||
Many of issues raised about the prev. patch were addressed.
Among them are
- interface documentation
- interface definitions tightened up/cleaned up
- got rid of b-decoder and a related static array
(also gone are a static array for q-decoder
and other dispensable static functions using
utility functions I found in string library)
- C++ 'inflitration' in idl files
- functions/methods were renamed to make them
more in line with the local convention
- empty ctor/dtors inlined
- now uses |IsUTF8| (patch for bug 193142 is required)
.....
I didn't touch most of C-style string manipulation in code
brought in from mailnews (they've been changed and a couple
of loose ends were tightened). Alec, do you want me to
recast them in iterator and things like that? It might be nice,
but do we need to? You know, RFC 2047/2231 handling is
kinda 'black magic' (well, to exaggerate a bit) and I'm
reluctant to touch them in _this_ bug.
Anyway, please take a look and hunt for glitches, holes and ... :-)
Attachment #117077 -
Attachment is obsolete: true
Assignee | ||
Comment 39•22 years ago
|
||
Ooops. It's bug 191541 that deals with |IsUTF8|
Comment 40•22 years ago
|
||
Comment on attachment 117822 [details] [diff] [review]
a new patch
+#if 0
please don't add #if 0 without a comment saying why the block is not just
removed...
+ // assume UTF-8 if the spec contains unescaped non ASCII
+ if (!IsASCII(aSpec)) {
should you use IsUTF8 here?
In +nsMIMEHeaderParamImpl::GetParameter(const nsACString& aHeaderVal, :
+ char *med;
...
+ rv = GetParameterInt(PromiseFlatCString(aHeaderVal).get(), aParamName,
+ &charset, aLang, &med);
...
+ rv = DecodeParameter(nsCAutoString(med), charset, "", PR_FALSE, str1);
Can you use nsXPIDLCString instead of char* for |med|? (also, even if not, you
should use nsDependentCString instead of nsCAutoString in the DecodeParameter
line)
this would probably also be a good idea for charset
+nsMIMEHeaderParamImpl::GetParameterInt(const char *aHeaderValue,
is there a reason why you tend to declare variables some lines before you use
them?
+ NS_ASSERTION(!nsCRT::IsAsciiSpace(*str), "1.1 <rhp@netscape.com> 19 Mar
1999 12:00"); /* should be after whitespace already */
er, that assertion message looks very unhelpful. it seems the comment should be
the assertion message.
Comment 41•22 years ago
|
||
heh, i recognize that mailnews. the NS_ASSERTION ... rhp is because the code
used to PR_Assert and i got sick of my browser being unceremoneously killed by
silly asserts that i didn't understand. someone who understands the code is
welcome to replace the messages with a useful explanation, the messages are
there as they are because the change to NS_ASSERTION would make it relatively
hard for lazy people to find out who to blame and i didn't want people
complaining to me about them :).
Assignee | ||
Comment 42•22 years ago
|
||
Thank you for reviewing.
+ // assume UTF-8 if the spec contains unescaped non ASCII
+ if (!IsASCII(aSpec)) {
> should you use IsUTF8 here?
No. The question to ask is whether or not to keep that if-block.
By removing that line, we can guard ourselves against mistakes of
calling EnsureUTF8Spec with unescaped (bare/raw) non-UTF-8 spec.
As it is now, if a bare/raw Non-Ascii spec is passed, it just returns
the input without any change. I moved that part of the code from
what was originially written by Naoki.
Naoki, what would you say? The more I think about it, the more I feel
like either removing it (so that the input gets subjected to
checking/ensuring by the rest of the code) or at least calling
NS_EscapeURL so that it returns escaped URL.
As for #if 0, it was just left there because I wasn't sure whether this
bug would go before/after bug 191541. Needless to say, I'll either
remove #if 0 block or activate it(and remove the |IsUTF8| depending on
which patch gets committed first. Hopefully, bug 191541 goes before this one.
In a patch I'm going to upload in a moment,
I cleaned up the code a bit more : made NS_ASSERTION more informative,
used XPIDLString(along with replacing PR_Malloc with nsMemory::Alloc/Clone
and removing PR_FREEIF a few places) , added more comments to RFC 2231 decoding
routine,
moved some variable definitions closer to where they're used, changed
some variable names to better reflect their usage, etc.
Keywords: intl
Target Milestone: mozilla1.3alpha → mozilla1.4alpha
Assignee | ||
Comment 43•22 years ago
|
||
Attachment #117822 -
Attachment is obsolete: true
Assignee | ||
Comment 44•22 years ago
|
||
+#define REPLACEMENT_CHAR "\347\277\275" // UTF-8 encoding of U+FFFD
This is a mistaken I copied from mailnews without a thought. The first
byte should be '\357' (or \xEF).
Any thought, anyone? Fixing this would be a good news to non-English speakers.
Assignee | ||
Comment 45•22 years ago
|
||
Same patch, but I just removed license blocks and other non-essential parts
hoping that that may help reviewers. Well, it's still very long and
this may be harder to review than the original. Anyway, Alec, can you take a
look once more and tell me which part you want me to 'scrub' more?
Assignee | ||
Comment 46•22 years ago
|
||
I haven't made a transition to iterator, but everybody seems to agree that
we can deal with it in another bug after getting the required
feature right here.
Below is what I sent to Alec (off-line) to help him better understand my patch.
I'm adding it here to help others as well. BTW, xpfe dependency is to
fix bug 158006 (and perhaps one or two more in xpfe)
The dependency relation is shown below. I may have missed a couple,
but should be roughly right. (at least, I am not making
up non-existent dependency :-))
Mozilla 'modules':
uriload : getParameter
xpfe : getParameter
mailnews: getParameterInt, decodeRFC2047Header, decodeParameter
exposed methods:
getParameter : getParameterInt, decodeParameter
getParameterInt: [1]
decodeRFC2047Header: Is7bitCharset, DecodeRFC2047Str
decodeParameter : decodeRFC2047Header,
static functions:
DecodeRFC2047Str: DecodeQ, CopyRawHeader, ConvertHTab
CopyRawHeader:
ConvertHTab:
Is7bitCharset:
DecodeQ:
Assignee | ||
Updated•22 years ago
|
Attachment #117999 -
Attachment is obsolete: true
Assignee | ||
Comment 47•22 years ago
|
||
basically the same patch against a new tree.
corrected my mistake in converting Unicode replacement char. in
UTF-8 to octal representation.
There are a few superreviewers here on CC so that I don't know whom to ask for
r/sr
Moreover, this patch is across modules. alecf, bz, darin, please take this :-)
It's been sitting here long enough, hasn't it?
nhotta, can you see if my change is all right with nsSMTPService?
ducarroz, I carefully preserved mailnews behavior and played with
Mozilla-mail with this patch to make sure that I'm not breaking anything
(spam mail in various encodings/languages with various degree of
standard-compliance helped me a lot with this testing!!). Could you
take a look as well?
Thank you tons in advance.
P.S. my patch for bug 150008(blocked by this bug) just got sr and I'm hoping
to land this together with that patch.
Attachment #120825 -
Attachment is obsolete: true
Assignee | ||
Comment 48•22 years ago
|
||
Sorry for spamming. The bug blocked by this one for which I have a patch with sr is
bug 158006.
Comment 49•22 years ago
|
||
jshin, I'm not going to be able to review this in the near future... (like in
the next week at least).
Assignee | ||
Comment 50•22 years ago
|
||
In comment #47, I explained how various methods/static functions are used in
necko/xpfe.
Here's the picture on the mailnews side (see nsIMIMEHeaderParam.idl in my patch.).
getParameterInt() : does the job of |MimeHeaders_get_parameter| in
mailnews/mime/src/mimehdrs.cpp.
Now |MimeHeaders_get_parameter| is a wrapper for
nsMIMEHeaderParam::getParameterInt. Therefore,
mailnews code is not affected.
decodeRFC2047Header() : |Mime_DecodeMimeHeader| in mailnews/mime/src/mimehdrs.cpp
is now a wrapper for this.
decodeParameter() : |mime_decode_filename| in mailnews/mime/comi18n.cpp is now
a wrapper for this.
As I wrote in my earlier comments, static functions moved from mailnews are NOT
called
by other parts of mailnews.
Assignee | ||
Comment 51•22 years ago
|
||
*** Bug 205232 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 52•22 years ago
|
||
Comment on attachment 121646 [details] [diff] [review]
a new patch
Now that 1.4f branch was cut, let's try to make this bug not celebrate its
anniversary :-)
Attachment #121646 -
Flags: superreview?(alecf)
Attachment #121646 -
Flags: review?(cbiesinger)
Comment 53•22 years ago
|
||
Comment on attachment 121646 [details] [diff] [review]
a new patch
wow, this is a large patch... and I'm no module owner of any of this code...
but ok, I'll take a look at this patch.
In nsIEnsureUTF8.idl:
typo: "covnersion"
"|aSpec|(after" - missing space before the (
is there a reason this interface is not scriptable?
"The presence of non-ASCII characters is *blindly*
+ * regarded as an indication that your input spec is in unescaped UTF-8
"
don't we have an IsUTF8 funtion that you should use instead?
+ * @return empty if |aSpec| is in UTF-8.
+ * the URL-escaped converted string in ASCII if not.
not sure I like that. Why not always return a valid, UTF-8, escaped spec?
nsEnsureUTF8.cpp:
+ nsCAutoString inStr(aString);
um no. use this:
const nsAFlatCString& inStr = PromiseFlatCString(aString);
+ PRUnichar *ustr = (PRUnichar *) nsMemory::Alloc(dstLen * sizeof(PRUnichar));
might want to consider using an nsAutoPtr here
+NS_IMETHODIMP nsEnsureUTF8::ConvertToUTF8
+(const nsACString &aString, const char *aCharset, PRBool aCheck,
+ nsACString &aUTF8String)
hm, earlier you used a different way (return type on own line, parameters on
same line as function name), please be consistent
+ if (!IsASCII(aSpec)) {
+ aUTF8Spec = aSpec;
why not use IsUTF8 here?
+ return rv;
this will always be NS_OK at this point, I would therefore directly write
return NS_OK
Is there a special reason for mixing string/ACString types in the interfaces?
Why not stick to the string classes?
more later... note to self: netwerk/mime/src/nsMIMEHeaderParamImpl.cpp
Comment 54•22 years ago
|
||
another question - why not remove the #if 0 code in mailnews/mime/src/comi18n.cpp?
Assignee | ||
Comment 55•22 years ago
|
||
Thanks a lot for taking a look at this big patch (well, actually, it's not so
big as it looks because a lot of codes are moved to necko out of mailnews)
> why not remove the #if 0 code in mailnews/mime/src/comi18n.cpp?
I guess that's just kept by mistake while removing all other cases like that.
I'll get rid of it.
> "The presence of non-ASCII characters is *blindly*
> + * regarded as an indication that your input spec is in unescaped UTF-8"
> don't we have an IsUTF8 funtion that you should use instead?
See comment #42. I'm still waiting for Naoki's answer. Naoki, what would say to
my question in comment #42?
> * @return empty if |aSpec| is in UTF-8.
> * the URL-escaped converted string in ASCII if not.
> not sure I like that. Why not always return a valid, UTF-8, escaped spec?
I considered that, but I just left the way its original (by Naoki) does because
I wanted to minimize the change in other parts. Thinking again about it, it
doesn't seem to be a very good justification. There might have been other
reasons. I'll get back to this later.
> Is there a special reason for mixing string/ACString types in the interfaces?
> Why not stick to the string classes?
I'd love to, but I was caught in the middle between my desire to be consistent
and my desire to simplify 'calling sequences' (and avoid unnecessary conversion
between string and string class). I think some of them can be made more
consistent later when I do 'iterator-love', which I plan to in a separate bug.
As for other comments, I'll take care of them in next iteration once you're
through reviewing the current patch.
Comment 56•21 years ago
|
||
Comment on attachment 121646 [details] [diff] [review]
a new patch
nsMIMEHeaderParamImpl.cpp:
+ // retrun charset.)
typo
+ const PRBool aTryLocaleCharset,
why const?
+ * @param aDefaultCharset MIME charset to use in place of MIME charset
+ * specified in RFC 2047 style encoding
+ * when <code>aOverrideCharset</code> is set.
+ * @param aOverrideCharset When set, overrides MIME charset specified
+ * in RFC 2047 style encoding with
<code>aDefaultCharset</code>
Couldn't you assume that the presence of aDefaultCharset implies
aOverrideCharset? Or did I misunderstand the meaning of aOverrideCharset?
+ if (NS_SUCCEEDED(rv) && ensureUTF8 &&
no need to test for both here.
+ if (aEatContinuations) {
+ nsCAutoString temp(aResult);
+ temp.StripChars("\r\n");
+ aResult = temp;
+ }
this is not exactly the fastest way to do this...
|rv| seems to be unused in DecodeRFC2047Header.
+ nsCAutoString unQuoted(aParamValue);
+ unQuoted.ReplaceSubstring("\\\\", "\\");
+ unQuoted.StripChar('\\');
+ aResult = unQuoted;
also quite slow I guess... and, why bother replacing \\ with \ if you're
stripping all \ characters anyway?
+ if (NS_SUCCEEDED(rv) && !decoded.IsEmpty() && !decoded.Equals(aResult))
+ aResult = decoded;
I would think that comparing aResult and decoded is as costly as copying
decoded into aResult, so I wouldn't bother with the comparison.
+ out = dest = (char *)PR_Malloc(length + 1);
[...]
+ memset(out, 0, length + 1);
Use PR_Calloc:
http://www.mozilla.org/projects/nspr/reference/html/prmem2.html#21449
+PRBool Is7bitCharset(const char *input, PRUint32 len)
how does this compare to IsASCII? Also, what is HZ?
also, how about adding comments to the static functions about what they're
doing?
+ // Assume no more than 3X expansion due to UTF-8 conversion
is that a correct assumption?
uriloader/base/nsURILoader.cpp:
+ rv = mimehdrpar->GetParameter(disposition, "",
+ NS_LITERAL_CSTRING(""), PR_FALSE, nsnull, dispToken);
don't use tabs
Also, can't you remove more of nsURILoader? line 303ff seem to do the same job
as nsIMIMEHeaderParam::GetParameter.
nsExternalHelperAppService.cpp:
+ aChannel->GetURI(getter_AddRefs(mSourceUrl));
uh. you are touching mSourceURL here, the previous code is not. Is that safe?
In comi18n.cpp / MIME_DecodeMimeHeader:
Why bother with returnVal instead of directly returning nsnull?
+ if (NS_FAILED(rv) || !mimehdrpar)
no need to check both...
+ // I know this looks silly, but we have to do this to avoid
+ // replacing |PR_FREEIF(s)| at over a dozen places [...]
Uh. If I were to decide, I would replace the PR_FREEIF. in other words:
if you want to check this in with a review from me, replace the PR_FREEIF.
+ if (NS_FAILED(rv) || !mimehdrpar)
again only check one... I won't mention this again, please fix all occurences.
+ rv = mimehdrpar->DecodeParameter(nsCAutoString(name), charset,
+ opt->default_charset,
+ opt->override_charset, result);
indentation looks wrong here. Also, please use nsDependentCString instead of
nsCAutoString. Finally, no need for returnVal in this method, as that is always
nsnull.
done with the patch. marking review- due to the above issues.
Attachment #121646 -
Flags: review?(cbiesinger) → review-
Assignee | ||
Comment 57•21 years ago
|
||
Thanks a lot for your thorough review. This is a new patch addressing
your concerns. In what follows, I'm gonna reply to your comments only
if further explanation is necessary.
Two changes in nsIEnsureUTF8: Now ConvertToUTF8 has 'aSkipCheck'
(instead of 'aCheck') indicating whether ASCIIness/UTF8ness check has to
be skipped. Along with this, non-ASCII 7bit charsets are better handled
now. EnsureUTF8Spec now returns AUTF8String (unescaped UTF8 spec.)
Other changes were made to address Christian's concerns. Still other
changes are noted below.
> * @return empty if |aSpec| is in UTF-8.
> * the URL-escaped converted string in ASCII if not.
> not sure I like that. Why not always return a valid, UTF-8, escaped spec?
It turned out that I forgot to update the documentation after changing
the behavior of the method. Actually, your comment made me look
more closely at what it does and made a rather 'drastic' change in
|EnsureUTF8Spec|. Now its return type is AUTF8String and it always
returns _un_escaped UTF-8 spec. Checking out the way |spec| is used
and defined in nsIURI, I convinced myself that there's no need to
url-escape before returning.
> "The presence of non-ASCII characters is *blindly*
> + * regarded as an indication that your input spec is in unescaped UTF-8"
> don't we have an IsUTF8 funtion that you should use instead?
At least for now, I think I'd rather leave this alone. The assumption
behind that is that in Mozilla |spec| (of nsIURI) is always either in
UTF-8 (unescaped) or in other charset (*url-escaped*). By checking
out ASCIIness at the beginning, we're saving some cycles it'd take
us to check UTF8ness and possibly redundant un-escaping/conversion.
+ * @param aDefaultCharset MIME charset to use in place of MIME charset
+ * specified in RFC 2047 style encoding
+ * when <code>aOverrideCharset</code> is set.
+ * @param aOverrideCharset When set, overrides MIME charset specified
+ * in RFC 2047 style encoding with aDefaultCharset
> Couldn't you assume that the presence of aDefaultCharset implies
> aOverrideCharset? Or did I misunderstand the meaning of aOverrideCharset?
No, I can't. Sorry the documentation was not clear
enough. aDefaultCharset is used in two cases: 1. charset is not
available because raw/bare string without charset specified per rfc
2231/2047 is passed over. (our best guess is aDefaultCharset) 2. when
aOverrideCharset is set even with charset available. (charset is
overriden by aDefaultCharset.)
+ nsCAutoString temp(aResult);
+ temp.StripChars("\r\n");
+ aResult = temp;
> this is not exactly the fastest way to do this...
I'm aware of that :-) (see comment #35)
+ nsCAutoString unQuoted(aParamValue);
+ unQuoted.ReplaceSubstring("\\\\", "\\");
+ unQuoted.StripChar('\\');
+ aResult = unQuoted;
> also quite slow I guess... and, why bother replacing \\ with \ if you're
> stripping all \ characters anyway?
The logic was flawed and I replaced the above with iterator code.
+PRBool Is7bitCharset(const char *input, PRUint32 len)
> how does this compare to IsASCII? Also, what is HZ?
There are 7bit charsets that are NOT US-ASCII. Examples include
ISO-2022-JP(-2), ISO-2022-CN, HZ-GB2312 and ISO-2022-KR. ISO-2022-JP
is still widely used for Japanese emails/news postings and HZ-GB* was
popular in early 1990's for Chinese emails and news postings but still
is used occasionally. We should be careful not to treat ISO-2022-JP*
and HZ* as US-ASCII when dealing with mail/news. Your question insipired
me to do some more clean-up, better documentation and some changes in
nsEnsureUTF8::ConvertToUTF8.
+ // Assume no more than 3X expansion due to UTF-8 conversion
> is that a correct assumption?
Yes, 3x is sufficient. A couple of weeks ago, I estimated this
mulitplication factor again for another program.
1. no known ISO-8859-x/Windows-x/KOI8*/other single byte charsets can
encode non-BMP characters.
2. all BMP characters can be expressed in three or fewer bytes in UTF-8
3. ISO-2022-JP can encode half-width kana with a single byte (excluding
escape sequence) and a half-width kana takes 3bytes in UTF-8. The multiplier
is at most three.
4. GB18030 and EUC-JP(JISX 0213) can encode non-BMP characters (4bytes
in UTF-8), but their representations in GB18030 and EUC-JP(JIS X 0213)
take at least two bytes (actually three in both cases) so that the
multiplication factor is certainly smaller than 3.
5. In addition, we have extra room here because the input includes RFC 2047
overhead ('=?charset_name?......?=')
> Also, can't you remove more of nsURILoader? line 303ff seem to do the same
job
> as nsIMIMEHeaderParam::GetParameter.
Thanks for the catch. I removed the redundant code.
nsExternalHelperAppService.cpp:
+ aChannel->GetURI(getter_AddRefs(mSourceUrl));
> uh. you are touching mSourceURL here, the previous code is not. Is that safe?
I guess it's safe, but it's redudant. I changed it to use GetSource()
method that retrieves mSourceUrl.
+ // I know this looks silly, but we have to do this to avoid
+ // replacing |PR_FREEIF(s)| at over a dozen places [...]
> Uh. If I were to decide, I would replace the PR_FREEIF. in other words:
> if you want to check this in with a review from me, replace the PR_FREEIF.
I changed two out parameters (charset and language)
to be nsMemory::Free'd, which puts this function on par with other helper
functions that are wrappers over methods of nsIMIMEHeaderParam. Changing
return value as well is too risky because it's not always freed
immediately (sometimes it's hard to find where it's freed.). I'd end
up modifying more than 60 lines in over a dozen additional files in
mailnews/mime/src.
Attachment #121646 -
Attachment is obsolete: true
Assignee | ||
Comment 58•21 years ago
|
||
I don't know how much this diff. will help because it doesn't look like it's
easy to read..
Assignee | ||
Updated•21 years ago
|
Attachment #121646 -
Flags: superreview?(alecf)
Assignee | ||
Comment 59•21 years ago
|
||
Comment on attachment 124819 [details] [diff] [review]
a new patch addressing Chrsitian's concerns
Christian, can you take a look again? Hopefully, it won't take as long this
time.
Alec, will you take a look sometime later (perhaps after bug 206379 :-))?
Attachment #124819 -
Flags: superreview?(alecf)
Attachment #124819 -
Flags: review?(cbiesinger)
Comment 60•21 years ago
|
||
Comment on attachment 124819 [details] [diff] [review]
a new patch addressing Chrsitian's concerns
now that EnsureUTF8Spec() is unconditionally outputting to the destination
string (Which I think is the right thing, btw) I think it needs a new name. How
about convertToUTF8WithCharset(...);?
as for getParameterInt() that sounds like its getting an integer version. Don't
shorten "Int" to "Internal" - just make it getParameterInternal()
But can't we just describe it better? What's the difference between the two?
Also nsIEnsureUTF8 is a bad name for a class - "ensure" is a verb, classes
should be nouns, like "nsIUTF8Converter" or "nsIUTF8ConverterService" or
something.
Also, we now have a CopyUCS2toUTF8() that you should use - it goes through the
old code which does what you've written, but at least this way we can correctly
implement CopyUCS2toUTF8 later and get a perf. improvement
+ nsCAutoString tempStr(valueStart, valueEnd - valueStart);
+ tempStr.StripChars("\r\n");
+ *aResult = (char*)nsMemory::Clone(tempStr.get(), tempStr.Length() + 1);
use ToNewCString() here.
+ *aLang = (char *) nsMemory::Clone(sQuote1 + 1, sQuote2 - (sQuote1 +
1) + 1);
isn't there an nsCRT::strndup() or something? (if not, don't worry about it)
+ else if (ns != *aResult)
+ *aResult = ns;
don't bother checking, just assign.
+ // Assume no more than 3X expansion due to UTF-8 conversion
+ retbuff = (char *)PR_Malloc(3 * strlen(header) + 1);
it would be really nice to make this a nsCAutoString, or to just assign into
"aResult" - there is serious possibility here for buffer overflow, I think...
in fact if there is ANY way that the resulting string could be longer than 3x
the incoming string, we MUST switch to nsCAutoString.
the rest of the code looks ok, as best as I can tell. I'm going to sr- this for
now, but I think the cleanups should be pretty straight forward.
Attachment #124819 -
Flags: superreview?(alecf)
Attachment #124819 -
Flags: superreview-
Attachment #124819 -
Flags: review?(cbiesinger)
Assignee | ||
Comment 61•21 years ago
|
||
alecf, thank you for the review. This patch addresses all your concerns.
The name changes:
EnsureUTF8 -> UTF8ConverterService [1]
EnsureUTF8Spec -> ConvertURISpecToUTF8
ConvertToUTF8 -> ConvertStringToUTF8
getParameterInt -> GetParameterInternal
+ *aLang = (char *) nsMemory::Clone(sQuote1 + 1, sQuote2 -..
> isn't there an nsCRT::strndup() or something? (if not, don't worry > about
it)
There is, but I can't use it because nsCRT::strndup(char *) returns malloced
buffer (nsCRT::strndup(PRUnichar *) returns nsMemory::Alloc'd buffer) and I
need nsMemory::Alloc'd buffer.
BTW, GetParameterInternal does only RFC 2231 parsing while GetParameter does a
lot more(RFC 2047 fallback, fallback charset, raw 8bit string, etc). I'd not
have made the former a method if it had not been used extensively in mailnews.
I added a comment to this effect.
+ // Assume no more than 3X expansion due to UTF-8 conversion
+ retbuff = (char *)PR_Malloc(3 * strlen(header) + 1);
> it would be really nice to make this a nsCAutoString, or to just
> assign into
> "aResult" -
Although there's no possibility of the buffer overflow, this is a nice thing
to do anyway, which helped me cut down the code size more.
> there is serious possibility here for buffer overflow,
> I think...
> in fact if there is ANY way that the resulting string could be
> longer than 3x
> the incoming string, we MUST switch to nsCAutoString.
I can assure :-) you that (3 * inlen + 1) is the worst possible case. Note
that input string cannot be expanded in an arbitrary manner but can only be
expanded under our control. The worst case is when somebody gives us a string
made soley of octets illegal in both UTF-8 and defaultCharset. Another case is
when SCSU/BCSU is used and the sliding window of SCSU/BCSU is set into a
non-BMP block. However, currently Mozilla does not support SCSU/BCSU(see
intl/uconv. it may never.) so that even if Mozilla's given such a string, it
won't be interpreted that way (it'll be just treated as ASCII.)
[1] As for nsIUTF8ConverterService, I'm not sure if that's the best choice. At
the moment, the name fits what it does. However, I found that there are a few
static functions through out the tree that may benefit from a potential
expansion of this interface. They're ConvertToUnicode (in nsAppShell),
ConvertStringToUnicode (nsProfile) and ConvertStringToUTF8. The last one needs
only 'impedance' matching . If the former two's are also folded under this
interface, 'UTF8' in the name has to be changed. Unless it's an absolute evil
to lose the file history by renaming, I'd rather just go with
nsIUTF8ConverterService for this patch and will change the file(interface)
name later if we do add ConvertToUnicode.
Attachment #124819 -
Attachment is obsolete: true
Attachment #124823 -
Attachment is obsolete: true
Assignee | ||
Comment 62•21 years ago
|
||
Comment on attachment 124986 [details] [diff] [review]
a new patch per Alec's comment
Asking r/sr.
Other changes:
1. static ConvertHTab is now gone. It was used in two places. One of them was
replaced with ReplaceChar (it's slow, I know) and the other is now a simple
loop over a char. pointer.
2. Brand-new CopyUTF8toUTF16 is used in place of NS_ConvertUTF8toUCS2 in three
places. I also replaced NS_ConvertUCS2toUTF8 with CopyUTF16toUTF8 as suggested
by alecf.
Attachment #124986 -
Flags: superreview?(alecf)
Attachment #124986 -
Flags: review?(cbiesinger)
Comment 63•21 years ago
|
||
That all sounds great. If we're talking about general conversions (like
replacing the ConvertToUnicode()) another option might be to fix the
nsICharsetConverterManager methods to use nsAString/nsACString - the service was
written way before these strings were available to us.. that is why their API is
so obtuse.
I'll finish this review today...
Comment 64•21 years ago
|
||
hm, if UTF8ConverterService is a bad name because you might add non-utf8 unicode
methods, why not name it UnicodeConverterService?
I'll also try to finish the review later today.
Comment 65•21 years ago
|
||
because at that point it sure sounds a lot like nsICharsetConverterService
Assignee | ||
Comment 66•21 years ago
|
||
I shouldn't have opened a 'can of worms' :-). For this bug, let's just go with
nsUTF8ConverterService because that name represents what it does now well
although we have to risk losing the file change history later when we decide to
expand this interface and change the file/interface name. See also bug 54857.
Something similar to Alec's alternative was briefly mentioned there.
TIA for reviews ^-^
Comment 67•21 years ago
|
||
Comment on attachment 124986 [details] [diff] [review]
a new patch per Alec's comment
just noticed some nits in nsIUTF8ConverterService.idl:
+ * Portions created by the Initial Developer are Copyright (C) 2002
+ * the Initial Developer. All Rights Reserved.
we have 2003 now :)
+ */
+
+ AUTF8String convertStringToUTF8(in ACString aString,
The AUTF8String line is a bit too far indented, should be two spaces less (also
the arguments for this function are indentet too much)
same for the other function on this interface
ok now to more serious stuff:
+ nsCOMPtr<nsIAtom> charsetAtom;
+ rv = ccm->GetCharsetAtom2(aCharset, getter_AddRefs(charsetAtom));
isn't alecf rewriting that stuff in another bug?
would it make sense to wait with landing this patch until that other patch is
ready, and use the new code?
+#define REPLACEMENT_CHAR "\357\277\275" // EF BF BD (UTF-8 encoding of U+FFFD)
would it be useful to mention what character U+FFFD is ("unknown character")?
mailnews/mime/src/mimehdrs.cpp
+ return NS_SUCCEEDED(rv) ? PL_strdup(result.get()) : nsnull;
in comi18n.cpp, you used an explicit if instead of ?:... is there a special
reason for that? it would be good if you could be consistent, unless of course
you did that to be consistent with the current style of the file.
other than that, you can have r=biesi; assuming that my review is acceptable to
alecf, given that I'm no module owner.
Attachment #124986 -
Flags: review?(cbiesinger) → review+
Assignee | ||
Comment 68•21 years ago
|
||
Thanks a lot for the review.
> isn't alecf rewriting that stuff in another bug?
> would it make sense to wait with landing this patch until that other
> patch is ready, and use the new code?
Absolutely. That's my plan. I find it a bit hard to shuttle between two
versions(one with his patch and the other without) so that I just uploaded one
with the old interface. The effect of alecf's patch on this patch would be very
small (a few changes in Makefile.in, include files and just one change in
charset. cvt. manager invocation you spotted.
I'll fix the indentation and "if vs '? :". As for U+FFFD, its name (Unicode
name) is 'REPLACEMENT CHARACTER' :-) See
http://www.unicode.org/charts/PDF/UFFF0.pdf In Mozilla and elsewhere, U+FFFD is
used to mark unknown characters, invalid byte sequences, and so forth.
Assignee | ||
Comment 69•21 years ago
|
||
While testing on Win2k, I found a problem I overlooked on Linux(I should have
paid attention to NS_ASSERT passing by in my terminal). 'NS_ASSERT' on Win32 is
so annoying that I can never miss it :-) After fixing this, it works as well on
Win2k as on Linux.
In the following two places, I had to replace nsDependent(C)String with
Substring() as suggested in nsDependentString.h.
In nsUTF8ConverterService.cpp
> + rv = unicodeDecoder->Convert(inStr.get(), &srcLen, ustr, &dstLen);
> + if (NS_SUCCEEDED(rv))
> + CopyUTF16toUTF8(nsDependentString(ustr, dstLen), aResult);
+ CopyUTF16toUTF8(Substring(ustr, ustr + dstLen - 1), aResult);
In nsMIMEHeaderParamImpl.cpp
> + cvtUTF8->ConvertStringToUTF8(nsDependentCString(aInput, aLen),
> + aDefaultCharset, skipCheck, utf8Text))) {
+ cvtUTF8->ConvertStringToUTF8(Substring(aInput, aInput + aLen - 1),
+ aDefaultCharset, skipCheck, utf8Text))) {
Assignee | ||
Comment 70•21 years ago
|
||
> cvtUTF8->ConvertStringToUTF8(Substring(aInput, aInput + aLen - 1)
ooops. in both cases, '-1' should not be there. I didn't check out the def. of
Substring().
Another loose end found by running on Windows : In nsMIMEHeaderParamImpl.cpp :
+ if (str == start)
+ return NS_ERROR_UNEXPECTED;
+ *aResult = (char *) nsMemory::Clone(start, (str - start) + 1);
+ NS_ENSURE_TRUE(*aResult, NS_ERROR_OUT_OF_MEMORY);
I forgot to null-terminate.
(*aResult)[str - start] = '\0'; // null-terminate
Certainly, it looks like another case for nsACString. I'll switch to it when
doing 'iterator-love'
Comment 71•21 years ago
|
||
Comment on attachment 124986 [details] [diff] [review]
a new patch per Alec's comment
+// XXX Do we have to check UTF-7? To be perfectly safe, we have to, but
+// who uses UTF-7 in the mail header?
should probably check with bienvenu@netscape.com to see - IMAP uses UTF7 I
believe.
+ // B. title*=us-ascii'en-us'This%20is%20weired.
perhaps you meant "wierd"? :)
+// XXX : aTryLocaleCharset is not yet effective.
should you NS_ASSERTION() if someone tries to use it then?
other than that, this all looks good!
sr=alecf with the above changes/updates
Attachment #124986 -
Flags: superreview?(alecf) → superreview+
Assignee | ||
Comment 72•21 years ago
|
||
Thank you for sr.
I'll add NS_ASS..() and fix spelling errors. I'll land it with a necessary
change after you check in charsetAtom removal patch.
As for UTF-7, a modified form of UTF-7 is used for IMAP4rev? folder names, but
my code is a world apart from the code handling IMAP folder names (it's about
decoding the message headers : RFC 2047/2231). Maybe I'd better remove the
comment because 'perfect' there means not failing even once in 10^30(?) trials :-).
Assignee | ||
Comment 73•21 years ago
|
||
alec>+// XXX Do we have to check UTF-7? To be perfectly safe, we have to, but
alec>+// who uses UTF-7 in the mail header?
alec>should probably check with bienvenu@netscape.com to see
js> comment because 'perfect' there means not failing even once in
js> 10^30(?) trials :-)
Oops. Sorry I misread the 'context'. I thought the code in question is about
being generous in what you accept (that is, even though it's untagged
_violating_ the standard so that we have no oblication, we try our best to
figure out what's meant by a sender) but it turns out that the code is used when
we handle 'standard-compliant' cases (properly tagged). Still, the chance of
getting something like the following
Subject: =?UTF-7?Q?+rACsAQ?= for 가각
Subject : =?UTF-7?Q?caf+AOk?= for café (set encoding to UTF-8)
Or
Content-Disposition: attachment; filename*=utf-7''caf+AOk.txt
is not much higher than 1 in 10^30 :-). However, if somebody points out that
it's a bug for Mozilla not to decode the above. I can't help admitting that she
is correct in principle although I'm tempted to counter that nobody in their
right mind would use it. How much are we willing to pay for this? Well, it's
another strcasecmp() and I'll add that to be correct for even very remotely
possible cases.
bienvenu, nhotta and duccaroz, what would you say?
Comment 74•21 years ago
|
||
Hmm... So I just looked at the nsURILoader and nsExternalHelperAppService
changes (the only ones I care about, really) and it looks like both have the
same problem -- passing a length to EqualsIgnoreCase will return true if that
many chars match.... so "inlines" and "inline" are equal if you pass 6 as the
second arg to EqualsIgnoreCase, no?
One other question I have is why we pass the origin charset in when decoding the
filename in nsExternalHelperAppService but not when getting the disposition
type... is there a reason?
Other than that, the code refactoring looks awesome.
Assignee | ||
Comment 75•21 years ago
|
||
Fix was checked a while ago.
Thanks for the catch.
Instead of |dispToken.EqualsIgnoreCase("inline", 6)|, I should have used
|dispToken.EqualsIgnoreCase("inline", dispToken.Length())|, shouldn't I?
I'll make a quick patch. Somebody might say that Mozilla should be generous to
accept 'inlines' as well, but I guess in this case, we'd rather play by the
rules :-)
As for not passing originCharset, we don't have to when fetching the
disposition type which is always in US-ASCII (unless somebody makes an
'over-i18n' proposal and IETF accepts it ;-)). In case of filename parameter,
there are a lot of broken CGI programs (perl, jsp, asp) in the wild that emit
raw 8bit characters expecting clients to interpret them as encoded in
originCharset. So, we have to 'bend' the rules and be a bit accomodating....
Target Milestone: mozilla1.4alpha → mozilla1.5alpha
Comment 76•21 years ago
|
||
jshin: well if you're at that, why not just don't pass a length?
Assignee | ||
Comment 77•21 years ago
|
||
let me get quick r/sr
Comment 78•21 years ago
|
||
Comment on attachment 125584 [details] [diff] [review]
uriloader : 4-line fix
sr=me. This is exactly what I wanted; passing the other length would be just
as bad when dispToken is too short...
Attachment #125584 -
Flags: superreview+
Comment 79•21 years ago
|
||
Did the checkin for this cause bug 209329 (mailto links don't work at all)?
Comment 80•21 years ago
|
||
Comment on attachment 125584 [details] [diff] [review]
uriloader : 4-line fix
looks right
Attachment #125584 -
Flags: review+
Assignee | ||
Comment 81•21 years ago
|
||
Fix checked in. Thank you all! I'm done with this one.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 82•21 years ago
|
||
This fix broke mailto urls in mailnews. nsSmtpService.cpp ends up setting an
empty spec on new mailto urls. Did you test this before checking in your change
to the smtp service? =(
=)
Comment 83•21 years ago
|
||
Comment 84•21 years ago
|
||
Comment on attachment 125849 [details] [diff] [review]
fix to make mailto urls work in mail again
sr=bienvenu
Attachment #125849 -
Flags: superreview+
Comment 85•21 years ago
|
||
mscott: please see bug 209328 which covers mailto uris not working
Assignee | ||
Comment 86•21 years ago
|
||
This patch is not complete although it is simple and works on the surface.
Please, see bug 209328.
Comment 87•21 years ago
|
||
well, that patch was checked in anyway:
06/17/2003 22:34 scott%scott-macgregor.org mozilla/ mailnews/ compose/ src/
nsSmtpService.cpp 1.125 1/1 fix the regression caused by Bug #162765 which
broke mailto urls.
without any review or super-review
Comment 88•21 years ago
|
||
It had sr, was written by the module peer, and fixed a bug that is still open 5
days after filing when it should have been a smoketest blocker, imo....
Assignee | ||
Comment 89•21 years ago
|
||
Sorry for the regession. BTW, please note that the patch was uploaded within 7
hrs of the report. Perhaps, I should have asked people on mailnews for r/sr. (my
patch uploaded to bug 209328 is across modules unlike the one liner just
checked in). Anyway, I'll take care of the rest in bug 209328 as soon as I get r.
Comment 90•21 years ago
|
||
To answer Christian's comments, I checked this in right away because it fixed a
nasty regression in mail which I think should have been a blocker and caught
during the testing of the patch before it went in.
Now that it's in, feel free to change how this was fixed via Bug #209328 as you
pointed out.
Assignee | ||
Comment 91•21 years ago
|
||
*** Bug 206788 has been marked as a duplicate of this bug. ***
Comment 92•18 years ago
|
||
It looks like I'm facing a similar bug to bug 206788 that is duplicate of this one.
Please take a look at bug 353113. Thanks.
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•