Closed Bug 162765 Opened 22 years ago Closed 21 years ago

RFC2231 filename support for nsExternalAppHandler

Categories

(Core Graveyard :: File Handling, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.5alpha

People

(Reporter: nhottanscp, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(4 files, 11 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Biesinger
: review+
Details | Diff | Splinter Review
(deleted), patch
Biesinger
: review+
bzbarsky
: superreview+
Details | Diff | Splinter Review
(deleted), patch
Bienvenu
: superreview+
Details | Diff | Splinter Review
http://www.ietf.org/rfc/rfc2231.txt separated from bug 161174, a filename in HTTP header may be encoded in RFC2231 form, currently Mozilla do not support that for file download.
I filed bug 155949 and added pretty extensive information on the issue. IMHO, it's better to make this one as a duplicate of bug 155949. The other way around is all right with me, but in that case what I wrote there would have to be copied here, which seems to me to be sort of waste of time and resource.
*** Bug 155949 has been marked as a duplicate of this bug. ***
OS: Windows 2000 → All
Priority: -- → P3
Hardware: PC → All
Target Milestone: --- → mozilla1.3alpha
Attached patch a patch (obsolete) (deleted) — Splinter Review
With this patch, the last 5 cases at http://jshin.net/moztest/download.html work as expected/intended. If RFC 2231 compliant C-D http header is present, charset is read off and the parameter value is decoded accordingly. When raw 8bit characters are used without charset info. in C-D header, UTF-8 is tried first and if that does not work, the document charset (originCharset) is used instead. A couple of issues: EnsureUTF8Filename() was copied almost verbatim (aside from URL encoding/decoding) from nhotta's EnsureUTF8Spec() in nsSmtpService.cpp. It might be a good idea to have EnsureUTF8String() somewhere in intl/uconv to avoid cut'n'paste of the code. We may want to have 'IsUTF8()' that works simliarly to 'IsASCII()'. (I saw in one of bugs related to this that 'IsUTF8()' was briefly discussed, but forgot which) To me converting to UCS2 and then back to UTF-8 and comparing the result with the original string does not seem to be a very good idea. And, there may be other problems as well. It'd be nice to hear feedback, positive or negative.
One more issue which I think has to be dealt with separately from this one is what to do when the local file system encoding cannot represent characters in a suggested filename. Users can for sure override, but my test result of the page mentioned previously can be confusing to users. After verifying the patch works fine under ko_KR.UTF-8 locale, I ran Mozilla under C locale(Linux) and obviously Korean characters in the URL aforementioned cannot be represented in C locale. Without any other option, Mozilla assumes that the local file system encoding is the encoding of the locale. However, the filepicker dialog shows a Korean filename as if Mozilla can use it. When I press 'OK' button, Mozilla falls back to 'something' and the download manager uses that fallback. A more gracious and user-friendly approach is necessary here.
One thing that sorta leaps out at me there is that this makes uriloader depend on a mailnews module... won't this break when we build with --disable-mailnews?
> this makes uriloader depend > on a mailnews module... won't this break when we build with --disable-mailnews? I never thought about it. I guess that's likely.(I wish we had a generic mime module not tied to mailnews.) Then, I looked at Makefile.in in mailnews/mime/build and it does not depend on mailnews. The dependency is the other way around (mailnews/base/build). Now the question is what exactly --disable-mailnews disables.... If indeed that dependency breaks a build with mailnews diabled, what would you suggest? Perhaps, we just have to go back to nhotta's original code and build upon it to support RFC 2231. Or just copy'n'past two functions in mimehdrs.cpp into nsExternalHelperAppService.cpp although that may be pretty ugly(?). Before pondering further, let's get the definitive answer on the dependency issue. well, I can clobber and build with --disable-mailnews. I'm gonna ask the owner of mime module..
The simplest solution would likely be to move this stuff into uconv or necko and have both mailnews and uriloader is it from there....
A problem with moving this stuff out of mime module into uconv or necko is that 'MimeHeaders_get_parameter()' is invoked 50+ times in mime module. That, by itself, may not be a big problem (I believe none of them is used in a context critical to the performance) except that it's tedious to change them all. Another function, mime_decode_filename() refered to in GetMimeHeaderParameter() is invoked only 7times, but dividing it into a part not depending on mime module and the other dependent on it is a bit ugly although doable. If we wanna go down that path, I guess we definitely need to talk to the owner of mime module, ducarroz@netscape.com. I'm now adding ducarroz to cc.
QA Contact: sairuh → petersen
Blocks: 177840
I took another look at this bug and at first it looked like I could just move three functions out of mimehdrs.cpp and put them in uconv or necko. It turned out that the extent to which things depend on each ohter makes it hard to separate cleanly what I need from the rest of mailnews/mime. Now I'm considering moving the whole mailnews/mime out of mailnews and putting it under necko or somewhere else. That may have been what Boris meant in comment #7. Anyway, ducarroz and Boris, how does it sound? This is a big change (and the cvs admin. wouldn't happy for havig to do extensive operation to keep 'history' intact after moving so many files....).
That's sort of what I meant, yes. I'm not sure what the mailnews team's take on that is.... ccing darin also, since he should certainly know about changes like this. :) If we can't reuse the mailnews code, I would opt for a helper function somewhere (sort of like the mimetype-parsing function we have) that would take a string and parse out the disposition and filename.... that would allow us to save some already-duplicated code in uriloader and exthandler.
I tried compiling in mailnews/mime even when mailnews is disabled (by tweaking a couple of Makefile.in and allmakefiles.sh. this way, I thought we'd be able to avoid 'cvs surgery'), but found that 'mime' has a dependency on the rest of mailnews (there's only one dependency, but it's rather hard to remove it). So, that doesn't work. Instead, I'm going for the second option while trying to resue mailnews/mime code whenever possible by making a new directory next to mailnews/mime and putting there what we need _outside_ mailnews. What's in there will be compiled regardless of whether mailnews is enabled or not.
Attached patch a path (with mailnews dependency removed) (obsolete) (deleted) — Splinter Review
I moved what's necessary for this out of mailnews/mime and put them in a new mailnews/mimehdrparm (name and location are tentative). Because some of them are utility functions used by the rest of mailnews, I took some care to make them available to mailnews 'directly' (not via xpcom). Perhaps, this is not a good idea and I should expose them only thru xpcom. I also have to make a couple of methods scriptable so that Javascript can call them. I found that 'save target as' takes a separate code path from 'Save' and is dealt with with JS. Besides, it might be better to make 'service' do more work than callers. I haven't yet implemented getting the MIME charset/encoding corresp. to the current locale and use that as the last resort when MIME charset info. is not otherwise available/inferred. There's no XP-way available at the moment... Anyway, please, take a look and make some comments. There are a lot of 'quirkiness' in a sense ...
Attachment #96925 - Attachment is obsolete: true
A collection of new files in mimehdrparam module. I'm not sure of the best way to build a lib for this module. Currently, both dll/shared lib as a component and static lib. are built. When I only built static lib., do_GetService() somehow failed (perhaps I should have rebuilt from the top of the tree in that case). On the other hand, I couldn't resolve symbols for mailnews (when it's enabled) when mimehdrparam was built as dll only. I found some old news postings about this in xpcom newsgroup. Any advice/comment on this and other aspects would be appreciated a lot.
http://jshin.net/moztest/download.html (I don't have privilige to put this in the URL field. Boris, can I take this bug?) has some testing cases (the last section). I'm going to revise the page so that it may not work sometimes. BTW, i18n of 'save target as' is bug 158006. Once I make a couple of methods exposed to JS, it should be easy to fix it.
oh. I didn't realize you did not have the privs to take the bug. Sorry about that. Reassigning. Feel free to mark any of those as needing review from me once you're somewhat happy with them. ;)
Assignee: bzbarsky → jshin
Priority: P3 → --
Target Milestone: mozilla1.3alpha → ---
Thank you, Boris for assigning this to me. Anyway, before making some 'feature enhancements' and structural changes I talked about, I need to ask a few questions about the way things are laid out and built. What my patch as of now does are : A. make a new nsIMimeHdrParam service with a single method ( for now) to retrieve rfc2231/rfc2047-decoded values of mime header parameters (in my first patch, the method was added to nsIMimeConverter service, but it introduced mailnews dependency which needs to be avoided). This module may have to be expanded later if it turns out that Mozilla needs to generate (as opposed to parse) multipart/form-data (with RFC 2231 style C-D header) for form submission B. moves a dozen of functions out of libmime (comi18n.cpp and mimehdrs.cpp) and put them into a new 'module' mimehdrparam. Tentatively, I put it under mailnews, but it can go anywhere in the source tree. An upside of putting it there is that its 'relation' with mailnews is not entirely severed. On the other hand, it may be considered 'hackish' to compile in 'mailnews/mimehdrparam' without turning on 'mailnews' when 'mailnews' is disabled. (mailnews/mimehdparam is added as a 'tire-9' module so that it's always built). C. most functions separated are not used in the rest of mailnews but there are three or four of them required by mailnews. To make them available to mailnews across 'module boundaries', I think there are two options. This is an area where I need your opinion as well as that of ducarroz and darin. One is to add them as 'methods' to new nsIMimeHdrParam and the other is to just expose them as 'extern' and link mailnews (when enabled) with mimehdrparam lib(static). I took the latter path. The latter, in turn, seems to be approachable in two ways(or more ways ...). The first is to separate a part that implements method(s) for nsIMimeHdrParam from the part that need to be used by both MimeHdrParam and mailnews and build dll/so(component) for the former and static lib. for the latter. The path my latest patch took is to leave them together, but to build both dll/so(component) and a static lib. for the whole thing. I set both IS_COMPONENT and BUILD_STATIC_LIBS in Makefile.in of src directory of mimehdrparam) and it works fine under Win2k(I'm gonna try it under Linux soon) Before that, I also tried FORCE_STATIC_LIBS without IS_COMPONENT but it didn't work. (compilation went thru, but |do_GetService| failed for nsIMimeHdrParam) I read quite a lot of news postings and comments in bug 46775 about 'static components for xpcom', but am not yet sure which is the right/optimal way which wouldn't break build on all platforms Mozilla is ported to. To make things more complicated, libmime is a part of metamodule 'mail'. As for item A, I guess I have a strong case. Now for item B and C, I like to know what the module owner (ducarroz) as well as Boris and darin thinks because they're about taking away something from mailnews. I'm also taking a liberty to add waterson (hope he won't mind :-)) to CC for 'xpcom build' issue. Thanks again for reading and help in advance..
Priority: -- → P3
Target Milestone: --- → mozilla1.3alpha
ccing alecf too; since we want to try to avoid creating a whole new module... Not sure whether mailnews would be ok with moving this into necko or something like that....
Like Boris wrote, I'd like to hear from mailnews/xpcom experts as to how to divide things and where to put them. In this patch, consumers of GetMimeHeaderParameter are relieved of making sure the result is indeed in UTF-8. The chore is now taken care by the method. I made it(actually there are two of them because I'm not sure which string types are allowed in scriptable methods. I found that nsAString works well instead of wstring. Can I use nsACString as well in place of 'string') scriptable and it works with Javascript. Javascript part fixes I18N issue of 'save target as'(not completely but RFC 2231 compliant cases are well taken care of) and C++ part fixes 'save'. Raw 8bit chars are interpreted as UTF-8 at first and then as 'originCharset' of the source URL if available. For textual types, I may also try 'contentCharset' of the current channel(as opposed to originCharset). . It appears easy to do this in C++, but in JS, I need to dig up more. However, checking contentCharset may not be such a good idea because so many servers suck and don't emit the correct MIME charset in HTTP header for textual types(text/*). Featurewise ('save target as' was filed as a separate bug. it's bug 158006), it does what I want it to do. However, I'm not sure how to put things together or take them apart (?).
Attachment #106968 - Attachment is obsolete: true
Attachment #106972 - Attachment is obsolete: true
Jungshik, I am busy right now but I'll take a close look soon.
I haven/t reviewed the whole patch (I am not the right person for things else than mailnews by the way) but mimehdrparam should not be part of mailnews. we don't want the browser to have a dependency on mailnews! The goal of extracting this code from mime was to not have a dependency on mime (which is part of mailnews). This code should live in netwerk: maybe in netwerk/mime or in netwerk/mimehdrparam ALso, looks like you have extracted too much stuff especially stuff used only for emails. cc'ing nhotta for his advice about international issues
use netwerk/mime/ instead of creating a new subdirectory under netwerk/. thx!
Ducarroz and Darin, Thank you for your comments/answers and I'm sorry for the late response. >looks like you have extracted too much stuff especially stuff used only for emails. Actually, I tried to extract the only part of mailnews that I need. The only thing I think I extracted unnecessarily is the content of 'nsMailHeader.h' most of which are only for emails as you wrote. I'll put it back. A dozen or so functions I took away from mimehdrs.cpp and comi18n.cpp are not used by the rest of mailnews at least as of now. There are, however, 3 or 4 functions that are needed by both 'mimehdrparam' and mailnews. They're the cause of my headache because I have to make them available across module boundaries (see comment #16, item C). >use netwerk/mime/ instead of creating a new subdirectory under netwerk/. I'll try, but at the moment, it seems easier to make a new subdirectory under netwerk if that's not the only way to take care of issues I wrote about in comment #16 (item A and B were resolved by your and Ducarroz's responses, but to have a satisfactory resolution of C(in comment #16), I need some help from xpcom expert. Alecf, can you help?
Depends on: 158006
Blocks: 158006
No longer depends on: 158006
sure. A few general comments: - stuff should not only not depend on mailnews, but it shouldn't even be in the mailnews/ subdirectory if you expect it to build when mailnews is disabled. - who is the primary consumer of this interface, or what is the primary function of this code that's being moved out? Is it network related? then it belongs in necko.dll or necko2.dll. Is it international/charset processing stuff? Then it belongs in i18n.dll. Is it for dealing with helper apps? then it belongs in the uriloader/ directory. Some guidelines: No matter where it goes, the only way to make methods available across modules in xpcom is via an interface. If your utilities do not maintain state, or you only need to maintain some singular state across the process, then consumers will need to create it as a service. I know that creating an object just to make a stateless call seems like overkill, but its the only way for us to do it.. and it is done across the rest of the product in various places. Linking against a new DLL is really a last resort, and trying to manually load a DLL, find a symbol, etc, has about as much static footprint overhead as going through XPCOM and getting a service. Furthermore, I would like to see as absolutely minimum code as possible moved out of mailnews and made available to the general public! And if it isn't mail related, it shouldn't even have mail in the names of the files, classes, or contractIDs. Let me know if I can be of more help - or catch me on IRC and we can discuss it.
Status: NEW → ASSIGNED
> Is it network related? Yes. > Is it international/charset processing stuff? Yes. The code in question is parsing code for general network headers that allows non-ascii header values (handles parsing, conversion to Unicode, etc). It'd be fine in either of those modules.... We could either make this a service (which would take the header CString every time) or an object (takes string once, parses once, returns many results). Which one we choose should just depend on the usage patterns.
Attached patch a new patch (obsolete) (deleted) — Splinter Review
Finally, I managed to do what's long overdue. I'm sorry for the delay and thank you all for your help. I just want to put this somewhere safer than my disk :-). This patch does everything I planned for this bug (bug 158006 needs a small patch to JS code relying on this patch) but a lot of clean-up and documentation have yet to be done. I'll do it soon(perhaps early next week). In the meantime, here's what this patch does: 0. add a new interface nsIEnsureUTF8 to intl/uconv. This is to avoid the code duplication. It's based on Naoki's implementation in mailnews/compose. At a couple of places, I need to use this and this may be useful at other places as well. 1. add a new interface nsIMIMEHeaderParam and its implementation to netwerk/mime. It offers 4 methods, but only one of them |getHeaderParam2| is for general consumption. Other three are implemented because they're not only needed internally but also used heavily in mailnews/mime. They're implemented by moving corresponding functions in mailnews/mime/src (see below) and applying a little cosmetic(??) touch and impedance matching. 2. I redefined three functions in mailnews/mime/src (in comi18n.cpp and mimehdrs.cpp) as wrappers for three methods mentioned above. 3. Several static functions and arrays in comi18n.cpp and mimehdrs.cpp were moved to nsMIMEHeaerParam implementation. They're used by three methods above, but are NOT called by any other functions in mailnews so that it's safe to move them away from mailnews. 4. At a couple of places in uriloader, |GetHeaderParam2| is called to deal with RFC 2231 and RFC 2047 encoded C-D filename/name parameter. Please, take a look and give me some feedback. There would be a lot of comments. I don't think it's pretty as it is now. In many places, I was caught in the middle between string/wstring and ns*String especially because three methods moved from mailnews/mime and other static functions are almost entirely based on string and I didn't want to modify them extensively (at least in the first pass)
Attachment #107217 - Attachment is obsolete: true
From a totally cursory look: 1) nsIEnsureUTF8.idl should not contain C++ goop. 2) It should not have tabs. 3) Why the [const] on the in params? What does that do? 4) What is the encoding of the return value of ensureUTF8Spec? 5) Could the nsEnsureUTF8 code use that proposed function to check for UTF-8ness? Converting back and forth to UTF-16 seems silly.... 6) nsUConvModule.cpp seems to have some leakage from a different patch 7) nsIMIMEHeaderParam.idl should have comments in JavaDoc format (starting with /**) and all methods should be commented, as well as all params. Indentation should be fixed. I didn't read the code guts at all, pretty much; just the interfaces.
Comment on attachment 117077 [details] [diff] [review] a new patch NS_IENSUREUTF8_CONTRACTID that is a bad name. the contract specifies an implementation, not an interface. {249f52a3-2599-4b00-ba40-0481364831a2} I don't think it's a good idea to give the interface the same id as the class. + AUTF8String convertToUTF8([const] in ACString aString, "in nsACString" already makes it const (C++ header has const nsACString&) +static NS_DEFINE_CID(kIEnsureUTF8IID, NS_IENSUREUTF8_IID); +static NS_DEFINE_CID(kCharsetConverterManagerCID, NS_ICHARSETCONVERTERMANAGER_CID); please don't use CIDs, but contract ids instead. +NS_IMETHODIMP nsEnsureUTF8::ConvertToUTF8 +(const nsACString &aString, const char *aCharset, PRBool aCheck, + nsACString &aUTF8String) is there a special reason for putting the argument list onto a new line? + aUTF8String.Truncate(0); remove the 0 + aString.Equals(NS_ConvertUCS2toUTF8(NS_ConvertUTF8toUCS2(aString)))) { do we _still_ not have a better way to do this? +NS_IMETHODIMP nsEnsureUTF8::EnsureUTF8Spec +(const nsACString &aSpec, const char *aCharset, nsACString &aUTF8Spec) + aUTF8Spec.Truncate(0); again, no 0, and same question about argument list + AString getHeaderParameter2([const] in AUTF8String aHeaderVal, why the 2? why the const? + /* + * char * mime_decode_filename(char *name, const char *charset, + * MimeDisplayOptions *opt) looks like you should remove it.
Comment on attachment 117077 [details] [diff] [review] a new patch a few of my own comments: - you need fresh license files from http://www.mozilla.org/MPL/boilerplate-1.1/ - don't use inout string - the memory management is really ugly. Instead use inout AString - you'll find your code cleans up signifigantly. In fact, from looking at the code, I can't tell if that really should be inout, or just out. - when constructors and destructors have no body, you should inline them to let the compiler optimize them out - I'm really not a fan of all this code just copied and pasted in from mail..it had previously been copied in from Netscape 4.x, and I'd really like to see it streamlined and cleaned up.. there seems to be a lot of legacy stuff in there, but I'm not sure. Ugh, and all these "static char foo[256]" stuff - do we really need base64 decoding? I guess before we just slap in all this code from mail, I want you (jshin) to do a thorough code scrubbing to make sure you're only bringing in the functionality you need, and that the code is factored properly (for instance, do we already have a base64 decoder?)
Thank you all for comments. bz> 2) It should not have tabs. bz> 7) nsIMIMEHeaderParam.idl should have comments in JavaDoc format Yup. these are all parts of clean-up/documentation I wrote about. bz > 3) Why the [const] on the in params? What does that do? cb> in nsACString" already makes it const is there a special reason for putting the argument list onto a new line? Thanks. I must have done too many copy'n'paste without thinking much. cb> is there a special reason for putting the argument list onto a new line? I thought I was just following the 'local' convention in the directory, but I may have been mistaken because a second look a moment ago didn't show me that pattern. cb> + AString getHeaderParameter2([const] in AUTF8String aHeaderVal, cb> why the 2? Because there's another one with almost identical name that's only used internally and by mailnews as a wrapper. Anyway, I'm not so fond of various names used in my patch so that I'd listen to you for better names. bz> 4) What is the encoding of the return value of ensureUTF8Spec? It's UTF-8. I'll make it clear. bz> 5) Could the nsEnsureUTF8 code use that proposed function to check for bz> UTF-8ness? Converting back and forth to UTF-16 seems silly.... cb> do we _still_ not have a better way to do this? I fully agree. That's bug 191541 as you know :-) It's not there yet. I love to use that. The reason I didn't rush to add that is I could find only a few plaes in the entire tree that checks UTF-8ness. With that implemented, probably more people will begin to use it. bz> 6) nsUConvModule.cpp seems to have some leakage from a different patch I also noticed it and removed that in my tree. al> do we really need base64 decoding? Yes, we do for RFC 2047 decoding. Pls, note that the code is shared by mailnews and necko. Even necko alone needs it because quite a lot of broken http servers send out C-D filename param in RFC 2047 style instead of RFC 2231 style. cb> sure. NSPR has PL_Base64Decode. Thanks. With this information, I can remove the static array you hate along with |intl_mime_decode_b| :-) al> I'm really not a fan of all this code just copied and pasted in from mail.. al> it had previously been copied in from Netscape 4.x, and I'd really like al> to see it streamlined and cleaned up.. I agree with you. I'm not a fan of stuff I copied and pasted with a minimal change for impedance matching and filling up a couple of holes. I tried not to touch them in this bug because I didn't want to introduce a regression in mailnews. al> there seems to be a lot of legacy stuff in there, al> but I'm not sure. Perhaps the coding style and things like that are from an old C school. Feature-wise, I think most of them are still necessary. However, it could well be safe to drop 'HZ' check in |intl_is_utf8| (better, we can just use IsUTF8 I wrote about above). > I want you (jshin) to > do a thorough code scrubbing to make sure you're only bringing in the > functionality you need, and that the code is factored properly I'll. I believe I haven't brought in anything unnecessary, but I'll take another look. As for factoring, I'll remove an static array for Q decoding(256byte saving) in my local copy. This code doesn't belong where speed is critical so that I can replace array look-up with a couple of bound checks.
> It's UTF-8. I'll make it clear. Make it return a AUTF8String. That should make it plenty clear... ;)
>> It's UTF-8. I'll make it clear. >Make it return a AUTF8String. Ooops. It's not UTF-8 but it's US-ASCII. Because input urlSpec is url-unescaped, converted to UTF-8 if necessary, and url-escaped back before being returned. So I have to keep ACString. > nsIEnsureUTF8.idl should not contain C++ goop By this, I guess you meant removing CONTRACTID and CID definitions for class implementing interface and putting them in the header file for a collection of CONTRACTIDs and CIDs for the module the implementation belongs to (nsUConvModule in this case). Do you want me to go ahead and make this header file and move all #defines's for CONTRACTIDs and CIDs in intl/uconv/idl/*.idl there? Hmm.. I'd rather not for this bug. Because then I have to hunt for every single file refering to nsIxxxxx corresponding to intl/uconv/idl/*idl and get them to include nsUConvCID.h(yet to be created) as well. In case of nsIMIMEHeaderParam.idl, I can just remove C++ defines and put them in preexisting nsNetCID.h. BTW, why do so many idl files have these defines? I'm aware that no matter how many files may have this glitch, it doesn't make the practice right, but am just curious. Or, did you mean a totally different thing? exist
Yeah, I meant the CID/Contractid. There's no particular need for a #define for the contractid anyway; people should just use the string, imo. But yeah, you could create such a file for uconv; no need to do major cleanup, just start with putting these in there. ;) As for why they're in IDL, someone who didn't have a clear understanding of the difference between an interface and a contract decided it'd save time to do it and the rest was history.
Thank you. I'll do it. BTW, I've just found what I'd been looking for for a while (methods like ReplaceChar, StripWhiteSpace..) that would simplify things a bit in my patch. Why haven't they been advertised? Is it because new implementations for them haven't yet been delivered? Another question.. Is the regular expression engine used by Javascript interpreter available in a form or another to other parts of mozilla?
Those functions tend to have somewhat slow implementations, so they're discouraged except when they're clearly the best option... But if you're in code where perf is not critical, go for it. As for regexps... bug 106590
> Because there's another one with almost identical name that's only >used internally and by mailnews as a wrapper. To me, that sounds like a bad reason... If it's only used internally, I don't think it should be advertised in an idl file... and can mailnews directly use the other function?
> I don't think it should be advertised in an idl > file... and can mailnews directly use the other function? No. That's what I tried to do in my patch last year. Actually, I think it's possible with some 'liking gymanstics', but Alec replied in comment #23 to my query about that possibility (comment #22 and comment #16 C) that using an interface is the only (reasonable) way to make methods available across modules although it sounds like an overkill. Anyway, I'll try to come up with better names for both.
Attached patch a new patch (obsolete) (deleted) — Splinter Review
Many of issues raised about the prev. patch were addressed. Among them are - interface documentation - interface definitions tightened up/cleaned up - got rid of b-decoder and a related static array (also gone are a static array for q-decoder and other dispensable static functions using utility functions I found in string library) - C++ 'inflitration' in idl files - functions/methods were renamed to make them more in line with the local convention - empty ctor/dtors inlined - now uses |IsUTF8| (patch for bug 193142 is required) ..... I didn't touch most of C-style string manipulation in code brought in from mailnews (they've been changed and a couple of loose ends were tightened). Alec, do you want me to recast them in iterator and things like that? It might be nice, but do we need to? You know, RFC 2047/2231 handling is kinda 'black magic' (well, to exaggerate a bit) and I'm reluctant to touch them in _this_ bug. Anyway, please take a look and hunt for glitches, holes and ... :-)
Attachment #117077 - Attachment is obsolete: true
Ooops. It's bug 191541 that deals with |IsUTF8|
Comment on attachment 117822 [details] [diff] [review] a new patch +#if 0 please don't add #if 0 without a comment saying why the block is not just removed... + // assume UTF-8 if the spec contains unescaped non ASCII + if (!IsASCII(aSpec)) { should you use IsUTF8 here? In +nsMIMEHeaderParamImpl::GetParameter(const nsACString& aHeaderVal, : + char *med; ... + rv = GetParameterInt(PromiseFlatCString(aHeaderVal).get(), aParamName, + &charset, aLang, &med); ... + rv = DecodeParameter(nsCAutoString(med), charset, "", PR_FALSE, str1); Can you use nsXPIDLCString instead of char* for |med|? (also, even if not, you should use nsDependentCString instead of nsCAutoString in the DecodeParameter line) this would probably also be a good idea for charset +nsMIMEHeaderParamImpl::GetParameterInt(const char *aHeaderValue, is there a reason why you tend to declare variables some lines before you use them? + NS_ASSERTION(!nsCRT::IsAsciiSpace(*str), "1.1 <rhp@netscape.com> 19 Mar 1999 12:00"); /* should be after whitespace already */ er, that assertion message looks very unhelpful. it seems the comment should be the assertion message.
heh, i recognize that mailnews. the NS_ASSERTION ... rhp is because the code used to PR_Assert and i got sick of my browser being unceremoneously killed by silly asserts that i didn't understand. someone who understands the code is welcome to replace the messages with a useful explanation, the messages are there as they are because the change to NS_ASSERTION would make it relatively hard for lazy people to find out who to blame and i didn't want people complaining to me about them :).
Thank you for reviewing. + // assume UTF-8 if the spec contains unescaped non ASCII + if (!IsASCII(aSpec)) { > should you use IsUTF8 here? No. The question to ask is whether or not to keep that if-block. By removing that line, we can guard ourselves against mistakes of calling EnsureUTF8Spec with unescaped (bare/raw) non-UTF-8 spec. As it is now, if a bare/raw Non-Ascii spec is passed, it just returns the input without any change. I moved that part of the code from what was originially written by Naoki. Naoki, what would you say? The more I think about it, the more I feel like either removing it (so that the input gets subjected to checking/ensuring by the rest of the code) or at least calling NS_EscapeURL so that it returns escaped URL. As for #if 0, it was just left there because I wasn't sure whether this bug would go before/after bug 191541. Needless to say, I'll either remove #if 0 block or activate it(and remove the |IsUTF8| depending on which patch gets committed first. Hopefully, bug 191541 goes before this one. In a patch I'm going to upload in a moment, I cleaned up the code a bit more : made NS_ASSERTION more informative, used XPIDLString(along with replacing PR_Malloc with nsMemory::Alloc/Clone and removing PR_FREEIF a few places) , added more comments to RFC 2231 decoding routine, moved some variable definitions closer to where they're used, changed some variable names to better reflect their usage, etc.
Keywords: intl
Target Milestone: mozilla1.3alpha → mozilla1.4alpha
Attached patch yet another patch (obsolete) (deleted) — Splinter Review
Attachment #117822 - Attachment is obsolete: true
Depends on: 191541
+#define REPLACEMENT_CHAR "\347\277\275" // UTF-8 encoding of U+FFFD This is a mistaken I copied from mailnews without a thought. The first byte should be '\357' (or \xEF). Any thought, anyone? Fixing this would be a good news to non-English speakers.
Same patch, but I just removed license blocks and other non-essential parts hoping that that may help reviewers. Well, it's still very long and this may be harder to review than the original. Anyway, Alec, can you take a look once more and tell me which part you want me to 'scrub' more?
Attached patch a cleaned-up patch (obsolete) (deleted) — Splinter Review
I haven't made a transition to iterator, but everybody seems to agree that we can deal with it in another bug after getting the required feature right here. Below is what I sent to Alec (off-line) to help him better understand my patch. I'm adding it here to help others as well. BTW, xpfe dependency is to fix bug 158006 (and perhaps one or two more in xpfe) The dependency relation is shown below. I may have missed a couple, but should be roughly right. (at least, I am not making up non-existent dependency :-)) Mozilla 'modules': uriload : getParameter xpfe : getParameter mailnews: getParameterInt, decodeRFC2047Header, decodeParameter exposed methods: getParameter : getParameterInt, decodeParameter getParameterInt: [1] decodeRFC2047Header: Is7bitCharset, DecodeRFC2047Str decodeParameter : decodeRFC2047Header, static functions: DecodeRFC2047Str: DecodeQ, CopyRawHeader, ConvertHTab CopyRawHeader: ConvertHTab: Is7bitCharset: DecodeQ:
Attachment #117999 - Attachment is obsolete: true
Attached patch a new patch (obsolete) (deleted) — Splinter Review
basically the same patch against a new tree. corrected my mistake in converting Unicode replacement char. in UTF-8 to octal representation. There are a few superreviewers here on CC so that I don't know whom to ask for r/sr Moreover, this patch is across modules. alecf, bz, darin, please take this :-) It's been sitting here long enough, hasn't it? nhotta, can you see if my change is all right with nsSMTPService? ducarroz, I carefully preserved mailnews behavior and played with Mozilla-mail with this patch to make sure that I'm not breaking anything (spam mail in various encodings/languages with various degree of standard-compliance helped me a lot with this testing!!). Could you take a look as well? Thank you tons in advance. P.S. my patch for bug 150008(blocked by this bug) just got sr and I'm hoping to land this together with that patch.
Attachment #120825 - Attachment is obsolete: true
Sorry for spamming. The bug blocked by this one for which I have a patch with sr is bug 158006.
jshin, I'm not going to be able to review this in the near future... (like in the next week at least).
In comment #47, I explained how various methods/static functions are used in necko/xpfe. Here's the picture on the mailnews side (see nsIMIMEHeaderParam.idl in my patch.). getParameterInt() : does the job of |MimeHeaders_get_parameter| in mailnews/mime/src/mimehdrs.cpp. Now |MimeHeaders_get_parameter| is a wrapper for nsMIMEHeaderParam::getParameterInt. Therefore, mailnews code is not affected. decodeRFC2047Header() : |Mime_DecodeMimeHeader| in mailnews/mime/src/mimehdrs.cpp is now a wrapper for this. decodeParameter() : |mime_decode_filename| in mailnews/mime/comi18n.cpp is now a wrapper for this. As I wrote in my earlier comments, static functions moved from mailnews are NOT called by other parts of mailnews.
*** Bug 205232 has been marked as a duplicate of this bug. ***
Comment on attachment 121646 [details] [diff] [review] a new patch Now that 1.4f branch was cut, let's try to make this bug not celebrate its anniversary :-)
Attachment #121646 - Flags: superreview?(alecf)
Attachment #121646 - Flags: review?(cbiesinger)
Comment on attachment 121646 [details] [diff] [review] a new patch wow, this is a large patch... and I'm no module owner of any of this code... but ok, I'll take a look at this patch. In nsIEnsureUTF8.idl: typo: "covnersion" "|aSpec|(after" - missing space before the ( is there a reason this interface is not scriptable? "The presence of non-ASCII characters is *blindly* + * regarded as an indication that your input spec is in unescaped UTF-8 " don't we have an IsUTF8 funtion that you should use instead? + * @return empty if |aSpec| is in UTF-8. + * the URL-escaped converted string in ASCII if not. not sure I like that. Why not always return a valid, UTF-8, escaped spec? nsEnsureUTF8.cpp: + nsCAutoString inStr(aString); um no. use this: const nsAFlatCString& inStr = PromiseFlatCString(aString); + PRUnichar *ustr = (PRUnichar *) nsMemory::Alloc(dstLen * sizeof(PRUnichar)); might want to consider using an nsAutoPtr here +NS_IMETHODIMP nsEnsureUTF8::ConvertToUTF8 +(const nsACString &aString, const char *aCharset, PRBool aCheck, + nsACString &aUTF8String) hm, earlier you used a different way (return type on own line, parameters on same line as function name), please be consistent + if (!IsASCII(aSpec)) { + aUTF8Spec = aSpec; why not use IsUTF8 here? + return rv; this will always be NS_OK at this point, I would therefore directly write return NS_OK Is there a special reason for mixing string/ACString types in the interfaces? Why not stick to the string classes? more later... note to self: netwerk/mime/src/nsMIMEHeaderParamImpl.cpp
another question - why not remove the #if 0 code in mailnews/mime/src/comi18n.cpp?
Thanks a lot for taking a look at this big patch (well, actually, it's not so big as it looks because a lot of codes are moved to necko out of mailnews) > why not remove the #if 0 code in mailnews/mime/src/comi18n.cpp? I guess that's just kept by mistake while removing all other cases like that. I'll get rid of it. > "The presence of non-ASCII characters is *blindly* > + * regarded as an indication that your input spec is in unescaped UTF-8" > don't we have an IsUTF8 funtion that you should use instead? See comment #42. I'm still waiting for Naoki's answer. Naoki, what would say to my question in comment #42? > * @return empty if |aSpec| is in UTF-8. > * the URL-escaped converted string in ASCII if not. > not sure I like that. Why not always return a valid, UTF-8, escaped spec? I considered that, but I just left the way its original (by Naoki) does because I wanted to minimize the change in other parts. Thinking again about it, it doesn't seem to be a very good justification. There might have been other reasons. I'll get back to this later. > Is there a special reason for mixing string/ACString types in the interfaces? > Why not stick to the string classes? I'd love to, but I was caught in the middle between my desire to be consistent and my desire to simplify 'calling sequences' (and avoid unnecessary conversion between string and string class). I think some of them can be made more consistent later when I do 'iterator-love', which I plan to in a separate bug. As for other comments, I'll take care of them in next iteration once you're through reviewing the current patch.
Comment on attachment 121646 [details] [diff] [review] a new patch nsMIMEHeaderParamImpl.cpp: + // retrun charset.) typo + const PRBool aTryLocaleCharset, why const? + * @param aDefaultCharset MIME charset to use in place of MIME charset + * specified in RFC 2047 style encoding + * when <code>aOverrideCharset</code> is set. + * @param aOverrideCharset When set, overrides MIME charset specified + * in RFC 2047 style encoding with <code>aDefaultCharset</code> Couldn't you assume that the presence of aDefaultCharset implies aOverrideCharset? Or did I misunderstand the meaning of aOverrideCharset? + if (NS_SUCCEEDED(rv) && ensureUTF8 && no need to test for both here. + if (aEatContinuations) { + nsCAutoString temp(aResult); + temp.StripChars("\r\n"); + aResult = temp; + } this is not exactly the fastest way to do this... |rv| seems to be unused in DecodeRFC2047Header. + nsCAutoString unQuoted(aParamValue); + unQuoted.ReplaceSubstring("\\\\", "\\"); + unQuoted.StripChar('\\'); + aResult = unQuoted; also quite slow I guess... and, why bother replacing \\ with \ if you're stripping all \ characters anyway? + if (NS_SUCCEEDED(rv) && !decoded.IsEmpty() && !decoded.Equals(aResult)) + aResult = decoded; I would think that comparing aResult and decoded is as costly as copying decoded into aResult, so I wouldn't bother with the comparison. + out = dest = (char *)PR_Malloc(length + 1); [...] + memset(out, 0, length + 1); Use PR_Calloc: http://www.mozilla.org/projects/nspr/reference/html/prmem2.html#21449 +PRBool Is7bitCharset(const char *input, PRUint32 len) how does this compare to IsASCII? Also, what is HZ? also, how about adding comments to the static functions about what they're doing? + // Assume no more than 3X expansion due to UTF-8 conversion is that a correct assumption? uriloader/base/nsURILoader.cpp: + rv = mimehdrpar->GetParameter(disposition, "", + NS_LITERAL_CSTRING(""), PR_FALSE, nsnull, dispToken); don't use tabs Also, can't you remove more of nsURILoader? line 303ff seem to do the same job as nsIMIMEHeaderParam::GetParameter. nsExternalHelperAppService.cpp: + aChannel->GetURI(getter_AddRefs(mSourceUrl)); uh. you are touching mSourceURL here, the previous code is not. Is that safe? In comi18n.cpp / MIME_DecodeMimeHeader: Why bother with returnVal instead of directly returning nsnull? + if (NS_FAILED(rv) || !mimehdrpar) no need to check both... + // I know this looks silly, but we have to do this to avoid + // replacing |PR_FREEIF(s)| at over a dozen places [...] Uh. If I were to decide, I would replace the PR_FREEIF. in other words: if you want to check this in with a review from me, replace the PR_FREEIF. + if (NS_FAILED(rv) || !mimehdrpar) again only check one... I won't mention this again, please fix all occurences. + rv = mimehdrpar->DecodeParameter(nsCAutoString(name), charset, + opt->default_charset, + opt->override_charset, result); indentation looks wrong here. Also, please use nsDependentCString instead of nsCAutoString. Finally, no need for returnVal in this method, as that is always nsnull. done with the patch. marking review- due to the above issues.
Attachment #121646 - Flags: review?(cbiesinger) → review-
Attached patch a new patch addressing Chrsitian's concerns (obsolete) (deleted) — Splinter Review
Thanks a lot for your thorough review. This is a new patch addressing your concerns. In what follows, I'm gonna reply to your comments only if further explanation is necessary. Two changes in nsIEnsureUTF8: Now ConvertToUTF8 has 'aSkipCheck' (instead of 'aCheck') indicating whether ASCIIness/UTF8ness check has to be skipped. Along with this, non-ASCII 7bit charsets are better handled now. EnsureUTF8Spec now returns AUTF8String (unescaped UTF8 spec.) Other changes were made to address Christian's concerns. Still other changes are noted below. > * @return empty if |aSpec| is in UTF-8. > * the URL-escaped converted string in ASCII if not. > not sure I like that. Why not always return a valid, UTF-8, escaped spec? It turned out that I forgot to update the documentation after changing the behavior of the method. Actually, your comment made me look more closely at what it does and made a rather 'drastic' change in |EnsureUTF8Spec|. Now its return type is AUTF8String and it always returns _un_escaped UTF-8 spec. Checking out the way |spec| is used and defined in nsIURI, I convinced myself that there's no need to url-escape before returning. > "The presence of non-ASCII characters is *blindly* > + * regarded as an indication that your input spec is in unescaped UTF-8" > don't we have an IsUTF8 funtion that you should use instead? At least for now, I think I'd rather leave this alone. The assumption behind that is that in Mozilla |spec| (of nsIURI) is always either in UTF-8 (unescaped) or in other charset (*url-escaped*). By checking out ASCIIness at the beginning, we're saving some cycles it'd take us to check UTF8ness and possibly redundant un-escaping/conversion. + * @param aDefaultCharset MIME charset to use in place of MIME charset + * specified in RFC 2047 style encoding + * when <code>aOverrideCharset</code> is set. + * @param aOverrideCharset When set, overrides MIME charset specified + * in RFC 2047 style encoding with aDefaultCharset > Couldn't you assume that the presence of aDefaultCharset implies > aOverrideCharset? Or did I misunderstand the meaning of aOverrideCharset? No, I can't. Sorry the documentation was not clear enough. aDefaultCharset is used in two cases: 1. charset is not available because raw/bare string without charset specified per rfc 2231/2047 is passed over. (our best guess is aDefaultCharset) 2. when aOverrideCharset is set even with charset available. (charset is overriden by aDefaultCharset.) + nsCAutoString temp(aResult); + temp.StripChars("\r\n"); + aResult = temp; > this is not exactly the fastest way to do this... I'm aware of that :-) (see comment #35) + nsCAutoString unQuoted(aParamValue); + unQuoted.ReplaceSubstring("\\\\", "\\"); + unQuoted.StripChar('\\'); + aResult = unQuoted; > also quite slow I guess... and, why bother replacing \\ with \ if you're > stripping all \ characters anyway? The logic was flawed and I replaced the above with iterator code. +PRBool Is7bitCharset(const char *input, PRUint32 len) > how does this compare to IsASCII? Also, what is HZ? There are 7bit charsets that are NOT US-ASCII. Examples include ISO-2022-JP(-2), ISO-2022-CN, HZ-GB2312 and ISO-2022-KR. ISO-2022-JP is still widely used for Japanese emails/news postings and HZ-GB* was popular in early 1990's for Chinese emails and news postings but still is used occasionally. We should be careful not to treat ISO-2022-JP* and HZ* as US-ASCII when dealing with mail/news. Your question insipired me to do some more clean-up, better documentation and some changes in nsEnsureUTF8::ConvertToUTF8. + // Assume no more than 3X expansion due to UTF-8 conversion > is that a correct assumption? Yes, 3x is sufficient. A couple of weeks ago, I estimated this mulitplication factor again for another program. 1. no known ISO-8859-x/Windows-x/KOI8*/other single byte charsets can encode non-BMP characters. 2. all BMP characters can be expressed in three or fewer bytes in UTF-8 3. ISO-2022-JP can encode half-width kana with a single byte (excluding escape sequence) and a half-width kana takes 3bytes in UTF-8. The multiplier is at most three. 4. GB18030 and EUC-JP(JISX 0213) can encode non-BMP characters (4bytes in UTF-8), but their representations in GB18030 and EUC-JP(JIS X 0213) take at least two bytes (actually three in both cases) so that the multiplication factor is certainly smaller than 3. 5. In addition, we have extra room here because the input includes RFC 2047 overhead ('=?charset_name?......?=') > Also, can't you remove more of nsURILoader? line 303ff seem to do the same job > as nsIMIMEHeaderParam::GetParameter. Thanks for the catch. I removed the redundant code. nsExternalHelperAppService.cpp: + aChannel->GetURI(getter_AddRefs(mSourceUrl)); > uh. you are touching mSourceURL here, the previous code is not. Is that safe? I guess it's safe, but it's redudant. I changed it to use GetSource() method that retrieves mSourceUrl. + // I know this looks silly, but we have to do this to avoid + // replacing |PR_FREEIF(s)| at over a dozen places [...] > Uh. If I were to decide, I would replace the PR_FREEIF. in other words: > if you want to check this in with a review from me, replace the PR_FREEIF. I changed two out parameters (charset and language) to be nsMemory::Free'd, which puts this function on par with other helper functions that are wrappers over methods of nsIMIMEHeaderParam. Changing return value as well is too risky because it's not always freed immediately (sometimes it's hard to find where it's freed.). I'd end up modifying more than 60 lines in over a dozen additional files in mailnews/mime/src.
Attachment #121646 - Attachment is obsolete: true
I don't know how much this diff. will help because it doesn't look like it's easy to read..
Attachment #121646 - Flags: superreview?(alecf)
Comment on attachment 124819 [details] [diff] [review] a new patch addressing Chrsitian's concerns Christian, can you take a look again? Hopefully, it won't take as long this time. Alec, will you take a look sometime later (perhaps after bug 206379 :-))?
Attachment #124819 - Flags: superreview?(alecf)
Attachment #124819 - Flags: review?(cbiesinger)
Comment on attachment 124819 [details] [diff] [review] a new patch addressing Chrsitian's concerns now that EnsureUTF8Spec() is unconditionally outputting to the destination string (Which I think is the right thing, btw) I think it needs a new name. How about convertToUTF8WithCharset(...);? as for getParameterInt() that sounds like its getting an integer version. Don't shorten "Int" to "Internal" - just make it getParameterInternal() But can't we just describe it better? What's the difference between the two? Also nsIEnsureUTF8 is a bad name for a class - "ensure" is a verb, classes should be nouns, like "nsIUTF8Converter" or "nsIUTF8ConverterService" or something. Also, we now have a CopyUCS2toUTF8() that you should use - it goes through the old code which does what you've written, but at least this way we can correctly implement CopyUCS2toUTF8 later and get a perf. improvement + nsCAutoString tempStr(valueStart, valueEnd - valueStart); + tempStr.StripChars("\r\n"); + *aResult = (char*)nsMemory::Clone(tempStr.get(), tempStr.Length() + 1); use ToNewCString() here. + *aLang = (char *) nsMemory::Clone(sQuote1 + 1, sQuote2 - (sQuote1 + 1) + 1); isn't there an nsCRT::strndup() or something? (if not, don't worry about it) + else if (ns != *aResult) + *aResult = ns; don't bother checking, just assign. + // Assume no more than 3X expansion due to UTF-8 conversion + retbuff = (char *)PR_Malloc(3 * strlen(header) + 1); it would be really nice to make this a nsCAutoString, or to just assign into "aResult" - there is serious possibility here for buffer overflow, I think... in fact if there is ANY way that the resulting string could be longer than 3x the incoming string, we MUST switch to nsCAutoString. the rest of the code looks ok, as best as I can tell. I'm going to sr- this for now, but I think the cleanups should be pretty straight forward.
Attachment #124819 - Flags: superreview?(alecf)
Attachment #124819 - Flags: superreview-
Attachment #124819 - Flags: review?(cbiesinger)
Attached patch a new patch per Alec's comment (deleted) — Splinter Review
alecf, thank you for the review. This patch addresses all your concerns. The name changes: EnsureUTF8 -> UTF8ConverterService [1] EnsureUTF8Spec -> ConvertURISpecToUTF8 ConvertToUTF8 -> ConvertStringToUTF8 getParameterInt -> GetParameterInternal + *aLang = (char *) nsMemory::Clone(sQuote1 + 1, sQuote2 -.. > isn't there an nsCRT::strndup() or something? (if not, don't worry > about it) There is, but I can't use it because nsCRT::strndup(char *) returns malloced buffer (nsCRT::strndup(PRUnichar *) returns nsMemory::Alloc'd buffer) and I need nsMemory::Alloc'd buffer. BTW, GetParameterInternal does only RFC 2231 parsing while GetParameter does a lot more(RFC 2047 fallback, fallback charset, raw 8bit string, etc). I'd not have made the former a method if it had not been used extensively in mailnews. I added a comment to this effect. + // Assume no more than 3X expansion due to UTF-8 conversion + retbuff = (char *)PR_Malloc(3 * strlen(header) + 1); > it would be really nice to make this a nsCAutoString, or to just > assign into > "aResult" - Although there's no possibility of the buffer overflow, this is a nice thing to do anyway, which helped me cut down the code size more. > there is serious possibility here for buffer overflow, > I think... > in fact if there is ANY way that the resulting string could be > longer than 3x > the incoming string, we MUST switch to nsCAutoString. I can assure :-) you that (3 * inlen + 1) is the worst possible case. Note that input string cannot be expanded in an arbitrary manner but can only be expanded under our control. The worst case is when somebody gives us a string made soley of octets illegal in both UTF-8 and defaultCharset. Another case is when SCSU/BCSU is used and the sliding window of SCSU/BCSU is set into a non-BMP block. However, currently Mozilla does not support SCSU/BCSU(see intl/uconv. it may never.) so that even if Mozilla's given such a string, it won't be interpreted that way (it'll be just treated as ASCII.) [1] As for nsIUTF8ConverterService, I'm not sure if that's the best choice. At the moment, the name fits what it does. However, I found that there are a few static functions through out the tree that may benefit from a potential expansion of this interface. They're ConvertToUnicode (in nsAppShell), ConvertStringToUnicode (nsProfile) and ConvertStringToUTF8. The last one needs only 'impedance' matching . If the former two's are also folded under this interface, 'UTF8' in the name has to be changed. Unless it's an absolute evil to lose the file history by renaming, I'd rather just go with nsIUTF8ConverterService for this patch and will change the file(interface) name later if we do add ConvertToUnicode.
Attachment #124819 - Attachment is obsolete: true
Attachment #124823 - Attachment is obsolete: true
Comment on attachment 124986 [details] [diff] [review] a new patch per Alec's comment Asking r/sr. Other changes: 1. static ConvertHTab is now gone. It was used in two places. One of them was replaced with ReplaceChar (it's slow, I know) and the other is now a simple loop over a char. pointer. 2. Brand-new CopyUTF8toUTF16 is used in place of NS_ConvertUTF8toUCS2 in three places. I also replaced NS_ConvertUCS2toUTF8 with CopyUTF16toUTF8 as suggested by alecf.
Attachment #124986 - Flags: superreview?(alecf)
Attachment #124986 - Flags: review?(cbiesinger)
That all sounds great. If we're talking about general conversions (like replacing the ConvertToUnicode()) another option might be to fix the nsICharsetConverterManager methods to use nsAString/nsACString - the service was written way before these strings were available to us.. that is why their API is so obtuse. I'll finish this review today...
hm, if UTF8ConverterService is a bad name because you might add non-utf8 unicode methods, why not name it UnicodeConverterService? I'll also try to finish the review later today.
because at that point it sure sounds a lot like nsICharsetConverterService
I shouldn't have opened a 'can of worms' :-). For this bug, let's just go with nsUTF8ConverterService because that name represents what it does now well although we have to risk losing the file change history later when we decide to expand this interface and change the file/interface name. See also bug 54857. Something similar to Alec's alternative was briefly mentioned there. TIA for reviews ^-^
Comment on attachment 124986 [details] [diff] [review] a new patch per Alec's comment just noticed some nits in nsIUTF8ConverterService.idl: + * Portions created by the Initial Developer are Copyright (C) 2002 + * the Initial Developer. All Rights Reserved. we have 2003 now :) + */ + + AUTF8String convertStringToUTF8(in ACString aString, The AUTF8String line is a bit too far indented, should be two spaces less (also the arguments for this function are indentet too much) same for the other function on this interface ok now to more serious stuff: + nsCOMPtr<nsIAtom> charsetAtom; + rv = ccm->GetCharsetAtom2(aCharset, getter_AddRefs(charsetAtom)); isn't alecf rewriting that stuff in another bug? would it make sense to wait with landing this patch until that other patch is ready, and use the new code? +#define REPLACEMENT_CHAR "\357\277\275" // EF BF BD (UTF-8 encoding of U+FFFD) would it be useful to mention what character U+FFFD is ("unknown character")? mailnews/mime/src/mimehdrs.cpp + return NS_SUCCEEDED(rv) ? PL_strdup(result.get()) : nsnull; in comi18n.cpp, you used an explicit if instead of ?:... is there a special reason for that? it would be good if you could be consistent, unless of course you did that to be consistent with the current style of the file. other than that, you can have r=biesi; assuming that my review is acceptable to alecf, given that I'm no module owner.
Attachment #124986 - Flags: review?(cbiesinger) → review+
Thanks a lot for the review. > isn't alecf rewriting that stuff in another bug? > would it make sense to wait with landing this patch until that other > patch is ready, and use the new code? Absolutely. That's my plan. I find it a bit hard to shuttle between two versions(one with his patch and the other without) so that I just uploaded one with the old interface. The effect of alecf's patch on this patch would be very small (a few changes in Makefile.in, include files and just one change in charset. cvt. manager invocation you spotted. I'll fix the indentation and "if vs '? :". As for U+FFFD, its name (Unicode name) is 'REPLACEMENT CHARACTER' :-) See http://www.unicode.org/charts/PDF/UFFF0.pdf In Mozilla and elsewhere, U+FFFD is used to mark unknown characters, invalid byte sequences, and so forth.
While testing on Win2k, I found a problem I overlooked on Linux(I should have paid attention to NS_ASSERT passing by in my terminal). 'NS_ASSERT' on Win32 is so annoying that I can never miss it :-) After fixing this, it works as well on Win2k as on Linux. In the following two places, I had to replace nsDependent(C)String with Substring() as suggested in nsDependentString.h. In nsUTF8ConverterService.cpp > + rv = unicodeDecoder->Convert(inStr.get(), &srcLen, ustr, &dstLen); > + if (NS_SUCCEEDED(rv)) > + CopyUTF16toUTF8(nsDependentString(ustr, dstLen), aResult); + CopyUTF16toUTF8(Substring(ustr, ustr + dstLen - 1), aResult); In nsMIMEHeaderParamImpl.cpp > + cvtUTF8->ConvertStringToUTF8(nsDependentCString(aInput, aLen), > + aDefaultCharset, skipCheck, utf8Text))) { + cvtUTF8->ConvertStringToUTF8(Substring(aInput, aInput + aLen - 1), + aDefaultCharset, skipCheck, utf8Text))) {
> cvtUTF8->ConvertStringToUTF8(Substring(aInput, aInput + aLen - 1) ooops. in both cases, '-1' should not be there. I didn't check out the def. of Substring(). Another loose end found by running on Windows : In nsMIMEHeaderParamImpl.cpp : + if (str == start) + return NS_ERROR_UNEXPECTED; + *aResult = (char *) nsMemory::Clone(start, (str - start) + 1); + NS_ENSURE_TRUE(*aResult, NS_ERROR_OUT_OF_MEMORY); I forgot to null-terminate. (*aResult)[str - start] = '\0'; // null-terminate Certainly, it looks like another case for nsACString. I'll switch to it when doing 'iterator-love'
Comment on attachment 124986 [details] [diff] [review] a new patch per Alec's comment +// XXX Do we have to check UTF-7? To be perfectly safe, we have to, but +// who uses UTF-7 in the mail header? should probably check with bienvenu@netscape.com to see - IMAP uses UTF7 I believe. + // B. title*=us-ascii'en-us'This%20is%20weired. perhaps you meant "wierd"? :) +// XXX : aTryLocaleCharset is not yet effective. should you NS_ASSERTION() if someone tries to use it then? other than that, this all looks good! sr=alecf with the above changes/updates
Attachment #124986 - Flags: superreview?(alecf) → superreview+
Thank you for sr. I'll add NS_ASS..() and fix spelling errors. I'll land it with a necessary change after you check in charsetAtom removal patch. As for UTF-7, a modified form of UTF-7 is used for IMAP4rev? folder names, but my code is a world apart from the code handling IMAP folder names (it's about decoding the message headers : RFC 2047/2231). Maybe I'd better remove the comment because 'perfect' there means not failing even once in 10^30(?) trials :-).
alec>+// XXX Do we have to check UTF-7? To be perfectly safe, we have to, but alec>+// who uses UTF-7 in the mail header? alec>should probably check with bienvenu@netscape.com to see js> comment because 'perfect' there means not failing even once in js> 10^30(?) trials :-) Oops. Sorry I misread the 'context'. I thought the code in question is about being generous in what you accept (that is, even though it's untagged _violating_ the standard so that we have no oblication, we try our best to figure out what's meant by a sender) but it turns out that the code is used when we handle 'standard-compliant' cases (properly tagged). Still, the chance of getting something like the following Subject: =?UTF-7?Q?+rACsAQ?= for &#44032;&#44033; Subject : =?UTF-7?Q?caf+AOk?= for café (set encoding to UTF-8) Or Content-Disposition: attachment; filename*=utf-7''caf+AOk.txt is not much higher than 1 in 10^30 :-). However, if somebody points out that it's a bug for Mozilla not to decode the above. I can't help admitting that she is correct in principle although I'm tempted to counter that nobody in their right mind would use it. How much are we willing to pay for this? Well, it's another strcasecmp() and I'll add that to be correct for even very remotely possible cases. bienvenu, nhotta and duccaroz, what would you say?
Hmm... So I just looked at the nsURILoader and nsExternalHelperAppService changes (the only ones I care about, really) and it looks like both have the same problem -- passing a length to EqualsIgnoreCase will return true if that many chars match.... so "inlines" and "inline" are equal if you pass 6 as the second arg to EqualsIgnoreCase, no? One other question I have is why we pass the origin charset in when decoding the filename in nsExternalHelperAppService but not when getting the disposition type... is there a reason? Other than that, the code refactoring looks awesome.
Fix was checked a while ago. Thanks for the catch. Instead of |dispToken.EqualsIgnoreCase("inline", 6)|, I should have used |dispToken.EqualsIgnoreCase("inline", dispToken.Length())|, shouldn't I? I'll make a quick patch. Somebody might say that Mozilla should be generous to accept 'inlines' as well, but I guess in this case, we'd rather play by the rules :-) As for not passing originCharset, we don't have to when fetching the disposition type which is always in US-ASCII (unless somebody makes an 'over-i18n' proposal and IETF accepts it ;-)). In case of filename parameter, there are a lot of broken CGI programs (perl, jsp, asp) in the wild that emit raw 8bit characters expecting clients to interpret them as encoded in originCharset. So, we have to 'bend' the rules and be a bit accomodating....
Target Milestone: mozilla1.4alpha → mozilla1.5alpha
jshin: well if you're at that, why not just don't pass a length?
Attached patch uriloader : 4-line fix (deleted) — Splinter Review
let me get quick r/sr
Comment on attachment 125584 [details] [diff] [review] uriloader : 4-line fix sr=me. This is exactly what I wanted; passing the other length would be just as bad when dispToken is too short...
Attachment #125584 - Flags: superreview+
Did the checkin for this cause bug 209329 (mailto links don't work at all)?
Comment on attachment 125584 [details] [diff] [review] uriloader : 4-line fix looks right
Attachment #125584 - Flags: review+
Fix checked in. Thank you all! I'm done with this one.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
This fix broke mailto urls in mailnews. nsSmtpService.cpp ends up setting an empty spec on new mailto urls. Did you test this before checking in your change to the smtp service? =( =)
Comment on attachment 125849 [details] [diff] [review] fix to make mailto urls work in mail again sr=bienvenu
Attachment #125849 - Flags: superreview+
mscott: please see bug 209328 which covers mailto uris not working
This patch is not complete although it is simple and works on the surface. Please, see bug 209328.
well, that patch was checked in anyway: 06/17/2003 22:34 scott%scott-macgregor.org mozilla/ mailnews/ compose/ src/ nsSmtpService.cpp 1.125 1/1 fix the regression caused by Bug #162765 which broke mailto urls. without any review or super-review
It had sr, was written by the module peer, and fixed a bug that is still open 5 days after filing when it should have been a smoketest blocker, imo....
Sorry for the regession. BTW, please note that the patch was uploaded within 7 hrs of the report. Perhaps, I should have asked people on mailnews for r/sr. (my patch uploaded to bug 209328 is across modules unlike the one liner just checked in). Anyway, I'll take care of the rest in bug 209328 as soon as I get r.
To answer Christian's comments, I checked this in right away because it fixed a nasty regression in mail which I think should have been a blocker and caught during the testing of the patch before it went in. Now that it's in, feel free to change how this was fixed via Bug #209328 as you pointed out.
*** Bug 206788 has been marked as a duplicate of this bug. ***
It looks like I'm facing a similar bug to bug 206788 that is duplicate of this one. Please take a look at bug 353113. Thanks.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: