Closed Bug 5933 Opened 27 years ago Closed 25 years ago

International support for IMAP4 search

Categories

(MailNews Core :: Internationalization, defect, P1)

All
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: mozilla, Assigned: nhottanscp)

References

Details

(Whiteboard: nsbeta2+]Exception Feature)

(This bug imported from BugSplat, Netscape's internal bugsystem. It was known there as bug #88257 http://scopus.netscape.com/bugsplat/show_bug.cgi?id=88257 Imported into Bugzilla on 05/04/99 17:49) Messenger client should have fall back mechanism just in case IMAP4 server doesn't support the charset used with SEARCH command. For example, when it's working in Japanese char encoding, it should work like below: result = SEARCH(UTF8 string with charset "UTF-8"); if (result == NO) { // UTF-8 may not be supported. result = SEARCH(ISO-2022-JP string with charset "ISO-2022-JP"); if (result == NO) { // ISO-2022-JP may not be supported. result = SEARCH(AS IS without charset); if (result == NO) printf("Couldn't match any"); } } Notice: By checking whether the search string contains only ASCII or not, you can skip first two SEARCH(). It's up to implementation. Messenger client in Communicator 4.0 doesn't work like above. It sends SEARCH command with "Shift_JIS" charset, and gives up without retrying if server response is "NO".
John, Isn't this on the Gromit "In" list?
Actually, there is one change to the algorithm specified here. As the very first step, if search string contains only US-ASCII (regardless of encoding of the search UI), then SEARCH with charset=US-ASCII otherwise continue as listed here.
Seems more search related than IMAP related. If you disagree, assign back to me.
Yes; it's search related, so it goes to Scott :-) We'll need to invent some way to allow multiple passes at a single search scope, which we don't have right now. To clarify jfriend's point, if the search string contains only US-ASCII, we only try US-ASCII, and not any i18n charset stuff. I'd also like to clarify whether the IMAP server must send a NO response if it doesn't know the charset, or whether it can just search and not find any matches.
Setting TFV to 4.5.
Mass moving bugs from product version 5.0 to 4.5 since that's where the bugs are now (no change to TFV).
Setting qa assigned to field.
Not a PR1 stopper.
Bulk change: Bug assigned to mail/news engineer but no component specified. Changed to mail/news component.
<sorry for the bug notification intrusion. Product version on this bug shows 1.0 (due to a bugsplat bug). Correcting all mail/news bugs numbered < 90000 to product version 4.0. Bulk changing this.>
FYI. tintin.mcom.com is now running MS4.0 Beta which supports various charset option for IMAP SEARCH. If you don't have test environment now, ask sheneman@netscape.com for an account. However, I would recommend to have WSU's IMAP4 server as reference as well. It also have good SEARCH implementation.
Phil, wasn't this the I18N bug we were talking about with Naoki and Bob? Where you going to end up doing this one? changing QA field to gbush
Bouncing over to Phil.
M15, I hope. I won't get to this for 4.5b2
Later. Too many more serious bugs for 4.5.
How can this remain "latered"? Negotiation of search charsets with our own MS4.0 is something most major mail clients can perform now, e.g. Outlook, WinBiff, etc. We should be competitive and support the search charset negotiation. Without this, our IMAP search for Japanese and other non-ASCII languages would not work. How can we promote our clients to enterprise customers without this feature working? MS4.0 is nearing completion. This bug should be a perfect candiadte for 4.51. Re-opening for consideration in 4.51.
In case we need to review how this functionality should work, I consulted taka and came up with the following summary of the spec. ** Proposed steps for negotiating down the IMAP search charset. ** 0. Check the 'capability' of the IMAP server for UTF-8. IMAP4 capability command should return something like the following in response to "a capability" command: a capability * CAPABILITY IMAP4 IMAP4rev1 ACL QUOTA LITERAL NAMESPACE UIDPLUS LANGUAGE XSENDER X-NETSCAPE XSERVERINFO AUTH=PLAIN AUTH=LOGIN a OK Completed If the return string contains "X-NETSCAPE", we can be assured of UTF-8 seacrh capability with this server. (Note: If you see X-NETSCAPE in the response of CAPABILITY command, there's 100% guarantee that the server will recognize UTF-8 charset. Do NOT rely on the banner message because it's configurable, user may change it to something else. You can always try UTF-8 as charset whethr or not it's IMAP4 server (it will fail if the server doesn't know UTF-8). ) 1. Determine if the search string contains any 8-bit characters. ---> If not (=only 7-bit data), send the search string in ASCII. 2. If 1) is yes, then assume that the search charset is in the System Charset (or the global default -- e.g. in 4.5 we use global default for LDAP servers so that more than one charsets can be used for search.) Convert it to UTF-8 and send to the server. If the server accepts it, then it should return matches if there are any matches. 3. If the request in 2 is rejected by the server, then, send the string in the standard mail charset matching the System (or the global default) charset. (For example, iso-2022-jp for the Japanese Win/Mac system charset, Shift_JIS.) 4. If the request in 3 is rejected, then send the raw search string (as is) without any charset specification. And this completes the client's responsibility. Open issue: Should we use the global default or the system charset as the basis for the source charset? The global default is more flexible in that we can input in different charsets if proper keyboards or input methods are available as we change the global default.
qa assigned shouldn't be gbush. Should be someone in msanz's group.
There are two issues, The pref mailnews.force_ascii_search is set to true. The second problem is that we need to convert search string to mail charset which is JIS in case of Japanese. We are currently using the folder csid which is ShiftJIS or EUC. Here is a change I applied to my local tree. Index: search.cpp =================================================================== RCS file: /m/src/ns/lib/libmsg/search.cpp,v retrieving revision 1.112.4.2.2.42 diff -c -r1.112.4.2.2.42 search.cpp *** search.cpp 1998/10/01 04:24:55 1.112.4.2.2.42 --- search.cpp 1998/11/10 18:53:45 *************** *** 2182,2188 **** --- 2182,2192 ---- // Ask the newsgroup/folder for its csid. if (m_scope->m_folder) { dst_csid = m_scope->m_folder->GetFolderCSID() & ~CS_AUTO; dst_csid = INTL_DefaultMailCharSetID(dst_csid); } } // default means that our best guess is to get the default window char set ID
This sounds like a lot of work, so I think we shouldn't commit to doing this for 4.51, unless a customer escalation comes in which forces us to do it. Clearing TFV. Please see me before setting the TFV. BTW, I think Naoki's proposed change above is partial, at best, and defeats the per-folder CSID that we allow the user to set.
Why can it sound like a lot of work? Naoki shows everything to fix. What is wrong with partial solution? Any serious side effect? Although I don't mind what TFV it's got, I do care if customers in Japan find all other IMAP clients work with Messaging Server 4.0, but only Netscape client (except Messenger Express 4.1) doesn't with Netscape's own IMAP server. I've waited almost 10 month. And, seems like I have to keep waiting more. Am I expecting too much?
>and defeats the per-folder CSID that we allow the user to set. That has been true anyway as we restrict to Ascii only. The other issue is that we only support single charset inside the search dialog. Also more complicated issue is folder hierachy which may have mixed charsets situations. So, those issues need to be solved in future. But I am not sure if we should support only ascii until we solve those issues.
> Why can it sound like a lot of work? Because none of the other searching code takes more than one attempt at a search based on the results of previous attempts. > Naoki shows everything to fix. That is absolutely not true. Naoki shows how to convert to the mail server's charset only. That does not implement the algorithm Kat showed his 10/29/98 comments. > I've waited almost 10 month. And, seems like I have to keep waiting more. > Am I expecting too much? As I said above, the question for when we add this feature is determined by customer escalations. There are lots of other features that people have wanted for longer then 10 months that we're not doing in 4.51.
After discussing various pros and cons, we have decided to open a new bug for fulfilling a minimum IMAP search requirement for the Japanese market. A new bug does not ask for server-client negotiation, and should be handled by the escalation team. The new bug is: 334536.
TFV 5.0
I (or someone else) will be moving enhancements, etc, bugs targeted for 5.0 to bugzilla in the near future. ------- Additional Comments From paulmac May-04-1999 17:44 ------- Okay, time to close out old bugsplat bugs - Please move to bugzilla if this one is still relevant or mark won't fix, please. ------- Additional Comments From momoi May-04-1999 17:49 ------- Well, this is still a valid bug. Let's move to 5.0 and send it to the Mail/News team.
Target Milestone: M9
Blocks: 7228
Target Milestone: M9 → M13
search is moving out.
Search won't be implemented until after Beta 1, so this bug does not need to be fixed until after Beta 1
Assignee: phil → mscott
Status: REOPENED → NEW
Target Milestone: M13 → M14
mscott owns the search backend, so reassigning to him for M14. Searching is not a B1 feature.
Target Milestone: M14 → M16
triagin...this is not a beta2 bug.
Target Milestone: M16 → M18
Based on Beta2 Criteria http://client/seamonkey/prd/beta2criteria.html. This is beta2 P1 bug, should add a keyworkds beta2 on this bug?
Karen, the beta2 doc says we need to implement a search back end which is a separate bug. We need the search backedn before we can start fixing bugs like this which have been around since 4.5. =( I don't see any mention of this bug in the beta2 docs so I'm not sure what you were looking at or maybe you were thinking about the comment to implement search for beta2?
I suck i was only looking under mail not under mail 18n on the beta2 docs. moving back to a beta2 milestone. Thanks for catching my mistake Karen! I18N, are you guys sure this is a beta2 stopper?
Target Milestone: M18 → M17
4.x didn't do this - I can't believe it would be a beta stopper for 6.0, and we could ship with it as well - we always have before.
From Beta2 Criteria http://client/seamonkey/prd/beta2criteria.html. 1) Scroll down to see the Features 2) Selec I18N Features. 3) Select Mail I18N 4) Search for Mail/News Tasks - IMAP I18N - IMAP search 5933 - P1 P.S. I don't know what I18N mean? Does anybody know that?
I18N = Internationalization. I believe that the i18n group says it's a beta stopper. I just don't think we're going to have time to do it.
OK. I am just checking & trying to clarify that. Then the document should be modified!!
This bug was transferred from 4.x bug system. What we need for beta2 is i18n IMAP search to work. It is working in 4.x. In 4.x, if ascii search does fails then it falls back to another query using a folder charset. But for mozilla, it is easier and better to do UTF-8 query since we have a query string in unicode.
This is an IMAP spec. We made some very hard choices to ship 4.5 and this was one of the features that was cut at the very end. The mail server guys have been very adamant that the client needs to support this and were very disappointed that if fell off the 4.5 list at the end of that development cycle. taka and jgmyers can provide more data on what will break for who without this long awaited feature...
I'd be surprised if we get 80% of the search functionality that was in 4.5 into 6.0 - getting > 100% would be a miracle. If you hadn't noticed, we haven't even started search yet!
Putting beta2 for i18n beta2 criteria items. Contact bobj for question.
Keywords: beta2
> This is an IMAP spec. I don't see this mentioned in RFC 2060 or 2683. Please give the spec reference which supports your claim.
Blocks: 35851
Keywords: nsbeta2
Putting on [nsbeta2-] radar.
Keywords: beta2
Whiteboard: [nsbeta2-]
As the bug is old and the original comment is not consistent with what we need for beta2, I am rewriting the i18n requirement for beta2 (which is the same level of support as the current 4.x cleint). I also changed the summary. For beta2, we need US-ASCII search and charset specified search (i18n search). Here is how we can do, * Apply 7 bit check against search string. Assuming the search string is unicode (PRUnichar* or UTF-8), we can check < 128 against the search string. * If the search string is 7bit then the do US-ASCII search (search with no charset specified). * If the search string is 8bit then get the folder charset, convert the unicode string to the folder charset and specify the charset in the search command.
Summary: IMAP4 search doesn't retry if first attempt fails → International support for IMAP4 search
clear nsbeta2-
Whiteboard: [nsbeta2-]
ftang, why did you clear nsbeta2-..can you state your case?
Whiteboard: [NEED INFO]
Since search has been an approved feature exception, this goes hand in hand with that. It basically says make our imap seach I18N friendly when we implement it =).
On exception list for PR2, removing 5/16...giving [nsbeta2+]Exception Feature status.
Whiteboard: [NEED INFO] → nsbeta2+]Exception Feature
It's my understanding that the mail team cut search today.
so, like the last bug, I did a bunch of i18n work yesterday. And a reality check from everyone: This bug is over 2 years old now, a carryover from 4.5.. the general i18n-ness of search is already covered in bug 11659.. kinda seems like this should just be a dupe. if however this bug is referring to the algorithm described at the top of this file, I believe it may never have been implemented in 4.x.. and if that's the case I'm not sure why this would be nsbeta2+ in any case, I think this should either go to bienvenu or myself to lighten scott's load.
So after your i18n fixes, are we now close to parity with 4.5 and later? The spec there was described in nhotta@netscape.com 2000-05-01 16:00 comment above. That should be the minimum -- it has been implemnted before and current users of Communicator will expect as much.
I _think_ so... we won't know for certain until we have a UI. I haven't seen the equivalent of the algorithm described at the top of this bug...it might be there though
The algorithm which retries with a different character set if no hits are found was not implemented in 4.x. Since that's that this bug was about originally, I'm guessing that we should separate that issue (which we're not addressing for seamonkey) with the issue of 4.x parity WRT i18n searching (which we should address for seamonkey)
>I _think_ so... we won't know for certain until we have a UI. Do we have a bug for that? As soon as that is resolved iqa can test i18n search.
ok, does anyone object to me marking this a dupe of 11659 (which has been marked fixed) then? bienvenu has appearantly got IMAP search working, and I have supposedly made the whole search backend i18n friendly... *** This bug has been marked as a duplicate of 11659 ***
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → DUPLICATE
nhotta, see bug 33101 for the search UI frontend bug. I just added you to the CC
i object actually. Alecf, this bug refers to a specic algorith for imap4 searching that escalation engineering implemented in 4.6. This bug is track this when we implement search for imap. It's separate from the random i18n filter and search bug you marked it a dup of.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
I don't really see how i18n search can be done, despite what Alec has done. My understanding was that the i18n group had to provide us with API's that existed in 4.5 but no longer exist in 6.0 in order for i18n search to work. Of course, I've been out of it for a long time, but that was my understanding.
The last time we talked about NNTP search API and we dropped that from beta2. For I map, I think what I mentioned in 2000-05-01 16:00 are available in 6.0 (e.g. getting a folder charset, conversion from unicode to a folder charset, etc.).
is the latter also true for local? Convert the headers to unicode using the charset and do a unicode comparison with the utf8->unicode converted search string? What about message bodies? We can't really convert the whole message body to unicode in memory, can we?
I believe that some of the search code converts the unicode search term to the folder's charset, then performs the search with this converted string.
alecf's description is how local searching is supposed to work (and did in 4.x).
For local search, header search requires MIME decoding. Sicne the MIME decoder returns unicode, that can be compared with the search term. For body local search, I believe we converted the body (not the search term). Here is a 4.x code, I belive DO_I18N was defined in the 4.x (otherwise japanese search wouldn't work). http://lxr.mozilla.org/mozilla/source/mailnews/base/search/src/nsMsgSearchTerm.c pp#687 739 #ifdef DO_I18N 740 // In here we do I18N conversion if we get the converter 741 char *newBody = nsnull; 742 newBody = (char *)INTL_CallCharCodeConverter(conv, (unsigned char *) buf, (int32) PL_strlen(buf)); 743 if (newBody && (newBody != buf)) 744 { 745 // CharCodeConverter return the char* to the orginal string 746 // we don't want to free body in that case 747 compare = newBody; 748 } 749 #endif
DO_I18N is not in the 4.5 code; it was added to 6.0 so the code would compile because things like INTL_CreateCharCodeConverter don't exist in 6.0 - I think this was one area where we need a 6.0 equivalent way of doing this.
you know, it's actually going to be EASIER for me to convert the user-entered value to the folder's charset and do the body search that way. Anyone object if I do it that way? It'll be faster too.
I take that back, it's not as simple as I had hoped.. converting the body is the easy way right now.
Reposting my comment in 2000-05-01 16:00 which contains I18N requirement for nsbeta2. > As the bug is old and the original comment is not consistent with what we need > for beta2, I am rewriting the i18n requirement for beta2 (which is the same > level of support as the current 4.x cleint). I also changed the summary. > For beta2, we need US-ASCII search and charset specified search (i18n search). > > Here is how we can do, > * Apply 7 bit check against search string. Assuming the search string is unicode > (PRUnichar* or UTF-8), we can check < 128 against the search string. > * If the search string is 7bit then the do US-ASCII search (search with no > charset specified). > * If the search string is 8bit then get the folder charset, convert the unicode > string to the folder charset and specify the charset in the search command.
Added myself to Cc.
I and taka started to look at the code. The search criteria string is UTF-8 and there is also a function to get a folder charset. 7 bit check can be done easily agains a UTF-8 string. Also, we can convert the string from UTF-8 to a folder charset. Taka pointed that we can use literal string instead of quoted string (which needs escaping for some charset, e.g. ISO-2022-JP).
The patch (hooked up charset conversion) was reviewed. I will probably check in tomorrow.
Checked in, testable once the UI is functional again.
Status: REOPENED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
** Checked with 7/10/2000 Win32 build ** OK, we are finally able to check on this because I can now see attribute names. Here's what works: 1. With the default view charset set to ISO-2022-JP, a single condition or "OR" with more than 1 attributes work OK to find relevant messages when we input Japanese search keys. What does not work: 1. Any search after the first one using a Japanese word produces no change even if you change an attribute value to another Japanese word. Even if you close the Search window and re-open it, it does not seem possible to do any search. If you use ASCII values, you can do more than 1 search at a time succesfully. This problem seems to be due to the use of non-ASCII data as search keys. 2. Any change in attribute category changes, e.g. from Subject to Sender, or from "OR" conjunction to "AND" conjunction. This type of change forces the server to send an error message saying that "Required argument was missing." This problem happens regardless of the charset of the attribute values used. There are other problems but I have not sorted them out yet. For item 2, I'll look for an existing bug. But for Item 1, I need to re-open this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reassing to me. Can we file a separate bug for the first problem since the international search itself has been enabled?
Assignee: mscott → nhotta
Status: REOPENED → NEW
Can we get a bit more analysis before deciding to file another bug? If we know for sure that it has nothing to do with the way non-ASCII was implemented, then let's file a new bug. Problem #1 makes Search in Japanese very difficult since users often try one key and then another in case the first didn't work. Unless I am mistaken, the user will have to reboot Mozilla to try the next search. That is really bad.
I've looked at Problem #1 a bit further and it seems that the problem is a bit more complex than I had described above. It seems that if you pick certain Japanese words, you can do more than 1 search at a time. When you use some other word, it does not work until you use some other data that do not have this problem. One example of a problem word is "Ni-hon" (Japan in Kanji). I have not been able to do any search with it.
I used win32 build ID 2000071008 on WinNT 4 Japanese and I can search Japanese strings more than once. First, I searched "mail" in Japanese then got some results. Then I searched "homepage" in Japanese then got additional results and they were appended to the search result. And I searched "welcome" in Japanese then got additional results and they were appended to the search result. So I cannot reproduce the problem. There may be a condition to reproduce this. Anyway, I prefer the problem to be filed separately.
Would you try "nihon" in Kanji and see if that works?
There seems to be another problem in search string formation to send to the server. See the SCOPUS bug we dealt with for Communicator, Bug ID 343598. The example string described there, Hiragana "a", causes a server error in Mozilla.
Please file seperate bug instead reopen this feature bug. Individual bugs will help us track different cases.
OK. I'll verify that the basic Intl IMAP functionality is working. There are soem misses and they will be filed as separate bugs.
Status: NEW → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
** Checked on 7/10/2000 Win32, Mac, and Linux builds ** On the above builds, basic non-ASCII search function is now working as long the search keys match the default view charset set in the Preferences dilaog. Marking it verifies ad fixed. We will file new bugs for thsi new feature in separate bugs.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.