Open Bug 585094 Opened 14 years ago Updated 9 years ago

Global search suppresses duplicates: "Search all Messages" does not show Inbox message in facet results or Open as List, only message in Sent is found (POP3)

Categories

(MailNews Core :: Database, defect)

x86
Windows XP
defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: thomas8, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [datalossy])

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.7) Gecko/20100713 Lightning/1.0b2 Thunderbird/3.1.1

Mozilla/5.0 (Windows; Windows NT 5.1; rv:2.0b3pre) Gecko/20100804 Shredder/3.2a1pre

STR

1 POP3, compose test mail to yourself (bcc in my case) that has "test123" in body
2 Get messages until you receive the test message3 in POP3 inbox, there's now 2 copies of the msg: one in Sent, one in Inbox
3 "Search all messages" (global search) for "test123"
4 "Open as list"

Actual result after step 4

- Step 3 finds only one msg, the one from SENT folder (as you can find out in step 4, after enabling Location column)
- Step 4 finds only one msg, the one from SENT folder

Expected result after step 4

Step 3 should find both messages, from Sent and Inbox
Step 4 should find both messages, from Sent and Inbox

If I understand this correctly, it means that I cannot use global search on words from body for any messages that I have (b)cc'ed into my on inbox, as long as I have a second copy in Sent folder (i. e. I cannot search for any of my own messages). That's global search pretty much broken -> major

Gloda seems to have trouble with disentangling the two instances of the same msg. Which you can easily see with following steps

5 Delete msg from Sent to Trash
6 repeat 3 and 4

Actual result (as expected)
Now there is only the msg in Inbox left, and - surprise - now it's found in step 3 and 4

This may or may not be the superset/cause of bug 570589.
And altogether, there's way too many bugs in TB Search!!!
Wayne, can you reproduce/confirm this?
Summary: Global search: "Search all Messages" fails to find messages with search word in body (POP, with 2 instances of msg in sent and inbox) → Global search: "Search all Messages" fails to find messages with search word in body (POP3, with 2 instances of msg in Sent and Inbox folder)
Thomas how does you database look, what does logging shows when you follow you strs ( see https://developer.mozilla.org/en/Thunderbird/Gloda_debugging) ?
Component: Search → Database
Product: Thunderbird → MailNews Core
QA Contact: search → database
Version: 3.1 → Trunk
(In reply to comment #2)
> Thomas how does you database look,

I have no idea, and I'm not sure I want to...

> what does logging shows when you follow you
> strs ( see https://developer.mozilla.org/en/Thunderbird/Gloda_debugging) ?

Nothing on the error console.
Are you unable to reproduce?
We intentionally hide duplicate messages in the faceted search UI page.  Once you click on them to show the conversation, though, we show all instances.

We're working on fixing the limitation of gloda that treats duplicate messages as separate messages and required the UI to then do a simple filtering pass.  (Without it, messages found in gmail or the like where lots of duplicates happen end up looking dumb.)
(In reply to comment #4)
> We intentionally hide duplicate messages in the faceted search UI page. 

Oh really! Well, I consider myself an advanced user and I did NOT get the idea in spite of KNOWING that there are duplicates and explicitly trying to find them... OK, I'll let that sink in and think about it.

First thoughts (especially with physical copies in different folders in mind!):
- Any in-program documentation/indication of hidden duplicates?
- Any external documentation about hidden duplicates?
- Any option for the user to customize/turn off this behaviour?
- No option of showing duplicates means a big limitation to the usefulness of global search for organizing mails with multiple instances (which at least in case of POP3 are physical duplicates!)
- What counts as a duplicate?
  - my msg in pop3 inbox and sent are not the same (looking at the source), are they dupes?
  - what if I delete attachment from the msg in sent, but not from the msg in inbox, still dupes?
  - what if two (physical) instances of same mail have different tags, are they dupes?

> Once you click on them to show the conversation, we show all instances.

The problem is that in both result views (fancy facets and open as list), there is no indication whatsoever of the duplicates (physical dupes at least for POP3, maybe more copies in backup folders, archives etc.). That's suppression of existing results. I'd definitely want global search to at least optionally return all instances. After all, it's "Search all messages" and not "Hide some"... I do see the visual benefit of not showing dupes in the "fancy" first results view, but there should be some indication, like the number of duplicates in brackets, in black: my subject (3).

> We're working on fixing the limitation of gloda that treats duplicate messages
> as separate messages and required the UI to then do a simple filtering pass. 

After that fix, will gloda still know all the instances of that mail, and their potential differences as described above (different source code, folders, tags, deleted attachments etc.)?
Messages with the same message-id header are duplicates from the perspective of the duplicate elimination logic.

I am not aware of any documentation on this where someone would expect it; I tend to mention it on bugs and on the mailing lists when relevant.  It would probably want to live at http://support.mozillamessaging.com/en-US/kb/Global+Search if you want to help rectify that problem.

The only UI impact of the duplicate suppression is the number of matching messages being even more misleading.  (We limit the search to 400 messages or something like that, but then we whittle that number down when we get rid of the duplicates.  I'm reasonably confident there's a bug on that.)

After the refactoring gloda will have 'union' semantics over duplicates.  Which is to say a message found in the "Inbox" folder with tag Foo and the "Sent" folder with tag Bar will be presented as a single message known to reside in "Inbox" and "Sent" with tags Foo and Bar.  Application of a new tag Baz will add the tag to both underlying messages.  If someone wanted to perform manipulations affecting only one of the underlying messages they could write it as a very specialized extension (too complex for core.)
Depends on: 585300
(In reply to comment #6)
> Messages with the same message-id header are duplicates from the perspective 
> of the duplicate elimination logic.
> 
> The only UI impact of the duplicate suppression is the number of matching
> messages being even more misleading. 

No, that's not "the only" UI impact, as described before, and below.

> (We limit the search to 400 messages or
> something like that, but then we whittle that number down when we get rid of
> the duplicates.  I'm reasonably confident there's a bug on that.)

I couldn't find that bug, posted Bug 585300 - [faceted search] Need UI to point out incomplete results count for global search with > 400 results or suppressed duplicate messages (wrong, misleading, incorrect)

> After the refactoring gloda will have 'union' semantics over duplicates.Which 
> is to say a message found in the "Inbox" folder with tag Foo and the "Sent"
> folder with tag Bar will be presented as a single message known to reside in
> "Inbox" and "Sent" with tags Foo and Bar.  Application of a new tag Baz will
> add the tag to both underlying messages.

You'll obviously be aware of the manifold UI and behaviour problems that will go along with this if "presenting of/acting on multiple messages as one" is not perfectly transparent, or rather visibly communicated to the user.
(In reply to comment #7)
> (In reply to comment #6)
> > The only UI impact of the duplicate suppression is the number of matching
> > messages being even more misleading. 
> 
> No, that's not "the only" UI impact, as described before, and below.

Yeah, sorry, poorly phrased that.  What I meant is that the closest thing we have to intentional feedback about the suppression is that the number of messages goes down.  But it's ambiguous even if you know about the suppression.
 
> > After the refactoring gloda will have 'union' semantics over duplicates.Which 
> > is to say a message found in the "Inbox" folder with tag Foo and the "Sent"
> > folder with tag Bar will be presented as a single message known to reside in
> > "Inbox" and "Sent" with tags Foo and Bar.  Application of a new tag Baz will
> > add the tag to both underlying messages.
> 
> You'll obviously be aware of the manifold UI and behaviour problems that will
> go along with this if "presenting of/acting on multiple messages as one" is not
> perfectly transparent, or rather visibly communicated to the user.

It's a hard problem in that we can't satisfy all use-cases for/reasons why people might have duplicate messages at once and have a friendly UI.  In general I don't expect the UI built on top of the refactored gloda providing much opportunity for manually pushing messages around, which should shortcut the danger.  Folder-organized message stores would be accomplished by assertions about where messages should be located.  Otherwise, things would likely end up in a 'storage soup' like gmail.  (In our case, for non-gmail IMAP I think that means inbox or the archives folders, and maybe mailing lists automatically asserted out into their own folders.)
In response to comment 1 - I can reproduce comment 0 using Mozilla/5.0 (Windows NT 6.0; rv:2.0b12pre) Gecko/20110212 Thunderbird/3.3a3pre. I don't however see this for imap

Updating summary.
Summary: Global search: "Search all Messages" fails to find messages with search word in body (POP3, with 2 instances of msg in Sent and Inbox folder) → Global search suppresses duplicates: "Search all Messages" does not show Inbox message in facet results or Open as List, only message in Sent is found (POP3)
Whiteboard: [datalossy]
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
You need to log in before you can comment on or make changes to this bug.