Body filter not working with greek characters
Categories
(Thunderbird :: Filters, defect)
Tracking
(Not tracked)
People
(Reporter: callmejames, Unassigned)
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.165
Steps to reproduce:
I set the following conditions:
-Match all the following
-Body | Contains | μύκητες
-Perform these actions:
-Delete message
Then I clicked the button "Run now"
Actual results:
Absolutely nothing, so Thunderbird cannot filter spam messages!
Expected results:
Thunderbird should have found the word "μύκητες" on the body and deleted the annoying message.
Comment 1•4 years ago
|
||
Please attach the offending message as .eml
Comment 2•4 years ago
|
||
Sent myself messages both in plain text and html, with just subject and just body, utf-8
Body search fails for with version 78, works with 68. Subject search works
Also fails with search on server.
Also fails using quick filter bar.
Comment 3•4 years ago
|
||
Updated•4 years ago
|
Comment 4•4 years ago
|
||
Wayne, I tried the message you provided in a local folder using the QFB and "Search Messages" in both TB 68 and TB 78 and μύκητες is found in both cases. It's impossible that this is a bug in local folders, or, I believe, also synchronised IMAP folders, since we have totally extensive tests for this (which I wrote):
https://searchfox.org/comm-central/source/mailnews/base/test/unit/test_searchBody.js#51
Greek text is in one of the examples.
So what are the STR exactly?
Comment 5•4 years ago
|
||
Comment 6•4 years ago
|
||
I set up a filter for adding a star to messages with μύκητες in the body, and that worked, too. On a local folder.
Comment 7•4 years ago
|
||
Yesterday's initial test was mostly creating messages on Mac and copied to other accounts. Also, I stopped using search and filters - simplified testing to using only quick filter.
I've done a lot more testing - I'm mostly only going to report the last bits:
- on Windows - With the yesterday's messages I have success with both 68 AND 78 on WINDOWS - but these were on messages that had been copied to accounts or received in those accounts
- On Mac quick filter on the account the message was sent from, vseerror.
- on Mac, I copied the test messages to another account , then back to vseerror using Mac. Now quick filter on Mac succeeds in the vseerror account. Both accounts have "sync" enabled.
Next, today on Mac, I create new plain text test message with μύκητες sending from vseerror to vseerror. quick filter fails. I compared message yysource of Thursday's message to Friday's - they look the same.
Next, I tested global search, it finds both the message I sent today and Thursday's message that copied I between accounts message. Conclusion - I can't believe I am saying this - the sent and received message are stored differently on disk? Or one is coming from cache and another from the synced folder?
For good measure, some final tests on Mac
- message sent from vseerror to two other accounts - QF only fails in vseerror from where it was sent (works in other two receiving accounts)
- message sent from luwsm to vseerror - QF only fails filtering luwsm sent folder (not vseerror inbox)
- message sent from luwsm to luwsm - QF fails on both sent folder and Inbox
All of the above are gmail enterprise accounts.
And now the nails in the coffin: (to eliminate possible gmail strangeness)
- fastmail account sent from wsmwk to wsmwk (both messages are in the Inbox because I have sent message going to Inbox) QF finds ONLY the RECEIVED message - it should have found both the copy in "send folder" and received
- newsgroup posting to mozilla.test - QF finds the copy in the sent folder (local folder) - conclusion, sent message handled different for imap vs local?
Comment 8•4 years ago
|
||
To narrow the issue down, can you please do this more systematically:
- On Windows and Mac, do body search in a local folder
- On Windows and Mac, do body search in a locally sync'ed IMAP folder
- Repeat for non-sync folder.
We need to know whether to look for the bug in local body search, which is subject to the test I mentioned, or whether we have an IMAP issue here.
There is bug 1245532 with 7 duplicates and also bug 404255. I doubt that you're seeing a new issue here.
If you see differences in point 1 between Windows and Mac, I will comment further, just as a teaser:
https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about
Quote: Unicode includes multiple ways to encode some characters, most notably accented characters.
... and it is known that Mac uses a different normalisation, so if the normalisation of text entered in the search box doesn't coincide with the normalisation of the text in the body, you have a problem.
Reporter | ||
Comment 9•4 years ago
|
||
<<There is bug 1245532 with 7 duplicates and also bug 404255. I doubt that you're seeing a new issue here.>>
Out of curiosity, why those bugs aren't fixed?
<<Unicode includes multiple ways to encode some characters, most notably accented characters.
... and it is known that Mac uses a different normalisation, so if the normalisation of text entered in the search box doesn't coincide with the normalisation of the text in the body, you have a problem.>>
This is characteristic of today's era of incredible techno-idiocy, techno-sloppiness and greed, and that's definitely not an excuse in 21st century, I'd be ashamed to even use this as an excuse.
Meanwhile, lately I'm getting a storm of spam on my official email (shared only with companies where I'm a customer) ... I'm creating 20 F*** filters per day! I got the first spam batch after registering domains with godaddy and I transferred all of my domains to another company after I realized that (they didn't even answered my complaint for the leak).
How about a filter that moves to trash ALL email, except those addresses on a white list?
That would solve the spam epidemic for good.
Comment 10•4 years ago
|
||
Looks like an IMAP issue.
How about a filter that moves to trash ALL email, except those addresses on a white list?
You can do that: From doesn't contain and From doesn't contain, etc. Not much fun to manage.
Description
•