Closed
Bug 1489949
Opened 6 years ago
Closed 6 years ago
Port bug 1488659: Replace XPCOM use of nsICharsetDetector
Categories
(MailNews Core :: General, task)
MailNews Core
General
Tracking
(Not tracked)
RESOLVED
FIXED
Thunderbird 64.0
People
(Reporter: jorgk-bmo, Assigned: jorgk-bmo)
References
Details
Attachments
(3 files, 5 obsolete files)
(deleted),
patch
|
ehsan.akhgari
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
As per bug 1488659 comment #6. Sample code in:
https://phabricator.services.mozilla.com/D5392#C122482NL203
but let's see the final version since there was a review nit.
Assignee | ||
Comment 1•6 years ago
|
||
Hi Henri and Ehsan, can you please take a look. Sorry about the double review request, one will surely do, but bug 1489949 is already on inbound, so we're a bit in a hurry.
This is copied from
https://hg.mozilla.org/integration/mozilla-inbound/rev/d94ad30a3001#l11.48
In that changeset I can't see any evidence of
NS_CHARSET_DETECTOR_CONTRACTID_BASE "universal_charset_detector"
ever existing, so hence the:
- // First try the universal charset detector
- nsCOMPtr<nsICharsetDetector> detector
- = do_CreateInstance(NS_CHARSET_DETECTOR_CONTRACTID_BASE
- "universal_charset_detector");
- if (!detector) {
That never worked, or am I missing something?
Strangely the string is still here:
https://dxr.mozilla.org/mozilla-central/search?q=universal_charset_detector&redirect=false
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
Attachment #9007760 -
Flags: review?(hsivonen)
Attachment #9007760 -
Flags: review?(ehsan)
Comment 2•6 years ago
|
||
Comment on attachment 9007760 [details] [diff] [review]
1489949-replace-xpcom-nsICharsetDetector.patch
Review of attachment 9007760 [details] [diff] [review]:
-----------------------------------------------------------------
I never realized there is a c-c consumer for this, sorry about that!
Do you think it's worth adding a constructor function to m-c that reads this pref and returns a nsICharsetDetector* so that you can call that directly instead of having this code duplicated in two places? If yes, I'd be happy to add that for you...
r=me for now so that c-c doesn't get busted.
::: mailnews/base/util/nsMsgUtils.cpp
@@ -1893,5 @@
> {
> - // First try the universal charset detector
> - nsCOMPtr<nsICharsetDetector> detector
> - = do_CreateInstance(NS_CHARSET_DETECTOR_CONTRACTID_BASE
> - "universal_charset_detector");
So this will always fail currently (even before bug 1488659) since this component doesn't exist.
@@ +1905,5 @@
> + detector = new nsRUProbDetector();
> + } else if (detectorName.EqualsLiteral("ukprob")) {
> + detector = new nsUKProbDetector();
> + } else if (detectorName.EqualsLiteral("ja_parallel_state_machine")) {
> + detector = new nsJAPSMDetector();
Then yes, this is exactly like the code I landed on m-c.
Attachment #9007760 -
Flags: review?(hsivonen)
Attachment #9007760 -
Flags: review?(ehsan)
Attachment #9007760 -
Flags: review+
Assignee | ||
Comment 3•6 years ago
|
||
(In reply to :Ehsan Akhgari from comment #2)
> Do you think it's worth adding a constructor function to m-c that reads this
> pref and returns a nsICharsetDetector* so that you can call that directly
> instead of having this code duplicated in two places? If yes, I'd be happy
> to add that for you...
Yes please :-)
> r=me for now so that c-c doesn't get busted.
Thanks. I'll make it r=Ehsan, right? - although previously IIRC we used to use lower case.
Comment 4•6 years ago
|
||
sure
Comment 5•6 years ago
|
||
(I don't care about the case. :-) )
(In reply to Jorg K (GMT+2) from comment #3)
> (In reply to :Ehsan Akhgari from comment #2)
> > Do you think it's worth adding a constructor function to m-c that reads this
> > pref and returns a nsICharsetDetector* so that you can call that directly
> > instead of having this code duplicated in two places? If yes, I'd be happy
> > to add that for you...
> Yes please :-)
That would probably lead to *more* c-c churn later this year, when I intend to add more conditions to the detector instantiation in m-c.
(In reply to Jorg K (GMT+2) from comment #1)
> NS_CHARSET_DETECTOR_CONTRACTID_BASE "universal_charset_detector"
> ever existing
Removed in bug 844115.
Assignee | ||
Updated•6 years ago
|
Keywords: leave-open
Version: 24 → Trunk
Assignee | ||
Comment 8•6 years ago
|
||
Ehsan and Henri: There is another call site which is now busted:
https://searchfox.org/comm-central/rev/27b3d9c5fd2543f47c3943590c72fa560a5b6b9a/mailnews/mime/src/comi18n.cpp#49
That one uses the "string versions" which you also removed. I tried a similar approach, but since you also removed nsJAStringPSMDetector, nsRUStringProbDetector, nsUKStringProbDetector, this code doesn't compile.
Flags: needinfo?(hsivonen)
Flags: needinfo?(ehsan)
Assignee | ||
Comment 9•6 years ago
|
||
So for the time being, I have to remove the busted code.
Comment 10•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #8)
> Created attachment 9007837 [details] [diff] [review]
> 1489949-replace-xpcom-nsICharsetDetector-part2.patch - does NOT compile
>
> Ehsan and Henri: There is another call site which is now busted:
> https://searchfox.org/comm-central/rev/
> 27b3d9c5fd2543f47c3943590c72fa560a5b6b9a/mailnews/mime/src/comi18n.cpp#49
>
> That one uses the "string versions" which you also removed. I tried a
> similar approach, but since you also removed nsJAStringPSMDetector,
> nsRUStringProbDetector, nsUKStringProbDetector, this code doesn't compile.
You should probably add the removed classes back to c-c and instantiate them from here then. This stuff wasn't used in m-c and was therefore removed from there.
Flags: needinfo?(ehsan)
Comment 11•6 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/cacb2b5c2373
Port bug 1488659: Replace XPCOM use of nsICharsetDetector, part 1. r=Ehsan
https://hg.mozilla.org/comm-central/rev/2275b3b53ada
Port bug 1488659: Temporarily remove code using StringProbDetector. rs=bustage-fix
Assignee | ||
Comment 12•6 years ago
|
||
(In reply to :Ehsan Akhgari from comment #10)
> You should probably add the removed classes back to c-c and instantiate them
> from here then.
Unfortunately that didn't occur to me, ah, well, here goes. I'll back out the bustage fix separately.
Attachment #9007837 -
Attachment is obsolete: true
Flags: needinfo?(hsivonen)
Attachment #9007852 -
Flags: review?(ehsan)
Assignee | ||
Updated•6 years ago
|
Keywords: leave-open
Target Milestone: --- → Thunderbird 64.0
Comment 13•6 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/9588a123fb8d
backed out changeset 2275b3b53ada to remove temporary measure. a=backout
https://hg.mozilla.org/comm-central/rev/995fe17c5348
Port bug 1488659: Replace XPCOM use of nsICharsetDetector, part 2. rs=bustage-fix
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
I think it's not worthwhile to put the nsIStringCharsetDetector.h interface in c-c. You should be able to get the same result by calling DoIt once followed by Done on nsICharsetDetector.
Assignee | ||
Comment 15•6 years ago
|
||
Well, the DoIt() has a different interface
https://hg.mozilla.org/comm-central/rev/995fe17c5348#l1.98
something with confidence.
Would you like to give me a snippet here?
Comment 16•6 years ago
|
||
Comment on attachment 9007852 [details] [diff] [review]
1489949-replace-xpcom-nsICharsetDetector-part2.patch
I think Henri is a better reviewer... :-)
Attachment #9007852 -
Flags: review?(ehsan)
Assignee | ||
Comment 17•6 years ago
|
||
Comment on attachment 9007852 [details] [diff] [review]
1489949-replace-xpcom-nsICharsetDetector-part2.patch
I sometimes have to land stuff to fix bustage and a follow-up will be necessary. Unless it's grossly wrong and needs to be backed out, an r+ with the condition for a follow-up would be appreciated (see preceding comments).
Attachment #9007852 -
Flags: review?(hsivonen)
Comment on attachment 9007852 [details] [diff] [review]
1489949-replace-xpcom-nsICharsetDetector-part2.patch
Review of attachment 9007852 [details] [diff] [review]:
-----------------------------------------------------------------
::: mailnews/mime/src/comi18n.cpp
@@ +38,5 @@
> default_charset, override_charset,
> eatContinuations, result);
> }
>
> +class nsJAStringPSMDetector : public nsXPCOMStringDetector
It looks like the classes being subclassed here are dead code in m-c, so instead of subclassing them in c-c, we should be removing them from m-c.
@@ +69,2 @@
> *aCharset = nullptr;
> + nsCOMPtr<nsIStringCharsetDetector> detector;
Please use nsICharsetDetector here.
@@ +73,5 @@
>
> + if (!detectorName.IsEmpty()) {
> + // We recognize one of the three magic strings for the following languages.
> + if (detectorName.EqualsLiteral("ruprob")) {
> + detector = new nsRUStringProbDetector();
And then please instantiate the nsICharsetDetector implementation classes in these branches.
@@ +81,5 @@
> + detector = new nsJAStringPSMDetector();
> + }
> + }
> +
> + if (detector) {
The here you need to initialize the nsICharsetDetector with an implementation of nsICharsetDetectionObserver that stores copies the arguments of Notify to its fields.
Then you can call DoIt and Done on the nsICharsetDetector and then retrieve the values from the fields of your nsICharsetDetectionObserver implemenatation.
Attachment #9007852 -
Flags: review?(hsivonen) → review-
(In reply to Henri Sivonen (:hsivonen) from comment #18)
> It looks like the classes being subclassed here are dead code in m-c, so
> instead of subclassing them in c-c, we should be removing them from m-c.
(Bug 1490212)
(In reply to Henri Sivonen (:hsivonen) from comment #18)
> The here you need to initialize the nsICharsetDetector with an
> implementation of nsICharsetDetectionObserver that stores copies the
> arguments of Notify to its fields.
Correction: In what's a very questionable use of XPCOM, the const char* argument of Notify() always points to a literal string, so if the c-c callers assume they don't have to free the pointer, I guess you shouldn't copy that string but just pass that pointer along.
Comment 21•6 years ago
|
||
So why exactly do we want to preserve the sniffing in comm-central?
If m-c is able to remove it's usage, if anything I'd imagine mail content with unlabeled charsets would be even more uncommon.
Assignee | ||
Comment 22•6 years ago
|
||
Thanks for the review, Henri. Here's a version as you described it. I'll backout the incorrect version separately.
I've done the tweaks in nsMsgUtils.cpp according to your comment #20.
I'm not sure about passing back the charset from MIME_detect_charset(). Now I'm just passing the pointer back. MsgDetectCharsetFromFile() passes back a copy in a nsACString.
Attachment #9007852 -
Attachment is obsolete: true
Attachment #9007990 -
Flags: review?(hsivonen)
Comment on attachment 9007990 [details] [diff] [review]
1489949-replace-xpcom-nsICharsetDetector-part2.patch (v2)
Review of attachment 9007990 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks and sorry about the churn.
Attachment #9007990 -
Flags: review?(hsivonen) → review+
Assignee | ||
Comment 24•6 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #23)
> Thanks and sorry about the churn.
Thanks, the churn was mostly my fault landing incorrect solutions as the trigger-happy sheriff/developer ;-)
M-C sees a whole lot of backouts, so I don't think we should worry about some backouts on C-C.
So the
+ if (oConfident == eBestAnswer || oConfident == eSureAnswer) {
+ *aCharset = observer->GetDetectedCharset();
is OK?
(In reply to Jorg K (GMT+2) from comment #24)
> (In reply to Henri Sivonen (:hsivonen) from comment #23)
> > Thanks and sorry about the churn.
> Thanks, the churn was mostly my fault landing incorrect solutions as the
> trigger-happy sheriff/developer ;-)
>
> M-C sees a whole lot of backouts, so I don't think we should worry about
> some backouts on C-C.
>
> So the
> + if (oConfident == eBestAnswer || oConfident == eSureAnswer) {
> + *aCharset = observer->GetDetectedCharset();
> is OK?
I believe so. I believe both before and after what gets assigned to *aCharset is a pointed to a POD literal, so if it worked previously without anyone trying to free the literal, it should work now.
Assignee | ||
Comment 26•6 years ago
|
||
Hmm, well, the const char* pointer we return gets passed into Notify(). All we know is that Notify() shouldn't/can't modify it. As far as I'm aware, the is no guarantee as to the longevity of the data pointed to. The CharsetDetectionObserver will be destroyed once the function exits.
There is no "prior art" here, at least not strictly. Before that code used the string detector versions and the DoIt() function returned the charset, and I don't know where that came from.
All I can observe is that in the equivalent code in MsgDetectCharsetFromFile(), a copy is passed back a copy in a nsACString. That would be 100% water-proof. Shall I change it to that so we can sleep better?
BTW, I'm happy to be wrong on all the above, just asking ;-)
Note that the Japanese version is in fact used:
https://dxr.mozilla.org/l10n-central/rev/fe8de2043f654a30ec39111799bba92c53cc67e0/ja/toolkit/chrome/global/intl.properties#39
Flags: needinfo?(hsivonen)
(In reply to Jorg K (GMT+2) from comment #26)
> All I can observe is that in the equivalent code in
> MsgDetectCharsetFromFile(), a copy is passed back a copy in a nsACString.
> That would be 100% water-proof. Shall I change it to that so we can sleep
> better?
Changing the aCharset outparam of MIME_detect_charset from const char** aCharset to nsACString& aCharset would be better.
Flags: needinfo?(hsivonen)
MIME_detect_charset has C linkage but is only called from C++, so changing its linkage to C++ linkage should be fine.
Comment 30•6 years ago
|
||
Before proceeding, is there reason to think we need this? comment #21
(Because it's in use isn't a reason. Of course it's used since the localization note says to set it.)
Assignee | ||
Comment 31•6 years ago
|
||
Good point, but perhaps we should fix the bustage and move the removal, if so decided, two another bug. Note that there are various call sites doing this.
Assignee | ||
Comment 32•6 years ago
|
||
MsgDetectCharsetFromFile() fixed in part 1 has two call sites. Does your comment include those or only part 2?
Comment on attachment 9008035 [details] [diff] [review]
1489949-replace-xpcom-nsICharsetDetector-part2.patch (v3)
Review of attachment 9008035 [details] [diff] [review]:
-----------------------------------------------------------------
::: mailnews/base/util/nsMsgUtils.cpp
@@ +1883,5 @@
> {
> mCharset = aCharset;
> return NS_OK;
> };
> + const char *GetDetectedCharset() { return mCharset; }
This should have nsACString& as outparam instead of returning const char*.
@@ +1888,4 @@
>
> private:
> virtual ~CharsetDetectionObserver() {}
> + const char* mCharset;
This should stay as nsCString.
::: mailnews/mime/src/comi18n.cpp
@@ +47,5 @@
> + NS_DECL_ISUPPORTS
> + CharsetDetectionObserver() {};
> + NS_IMETHOD Notify(const char* aCharset, nsDetectionConfident aConf) override
> + {
> + mCharset = aCharset;
AssignASCII here.
Assignee | ||
Comment 34•6 years ago
|
||
Tweaked string processing in both source files as per Henri's comments. Applied all three comments everywhere.
Magnus will be disappointed that his question wasn't answered. I guess MsgDetectCharsetFromFile() from part 1 has its justification, but MIME_detect_charset() appears a bit doubtful. It's only ever configured for Japanese, so as far as I can see, no Russian/Ukrainian/Cyrillic detection. And for Japanese, will it really distinguish - say - UTF-8, Shift-JIS from ISO-2022-JP?
Best discussed in a new bug, I think.
Attachment #9007990 -
Attachment is obsolete: true
Attachment #9008035 -
Attachment is obsolete: true
Attachment #9008035 -
Flags: review?(hsivonen)
Attachment #9008124 -
Flags: review?(hsivonen)
Assignee | ||
Comment 35•6 years ago
|
||
Oops, forgot on AssignASCII(). Sorry about the noise. Interdiff with v3.
Attachment #9008124 -
Attachment is obsolete: true
Attachment #9008124 -
Flags: review?(hsivonen)
Attachment #9008125 -
Flags: review?(hsivonen)
Comment on attachment 9008125 [details] [diff] [review]
1489949-replace-xpcom-nsICharsetDetector-part2.patch (v3c)
Review of attachment 9008125 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks.
::: mailnews/mime/src/comi18n.h
@@ +31,3 @@
>
> #ifdef __cplusplus
> } /* extern "C" */
I'm surprised it compiles without removing the C linkage. If it turns out you need to remove the C linkage, the r+ holds either way.
Attachment #9008125 -
Flags: review?(hsivonen) → review+
Comment 37•6 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/9dc7f65220a4
Backed out changeset 995fe17c5348 for being incorrect. a=backout
https://hg.mozilla.org/comm-central/rev/4d4e022e6697
Port bug 1488659: Replace XPCOM use of nsICharsetDetector, part 2, take 2. r=hsivonen
Assignee | ||
Comment 38•6 years ago
|
||
Landed with two changes from the reviewed version:
- if (NS_SUCCEEDED(res) && detectedCharset && *detectedCharset) {
+ if (NS_SUCCEEDED(res) && !detectedCharset.IsEmpty()) {
PR_FREEIF(text->charset);
- text->charset = strdup(detectedCharset);
+ text->charset = ToNewCString(detectedCharset); // [1]
//update MsgWindow charset if we are instructed to do so
- if (text->needUpdateMsgWinCharset && *text->charset)
+ if (text->needUpdateMsgWinCharset && text->charset) // [2]
[1] Used to ToNewCString(detectedCharset) instead of strdup(detectedCharset.get())
[2] ToNewCString() can return nullptr, so we need to check.
Checking *text->charset was wrong since
a) it would have dereferenced null
b) we checked before that the string wasn't empty, even in the original code.
Thanks. That's indeed better that strdup.
Updated•5 years ago
|
Type: enhancement → task
You need to log in
before you can comment on or make changes to this bug.
Description
•