Closed Bug 1489949 Opened 6 years ago Closed 6 years ago

Port bug 1488659: Replace XPCOM use of nsICharsetDetector

Categories

(MailNews Core :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 64.0

People

(Reporter: jorgk-bmo, Assigned: jorgk-bmo)

References

Details

Attachments

(3 files, 5 obsolete files)

As per bug 1488659 comment #6. Sample code in: https://phabricator.services.mozilla.com/D5392#C122482NL203 but let's see the final version since there was a review nit.
Blocks: 1488659
Hi Henri and Ehsan, can you please take a look. Sorry about the double review request, one will surely do, but bug 1489949 is already on inbound, so we're a bit in a hurry. This is copied from https://hg.mozilla.org/integration/mozilla-inbound/rev/d94ad30a3001#l11.48 In that changeset I can't see any evidence of NS_CHARSET_DETECTOR_CONTRACTID_BASE "universal_charset_detector" ever existing, so hence the: - // First try the universal charset detector - nsCOMPtr<nsICharsetDetector> detector - = do_CreateInstance(NS_CHARSET_DETECTOR_CONTRACTID_BASE - "universal_charset_detector"); - if (!detector) { That never worked, or am I missing something? Strangely the string is still here: https://dxr.mozilla.org/mozilla-central/search?q=universal_charset_detector&redirect=false
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
Attachment #9007760 - Flags: review?(hsivonen)
Attachment #9007760 - Flags: review?(ehsan)
Comment on attachment 9007760 [details] [diff] [review] 1489949-replace-xpcom-nsICharsetDetector.patch Review of attachment 9007760 [details] [diff] [review]: ----------------------------------------------------------------- I never realized there is a c-c consumer for this, sorry about that! Do you think it's worth adding a constructor function to m-c that reads this pref and returns a nsICharsetDetector* so that you can call that directly instead of having this code duplicated in two places? If yes, I'd be happy to add that for you... r=me for now so that c-c doesn't get busted. ::: mailnews/base/util/nsMsgUtils.cpp @@ -1893,5 @@ > { > - // First try the universal charset detector > - nsCOMPtr<nsICharsetDetector> detector > - = do_CreateInstance(NS_CHARSET_DETECTOR_CONTRACTID_BASE > - "universal_charset_detector"); So this will always fail currently (even before bug 1488659) since this component doesn't exist. @@ +1905,5 @@ > + detector = new nsRUProbDetector(); > + } else if (detectorName.EqualsLiteral("ukprob")) { > + detector = new nsUKProbDetector(); > + } else if (detectorName.EqualsLiteral("ja_parallel_state_machine")) { > + detector = new nsJAPSMDetector(); Then yes, this is exactly like the code I landed on m-c.
Attachment #9007760 - Flags: review?(hsivonen)
Attachment #9007760 - Flags: review?(ehsan)
Attachment #9007760 - Flags: review+
(In reply to :Ehsan Akhgari from comment #2) > Do you think it's worth adding a constructor function to m-c that reads this > pref and returns a nsICharsetDetector* so that you can call that directly > instead of having this code duplicated in two places? If yes, I'd be happy > to add that for you... Yes please :-) > r=me for now so that c-c doesn't get busted. Thanks. I'll make it r=Ehsan, right? - although previously IIRC we used to use lower case.
sure
(I don't care about the case. :-) )
(In reply to Jorg K (GMT+2) from comment #3) > (In reply to :Ehsan Akhgari from comment #2) > > Do you think it's worth adding a constructor function to m-c that reads this > > pref and returns a nsICharsetDetector* so that you can call that directly > > instead of having this code duplicated in two places? If yes, I'd be happy > > to add that for you... > Yes please :-) That would probably lead to *more* c-c churn later this year, when I intend to add more conditions to the detector instantiation in m-c.
(In reply to Jorg K (GMT+2) from comment #1) > NS_CHARSET_DETECTOR_CONTRACTID_BASE "universal_charset_detector" > ever existing Removed in bug 844115.
Keywords: leave-open
Version: 24 → Trunk
Ehsan and Henri: There is another call site which is now busted: https://searchfox.org/comm-central/rev/27b3d9c5fd2543f47c3943590c72fa560a5b6b9a/mailnews/mime/src/comi18n.cpp#49 That one uses the "string versions" which you also removed. I tried a similar approach, but since you also removed nsJAStringPSMDetector, nsRUStringProbDetector, nsUKStringProbDetector, this code doesn't compile.
Flags: needinfo?(hsivonen)
Flags: needinfo?(ehsan)
So for the time being, I have to remove the busted code.
(In reply to Jorg K (GMT+2) from comment #8) > Created attachment 9007837 [details] [diff] [review] > 1489949-replace-xpcom-nsICharsetDetector-part2.patch - does NOT compile > > Ehsan and Henri: There is another call site which is now busted: > https://searchfox.org/comm-central/rev/ > 27b3d9c5fd2543f47c3943590c72fa560a5b6b9a/mailnews/mime/src/comi18n.cpp#49 > > That one uses the "string versions" which you also removed. I tried a > similar approach, but since you also removed nsJAStringPSMDetector, > nsRUStringProbDetector, nsUKStringProbDetector, this code doesn't compile. You should probably add the removed classes back to c-c and instantiate them from here then. This stuff wasn't used in m-c and was therefore removed from there.
Flags: needinfo?(ehsan)
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/cacb2b5c2373 Port bug 1488659: Replace XPCOM use of nsICharsetDetector, part 1. r=Ehsan https://hg.mozilla.org/comm-central/rev/2275b3b53ada Port bug 1488659: Temporarily remove code using StringProbDetector. rs=bustage-fix
(In reply to :Ehsan Akhgari from comment #10) > You should probably add the removed classes back to c-c and instantiate them > from here then. Unfortunately that didn't occur to me, ah, well, here goes. I'll back out the bustage fix separately.
Attachment #9007837 - Attachment is obsolete: true
Flags: needinfo?(hsivonen)
Attachment #9007852 - Flags: review?(ehsan)
Keywords: leave-open
Target Milestone: --- → Thunderbird 64.0
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/9588a123fb8d backed out changeset 2275b3b53ada to remove temporary measure. a=backout https://hg.mozilla.org/comm-central/rev/995fe17c5348 Port bug 1488659: Replace XPCOM use of nsICharsetDetector, part 2. rs=bustage-fix
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
I think it's not worthwhile to put the nsIStringCharsetDetector.h interface in c-c. You should be able to get the same result by calling DoIt once followed by Done on nsICharsetDetector.
Well, the DoIt() has a different interface https://hg.mozilla.org/comm-central/rev/995fe17c5348#l1.98 something with confidence. Would you like to give me a snippet here?
Comment on attachment 9007852 [details] [diff] [review] 1489949-replace-xpcom-nsICharsetDetector-part2.patch I think Henri is a better reviewer... :-)
Attachment #9007852 - Flags: review?(ehsan)
Comment on attachment 9007852 [details] [diff] [review] 1489949-replace-xpcom-nsICharsetDetector-part2.patch I sometimes have to land stuff to fix bustage and a follow-up will be necessary. Unless it's grossly wrong and needs to be backed out, an r+ with the condition for a follow-up would be appreciated (see preceding comments).
Attachment #9007852 - Flags: review?(hsivonen)
Comment on attachment 9007852 [details] [diff] [review] 1489949-replace-xpcom-nsICharsetDetector-part2.patch Review of attachment 9007852 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/mime/src/comi18n.cpp @@ +38,5 @@ > default_charset, override_charset, > eatContinuations, result); > } > > +class nsJAStringPSMDetector : public nsXPCOMStringDetector It looks like the classes being subclassed here are dead code in m-c, so instead of subclassing them in c-c, we should be removing them from m-c. @@ +69,2 @@ > *aCharset = nullptr; > + nsCOMPtr<nsIStringCharsetDetector> detector; Please use nsICharsetDetector here. @@ +73,5 @@ > > + if (!detectorName.IsEmpty()) { > + // We recognize one of the three magic strings for the following languages. > + if (detectorName.EqualsLiteral("ruprob")) { > + detector = new nsRUStringProbDetector(); And then please instantiate the nsICharsetDetector implementation classes in these branches. @@ +81,5 @@ > + detector = new nsJAStringPSMDetector(); > + } > + } > + > + if (detector) { The here you need to initialize the nsICharsetDetector with an implementation of nsICharsetDetectionObserver that stores copies the arguments of Notify to its fields. Then you can call DoIt and Done on the nsICharsetDetector and then retrieve the values from the fields of your nsICharsetDetectionObserver implemenatation.
Attachment #9007852 - Flags: review?(hsivonen) → review-
(In reply to Henri Sivonen (:hsivonen) from comment #18) > It looks like the classes being subclassed here are dead code in m-c, so > instead of subclassing them in c-c, we should be removing them from m-c. (Bug 1490212)
(In reply to Henri Sivonen (:hsivonen) from comment #18) > The here you need to initialize the nsICharsetDetector with an > implementation of nsICharsetDetectionObserver that stores copies the > arguments of Notify to its fields. Correction: In what's a very questionable use of XPCOM, the const char* argument of Notify() always points to a literal string, so if the c-c callers assume they don't have to free the pointer, I guess you shouldn't copy that string but just pass that pointer along.
So why exactly do we want to preserve the sniffing in comm-central? If m-c is able to remove it's usage, if anything I'd imagine mail content with unlabeled charsets would be even more uncommon.
Thanks for the review, Henri. Here's a version as you described it. I'll backout the incorrect version separately. I've done the tweaks in nsMsgUtils.cpp according to your comment #20. I'm not sure about passing back the charset from MIME_detect_charset(). Now I'm just passing the pointer back. MsgDetectCharsetFromFile() passes back a copy in a nsACString.
Attachment #9007852 - Attachment is obsolete: true
Attachment #9007990 - Flags: review?(hsivonen)
Comment on attachment 9007990 [details] [diff] [review] 1489949-replace-xpcom-nsICharsetDetector-part2.patch (v2) Review of attachment 9007990 [details] [diff] [review]: ----------------------------------------------------------------- Thanks and sorry about the churn.
Attachment #9007990 - Flags: review?(hsivonen) → review+
(In reply to Henri Sivonen (:hsivonen) from comment #23) > Thanks and sorry about the churn. Thanks, the churn was mostly my fault landing incorrect solutions as the trigger-happy sheriff/developer ;-) M-C sees a whole lot of backouts, so I don't think we should worry about some backouts on C-C. So the + if (oConfident == eBestAnswer || oConfident == eSureAnswer) { + *aCharset = observer->GetDetectedCharset(); is OK?
(In reply to Jorg K (GMT+2) from comment #24) > (In reply to Henri Sivonen (:hsivonen) from comment #23) > > Thanks and sorry about the churn. > Thanks, the churn was mostly my fault landing incorrect solutions as the > trigger-happy sheriff/developer ;-) > > M-C sees a whole lot of backouts, so I don't think we should worry about > some backouts on C-C. > > So the > + if (oConfident == eBestAnswer || oConfident == eSureAnswer) { > + *aCharset = observer->GetDetectedCharset(); > is OK? I believe so. I believe both before and after what gets assigned to *aCharset is a pointed to a POD literal, so if it worked previously without anyone trying to free the literal, it should work now.
Hmm, well, the const char* pointer we return gets passed into Notify(). All we know is that Notify() shouldn't/can't modify it. As far as I'm aware, the is no guarantee as to the longevity of the data pointed to. The CharsetDetectionObserver will be destroyed once the function exits. There is no "prior art" here, at least not strictly. Before that code used the string detector versions and the DoIt() function returned the charset, and I don't know where that came from. All I can observe is that in the equivalent code in MsgDetectCharsetFromFile(), a copy is passed back a copy in a nsACString. That would be 100% water-proof. Shall I change it to that so we can sleep better? BTW, I'm happy to be wrong on all the above, just asking ;-) Note that the Japanese version is in fact used: https://dxr.mozilla.org/l10n-central/rev/fe8de2043f654a30ec39111799bba92c53cc67e0/ja/toolkit/chrome/global/intl.properties#39
Flags: needinfo?(hsivonen)
(In reply to Jorg K (GMT+2) from comment #26) > All I can observe is that in the equivalent code in > MsgDetectCharsetFromFile(), a copy is passed back a copy in a nsACString. > That would be 100% water-proof. Shall I change it to that so we can sleep > better? Changing the aCharset outparam of MIME_detect_charset from const char** aCharset to nsACString& aCharset would be better.
Flags: needinfo?(hsivonen)
MIME_detect_charset has C linkage but is only called from C++, so changing its linkage to C++ linkage should be fine.
So like this?
Attachment #9008035 - Flags: review?(hsivonen)
Before proceeding, is there reason to think we need this? comment #21 (Because it's in use isn't a reason. Of course it's used since the localization note says to set it.)
Good point, but perhaps we should fix the bustage and move the removal, if so decided, two another bug. Note that there are various call sites doing this.
MsgDetectCharsetFromFile() fixed in part 1 has two call sites. Does your comment include those or only part 2?
Comment on attachment 9008035 [details] [diff] [review] 1489949-replace-xpcom-nsICharsetDetector-part2.patch (v3) Review of attachment 9008035 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/util/nsMsgUtils.cpp @@ +1883,5 @@ > { > mCharset = aCharset; > return NS_OK; > }; > + const char *GetDetectedCharset() { return mCharset; } This should have nsACString& as outparam instead of returning const char*. @@ +1888,4 @@ > > private: > virtual ~CharsetDetectionObserver() {} > + const char* mCharset; This should stay as nsCString. ::: mailnews/mime/src/comi18n.cpp @@ +47,5 @@ > + NS_DECL_ISUPPORTS > + CharsetDetectionObserver() {}; > + NS_IMETHOD Notify(const char* aCharset, nsDetectionConfident aConf) override > + { > + mCharset = aCharset; AssignASCII here.
Tweaked string processing in both source files as per Henri's comments. Applied all three comments everywhere. Magnus will be disappointed that his question wasn't answered. I guess MsgDetectCharsetFromFile() from part 1 has its justification, but MIME_detect_charset() appears a bit doubtful. It's only ever configured for Japanese, so as far as I can see, no Russian/Ukrainian/Cyrillic detection. And for Japanese, will it really distinguish - say - UTF-8, Shift-JIS from ISO-2022-JP? Best discussed in a new bug, I think.
Attachment #9007990 - Attachment is obsolete: true
Attachment #9008035 - Attachment is obsolete: true
Attachment #9008035 - Flags: review?(hsivonen)
Attachment #9008124 - Flags: review?(hsivonen)
Oops, forgot on AssignASCII(). Sorry about the noise. Interdiff with v3.
Attachment #9008124 - Attachment is obsolete: true
Attachment #9008124 - Flags: review?(hsivonen)
Attachment #9008125 - Flags: review?(hsivonen)
Comment on attachment 9008125 [details] [diff] [review] 1489949-replace-xpcom-nsICharsetDetector-part2.patch (v3c) Review of attachment 9008125 [details] [diff] [review]: ----------------------------------------------------------------- Thanks. ::: mailnews/mime/src/comi18n.h @@ +31,3 @@ > > #ifdef __cplusplus > } /* extern "C" */ I'm surprised it compiles without removing the C linkage. If it turns out you need to remove the C linkage, the r+ holds either way.
Attachment #9008125 - Flags: review?(hsivonen) → review+
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/9dc7f65220a4 Backed out changeset 995fe17c5348 for being incorrect. a=backout https://hg.mozilla.org/comm-central/rev/4d4e022e6697 Port bug 1488659: Replace XPCOM use of nsICharsetDetector, part 2, take 2. r=hsivonen
Landed with two changes from the reviewed version: - if (NS_SUCCEEDED(res) && detectedCharset && *detectedCharset) { + if (NS_SUCCEEDED(res) && !detectedCharset.IsEmpty()) { PR_FREEIF(text->charset); - text->charset = strdup(detectedCharset); + text->charset = ToNewCString(detectedCharset); // [1] //update MsgWindow charset if we are instructed to do so - if (text->needUpdateMsgWinCharset && *text->charset) + if (text->needUpdateMsgWinCharset && text->charset) // [2] [1] Used to ToNewCString(detectedCharset) instead of strdup(detectedCharset.get()) [2] ToNewCString() can return nullptr, so we need to check. Checking *text->charset was wrong since a) it would have dereferenced null b) we checked before that the string wasn't empty, even in the original code.
Thanks. That's indeed better that strdup.
Type: enhancement → task
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: