Closed
Bug 992118
Opened 11 years ago
Closed 11 years ago
spell checker has no default dictionary
Categories
(Core :: Spelling checker, defect)
Tracking
()
RESOLVED
FIXED
mozilla32
People
(Reporter: porcelain_mouse, Assigned: ehsan.akhgari)
References
Details
Attachments
(2 files)
(deleted),
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
ehsan.akhgari
:
feedback-
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0 (Beta/Release)
Build ID: 20140319073243
Steps to reproduce:
* create new profile
* go to web-page with free text boxes
* type misspelled word
Actual results:
* misspelled word is not detected/highlighted.
* Right-click menu shows "Check Spelling" is already selected.
* Right-Click menu > Languages > none of the listed dictionaries is selected (or, the wrong dictionary is selected, i.e. not one matching Language preference).
* When I select the dictionary I want, it works, but it is remembered only for the current URL, which is probably the correct behavior. (This makes sense, but doesn't help me.)
Expected results:
I would have expected the spell checker to choose the installed dictionary that matches my language preference or locale or environment (LANG variable). (Although, I notice that the installed dictionary is 'en_US' and the FF Language settings is 'en-US'.)
I tried to get support: https://support.mozilla.org/en-US/questions/992002
I also see people complaining about the opposite behavior, see bug 682564. But, then again, maybe that is the same bug. FF default is wrong, but their default is en-US, which what I want, and my "default" is null. I would be happy if it reset to en-US on every new page; that's exactly what I want.
Updated•11 years ago
|
Component: Untriaged → Spelling checker
Product: Firefox → Core
Assignee | ||
Comment 1•11 years ago
|
||
So what is the expected language in your environment (what is your environment by the way?) and how do you deliver that information to Firefox? I'm trying to understand what the "my language preference or locale or environment (LANG variable)" exactly means...
Assignee | ||
Updated•11 years ago
|
Reporter | ||
Comment 2•11 years ago
|
||
Hi Ehsan,
I'm really confused by your question. I see you are quite active, so I assume the confusion is mine because that phrase is very specific and clear in this context. I'll try to answer, but if my explaination sounds pedantic, you will have to excuse me.
By "my language preference" I'm referring to the FF language preference, i.e. Preference > Content > Languages > Choose... >. Here, I haven't changed anything; the default is correct: en-us (aka English/US).
By "environment" I mean the program environment (aka ENV) in which FF is running. You know, all the variables that FF inherits when it starts running.
One of the environment variables is the LANG variable, which tells all programs what language I want to use. This is actually part of my POSIX locale, which is what I mean by "locale". As far as I know, this the way all internationalized programs work. FF inherits my locale, too, and I assumed was using that to pick my language.
So, in short, the expected language is specified by my locale.
Perhaps all this is beside the point, since all of these things are set correctly, yet FF doesn't pick the right spell checking dictionary. It doesn't pick any dictionary, actually. But, I assume it is related because Languages and dictionaries are associated with each other by the locale identifier. I assume this is not coincidence.
The bug you reference (bug 992944) is quite interesting. Perhaps that is my problem, but the symptom doesn't match, exactly. My spellchecker.dictionary pref is set and set correctly, yet it still doesn't work.
I hope this helps. I'm willing to troubleshoot, so please ask if you have more questions.
Reporter | ||
Comment 3•11 years ago
|
||
Any thoughts?
Assignee | ||
Comment 4•11 years ago
|
||
Sorry for the late response. :( Please use the needinfo? flag below the comment field and set its value to ":ehsan" without the quotes to get my attention.
(In reply to Porcelain Mouse from comment #2)
> Hi Ehsan,
>
> I'm really confused by your question. I see you are quite active, so I
> assume the confusion is mine because that phrase is very specific and clear
> in this context. I'll try to answer, but if my explaination sounds
> pedantic, you will have to excuse me.
No worries! I'm basically trying to get a picture of what your system looks like to see if I can figure out what's going wrong. The code which decides which language to use is quite complicated: <http://mxr.mozilla.org/mozilla-central/source/editor/composer/src/nsEditorSpellCheck.cpp#713> which is why I have to ask these questions. Sorry if they sound stupid! :-)
> By "my language preference" I'm referring to the FF language preference,
> i.e. Preference > Content > Languages > Choose... >. Here, I haven't
> changed anything; the default is correct: en-us (aka English/US).
That preference determines what language we send to the server as part of the content negotiation (see the Accept-Languages HTTP header). It doesn't affect spell checking in any way.
> By "environment" I mean the program environment (aka ENV) in which FF is
> running. You know, all the variables that FF inherits when it starts
> running.
We do look at the LANG environment variable, but only if we have no preferred spell checking dictionary, but it needs to match exactly with the language of your dictionary. IOW, en-US and en_US won't be considered to be the same thing here.
Speaking of this, what are the spellchecker.dictionary pref and the LANG environment variable set to respectively? Note that you originally mentioned that you reproduce this on an empty profile, which would make me believe that spellchecker.dictionary must be empty, but your comment here contradicts that...
> One of the environment variables is the LANG variable, which tells all
> programs what language I want to use. This is actually part of my POSIX
> locale, which is what I mean by "locale". As far as I know, this the way
> all internationalized programs work. FF inherits my locale, too, and I
> assumed was using that to pick my language.
Right, yeah we do have code to look at this environment variable.
> So, in short, the expected language is specified by my locale.
>
> Perhaps all this is beside the point, since all of these things are set
> correctly, yet FF doesn't pick the right spell checking dictionary. It
> doesn't pick any dictionary, actually. But, I assume it is related because
> Languages and dictionaries are associated with each other by the locale
> identifier. I assume this is not coincidence.
Which website are you using for your testing BTW? Websites can also specify a language for each text field.
> The bug you reference (bug 992944) is quite interesting. Perhaps that is my
> problem, but the symptom doesn't match, exactly. My spellchecker.dictionary
> pref is set and set correctly, yet it still doesn't work.
Are you on Fedora? (Note that I don't actually know if you are indeed hitting bug 992944...)
> I hope this helps. I'm willing to troubleshoot, so please ask if you have
> more questions.
Thanks, I may give you test builds and whatnot but let's first see if I can just guess where the bug is. :-)
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #4)
I don't mean to bug you about it. I'm needinfo?-ing you now, but there's no rush.
> No worries! I'm basically trying to get a picture of what your system looks
> like to see if I can figure out what's going wrong. The code which decides
> which language to use is quite complicated:
> <http://mxr.mozilla.org/mozilla-central/source/editor/composer/src/
> nsEditorSpellCheck.cpp#713> which is why I have to ask these questions.
> Sorry if they sound stupid! :-)
On the contrary, the confusion was in fact mine; that's clear now. I just didn't pick up on what you were driving at.
> That preference determines what language we send to the server as part of
> the content negotiation (see the Accept-Languages HTTP header). It doesn't
> affect spell checking in any way.
Ah ha, okay; I didn't realize that.
> We do look at the LANG environment variable, but only if we have no
> preferred spell checking dictionary, but it needs to match exactly with the
> language of your dictionary. IOW, en-US and en_US won't be considered to be
> the same thing here.
I read that in another bug. I just don't understand where the incorrect from could be coming from.
> Speaking of this, what are the spellchecker.dictionary pref and the LANG
> environment variable set to respectively? Note that you originally
> mentioned that you reproduce this on an empty profile, which would make me
> believe that spellchecker.dictionary must be empty, but your comment here
> contradicts that...
LANG=en_US.UTF-8
and
spellchecker.dictionary;en_US
I see that spellchecker.dictionary is set to a non-default value, but I don't remember setting it. I am confident I didn't set it manually in previous profiles as I just found it recently. Could it have been set the first time I used the contextual menu Languages option? I did that for this bug page, only; just to report the bug. I've been careful not to set it for any other pages so that I don't run out of convenient test pages.
Of course, neither 'en_US' nor the default, null, work. So, picking the dictionary from my LANG env is also not working for some reason.
> Which website are you using for your testing BTW? Websites can also specify
> a language for each text field.
I've mainly been using the RedHat bugzilla:
https://bugzilla.redhat.com/
which specifies <html lang="en">
and another internal page that you wouldn't be able to get, which specifies <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
Could I just be unfortunate to hit a lot of pages which specify the wrong language? Is there a test page I could try that should work for me? ....Hmm, I just copied the RedHat bugzilla page and removed the 'lang=en' and now that local copy works fine. Geez. So, now I'm really interested in this dash instead of an underscore. From where does that come!?
> > The bug you reference (bug 992944) is quite interesting. Perhaps that is my
> > problem, but the symptom doesn't match, exactly. My spellchecker.dictionary
> > pref is set and set correctly, yet it still doesn't work.
>
> Are you on Fedora? (Note that I don't actually know if you are indeed
> hitting bug 992944...)
Yes. Well, now I see the connection to that bug. So, to what is spellchecker.dictionary supposed to be set on Linux if that is to override these wrong page settings? I guess that is all I need to know. Obviously, the LANG patch would be nice, but in the meantime, sounds like what I want is the secret value for this setting. It can't be en_US, though.
> Thanks, I may give you test builds and whatnot but let's first see if I can
> just guess where the bug is. :-)
Sounds good.
Flags: needinfo?(ehsan)
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Porcelain Mouse from comment #5)
> > Speaking of this, what are the spellchecker.dictionary pref and the LANG
> > environment variable set to respectively? Note that you originally
> > mentioned that you reproduce this on an empty profile, which would make me
> > believe that spellchecker.dictionary must be empty, but your comment here
> > contradicts that...
>
> LANG=en_US.UTF-8
> and
> spellchecker.dictionary;en_US
>
> I see that spellchecker.dictionary is set to a non-default value, but I
> don't remember setting it. I am confident I didn't set it manually in
> previous profiles as I just found it recently. Could it have been set the
> first time I used the contextual menu Languages option? I did that for this
> bug page, only; just to report the bug. I've been careful not to set it for
> any other pages so that I don't run out of convenient test pages.
>
> Of course, neither 'en_US' nor the default, null, work. So, picking the
> dictionary from my LANG env is also not working for some reason.
Hmm, yeah this _could_ be a problem. When parsing the LANG environment variable we skip the encoding part (the part past the dot) but we don't attempt to normalize the previous part, so we'll read "en_US" out of that and don't end up matching that with "en-US". Not sure how your pref is set to "en_US". Can you try changing it to "en-US" and see if that fixes things?
> > Which website are you using for your testing BTW? Websites can also specify
> > a language for each text field.
>
> I've mainly been using the RedHat bugzilla:
>
> https://bugzilla.redhat.com/
>
> which specifies <html lang="en">
>
> and another internal page that you wouldn't be able to get, which specifies
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
Try <http://people.mozilla.org/~eakhgari/992118.html>? That is a test page without any lang specifications which I just created.
> Could I just be unfortunate to hit a lot of pages which specify the wrong
> language? Is there a test page I could try that should work for me?
> ....Hmm, I just copied the RedHat bugzilla page and removed the 'lang=en'
> and now that local copy works fine. Geez. So, now I'm really interested in
> this dash instead of an underscore. From where does that come!?
I can't tell you where those underscores come from. :-)
What dictionaries do you have installed? They should appear under Tools -> Add-ons. Can you also ls /usr/share/myspell and /usr/lib64/firefox/dictionaries?
> > > The bug you reference (bug 992944) is quite interesting. Perhaps that is my
> > > problem, but the symptom doesn't match, exactly. My spellchecker.dictionary
> > > pref is set and set correctly, yet it still doesn't work.
> >
> > Are you on Fedora? (Note that I don't actually know if you are indeed
> > hitting bug 992944...)
>
> Yes. Well, now I see the connection to that bug. So, to what is
> spellchecker.dictionary supposed to be set on Linux if that is to override
> these wrong page settings?
The spellchecker.dictionary pref is the fallback, not an override, so we'll prefer what websites tell us over that.
> I guess that is all I need to know. Obviously,
> the LANG patch would be nice, but in the meantime, sounds like what I want
> is the secret value for this setting. It can't be en_US, though.
What I would ultimately like you to test is to find out what one of your dictionaries is called, set that pref to that name, and try that people.mozilla.org link again and see if it fixes your issue. If it does, then there is a good chance that this is a dupe of bug 992944.
Flags: needinfo?(ehsan)
Assignee | ||
Comment 7•11 years ago
|
||
Also, can you please take a look at https://bugzilla.redhat.com/show_bug.cgi?id=439598 and see if your experience matches others' in that bug?
Reporter | ||
Comment 8•11 years ago
|
||
> Hmm, yeah this _could_ be a problem. When parsing the LANG environment
> variable we skip the encoding part (the part past the dot) but we don't
> attempt to normalize the previous part, so we'll read "en_US" out of that
> and don't end up matching that with "en-US". Not sure how your pref is set
> to "en_US". Can you try changing it to "en-US" and see if that fixes things?
So, no, that doesn't help when a page specifies "en-US". When a page doesn't specify, everything works fine, regardless of how I have it configured.
> Try <http://people.mozilla.org/~eakhgari/992118.html>? That is a test page
> without any lang specifications which I just created.
Yeah, that seems to work for me.
> I can't tell you where those underscores come from. :-)
Ah ha! Now we see the violence inherent in the system! ;-) As far as I can tell, the underscore is the traditional POSIX separator. It's been underscore on my system as long as I can remember, which is 15+ years.
> What dictionaries do you have installed? They should appear under Tools ->
> Add-ons. Can you also ls /usr/share/myspell and
> /usr/lib64/firefox/dictionaries?
Yeah, I looked that up for the support forum when I reported this problem there. Here are my files:
/usr/share/myspell/en_US.aff
/usr/share/myspell/en_US.dic
and
/usr/lib64/firefox/dictionaries/en_US.aff
/usr/lib64/firefox/dictionaries/en_US.dic
> The spellchecker.dictionary pref is the fallback, not an override, so we'll
> prefer what websites tell us over that.
Oh, I see..
> What I would ultimately like you to test is to find out what one of your
> dictionaries is called, set that pref to that name, and try that
> people.mozilla.org link again and see if it fixes your issue. If it does,
> then there is a good chance that this is a dupe of bug 992944.
Well, that page you made works in all cases for me. If I have spellchecker.dictionary set to "en_US", which is my installed dictionary, it works. If I have no spellchecker.dictionary setting at all, that page also works. It works if I have spellchecker.dictionary set to "en-US", too, which isn't a dictionary I have installed. The problem is when the page specifies "en" or "en-US"; that doesn't work regardless of how I have FF configured.
From what I see on the first couple of links I found searching, it sounds like the POSIX LANG variable is a composite field. That is, 'en-US' doesn't mean 'en-US', it means language 'en' and country code 'US'. So, why doesn't the library find a matching, installed dictionary, which I have, when 'en-US' or 'en' in specified? Isn't there a library call that will connonicalize the name for you? I'm guessing this separator business is handled internal to some library.
And, it should work on 'en' in any case since that doesn't included any separator business. Hmm, yeah, I think the delimiter is a red herring; shouldn't 'en' matched since I have one English dictionary installed?
Assignee | ||
Comment 9•11 years ago
|
||
(In reply to Porcelain Mouse from comment #8)
> > Hmm, yeah this _could_ be a problem. When parsing the LANG environment
> > variable we skip the encoding part (the part past the dot) but we don't
> > attempt to normalize the previous part, so we'll read "en_US" out of that
> > and don't end up matching that with "en-US". Not sure how your pref is set
> > to "en_US". Can you try changing it to "en-US" and see if that fixes things?
>
> So, no, that doesn't help when a page specifies "en-US". When a page
> doesn't specify, everything works fine, regardless of how I have it
> configured.
That's good to know.
> > I can't tell you where those underscores come from. :-)
>
> Ah ha! Now we see the violence inherent in the system! ;-) As far as I can
> tell, the underscore is the traditional POSIX separator. It's been
> underscore on my system as long as I can remember, which is 15+ years.
I'm not trying to pick sides here. ;-) I'm just saying that Gecko has never respected these underscores, and it has always used a dash here. Actually that's a bug I think, we should just accept underscores similar to dashes I think.
Actually looking at the code I do know where the underscore in your pref comes from. What happens is that we look at LANG, and then try to load a dictionary with that name, which will succeed the first time, and then we stick the name of the loaded dictionary in the pref. What that breaks is partial language name matching if a website later specifies "en" or "en-US" as the language of their textfield, which seems to match exactly what you're seeing.
So I guess that concludes our investigation! I'll prepare a patch and a test build for you shortly.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → ehsan
Assignee | ||
Comment 10•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #8414096 -
Flags: review?(bugs)
Assignee | ||
Comment 11•11 years ago
|
||
Can you please download the build here <http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla.com-9a2d82f97e88/try-linux/> and see if it fixes the bug for you? I'd really appreciate if you could try both with a new profile and with one of these old profiles that is affected by the same issue? I added code to handle both new LANG environment variables and also to deal with prefs with the underscore in them to take care of both situations.
Thanks!
Assignee | ||
Comment 12•11 years ago
|
||
Jan, please see this bug too, this is reported by a Fedora user, and it seems like the issue here is that we fail to properly handle the underscores in the dictionary names.
Reporter | ||
Comment 13•11 years ago
|
||
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #11)
> Can you please download the build here
> <http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla.
> com-9a2d82f97e88/try-linux/> and see if it fixes the bug for you?
Yes, I will gladly test it! Thank so much for banging out this patch so fast. Wow! Please give me a day or two to test all cases you requested; then I'll get right back to you.
Again, many thanks. More soon.
Updated•11 years ago
|
Attachment #8414096 -
Flags: review?(bugs) → review+
Assignee | ||
Updated•11 years ago
|
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(porcelain_mouse)
Reporter | ||
Comment 14•11 years ago
|
||
I'm not sure what I'm supposed to download. I see source for 32.0a1, is that it? I don't see any builds there for me; I have Fedora x86_64.
Flags: needinfo?(porcelain_mouse)
Reporter | ||
Comment 15•11 years ago
|
||
Oh, sorry, it has a compiled version. let me try...
Assignee | ||
Comment 16•11 years ago
|
||
Try: <http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla.com-9a2d82f97e88/try-linux64/firefox-32.0a1.en-US.linux-x86_64.tar.bz2>
It's not an rpm package, just a tarball which you can extract anywhere on your homedir. There should be an executable in there called firefox.
Reporter | ||
Comment 17•11 years ago
|
||
Okay, I tried three things:
New profile, with your test page and a page that specifies lang=en and a page that specifies lang=en-US: all work fine
Old Profile, with your test page and a page that specifies lang=en: all work fine
Old Profile, with spellcheck.dictionary=en_US and your test page and a page that specifies lang=en: all work fine
So, that looks prefect from my perspective.
Assignee | ||
Comment 18•11 years ago
|
||
Thanks a lot for testing, Porcelain Mouse!
http://hg.mozilla.org/integration/mozilla-inbound/rev/d411b8472391
Comment 19•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Comment 20•10 years ago
|
||
Well this fix doesn't work for me because your test package use internal dictionaries, not system ones. Fedora link 'dictionaries' directory to '/usr/share/myspell' which contains files like en_GB.dic, en_CA.dic, en_US.dic and same .aff files. And here comes problem with underscore. It fails with nsStyleUtil::DashMatchCompare (here: http://mxr.mozilla.org/mozilla-central/source/editor/composer/src/nsEditorSpellCheck.cpp#829 ) because '-' is expected and with '_' it never success.
So if you have German localized browser, like 'de', the 'de_DE' dictionary is ignored.
Comment 21•10 years ago
|
||
Attaching patch which try to find dictionary with '_' and use only '-' as separator in dictionary list.
This actually fix issues in Fedora. Please take a look, I just started a try run with patch: https://tbpl.mozilla.org/?tree=Try&rev=235031248590
Attachment #8435685 -
Flags: feedback?(ehsan)
Assignee | ||
Comment 22•10 years ago
|
||
Comment on attachment 8435685 [details] [diff] [review]
Support dictionaries with underscore
Review of attachment 8435685 [details] [diff] [review]:
-----------------------------------------------------------------
Looks mostly good, minusing because of the nsEditorSpellCheck change.
::: editor/composer/src/nsEditorSpellCheck.cpp
@@ +848,5 @@
> nsString lang = NS_ConvertUTF8toUTF16(env_lang);
> // Strip trailing charset if there is any
> int32_t dot_pos = lang.FindChar('.');
> if (dot_pos != -1) {
> + lang = Substring(lang, 0, dot_pos);
This change seems unrelated to the problem you're trying to fix, and is wrong. This breaks dictionaries with names such as "en-US.utf8" right?
::: extensions/spellcheck/hunspell/src/mozHunspell.cpp
@@ +171,5 @@
> }
>
> nsIFile* affFile = mDictionaries.GetWeak(nsDependentString(aDictionary));
> + if (!affFile) {
> + nsString replacedStr(aDictionary);
Please add a comment saying something like this:
"Support loading Fedora system dictionaries which use names such as en_US.aff as opposed to en-US.aff"
@@ +309,5 @@
> static PLDHashOperator
> AppendNewString(const nsAString& aString, nsIFile* aFile, void* aClosure)
> {
> AppendNewStruct *ans = (AppendNewStruct*) aClosure;
> + nsString replacedStr(aString);
Please add a comment saying "Restore the dictionary name on Fedora (see SetDictionary)"
Attachment #8435685 -
Flags: feedback?(ehsan) → feedback-
Reporter | ||
Comment 23•10 years ago
|
||
Hi Ehsan,
Hey, I can't remember why I thought your patch would make it into FF31, but 31 just came to Fedora and I downloaded it with great anticipation. Did it get incorporated for this release?
Well, FF31 works on my test page with lang=en and lang=en_US. That's an improvement and I think it's working on pages that it didn't before, like this one, e.g.
But, it doesn't work on pages with lang=en-US. From what I see in your patch, it looks like trying to make that work, though I could be wrong. I'm afraid we might not have tested this case. :-(
Thank you so much for your help on this. Just wanted to make sure what got implemented was what you intended. Let me know if you have moment to check up on this with me.
Flags: needinfo?(ehsan)
Assignee | ||
Comment 24•10 years ago
|
||
No sorry this landed for Firefox 32, which will come out in less than 6 weeks now. So I wouldn't expect any changes to Firefox 31 based on my patch here.
Flags: needinfo?(ehsan)
Reporter | ||
Comment 25•10 years ago
|
||
Oh dear...I have FF32 now and my test page (which is really the test page you created, I just copied it) with lang=en-US still doesn't work. Let me know if you can look into this with me, again.
Flags: needinfo?(ehsan.akhgari)
Assignee | ||
Comment 26•10 years ago
|
||
(In reply to Porcelain Mouse from comment #25)
> Oh dear...I have FF32 now and my test page (which is really the test page
> you created, I just copied it) with lang=en-US still doesn't work. Let me
> know if you can look into this with me, again.
Can you please file another bug and include as much information as you can? I'm pretty sure we fixed this bug, but there might be others to fix.
Flags: needinfo?(ehsan.akhgari)
Assignee | ||
Updated•10 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•