Spellcheck abilities (at least for Hungarian language) has decreased between Ffx95 and 96 beta
Categories
(Core :: Spelling checker, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox95 | --- | unaffected |
firefox96 | + | verified |
firefox97 | + | verified |
People
(Reporter: szalai.kalman, Assigned: emilio)
References
(Regression)
Details
(Keywords: regression)
Attachments
(3 files)
(deleted),
image/png
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
text/x-phabricator-request
|
diannaS
:
approval-mozilla-release+
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0
Steps to reproduce:
I just copy a batch of Hungarian text (from here: https://hu.wikipedia.org/wiki/Sz%C3%B6veg) to a text box.
"A szöveg megfelelője gyakorlatilag az összes európai nyelvben "Text" (különböző írásképekkel a nemzeti helyesírás miatt), ami a latin "textum" szóból ered, amely szó eredeti jelentése: szövet, szöveg. A magyarban a nyelvújítás idején a jelentést magyar szóval jelöltük. A szöveg egy összefüggő és a környezetétől jól elhatárolt vagy elhatárolható megnyilvánulás, kijelentés írott vagy tágabb értelemben nem írott de (le)írható nyelven. A nem feltétlenül írott, de leírható szövegre példa a dalszöveg, egy film szövege vagy improvizált színházi szöveg."
Actual results:
Copy works, but the spellchecker is not. In Firefox 95 the spellchecker only shows two errors "Text" and "textum" which are correct. Firefox 96 b10 shows more than 20 misspelled words, that is incorrect.
It seems the problem is related to words that contains affixes, suffixes or prefixes.
Are other agglutinative languages affected by this?
Expected results:
Somehow the recognize performance of spellchecker is getting worse during 95 ro 96 beta change (same for nighties too).
I excepting same correct spellchecking work as worked in Firefox 95.
Reporter | ||
Comment 1•3 years ago
|
||
Reporter | ||
Comment 2•3 years ago
|
||
:emilio , could you check this bug?
Comment 3•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Spelling checker' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
I can repro this:
- Install Hungarian dictionary: https://addons.mozilla.org/en-US/firefox/addon/hungarian-dictionary/
- Copy-paste this to URL bar:
data:text/html;charset=utf-8,<div contenteditable spellcheck="true" lang=hu>A sz%C3%B6veg megfelel%C5%91je gyakorlatilag az %C3%B6sszes eur%C3%B3pai nyelvben "Text" (k%C3%BCl%C3%B6nb%C3%B6z%C5%91 %C3%ADr%C3%A1sk%C3%A9pekkel a nemzeti helyes%C3%ADr%C3%A1s miatt), ami a latin "textum" sz%C3%B3b%C3%B3l ered, amely sz%C3%B3 eredeti jelent%C3%A9se: sz%C3%B6vet, sz%C3%B6veg. A magyarban a nyelv%C3%BAj%C3%ADt%C3%A1s idej%C3%A9n a jelent%C3%A9st magyar sz%C3%B3val jel%C3%B6lt%C3%BCk. A sz%C3%B6veg egy %C3%B6sszef%C3%BCgg%C5%91 %C3%A9s a k%C3%B6rnyezet%C3%A9t%C5%91l j%C3%B3l elhat%C3%A1rolt vagy elhat%C3%A1rolhat%C3%B3 megnyilv%C3%A1nul%C3%A1s, kijelent%C3%A9s %C3%ADrott vagy t%C3%A1gabb %C3%A9rtelemben nem %C3%ADrott de (le)%C3%ADrhat%C3%B3 nyelven. A nem felt%C3%A9tlen%C3%BCl %C3%ADrott, de le%C3%ADrhat%C3%B3 sz%C3%B6vegre p%C3%A9lda a dalsz%C3%B6veg, egy film sz%C3%B6vege vagy improviz%C3%A1lt sz%C3%ADnh%C3%A1zi sz%C3%B6veg.
- Click somewhere within the text
Firefox 95: Only shows two errors "Text" and "textum" which are correct, as the reporter said.
Firefox 97: Errors nearly everywhere.
Jari, I think you are working on spell checker, could you take a look?
:emilio , could you check this bug?
Also adding NI to Emilio as the reporter requested.
Assignee | ||
Comment 5•3 years ago
|
||
[Tracking Requested - why for this release]: Spell-checking in some language regressed to the point of being made useless.
I don't know why I was ni?d here but I can repro and mozregression says:
Bobby can you take a look? I can try too in case you're swamped.
Updated•3 years ago
|
Assignee | ||
Comment 6•3 years ago
|
||
Yeah, we're hitting https://searchfox.org/mozilla-central/rev/996a2cafe472e9934b8cb91db63050f96d8a59cb/extensions/spellcheck/hunspell/src/hashmgr.cxx#1403 on a debug build.
Assignee | ||
Comment 7•3 years ago
|
||
Some dictionaries might use more memory for some words than what we were
allowing.
Updated•3 years ago
|
Comment 8•3 years ago
|
||
I suspect this might be connected to bug 1737396
Unfortunately I haven't worked on the spell checker. Maybe Olli would know more?
Let's see Emilio's patch works there too. Thank you Emilio!
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Assignee | ||
Comment 11•3 years ago
|
||
Comment on attachment 9257518 [details]
Bug 1748408 - Allow bigger chunks in hunspell. r=bholley
Beta/Release Uplift Approval Request
- User impact if declined: Some spell checkers would misfunction
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: comment 4
- List of other uplifts needed: none
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Simple tweak to regressing bug to avoid going over the chunk size.
- String changes made/needed: none
Assignee | ||
Updated•3 years ago
|
Comment 12•3 years ago
|
||
Just a question regarding the high memory fragmentation of Hunspell dictionary. Is it possible to fix the memory fragmentation and higher memory usage in the Hunspell upstream, or this caused problems only in sandboxing?
Comment 14•3 years ago
|
||
(In reply to Kami from comment #12)
Just a question regarding the high memory fragmentation of Hunspell dictionary. Is it possible to fix the memory fragmentation and higher memory usage in the Hunspell upstream, or this caused problems only in sandboxing?
The issue is not specific to sandboxing, though the net amount of fragmentation will be allocator-dependent (wasi-libc uses dlmalloc, so that's where the measurements came from). This would be a reasonable change to take upstream if the maintainers are interested.
Comment 15•3 years ago
|
||
bugherder |
Comment 16•3 years ago
|
||
Could we get this fix into Firefox 96 Branch too?
Updated•3 years ago
|
Comment 17•3 years ago
|
||
Reproduced the issue with Firefox 97.0a1 (20220104214425) on Windows 10x64 sing STR from comment 4.
The issue is no longer reproducible with 97.0a1 (20220106034727) on Windows 10x64, macOS 10.15 and Ubuntu 20.04. Only errors for Text
and textum
words are displayed.
Comment 18•3 years ago
|
||
Comment on attachment 9257518 [details]
Bug 1748408 - Allow bigger chunks in hunspell. r=bholley
Approved for 96.0rc2
Comment 19•3 years ago
|
||
bugherder uplift |
Comment 20•3 years ago
|
||
Verified fixed with Firefox 96.0 RC2 (20220106144528) on Windows 10x64, macOS 10.15 and Ubuntu 20.04.
Reporter | ||
Comment 21•3 years ago
|
||
Works well for me in Nightly.
Updated•3 years ago
|
Description
•