Closed Bug 860550 Opened 12 years ago Closed 7 years ago

[keyboard] [meta] autocorrect - bad correction examples

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: djf, Unassigned)

References

Details

(Whiteboard: c=auto-correct)

Now that we're experimenting with enabling auto correction, this bug is for listing any examples anyone finds find of incorrect or sub-optimal corrections, in any language.

Please list any bad corrections you've seen and (if it isn't obvious) say what you think the right correction should have been.
Blocks: 797170
David,

Gregor found a good example and I just verified my/our worst case assumptions with the XML file.
user input: alway
suggestions: Alwyn (73), always (59.2), away (47)

one might think that 'always' should be ranked highest. The problem is averaging the frequency of nodes in compressed suffix nodes.

Let's have a look at the XML:
<w f="150" flags="">away</w>
<w f="145" flags="">always</w>
<w f="55" flags="">Alwyn</w>

The problem is that so many low ranked words with the postfix 'ways' exist in the dictionary, e.g.: wireways (10), layaways (10), fadeaways (10), and many more, which lower the frequency/rank of 'alway' down to 37, whereas 'Alwy' is upgraded to 73.
(In reply to Christoph Kerschbaumer from comment #1)
> David,
> 
> Gregor found a good example and I just verified my/our worst case
> assumptions with the XML file.
> user input: alway
> suggestions: Alwyn (73), always (59.2), away (47)
> 
> one might think that 'always' should be ranked highest. The problem is
> averaging the frequency of nodes in compressed suffix nodes.
> 
> Let's have a look at the XML:
> <w f="150" flags="">away</w>
> <w f="145" flags="">always</w>
> <w f="55" flags="">Alwyn</w>
> 
> The problem is that so many low ranked words with the postfix 'ways' exist
> in the dictionary, e.g.: wireways (10), layaways (10), fadeaways (10), and
> many more, which lower the frequency/rank of 'alway' down to 37, whereas
> 'Alwy' is upgraded to 73.

Can we have buckets of rankings? Like different buckets for low, middle and highly ranked words?
I was hoping to use this bug as a simple dumping ground for examples of bad suggestions and auto-corrections. Since we know the underlying cause for the bad 'alway' suggestions, let's continue the discussion of that case in bug 860624.
vome -> wanted to type 'come'. Suggestions are home(blue), homes, hometown, vome.
I guess after 'home' we should suggest come and not homes or hometown.
In pt-BR, "um" is not in the dictionary but the female form ("uma") is, so typing "um " autocorrects to "uma". This should be fixed, and "um" should have a higher priority than "uma".
Paper cuts!

Also in pt-BR, typing "nada," and then pressing space removes the comma (and other punctuation marks). When I enter the comma, the first suggestion ("nada") is selected, and the space triggers it.

"con" should probably autocorrect to "com".

"mel" autocorrects to "melhor", which kind of makes sense because the latter is more common, but it should probably autocorrect to "mel" after you type the 'l', as that's a valid word.

There's no way to put a period or comma after a word whose first suggestion is not the one you want to type, like "mel.":

1) Type "mel"
2) Choose "mel" in the suggestions list. A space is added.
3) Backspace and type '.'
4) The first suggestion, "melhor", is chosen instead.

"voce" autocorrects to "vice", should *definitely* be "você".

After typing "gati", "gato" is not in the suggestions list, it should, it's a common one letter typo from wanting to type "cat" in Portuguese.

Scrolling in the suggestions list is impossible for me on the Unagi.
QA Contact: whsu
Adding whiteboard tags for tracking via srumbu.gs.
Whiteboard: c=keyboard
Did you guys measure the memory usage here?  What's it look like?
jlebar: I did not measure. (I don't really know how other than adb shell b2gps). It shouldn't be too bad.  The dictionary file for english is < 2mb, and the algorithm doesn't hold on to many objects. It is breadth first so there isn't memory used up on a huge recursive stack.
> (I don't really know how other than adb shell b2gps).

From your b2g root directory, run tools/get_about_memory.py.

> It shouldn't be too bad.

Famous last words!
Whiteboard: c=keyboard → c=auto-correct
Blocks: 873934
Summary: [keyboard] Damn You Autocorrect → [keyboard] [meta] autocorrect - bad correction examples
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → ARM
For input "ch" I sometimes get "cha" and sometimes "VH". I don't know if this is a bug or just two suggestions with exactly the same weight being sorted in different orders at different times. 

If I start a new message in the message app and type "ch" It suggests cha. Then, if I lock the screen, unlock it, and tap in the field again, the suggestion changes to VH.  If I discard the message and start a new one, I get cha again.

I'm not sure what VH is an acronym for, but in the dictionary it is almost as frequent as the word cha, and since it matches the length of the input it is a pretty good auto-correction. So I'm guessing that these are just two corrections with exactly the same weight and some non-determinism in the sort order or something.

I'm not really sure this needs fixing. But I suppose I could break ties somehow in the sort comparator.
If a word can be both captialized or not (words like west that are also valid last names) and you type it at the beginning of a sentence, then both forms are offered as suggestions, but because we capitalize them to match the input, we end up with two suggestions that are identical.

We should not do that.  I don't think it is worth requesting an additional suggestion from the prediction engine to handle this case, but we shouldn't display duplicates, I think.
No longer blocks: 797170
The bugs described in comment 11 and comment 12 have been fixed.

The issue in comment 4 does not reproduce anymore.

I have not investigated the portuguese corrections in comments 5 & 6.

I worry that this (from comment 6) is a bug, however: 

  There's no way to put a period or comma after a word whose first suggestion is not the one you want to type, like "mel.":

We should probably follow Android here and not insert a space with the suggestion, but wait and add it if the next key is a letter.
Assignee: nobody → dflanagan
No longer blocks: 873934
In bug 881239, the reporter complains that she can't type http: in a text message because it autocorrects to HTTP.  As contemplated in bug 880117, I think maybe a local wordlist with http in it is the way to fix this.
Adding a related case. bug 888812
Please confirm if the auto-correction can correct upper-case letters.
Adding a related case.
Bug 890207 - [B2G][Gaia][Keyboard][Auto-correction] If a wording include special characters then moving the cursor to the end of a line, the cursor will disappear.
This bug seems like it may still have useful comments for things we should fix. But I'm not actively working on autocorrection now, so unassigning myself.
Assignee: dflanagan → nobody
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.