Closed
Bug 857850
Opened 12 years ago
Closed 12 years ago
[keyboard] predictions aren't ranked correctly
Categories
(Firefox OS Graveyard :: Gaia::Keyboard, defect)
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: djf, Unassigned)
References
Details
No description provided.
Reporter | ||
Comment 1•12 years ago
|
||
I added the following patch to predictions.js to help me understand what the prediction engine was doing: diff --git a/apps/keyboard/js/imes/latin/predictions.js b/apps/keyboard/js/imes/ index c4adf30..515eeb4 100644 --- a/apps/keyboard/js/imes/latin/predictions.js +++ b/apps/keyboard/js/imes/latin/predictions.js @@ -277,6 +277,10 @@ var Predictions = function() { } // Record the suggestion and move to the next best candidate if (!(prefix in _suggestions_index)) { + log("candidate: " + cand.prefix + + " suggestion: " + prefix + + " frequency: " + node.freq + + " multiplier: " + cand.multiplier); _suggestions.push(prefix); _suggestions_index[prefix] = true; } When I typed 'r', I got this output: E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: r suggestion: released frequency: 47 multiplier: 4 E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: re suggestion: received frequency: 47 multiplier: 4 E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: rec suggestion: record frequency: 154 multiplier: 4 Notice that the third candidate has much higer frequency than the first two. Also, after considering 'r' itself and picking 'released' as the best match, it then uses 're' as the candidate, picks 'received', and then uses 'rec' as the candiate and suggests 'record'. It doesn't seem to consider words beginning with 'ra', 'ri', etc. As another example, if I type 'te', I get this output: E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: team frequency: 164 multiplier: 2.5 E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: television frequency: 88 multiplier: 2.5 E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: term frequency: 153 multiplier: 2.5 The second candidate has a much lower frequency than the third candidate.
Comment 2•12 years ago
|
||
You just printed the wrong freq, the one stored in the candidate is the right one, not the one in the node. Try applying this diff. diff --git a/apps/keyboard/js/imes/latin/predictions.js b/apps/keyboard/js/imes/latin/predictions.js index c4adf30..cc66e84 100644 --- a/apps/keyboard/js/imes/latin/predictions.js +++ b/apps/keyboard/js/imes/latin/predictions.js @@ -277,6 +277,7 @@ var Predictions = function() { } // Record the suggestion and move to the next best candidate if (!(prefix in _suggestions_index)) { + dump("cand: " + cand.prefix + ", sugg: " + prefix + ", cand.freq: " + cand.freq + ", mult: " + cand.multiplier + "\n"); _suggestions.push(prefix); _suggestions_index[prefix] = true; } Tapping 'r' returns this: cand: r, sugg: released, node.freq: 648, mult: 4 cand: re, sugg: received, node.freq: 628, mult: 4 cand: rec, sugg: record, node.freq: 616, mult: 4 which makes sense because, e.g., realeased frequency: 162 * 4 = 648. Nevertheless, I agree we should use multipliers in the range of 1.1 to 1.4 which on the one hand pushes for matched prefixes, but on the other hand leaves room for alternative suggestions to be ranked higher, but still in the range less than 255.
Reporter | ||
Comment 3•12 years ago
|
||
Its hard to believe that "released", "received" and "record" are the three most common words that start with r in English, but that is what the dictionary says. I wonder what sort of corpus Google was using when compiling those? Sounds like technical or business language. So I guess that for any given node in the tree, the frequency is the frequency of the most common word underneath that node? I need to pass this frequency back to latin.js, so I'll change my code to use cand.freq instead of node.freq.
Reporter | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•