Closed Bug 1126076 Opened 10 years ago Closed 9 years ago

Add Hausa (ha) Wordlist/Dictionary

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: delphine, Unassigned, NeedInfo)

References

Details

Attachments

(3 files)

firefoxos_2.0-ha.zip 10 years ago ian.henderson (deleted), application/x-zip-compressed		Details
spartacus-ha.zip 10 years ago ian.henderson (deleted), application/x-zip-compressed		Details
fireplace-ha.zip 10 years ago ian.henderson (deleted), application/x-zip-compressed		Details

Delphine Lebédel [:delphine - Off until Sept 19]

Reporter

Description

•

10 years ago

Please add Hausa Wordlist and Dictionary to Firefox OS

Delphine Lebédel [:delphine - Off until Sept 19]

Reporter

Comment 1

•

10 years ago

Adding localizer to see if he can help with feedback here. thanks

Flags: needinfo?(mcsteann)

Delphine Lebédel [:delphine - Off until Sept 19]

Reporter

Comment 2

•

10 years ago

No update from localizer, so asking: * Peiying can you get Rubric plugged in to help with this? * Kevin: could you help out also? thanks all!

Flags: needinfo?(pmo)

Flags: needinfo?(kscanne)

Peiying Mo [:CocoMo]

Comment 3

•

10 years ago

+ Devon and Ian, we need Rubric's advice on this.

Flags: needinfo?(pmo)

ian.henderson

Comment 4

•

10 years ago

(In reply to Peiying Mo [:CocoMo] from comment #3) > + Devon and Ian, we need Rubric's advice on this. Let's try the same approach as my Comment 12 on Bug 1121730

Kevin Scannell

Comment 5

•

10 years ago

There's a good amount of Hausa online, but like Lingala, only a small percentage (about 4% by my best estimate) uses the correct "special" characters (in this case ɓ, ƙ, ɗ). The other issue is that there is no clean word list that uses the correct characters. The Firefox addon here: https://addons.mozilla.org/en-us/firefox/addon/hausa-spelling-dictionary/ is virtually all ASCII. From earlier work of mine on a spellchecker, I have what I think is a pretty comprehensive list of (~500) pairs of words that are correct as either ASCII or with special characters (ƙasa/kasa, saƙo/sako, etc.) With that in mind, here's what I'd propose: (1) I use the full web corpus to produce an ASCII-only frequency list, maybe validating against the Firefox spellchecker (I haven't checked its coverage, so I don't know if that's worthwhile) (2) Use the 4% of properly-encoded web texts to produce a word list of words containing special characters, everything appearing more than, say, 2 or 3 times. (3) Use the frequency of the ASCII version from (1) as a proxy for the frequency of the presumed-correct words from step (2)... (4) *except* if the word is in my list of special cases (ƙasa/kasa, saƙo/sako, etc.) Here it's not clear what to do. I suppose could split the frequency of the ASCII version from (1) according to the relative proportions that I see in the good (4%) corpus. I'd be grateful for some feedback from the Hausa team on this. If they don't care about preserving the special characters then I suppose I don't need to bother with any of this!

Flags: needinfo?(kscanne)

Friedel Wolff

Comment 6

•

10 years ago

Kevin, you can also have a look at the PO files for Gaia itself: https://github.com/translate/mozilla-gaia/commits/2.0/ha Here are also a few from old GNOME translations: https://l10n.gnome.org/POT/gnome-panel.master/gnome-panel.master.ha.po https://l10n.gnome.org/POT/metacity.master/metacity.master.ha.po https://l10n.gnome.org/POT/nautilus.master/nautilus.master.ha.po (be sure to include obsolete messages if you can - there seems to be quite some text there). Maybe this is already in your web corpus, but if not, hopefully gives you a little bit more text (horribly biased, unfortunately). I saw at least some non-ASCII characters in these files. I don't know how frequent it is supposed to be. About your plan: If your list of 500 is fairly complete, it mostly sounds good. I guess you can also augment the 500 with what you see in the 4% corpus. Is the 4% big enough, though? Any issues of balance in the 4%?

Kevin Scannell

Comment 7

•

10 years ago

(In reply to Friedel Wolff from comment #6) > Kevin, you can also have a look at the PO files for Gaia itself: > https://github.com/translate/mozilla-gaia/commits/2.0/ha > > Here are also a few from old GNOME translations: > https://l10n.gnome.org/POT/gnome-panel.master/gnome-panel.master.ha.po > https://l10n.gnome.org/POT/metacity.master/metacity.master.ha.po > https://l10n.gnome.org/POT/nautilus.master/nautilus.master.ha.po > (be sure to include obsolete messages if you can - there seems to be quite > some text there). > > Maybe this is already in your web corpus, but if not, hopefully gives you a > little bit more text (horribly biased, unfortunately). I saw at least some > non-ASCII characters in these files. I don't know how frequent it is > supposed to be. Thanks. > > About your plan: If your list of 500 is fairly complete, it mostly sounds > good. I guess you can also augment the 500 with what you see in the 4% > corpus. Is the 4% big enough, though? Any issues of balance in the 4%? It's about 250k words total, ~19k unique words. Heavily biased towards religious texts, so I'd rather not use frequencies from it if possible.

ian.henderson

Comment 8

•

10 years ago

Attached file firefoxos_2.0-ha.zip (deleted) — Details

Here is the FFOS

ian.henderson

Comment 9

•

10 years ago

Attached file spartacus-ha.zip (deleted) — Details

ian.henderson

Comment 10

•

10 years ago

Attached file fireplace-ha.zip (deleted) — Details

ian.henderson

Comment 11

•

10 years ago

ian.henderson

Comment 12

•

10 years ago

Hello Kevin, Another bit of corpus: https://localize.mozilla.org/ha/masterfirefoxos/

Peiying Mo [:CocoMo]

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Add Hausa (ha) Wordlist/Dictionary

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

Tracking

(Not tracked)

People

(Reporter: delphine, Unassigned, NeedInfo)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Attachment

General

Description

File Name

Content Type