Closed Bug 1180113 Opened 9 years ago Closed 9 years ago

Allow for out of dictionary words to be recognized in Pocketsphinx

Categories

(Core :: Web Speech, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
feature-b2g 2.5+
Tracking Status
firefox42 --- fixed

People

(Reporter: kdavis, Assigned: anatal)

References

Details

(Whiteboard: [webspeechapi][vaani])

Attachments

(2 files, 9 obsolete files)

(deleted), patch
anatal
: review+
Details | Diff | Splinter Review
(deleted), patch
anatal
: review+
Details | Diff | Splinter Review
Currently Pocketsphinx only allows for words in its dictionary to be recognized. This is insufficient for our needs. For example, for applications such as calling we need proper names such as "Sandip" to be recognized even if they are not in Pocketsphinx's dictionary. The goal of this bug is to allow Pocketsphinx to recognize words not in its dictionary.
Assignee: nobody → anatal
Whiteboard: [webspeechapi][vaani][systemsfe]
Whiteboard: [webspeechapi][vaani][systemsfe] → [webspeechapi][vaani]
Blocks: Meta-Vaani
Blocks: 1178264
Part 1 of 2 : Adds the new Arpa model trained from CMU LM to the repo next to cmudict.
Part 1 of 2 : Adds the new Arpa model trained from CMU LM to the repo next to cmudict.
Attachment #8635099 - Attachment is obsolete: true
Part 2 of 2: Introduces the g2p algorithm integrated into pocketsphinx codebase
Who should review this?
(In reply to kdavis from comment #5) > Who should review this? I am waiting the successful end of the run on try and the CMU review to then ask it inside Mozilla, but if you want to do it before, it's up to you.
Ok. Once I'm done some work on Bug 1178326 I'll take a look too.
Attachment #8635100 - Flags: review?(kdavis)
Attachment #8635100 - Flags: review?(bugs)
Attachment #8635101 - Flags: review?(kdavis)
Attachment #8635101 - Flags: review?(bugs)
I put Olli on too as he's mentioned in the commit comments.
Attachment #8635100 - Attachment is obsolete: true
Attachment #8635100 - Flags: review?(kdavis)
Attachment #8635100 - Flags: review?(bugs)
Attachment #8635101 - Attachment is obsolete: true
Attachment #8635101 - Flags: review?(kdavis)
Attachment #8635101 - Flags: review?(bugs)
Remarks: 1 - This patch was reviewed by pocketsphinx group and will eventually land in their repo and then maybe cmudict will be no more required. 2 - This algorithm works to every language that have a proper arpa model generated from a statistical language model 3 - Try is still being executed and when finish the review flags will be requested
Attachment #8635217 - Flags: review?(kdavis)
Attachment #8635217 - Flags: review?(bugs)
Attachment #8635218 - Flags: review?(kdavis)
Attachment #8635218 - Flags: review?(bugs)
Attachment #8635217 - Attachment is obsolete: true
Attachment #8635217 - Attachment is patch: false
Attachment #8635217 - Flags: review?(kdavis)
Attachment #8635217 - Flags: review?(bugs)
Attachment #8635218 - Attachment is obsolete: true
Attachment #8635218 - Flags: review?(kdavis)
Attachment #8635218 - Flags: review?(bugs)
Attachment #8635217 - Attachment is obsolete: false
Attachment #8635217 - Attachment is patch: true
Attachment #8635217 - Flags: review?(kdavis)
Attachment #8635217 - Flags: review?(bugs)
Attachment #8635608 - Attachment description: This parch introduces new g2p algorithm into pocketsphinx to allow out of dictionary words to be added to grammars → This patch introduces new g2p algorithm into pocketsphinx to allow out of dictionary words to be added to grammars
Attachment #8635217 - Attachment description: Bug 1180113 - Part 1 : Add arpa model into gecko for g2p generation → This patch adds arpa model into gecko for g2p generation
Comment on attachment 8635608 [details] [diff] [review] This patch introduces new g2p algorithm into pocketsphinx to allow out of dictionary words to be added to grammars This version have improvements searching the best path.
Comment on attachment 8635217 [details] [diff] [review] This patch adds arpa model into gecko for g2p generation I assume the license model for en-US.dic.dmp is something we can use, and not anything special. Could you please check.
Attachment #8635217 - Flags: review?(bugs) → review+
Comment on attachment 8635608 [details] [diff] [review] This patch introduces new g2p algorithm into pocketsphinx to allow out of dictionary words to be added to grammars >- char const *dictfile = NULL, *fillerfile = NULL; >+ char const *dictfile = NULL, *fillerfile = NULL, *arpafile = NULL; > > if (config) { > dictfile = cmd_ln_str_r(config, "-dict"); > fillerfile = cmd_ln_str_r(config, "-fdict"); >+ arpafile = string_join(dictfile, ".dmp", NULL); > } > > /* >@@ -303,6 +304,17 @@ dict_init(cmd_ln_t *config, bin_mdef_t * mdef) > * Also check for type size restrictions. > */ > d = (dict_t *) ckd_calloc(1, sizeof(dict_t)); /* freed in dict_free() */ >+ >+ if (arpafile) { >+ ngram_model_t *ngram_g2p_model = ngram_model_read(NULL,arpafile,NGRAM_AUTO,logmath); >+ if (!ngram_g2p_model) { >+ E_ERROR("No arpa model found \n"); >+ return NULL; You leak arpafile here. And in fact you leak if if the method returns anytime before the current ckd_free(arpafile); call. So, please do if (config) { arpafile = string_join(dictfile, ".dmp", NULL); } right before if (arpafile) { and then free the allocated string buffer after ngram_model_read(NULL,arpafile,NGRAM_AUTO,logmath); call. I can't review the rest without having some kind of spec for the file type. Say, I have no idea what dict_split_unigram should do. I can only read what its current implementation does. Are you kdavis more familiar with this stuff?
Attachment #8635608 - Flags: review?(bugs)
(In reply to Olli Pettay [:smaug] (about to have some vacation, expect slower than usual reviews) from comment #16) > Comment on attachment 8635217 [details] [diff] [review] > This patch adds arpa model into gecko for g2p generation > > I assume the license model for en-US.dic.dmp is something we can use, and > not anything special. Could you please check. I compiled the .dmp using sphinx_lm_convert from its text version [1] that was generated from standard pocketsphinx language model using phonetisaurus. The license is same as everything else from pocketsphinx [2] [1] https://www.dropbox.com/s/bj3mgli5iv5qdcx/cmudict-g2p.lm?dl=0 [2] https://github.com/cmusphinx/pocketsphinx/blob/master/LICENSE
Attachment #8635608 - Attachment is obsolete: true
Attachment #8635608 - Flags: review?(kdavis)
Attachment #8635877 - Flags: review?(kdavis)
(In reply to Olli Pettay [:smaug] (about to have some vacation, expect slower than usual reviews) from comment #17) > Comment on attachment 8635608 [details] [diff] [review] > This patch introduces new g2p algorithm into pocketsphinx to allow out of > dictionary words to be added to grammars > > >- char const *dictfile = NULL, *fillerfile = NULL; > >+ char const *dictfile = NULL, *fillerfile = NULL, *arpafile = NULL; > > > > if (config) { > > dictfile = cmd_ln_str_r(config, "-dict"); > > fillerfile = cmd_ln_str_r(config, "-fdict"); > >+ arpafile = string_join(dictfile, ".dmp", NULL); > > } > > > > /* > >@@ -303,6 +304,17 @@ dict_init(cmd_ln_t *config, bin_mdef_t * mdef) > > * Also check for type size restrictions. > > */ > > d = (dict_t *) ckd_calloc(1, sizeof(dict_t)); /* freed in dict_free() */ > >+ > >+ if (arpafile) { > >+ ngram_model_t *ngram_g2p_model = ngram_model_read(NULL,arpafile,NGRAM_AUTO,logmath); > >+ if (!ngram_g2p_model) { > >+ E_ERROR("No arpa model found \n"); > >+ return NULL; > You leak arpafile here. > And in fact you leak if if the method returns anytime before the current > ckd_free(arpafile); call. > So, please do > if (config) { > arpafile = string_join(dictfile, ".dmp", NULL); > } > right before if (arpafile) { and then free the allocated string buffer after > ngram_model_read(NULL,arpafile,NGRAM_AUTO,logmath); call. > Ok, fixed on new patch. > I can't review the rest without having some kind of spec for the file type. > Say, I have no idea what dict_split_unigram should do. I can only read what > its current implementation does. I sent the text arpa lm file on comment #18
(In reply to Andre Natal from comment #20) > (In reply to Olli Pettay [:smaug] (about to have some vacation, expect > slower than usual reviews) from comment #17) > > Comment on attachment 8635608 [details] [diff] [review] > > This patch introduces new g2p algorithm into pocketsphinx to allow out of > > dictionary words to be added to grammars > > > > >- char const *dictfile = NULL, *fillerfile = NULL; > > >+ char const *dictfile = NULL, *fillerfile = NULL, *arpafile = NULL; > > > > > > if (config) { > > > dictfile = cmd_ln_str_r(config, "-dict"); > > > fillerfile = cmd_ln_str_r(config, "-fdict"); > > >+ arpafile = string_join(dictfile, ".dmp", NULL); > > > } > > > > > > /* > > >@@ -303,6 +304,17 @@ dict_init(cmd_ln_t *config, bin_mdef_t * mdef) > > > * Also check for type size restrictions. > > > */ > > > d = (dict_t *) ckd_calloc(1, sizeof(dict_t)); /* freed in dict_free() */ > > >+ > > >+ if (arpafile) { > > >+ ngram_model_t *ngram_g2p_model = ngram_model_read(NULL,arpafile,NGRAM_AUTO,logmath); > > >+ if (!ngram_g2p_model) { > > >+ E_ERROR("No arpa model found \n"); > > >+ return NULL; > > You leak arpafile here. > > And in fact you leak if if the method returns anytime before the current > > ckd_free(arpafile); call. > > So, please do > > if (config) { > > arpafile = string_join(dictfile, ".dmp", NULL); > > } > > right before if (arpafile) { and then free the allocated string buffer after > > ngram_model_read(NULL,arpafile,NGRAM_AUTO,logmath); call. > > > > Ok, fixed on new patch. > > > I can't review the rest without having some kind of spec for the file type. > > Say, I have no idea what dict_split_unigram should do. I can only read what > > its current implementation does. > Andre, I think you forgot to add Olli as a reviewer on the new patch?
(In reply to Andre Natal from comment #18) > I compiled the .dmp using sphinx_lm_convert from its text version [1] that > was generated from standard pocketsphinx language model using phonetisaurus. Would it be possible to have also the text version in our repository? Or perhaps only that and run sphinx_lm_convert when compiling Gecko?
Comment on attachment 8635217 [details] [diff] [review] This patch adds arpa model into gecko for g2p generation Review of attachment 8635217 [details] [diff] [review]: ----------------------------------------------------------------- I am wondering about new languages. It seems to me that there are several steps that are hidden here, and it’d be great if we had the tool chain used to do these hidden steps so we can use it for new languages. As far as I can surmise this process starts with two things: * Phonetic dictionary * Text corpus The process then is: 1. One converts the text corpus to phonemes using the phonetic dictionary to obtain a phonetic text corpus 2. One coverts this phonetic text corpus to a "phonetic language model" in lm format 3. One converts the lm format "phonetic language model" to a dmp file using sphinx_lm_convert It’d be great if you could create a github repo that contains the scripts that go from a phonetic dictionary and text corpus to the final dmp so we can apply these scripts to new languages. Can you open a new bug for this task and put me on CC? For FxOS 2.5 we are committed to the languages English, Spanish, and French. For each of these languages CMU provides a phonetic dictionary. Also, I’ve created text corpora for each of these languages on S3 from Wikipedia dumps. So, all we need now is your github repo with the associated scripts! From a legal standpoint I guess my question is: "What is the licensing status of cmudict-g2p.lm?” You mentioned its the same as Pocketsphinx, but I don’t see where one can download cmudict-g2p.lm from CMU with an associated license. Can you send out the link?
Attachment #8635217 - Flags: review?(kdavis) → review+
Attachment #8635877 - Flags: review?(bugs)
(In reply to Olli Pettay [:smaug] (away-ish until July 26, expect slower than usual reviews) from comment #17) > Comment on attachment 8635608 [details] [diff] [review] > > Say, I have no idea what dict_split_unigram should do. I can only read what > its current implementation does. As Andre hasn't chimed in to explain his code, I'll give my take on what the function dict_split_unigram does. The function dict_split_unigram is tasked with parsing a "word", not really a word but more on that later, into a unigram_t, a mapping of one, or more, graphemes (the smallest unit used in the writing system of a language) to one or more phonemes (basic unit of a language's phonology). More specifically the dmp file, which is only a reformatted lm file, contains n-grams, not for words, but for phonemes and their associated graphemes. So, for example the lm version of the dmp file would have lines of the form... ... -5.2688 a|c}AA -0.5826 -5.0135 a|c}AE -1.7016 -5.3657 a|g}AE -0.6989 -3.4066 a|i}EH -1.6086 ... These lines are for 1-grams, and, for example, the line -5.2688 a|c}AA -0.5826 indicates that the graphemes 'a' or 'c' are pronounced as 'AA' (This sounds like the beginning of the word "odd") with probability P where the log base 10 of P is -5.2688 The final number -0.5826 is a backoff weight which I will purposely ignore for now. The "word" I talked about earlier is not really a word, but is one of the string entries, including the pipe '|' and '}', from the lm file. So from our example all the "words" are as follows ... a|c}AA a|c}AE a|g}AE a|i}EH ... what the function dict_split_unigram does it to, given one of these words, "a|c}AA" for example, split it into 'letter' part, in this case "ac", and a 'phone' part, in this case "AA" then creates a unigram_t from the two parts as follows unigram_t unigram = { letter , phone}; and returns this to the caller.
(In reply to Olli Pettay [:smaug] (away-ish until July 26, expect slower than usual reviews) from comment #17) > Comment on attachment 8635608 [details] [diff] [review] > > I can't review the rest without having some kind of spec for the file type. As Andre hasn't chimed in to explain the file types, I'll try to do so. The lm file format, that Andre linked to on dropbox, is used to describe n-grams. There is no "official" description of the file format. But, for example, a good to adequate description of the lm file format is given here: http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html The dmp file format serves the same purpose as the lm file format, describing of n-grams. However, the dmp file format is binary. The tool Andre mentions above, sphinx_lm_convert, is used to convert from an lm file to a dmp file. As to the format of the dmp file, the only "documentation" I know of is the source of the file that does the conversion from a ngram_model_t to a dmp file: https://github.com/skerit/cmusphinx/blob/01f79d8bf76f94318b4b9f37ab28fbbe9e78d4bf/multisphinx/sphinxbase/ngram_model_dmp.c#L872 Unfortunately, there doesn't seem to be any real human readable documentation of the file format.
(In reply to kdavis from comment #23) > Comment on attachment 8635217 [details] [diff] [review] > This patch adds arpa model into gecko for g2p generation > > From a legal standpoint I guess my question is: > > "What is the licensing status of cmudict-g2p.lm?” > > You mentioned its the same as Pocketsphinx, but I don’t see where > one can download cmudict-g2p.lm from CMU with an associated > license. Can you send out the link?
Flags: needinfo?(anatal)
(In reply to kdavis from comment #24) > (In reply to Olli Pettay [:smaug] (away-ish until July 26, expect slower > than usual reviews) from comment #17) > > Comment on attachment 8635608 [details] [diff] [review] > > > > Say, I have no idea what dict_split_unigram should do. I can only read what > > its current implementation does. > > As Andre hasn't chimed in to explain his code, I'll give my take on what the > function dict_split_unigram does. I was about to do it, but I talked with Olli by email and he said was in vacations and only next week would do it. But congratulations for your explanation. It is not completely correct but give an idea.
Hi Andrew, can you please help with this?
Flags: needinfo?(overholt)
Flags: needinfo?(anatal)
(In reply to Andre Natal from comment #27) > (In reply to kdavis from comment #24) > > (In reply to Olli Pettay [:smaug] (away-ish until July 26, expect slower > > than usual reviews) from comment #17) > > > Comment on attachment 8635608 [details] [diff] [review] > > > > > > Say, I have no idea what dict_split_unigram should do. I can only read what > > > its current implementation does. > > > > As Andre hasn't chimed in to explain his code, I'll give my take on what the > > function dict_split_unigram does. > > I was about to do it, but I talked with Olli by email and he said was in > vacations and only next week would do it. But congratulations for your > explanation. It is not completely correct but give an idea. Could you explain what's wrong with my explanation? I think knowing will help us now and in future with the code.
Comment on attachment 8635877 [details] [diff] [review] This patch introduces new g2p algorithm into pocketsphinx to allow out of dictionary words to be added to grammars Review of attachment 8635877 [details] [diff] [review]: ----------------------------------------------------------------- Generally this seems fine There are a few little changes that should be made * Removal of some trailing white space * Light documentation of the new methods (See Comment 17) * Change signature of dict_get_winner_wid to not take total_unigrams * Fix the leaks mentioned in my comments In addition * Olli should review this code before it is committed * The toolchain for dmp files for other languages should be made available ::: media/pocketsphinx/src/dict.c @@ +503,5 @@ > E_INFO_NOFN("\n"); > } > + > +int > +dict_starts_with(const char *pre, const char *str) Please add a comment describing this function. Remove the trailing white space. @@ +510,5 @@ > + return lenstr < lenpre ? 0 : strncmp(pre, str, lenpre) == 0; > +} > + > +unigram_t > +dict_split_unigram (const char * word) Please add a comment describing this function. @@ +530,5 @@ > + token_pos = w; > + continue; > + } > + if (!token_pos) > + total_letters++; In a multi-byte encoding this is the total number of bytes, not letters. @@ +532,5 @@ > + } > + if (!token_pos) > + total_letters++; > + else > + total_phone++; In a multi-byte encoding this is the total number of bytes, not letters. @@ +562,5 @@ > + return unigram; > +}; > + > +struct winner_t > +dict_get_winner_wid(ngram_model_t *model, const char * word_grapheme, glist_t history_list, const int32 total_unigrams, Please add a comment describing this function Why does this function take the argument 'total_unigrams'? This value can be derived from 'model' with a single call to ngram_model_get_counts(). @@ +586,5 @@ > + } > + > + for (i = 0; i < total_unigrams; i++) { > + vocab = ngram_word(model, i); > + unigram = dict_split_unigram(vocab); What is the logic of allocating and deallocating unigram.word and unigram.phone in this loop again and again? As these are simply the unigrams from the dmp file, wouldn't it make more sense to allocate then once and be done with it? @@ +611,5 @@ > + return winner; > +} > + > +char * > +dict_g2p(char const *word_grapheme, ngram_model_t *ngram_g2p_model) Please add a comment describing this function Remove the trailing white space. @@ +636,5 @@ > + winner = dict_get_winner_wid(ngram_g2p_model, word_grapheme, history_list, *total_unigrams, word_offset); > + increment = winner.length_match; > + if (increment == 0) { > + E_ERROR("Error trying to find matching phoneme (%s) Exiting.. \n" , word_grapheme); > + return NULL; Leaking a history_list here @@ +653,5 @@ > + continue; > + } > + word = ngram_word(ngram_g2p_model, gnode_int32(gn)); > + > + if (!word) Why would 'word' be null here? Didn't we construct history list only from elements in ngram_gp2_model? @@ +681,5 @@ > + return final_phone; > +} > + > +int > +dict_add_g2p_word(dict_t * dict, char const *word) Please add a comment describing this function ::: media/pocketsphinx/src/dict.h @@ +115,5 @@ > * Return ptr to dict_t if successful, NULL otherwise. > */ > dict_t *dict_init(cmd_ln_t *config, /**< Configuration (-dict, -fdict, -dictcase) or NULL */ > + bin_mdef_t *mdef, /**< For looking up CI phone IDs (or NULL) */ > + logmath_t *logmath // To load ngram_model for g2p load Can you document that logmath must be retained with logmath_retain() if it is to be used elsewhere. @@ +218,5 @@ > void dict_report(dict_t *d /**< A dictionary structure */ > ); > > +// g2p functions > +int dict_add_g2p_word(dict_t * dict, char const *word); For consistency use int dict_add_g2p_word(dict_t *dict, char const *word); and not int dict_add_g2p_word(dict_t * dict, char const *word);
Attachment #8635877 - Flags: review?(kdavis) → review+
Flags: needinfo?(overholt)
(In reply to kdavis from comment #26) > (In reply to kdavis from comment #23) > > Comment on attachment 8635217 [details] [diff] [review] > > This patch adds arpa model into gecko for g2p generation > > > > From a legal standpoint I guess my question is: > > > > "What is the licensing status of cmudict-g2p.lm?” > > > > You mentioned its the same as Pocketsphinx, but I don’t see where > > one can download cmudict-g2p.lm from CMU with an associated > > license. Can you send out the link? Just to make sure there are no legal problems, I think we should double check the licensing status of cmudict-g2p.lm and the associated dmp. So, I'm going to needinfo Andrew on this again, because, as far as I can tell, there was no concrete resolution.
Flags: needinfo?(overholt)
(In reply to kdavis from comment #31) > (In reply to kdavis from comment #26) > > (In reply to kdavis from comment #23) > > > Comment on attachment 8635217 [details] [diff] [review] > > > This patch adds arpa model into gecko for g2p generation > > > > > > From a legal standpoint I guess my question is: > > > > > > "What is the licensing status of cmudict-g2p.lm?” > > > > > > You mentioned its the same as Pocketsphinx, but I don’t see where > > > one can download cmudict-g2p.lm from CMU with an associated > > > license. Can you send out the link? > > Just to make sure there are no legal problems, I think we should double > check the licensing status of cmudict-g2p.lm and the associated dmp. > > So, I'm going to needinfo Andrew on this again, because, as far as I can > tell, there was no concrete resolution. If you are thinking about the legal situation, I suggest we ask Gerv.
Flags: needinfo?(overholt) → needinfo?(gerv)
Blocks: 1180668
(In reply to kdavis from comment #30) > Comment on attachment 8635877 [details] [diff] [review] > This patch introduces new g2p algorithm into pocketsphinx to allow out of > dictionary words to be added to grammars > > Review of attachment 8635877 [details] [diff] [review]: > ----------------------------------------------------------------- > > Generally this seems fine > > There are a few little changes that should be made > > * Removal of some trailing white space Fixed > * Light documentation of the new methods (See Comment 17) Done > * Change signature of dict_get_winner_wid to not take total_unigrams Fixed > * Fix the leaks mentioned in my comments Fixed > In addition Olli should review this code before it is committed Ok, sure. > * The toolchain for dmp files for other languages should be made available > Ok, created. Please refer bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1184849 and repo: https://github.com/mozilla/g2p/ > ::: media/pocketsphinx/src/dict.c > @@ +503,5 @@ > > E_INFO_NOFN("\n"); > > } > > + > > +int > > +dict_starts_with(const char *pre, const char *str) > > Please add a comment describing this function. > > Remove the trailing white space. Done > @@ +510,5 @@ > > + return lenstr < lenpre ? 0 : strncmp(pre, str, lenpre) == 0; > > +} > > + > > +unigram_t > > +dict_split_unigram (const char * word) > > Please add a comment describing this function. > Done > @@ +530,5 @@ > > + token_pos = w; > > + continue; > > + } > > + if (!token_pos) > > + total_letters++; > > In a multi-byte encoding this is the total number of bytes, not letters. > > @@ +532,5 @@ > > + } > > + if (!token_pos) > > + total_letters++; > > + else > > + total_phone++; > > In a multi-byte encoding this is the total number of bytes, not letters. > Correct, Thanks. I changed it to total_graphemes, that makes more sense than bytes and letters > @@ +562,5 @@ > > + return unigram; > > +}; > > + > > +struct winner_t > > +dict_get_winner_wid(ngram_model_t *model, const char * word_grapheme, glist_t history_list, const int32 total_unigrams, > > Please add a comment describing this function > Done. > Why does this function take the argument 'total_unigrams'? This value can > be derived from 'model' with a single call to ngram_model_get_counts(). > > @@ +586,5 @@ > > + } > > + > > + for (i = 0; i < total_unigrams; i++) { Correct, thanks, just fixed. > > + vocab = ngram_word(model, i); > > + unigram = dict_split_unigram(vocab); > > What is the logic of allocating and deallocating unigram.word and > unigram.phone > in this loop again and again? > > As these are simply the unigrams from the dmp file, wouldn't it make more > sense > to allocate then once and be done with it? > The unigrams are already allocated inside ngram_model_t *model object. The reason of the loop is to search for the next winner unigram at that position in the word. This is the reason the strucuture unigram is deallocated on the end of the loop and allocated again when next unigram from the model ( vocab = ngram_word(model, i);) comes to the challenge with others.. and this continues until the last unigram from the model compete. > @@ +611,5 @@ > > + return winner; > > +} > > + > > +char * > > +dict_g2p(char const *word_grapheme, ngram_model_t *ngram_g2p_model) > > Please add a comment describing this function > > Remove the trailing white space. Ok, done. > > @@ +636,5 @@ > > + winner = dict_get_winner_wid(ngram_g2p_model, word_grapheme, history_list, *total_unigrams, word_offset); > > + increment = winner.length_match; > > + if (increment == 0) { > > + E_ERROR("Error trying to find matching phoneme (%s) Exiting.. \n" , word_grapheme); > > + return NULL; > > Leaking a history_list here Fixed, thank you. > > @@ +653,5 @@ > > + continue; > > + } > > + word = ngram_word(ngram_g2p_model, gnode_int32(gn)); > > + > > + if (!word) > > Why would 'word' be null here? Didn't we construct history list only > from elements in ngram_gp2_model? > We can have winners with no phoneme in the case of models generated with errors, for example. In this particular case, if I have a wid with problem that does not match a unigram from ngram_word , I remove it from the composition of the final phoneme word. > @@ +681,5 @@ > > + return final_phone; > > +} > > + > > +int > > +dict_add_g2p_word(dict_t * dict, char const *word) > > Please add a comment describing this function > Ok, done. > ::: media/pocketsphinx/src/dict.h > @@ +115,5 @@ > > * Return ptr to dict_t if successful, NULL otherwise. > > */ > > dict_t *dict_init(cmd_ln_t *config, /**< Configuration (-dict, -fdict, -dictcase) or NULL */ > > + bin_mdef_t *mdef, /**< For looking up CI phone IDs (or NULL) */ > > + logmath_t *logmath // To load ngram_model for g2p load > > Can you document that logmath must be retained with logmath_retain() > if it is to be used elsewhere. > Done. > @@ +218,5 @@ > > void dict_report(dict_t *d /**< A dictionary structure */ > > ); > > > > +// g2p functions > > +int dict_add_g2p_word(dict_t * dict, char const *word); > > For consistency use > > int dict_add_g2p_word(dict_t *dict, char const *word); > > and not > > int dict_add_g2p_word(dict_t * dict, char const *word); Thanks for noticing this, fixed.
Attachment #8635877 - Attachment is obsolete: true
Attachment #8635877 - Flags: review?(bugs)
Attachment #8639696 - Flags: review?(bugs)
(In reply to Olli Pettay [:smaug] from comment #22) > (In reply to Andre Natal from comment #18) > > I compiled the .dmp using sphinx_lm_convert from its text version [1] that > > was generated from standard pocketsphinx language model using phonetisaurus. > Would it be possible to have also the text version in our repository? > Or perhaps only that and run sphinx_lm_convert when compiling Gecko? Olli, we have a repo with the models and the toolchain to generate them: https://github.com/mozilla/g2p/ Ships the binaries is better. Generate them gecko's compilation is impracticable, since depends from a lot of other stuff.
Attachment #8639696 - Attachment is obsolete: true
Attachment #8639696 - Flags: review?(bugs)
Attachment #8639704 - Flags: review?(bugs)
Attachment #8639704 - Attachment description: 0001-Bug-1180113-Part-2-Introducing-g2p-algorithm-inside-.patch → This patch introduces new g2p algorithm into pocketsphinx to allow out of dictionary words to be added to grammars
(In reply to Andrew Overholt [:overholt] from comment #32) > If you are thinking about the legal situation, I suggest we ask Gerv. Would someone be kind enough to clearly define the question, along with links to relevant source code / license text? Thanks :-) Gerv
Flags: needinfo?(gerv)
(In reply to Gervase Markham [:gerv] from comment #37) > (In reply to Andrew Overholt [:overholt] from comment #32) > > If you are thinking about the legal situation, I suggest we ask Gerv. > > Would someone be kind enough to clearly define the question, along with > links to relevant source code / license text? Thanks :-) > > Gerv This question is really the same question you just answered here[1]. So, you answered the question here without even knowing it. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1184849#c7
No longer blocks: 1180668
I guess in light of Bug 1184849 Comment7 from Gerv we should regenerate the dmp file before this is committed.
Comment on attachment 8639704 [details] [diff] [review] This patch introduces new g2p algorithm into pocketsphinx to allow out of dictionary words to be added to grammars Sorry about delay with this stuff. vacation and tons of other stuff to review. >+++ b/media/pocketsphinx/src/dict.c >@@ -8,27 +8,27 @@ > * are met: > * > * 1. Redistributions of source code must retain the above copyright >- * notice, this list of conditions and the following disclaimer. >+ * notice, this list of conditions and the following disclaimer. > * > * 2. Redistributions in binary form must reproduce the above copyright > * notice, this list of conditions and the following disclaimer in > * the documentation and/or other materials provided with the > * distribution. > * >- * This work was supported in part by funding from the Defense Advanced >- * Research Projects Agency and the National Science Foundation of the >+ * This work was supported in part by funding from the Defense Advanced >+ * Research Projects Agency and the National Science Foundation of the > * United States of America, and the CMU Sphinx Speech Consortium. > * >- * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND >- * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, >+ * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND >+ * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY > * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, >- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, >- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY >- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT >- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE >+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, >+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY >+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT >+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > * Random whitespace change here. Please drop them. pocketsphinx is external code, so better to try to keep our modifications to it minimal. > * ==================================================================== >@@ -51,7 +51,7 @@ > > #if WIN32 > #define snprintf sprintf_s >-#endif >+#endif same also here. >+// This function splits an unigram received (in format e|w}UW) and return a structure >+// containing two fields: the grapheme (before }) in unigram.word and the phoneme (after }) unigram.phone >+unigram_t >+dict_split_unigram (const char * word) nit, for consistency with the existing code dict_split_unigram(const char *word) >+ letter = ckd_calloc(1,total_graphemes+1); based on the existing code, the coding style should be letter = ckd_calloc(1, total_graphemes + 1); >+ letter[w-add] = word[w]; and here letter[w - add] = word[w]; consistency in coding style greatly helps with code readability. >+// This function calculates the most likely unigram to appear in the current position at the word >+// based on the three latest chosen/winners unigrams (history) and return a structure containing >+// the word id (wid), and lengths of the phoneme and the word >+struct winner_t >+dict_get_winner_wid(ngram_model_t *model, const char * word_grapheme, glist_t history_list, int word_offset) const char *word_grapheme >+{ >+ int32 current_prob = -2147483647; Why not initialize to LONG_MIN >+ int32 ngram_order = ngram_model_get_size(model); >+ int32 *history = ckd_calloc((size_t)ngram_order+1, sizeof(int32)); space before and after + but it isn't clear to me why + 1 (and why we start assigning value in the for-loop below from index ngram_order, and not ngram_order - 1 which would be the normal last element of an array without this + 1 here.) Please explain why this way, or remove + 1 and start indexing from ngram_order - j - 1 (and hopefully ngram_order isn't anything too huge) >+ for (gn = history_list; gn; gn = gnode_next(gn)) { >+ history[ngram_order-j] = gnode_int32(gn); Please add a comment why we build history from last to first. (and if gn is null before history is full, history starts with some 0 entries. I assume ngram_ng_prob can deal with that.) >+ if (unigram.word) >+ ckd_free(unigram.word); >+ if (unigram.phone) >+ ckd_free(unigram.phone); I would have added a helper function to free unigram_t members and the use the method in dict_get_winner_wid and dict_g2p. That would simplify the code a bit. Something like void free_unigram_t(unigram_t *unigram) { ckd_free(unigram->word); ckd_free(unigram->phone); } (free is null safe) >+ if (dict_starts_with(unigram.word, sub)){ existing coding style seems to hint there should be space between ) and { >+ struct winner_t winner; >+ int32 i = 0, j = 0; >+ int nused; >+ int32 ngram_order = ngram_model_get_size(model); >+ int32 *history = ckd_calloc((size_t)ngram_order+1, sizeof(int32)); again some missing spaces before and after + >+// This functions manages the winner unigrams and build the history of winners to properly generate the final phoneme. s/build/builds/ >+ In the first part, >+// I gets the most likely unigrams which graphemes compose the word and build a history of wids that is used in this search. In second part, the we s/I/It/ >+// This functions just receives the dict lacking word from fsg_search, call the main function dict_g2p, and then add the word to the memory dict. s/functions/function/ s/add/adds/ >+// The second part of this function is the same as pocketsphinx.c: https://github.com/cmusphinx/pocketsphinx/blob/ba6bd21b3601339646d2db6d2297d02a8a6b7029/src/libpocketsphinx/pocketsphinx.c#L816 Could we not share the implementation then? But I guess this is fine too. > * are met: > * > * 1. Redistributions of source code must retain the above copyright >- * notice, this list of conditions and the following disclaimer. >+ * notice, this list of conditions and the following disclaimer. > * > * 2. Redistributions in binary form must reproduce the above copyright > * notice, this list of conditions and the following disclaimer in > * the documentation and/or other materials provided with the > * distribution. > * >- * This work was supported in part by funding from the Defense Advanced >- * Research Projects Agency and the National Science Foundation of the >+ * This work was supported in part by funding from the Defense Advanced >+ * Research Projects Agency and the National Science Foundation of the > * United States of America, and the CMU Sphinx Speech Consortium. > * >- * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND >- * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, >+ * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND >+ * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY > * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, >- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, >- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY >- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT >- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE >+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, >+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY >+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT >+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > * > * ==================================================================== >@@ -39,11 +39,12 @@ > #define _S3_DICT_H_ > > /** \file dict.h >- * \brief Operations on dictionary. >+ * \brief Operations on dictionary. No need for all these whitespace changes > >-/** >+/** > \struct dictword_t >- \brief a structure for one dictionary word. >+ \brief a structure for one dictionary word. > */ > typedef struct { > char *word; /**< Ascii word string */ >@@ -68,9 +69,9 @@ typedef struct { > s3wid_t basewid; /**< Base pronunciation id */ > } dictword_t; > >-/** >+/** > \struct dict_t >- \brief a structure for a dictionary. >+ \brief a structure for a dictionary. > */ > not all these whitespace changes r+ with all those fixed/explained
Attachment #8639704 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #40) > Comment on attachment 8639704 [details] [diff] [review] > This patch introduces new g2p algorithm into pocketsphinx to allow out of > dictionary words to be added to grammars > > Sorry about delay with this stuff. vacation and tons of other stuff to > review. > > >+++ b/media/pocketsphinx/src/dict.c > >@@ -8,27 +8,27 @@ > > * are met: > > * > > * 1. Redistributions of source code must retain the above copyright > >- * notice, this list of conditions and the following disclaimer. > >+ * notice, this list of conditions and the following disclaimer. > > * > > * 2. Redistributions in binary form must reproduce the above copyright > > * notice, this list of conditions and the following disclaimer in > > * the documentation and/or other materials provided with the > > * distribution. > > * > >- * This work was supported in part by funding from the Defense Advanced > >- * Research Projects Agency and the National Science Foundation of the > >+ * This work was supported in part by funding from the Defense Advanced > >+ * Research Projects Agency and the National Science Foundation of the > > * United States of America, and the CMU Sphinx Speech Consortium. > > * > >- * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND > >- * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > >+ * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND > >+ * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > > * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > > * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY > > * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > >- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > >- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > >- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > >- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > >- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > >+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > >+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > >+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > >+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > >+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > * > Random whitespace change here. Please drop them. pocketsphinx is external > code, so > better to try to keep our modifications to it minimal. > > Ok. I changed my editor to remove every trailing space after the requests on this comments (https://bugzilla.mozilla.org/show_bug.cgi?id=1180113#c30) and did mess with everything. I wasn't expecting their code to have trailing spaces. > > > * ==================================================================== > >@@ -51,7 +51,7 @@ > > > > #if WIN32 > > #define snprintf sprintf_s > >-#endif > >+#endif > same also here. > Fixed. > >+// This function splits an unigram received (in format e|w}UW) and return a structure > >+// containing two fields: the grapheme (before }) in unigram.word and the phoneme (after }) unigram.phone > >+unigram_t > >+dict_split_unigram (const char * word) > nit, for consistency with the existing code > dict_split_unigram(const char *word) > fixed > >+ letter = ckd_calloc(1,total_graphemes+1); > based on the existing code, the coding style should be > letter = ckd_calloc(1, total_graphemes + 1); > fixed, thanks. > >+ letter[w-add] = word[w]; > and here > letter[w - add] = word[w]; > > consistency in coding style greatly helps with code readability. > > > >+// This function calculates the most likely unigram to appear in the current position at the word > >+// based on the three latest chosen/winners unigrams (history) and return a structure containing > >+// the word id (wid), and lengths of the phoneme and the word > >+struct winner_t > >+dict_get_winner_wid(ngram_model_t *model, const char * word_grapheme, glist_t history_list, int word_offset) > const char *word_grapheme > > >+{ > >+ int32 current_prob = -2147483647; > Why not initialize to LONG_MIN > Ok, but I needed to add #include <limits.h>. > >+ int32 ngram_order = ngram_model_get_size(model); > >+ int32 *history = ckd_calloc((size_t)ngram_order+1, sizeof(int32)); > space before and after + > but it isn't clear to me why + 1 (and why we start assigning value in the > for-loop below from index ngram_order, and not > ngram_order - 1 which would be the normal last element of an array without > this + 1 here.) > Please explain why this way, or remove + 1 and start indexing from > ngram_order - j - 1 > > (and hopefully ngram_order isn't anything too huge) > ok, I changed to this way. ngram_order in this case is 3, but in the future can go expand but not too much. > >+ for (gn = history_list; gn; gn = gnode_next(gn)) { > >+ history[ngram_order-j] = gnode_int32(gn); > Please add a comment why we build history from last to first. > (and if gn is null before history is full, history starts with some 0 > entries. I assume ngram_ng_prob can deal with that.) > > Yes, ngram_ng_prob can deal with that. I added the comment (we need to build history from last to first because glist returns itens from last to first) > >+ if (unigram.word) > >+ ckd_free(unigram.word); > >+ if (unigram.phone) > >+ ckd_free(unigram.phone); > I would have added a helper function to free unigram_t members and the use > the method in > dict_get_winner_wid and dict_g2p. That would simplify the code a bit. > > Something like > void > free_unigram_t(unigram_t *unigram) > { > ckd_free(unigram->word); > ckd_free(unigram->phone); > } > > (free is null safe) > Good idea, added it. Thanks > > >+ if (dict_starts_with(unigram.word, sub)){ > existing coding style seems to hint there should be space between ) and { > thanks, fixed. > >+ struct winner_t winner; > >+ int32 i = 0, j = 0; > >+ int nused; > >+ int32 ngram_order = ngram_model_get_size(model); > >+ int32 *history = ckd_calloc((size_t)ngram_order+1, sizeof(int32)); > again some missing spaces before and after + > Ok, but I removed the +1 following your suggestion above. > >+// This functions manages the winner unigrams and build the history of winners to properly generate the final phoneme. > s/build/builds/ > fixed > >+ In the first part, > >+// I gets the most likely unigrams which graphemes compose the word and build a history of wids that is used in this search. In second part, the we > s/I/It/ fixed > >+// This functions just receives the dict lacking word from fsg_search, call the main function dict_g2p, and then add the word to the memory dict. > s/functions/function/ > s/add/adds/ > fixed > >+// The second part of this function is the same as pocketsphinx.c: https://github.com/cmusphinx/pocketsphinx/blob/ba6bd21b3601339646d2db6d2297d02a8a6b7029/src/libpocketsphinx/pocketsphinx.c#L816 > Could we not share the implementation then? > But I guess this is fine too. We can't since only some part is shared. The main function have more code. > > * are met: > > * > > * 1. Redistributions of source code must retain the above copyright > >- * notice, this list of conditions and the following disclaimer. > >+ * notice, this list of conditions and the following disclaimer. > > * > > * 2. Redistributions in binary form must reproduce the above copyright > > * notice, this list of conditions and the following disclaimer in > > * the documentation and/or other materials provided with the > > * distribution. > > * > >- * This work was supported in part by funding from the Defense Advanced > >- * Research Projects Agency and the National Science Foundation of the > >+ * This work was supported in part by funding from the Defense Advanced > >+ * Research Projects Agency and the National Science Foundation of the > > * United States of America, and the CMU Sphinx Speech Consortium. > > * > >- * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND > >- * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > >+ * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND > >+ * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > > * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > > * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY > > * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > >- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > >- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > >- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > >- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > >- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > >+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > >+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > >+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > >+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > >+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > * > > * ==================================================================== > >@@ -39,11 +39,12 @@ > > #define _S3_DICT_H_ > > > > /** \file dict.h > >- * \brief Operations on dictionary. > >+ * \brief Operations on dictionary. > No need for all these whitespace changes > Removed > > > >-/** > >+/** > > \struct dictword_t > >- \brief a structure for one dictionary word. > >+ \brief a structure for one dictionary word. > > */ > > typedef struct { > > char *word; /**< Ascii word string */ > >@@ -68,9 +69,9 @@ typedef struct { > > s3wid_t basewid; /**< Base pronunciation id */ > > } dictword_t; > > > >-/** > >+/** > > \struct dict_t > >- \brief a structure for a dictionary. > >+ \brief a structure for a dictionary. > > */ > > > not all these whitespace changes > > Removed > > r+ with all those fixed/explained
We did too many deep changes, so I created a new try before commit https://treeherder.mozilla.org/#/jobs?repo=try&revision=f548fe8fdb40
This is the arpa model generated by mitlm
Attachment #8635217 - Attachment is obsolete: true
Attachment #8643527 - Flags: review+
Attachment #8643528 - Flags: review+
Did you see the B2G Desktop OS X debug build failure?
(In reply to kdavis from comment #45) > Did you see the B2G Desktop OS X debug build failure? That was already fixed. See https://bugzilla.mozilla.org/show_bug.cgi?id=1180113#c46
Cool. I only saw the try from comment 42
Keywords: checkin-needed
feature-b2g: --- → 2.5+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: