Closed
Bug 1158976
Opened 10 years ago
Closed 9 years ago
Turn on compare-locales.js in Elmo
Categories
(Webtools Graveyard :: Elmo, defect)
Webtools Graveyard
Elmo
Tracking
(firefox40 affected)
RESOLVED
WONTFIX
Tracking | Status | |
---|---|---|
firefox40 | --- | affected |
People
(Reporter: zbraniecki, Unassigned)
Details
Attachments
(2 files)
In order to support l20n file format, we want to start using new compare-locales.js (bug 1037052) in Elmo.
First step is to enable it in slave-ball.
Reporter | ||
Comment 1•10 years ago
|
||
The instructions to enable c-l.js in Elmo:
1) clone https://github.com/l20n/compare-locales.js to slave-ball/vendor-locale/compare-locales.js
2) switch slave-ball/vendor-local/locale-inspector to my fork, branch l20n: https://github.com/zbraniecki/locale-inspector/tree/l20n
restart slave-ball
Pike, what's the next step? Can you try it out and let me know if it works?
Flags: needinfo?(l10n)
Comment 2•10 years ago
|
||
There's no such thing as "trying", as we don't have a staging environment. This is deployment, and rolling back on bustage if it doesn't work.
I've got nodejs on the VM, IIRC, a 0.12 one, too.
Next steps are reviews for the patches on the individual repos.
Flags: needinfo?(l10n)
Reporter | ||
Comment 3•10 years ago
|
||
Attachment #8599633 -
Flags: review?(l10n)
Reporter | ||
Comment 4•10 years ago
|
||
Attachment #8599634 -
Flags: review?(l10n)
Reporter | ||
Comment 5•10 years ago
|
||
I like this agile programming style. Those two pull requests should make it work(tm). :)
Comment 6•10 years ago
|
||
Comment on attachment 8599634 [details]
locale-inspector pull request
The locale inspector patch has a bunch of hard-coded paths and executable names in it that won't work in prod, so I need to do an r- on that one. Details in the PR.
Attachment #8599634 -
Flags: review?(l10n) → review-
Reporter | ||
Updated•10 years ago
|
Attachment #8599634 -
Flags: review- → review?(l10n)
Comment 7•9 years ago
|
||
Comment on attachment 8599633 [details]
slave-ball pull request
Tried this, and failed, had to roll this back.
This fails because there's nodejs dependencies, which we're obviously not picking up by just including a single submodule.
We need a new approach to deploy this.
Attachment #8599633 -
Flags: review?(l10n) → review-
Reporter | ||
Comment 8•9 years ago
|
||
So, there are two dependencies at the moment:
- commander
- deep-equal
None of them is absolutely necessary but they are convenient for what they do.
I see three possible approaches:
1) We could bundle those two modules with compare-locales.js
2) We could replace them with our own code pieces
3) We could figure out how to install npm modules for compare-locales.
Pike, Stas, what are your preferences?
Flags: needinfo?(stas)
Flags: needinfo?(l10n)
Comment 9•9 years ago
|
||
My preference would be 2), if we can easily replace the code, that'd be best.
Then it'd be 1), and 3) last.
Flags: needinfo?(l10n)
Reporter | ||
Comment 10•9 years ago
|
||
ok. I removed deep-equal dependency - https://github.com/l20n/compare-locales.js/commit/3a792b36db334e3e7435526524fbd0369667c21d
Removing commander may be a bit more tricky if we want to have a clean program code. Stas, any recommendations?
Comment 11•9 years ago
|
||
I think we should roll our own (extremely simple) solution for arg parsing, but only for the purposes of locale-inspector.
Looking at the code, we're always calling the binary with --data=json and two args:
process = subprocess.Popen(
["node",
cljsPath + "/bin/compare-dirs.js",
"--data=json",
path1,
path2], stdout=subprocess.PIPE)
I suggest we create a new entry point in bin/elmo/compare-dirs.js which doesn't require commander and uses its own 'program' or 'config' object with the set of options expected by locale-inspector (type, data and tests args)
As far as positional args go, instead of:
var dirs = program.args;
we can simply do:
var dirs = process.argv.slice(2);
See https://nodejs.org/docs/latest/api/process.html#process_process_argv
Flags: needinfo?(stas)
Comment 12•9 years ago
|
||
We should keep commandline parity between the python and the js version, so that it's useful in environments where we don't want to bother with the python version, like aisle.
Comment 13•9 years ago
|
||
We can still keep bin/compare-dirs.js around for this purpose.
Reporter | ||
Comment 14•9 years ago
|
||
Comment on attachment 8599633 [details]
slave-ball pull request
I updated the compare-locales.js to not require any dependencies.
I also added tests to test compatibility mode and separated out the compat mode script to make sure that we keep it dependency-free.
Attachment #8599633 -
Flags: review- → review?(l10n)
Comment 15•9 years ago
|
||
I've tested this locally, and there are a few things that don't work like I expected.
For one, this compat thing is confusing. I've literally spent an hour to find out what's going on, and the only way that was enlightening was to actually run the comparisons. Doesn't help that --help just dies with an js error message.
Also, I ran all three versions of compare-dirs for http://hg.mozilla.org/gaia-l10n/en-US/file/40354033abe1 against http://hg.mozilla.org/gaia-l10n/de/file/f3462b78990b, and got different data back.
For one, the tree structure is different, common base dirs don't get folded into a tree.
Also, the ordering of the output is different, the python version sorts by ID, didn't check what the js version does.
I've also got a bad warning in the js warning for
setup-manual-activesync-username.placeholder=Domäne\\Benutzername
Well, \\ is a totally fine known escape :-)
The data piece here is not production ready, sorry.
Updated•9 years ago
|
Attachment #8599633 -
Flags: review?(l10n) → review-
Updated•9 years ago
|
Attachment #8599634 -
Flags: review?(l10n) → review-
Reporter | ||
Comment 16•9 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #15)
> For one, this compat thing is confusing. I've literally spent an hour to
> find out what's going on, and the only way that was enlightening was to
> actually run the comparisons. Doesn't help that --help just dies with an js
> error message.
Yeah, I didn't want to put a lot of work into the mockup of the commander. But seeing that it's just one file I decided to just copy it into c-l.js tree. So now --help should work etc.
> Also, I ran all three versions of compare-dirs for
> http://hg.mozilla.org/gaia-l10n/en-US/file/40354033abe1 against
> http://hg.mozilla.org/gaia-l10n/de/file/f3462b78990b, and got different data
> back.
./bin/compare-dirs.js will return different data and different data format because it uses different data model. That's why we need the compat version that returns stuff that locale-insp expects.
./bin/compare-dirs-compat.js returns the same data as compare-dirs, but with minor differences.
> For one, the tree structure is different, common base dirs don't get folded
> into a tree.
That's true. I don't really know how to write good code that will fold the structure. That also doesn't matter for JSON, which is what elmo uses.
> Also, the ordering of the output is different, the python version sorts by
> ID, didn't check what the js version does.
fixed
> I've also got a bad warning in the js warning for
>
> setup-manual-activesync-username.placeholder=Domäne\\Benutzername
>
> Well, \\ is a totally fine known escape :-)
fixed
> The data piece here is not production ready, sorry.
So, the only remaining bit of difference is the folding in text version. Is that a blocker?
Flags: needinfo?(l10n)
Comment 17•9 years ago
|
||
compat and non-compat return wildly different data for the revisions I tried. Sure, it's semantically only .ariaLabel and [plural], but the differences are 50% of the data.
The compat and refs pieces I ran earlier today are pasted below, and they're different. The change in structure doesn't only affect the text output, but also the json.
ref:
{
"details": {
"children": [
[
"apps",
{
"children": [
[
"calendar/calendar.properties",
{
"value": {
"missingEntity": [
"advanced-settings-short-icon-button.ariaLabel",
"drawer-sync-icon-button.ariaLabel",
"future-other-month-description.ariaLabel",
"past-other-month-description.ariaLabel"
],
"obsoleteEntity": [
"advanced-settings-short",
"drawer-sync-button",
"future-other-month",
"past-other-month"
]
}
}
compat:
{
"details": {
"children": [
[
"apps/calendar/calendar.properties",
{
"value": {
"missingEntity": [
"advanced-settings-short-icon-button.ariaLabel",
"drawer-sync-icon-button.ariaLabel",
"past-other-month-description.ariaLabel",
"future-other-month-description.ariaLabel"
],
"obsoleteEntity": [
"advanced-settings-short",
"drawer-sync-button",
"past-other-month",
"future-other-month"
]
}
}
],
[
"apps/callscreen/manifest.properties",
{
"value": {
"missingEntity": [
"name",
"description"
]
}
}
],
Yes, I think that's a blocker.
Glancing at https://github.com/l20n/compare-locales.js/commit/bf59422d722417587619a08e4b4c3ee3f288b9bb, something like "foo \\ bar \\ toast" is gonna break still.
Flags: needinfo?(l10n)
Reporter | ||
Comment 18•9 years ago
|
||
Comment on attachment 8599633 [details]
slave-ball pull request
I updated text and json to collapse structures and updated \\ skipping to be global.
I tested gaia-l10n en-US vs. de with text and json and both outputs are identical (except of keys)
Attachment #8599633 -
Flags: review- → review?(l10n)
Reporter | ||
Updated•9 years ago
|
Attachment #8599634 -
Flags: review- → review?(l10n)
Reporter | ||
Comment 19•9 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #17)
> compat and non-compat return wildly different data for the revisions I
> tried. Sure, it's semantically only .ariaLabel and [plural], but the
> differences are 50% of the data.
I ran compat and non-compat against en-US vs. de gaia:
compat:
changed: 3996
missing: 485
missingInFiles: 279
obsolete: 138
unchanged: 541
75% of entries changed
default:
changed: 3538
missing: 417
missingInFiles: 272
obsolete: 103
unchanged: 412
warnings: 13
errors: 21
81% of entries changed
which is correct difference for this scenario. The warnings and errors are usually about entity with obsolete value (warning) and missing ariaLabel (error).
I'll work on updating default text serialization to collapse view to match what I did for compat, but that doesn't impact elmo landing.
Comment 20•9 years ago
|
||
I've started to write a fuzzer for this, wip is at https://gist.github.com/Pike/00f82db491abf009dfc9
Sadly, this fails way to early way too hard for me to actually start fuzzing.
Bugs so far in the json output:
The empty details json looks different, it comes back as
{"details":{"children":[]}
instead of
{"details":{}
And for a comparison that only has details in 'apps', but not in 'shared', the 'apps' segment goes missing, in both text and json output.
The intent for the fuzzer at some point is to run against 'tip', and then randomly check out revisions in en-US and all locales, and just go on and on. Fuzzing, basically.
Updated•9 years ago
|
Attachment #8599633 -
Flags: review?(l10n) → review-
Updated•9 years ago
|
Attachment #8599634 -
Flags: review?(l10n) → review-
Reporter | ||
Comment 21•9 years ago
|
||
I updated c-l.js repo removing compat mode and updating code based on fuzzer.
This is an updated fuzzer: https://gist.github.com/zbraniecki/605042399d296ee537b0/eaca342b06a303f42485892bb990a3b8c15227c9
This is a list of locales I tested against it: https://pastebin.mozilla.org/8837275 getting no regressions
Pike, do you want to fuzz more or should I resubmit PRs?
Flags: needinfo?(l10n)
Comment 22•9 years ago
|
||
We should fuzz a lot more, looking at https://l10n.mozilla.org/shipping/dashboard?tree=gaia, we currently don't have errors or warnings. Here are a few known examples with errors:
1 0
Danish (da)
44cb47fb99a0 en-US
247b6580f05b da
2 0
Czech (cs)
2248a695be36 en-US
5abdcb97c52f cs
1 0
Japanese (ja)
70d286715c7d en-US
a42323b6f763 ja
1 0
Frisian (fy-NL)
70d286715c7d en-US
7e2f324301a2 fy-NL
1 0
Albanian (sq)
70d286715c7d en-US
8cc5787945a9 sq
and warnings
0 121
Breton (br)
5b3534c03709 en-US
aee1dbcdea4f br
0 1
Galician (gl)
5d618d0937ea gl
77ec1443ef6b en-US
0 1
Galician (gl)
5d618d0937ea gl
95e2495798a0 en-US
(all on the gaia tree)
Flags: needinfo?(l10n)
Reporter | ||
Comment 23•9 years ago
|
||
Cool, thanks!
I updated gist: https://gist.github.com/zbraniecki/605042399d296ee537b0
In the update I extended the way we're testing details because a) it made it easier to find problems b) I had to find a way to accept some minor differences. At the moment there are thee types:
1) Python errors when utf8 string is malformed informing which byte is malformed and if it is a start or continuation byte.
In node there is no warning. I manually[0] test if utf8 is not broken and report the broken byte, but I don't full Unicode table to determine if it's a start or continuation. Example:
c-l: 'utf8' codec can't decode byte 0xe6 in position 126: invalid start byte
c-l.js: 'utf8' codec can't decode byte 0xe6 in position 126: invalid byte
2) Compare-locales reports unknown escape sequences at the top of the file report with column and line. Compare-locales.js reports it as a warning on the entity with position in the string. Example:
c-l: unknown escape sequence, \2 at line 10, column 37 for thread-header-text[few]
c-l.js: thread-header-text[few]: unknown escape sequence, \2 at pos 9
3) Compare-locales returns missingFile with strings key for files that it can parse, and without it for files that it cannot. Compare-locales.js returns the strings key for all files including files it doesn't parse (in that case the value is 0). That affects json only. Example:
c-l: {"missingFile": "error"}
c-l.js: {"missingFile": "error", "strings": 0}
I tested all the error and warning revisions and the fuzzer is happy
Does it sound good for you Axel? Should I resubmit PRs?
Flags: needinfo?(l10n)
Reporter | ||
Comment 24•9 years ago
|
||
The next stage of c-l.js testing with the fuzzer is complete.
Updated fuzzer: https://gist.github.com/zbraniecki/605042399d296ee537b0
Stas modified the fuzzer to randomly pick two revisions and compare them against each other.
I run it like this: "python ./fuzz.py ./en-US ./de 10" to get 10 comparisons of random revisions of those two locales.
At this point I was able to run 100 runs for en-US against ar, cs, da, de, fr, fy-NL, ja, pl, sq, zh-CN, zh-TW with no errors.
In order to get there I had to add a few minor tweaks to c-l.js for compatibility reasons and a few exceptions to fuzzer.
All the changes are covering error scenarios - c-l.js responds to errors slightly differently than c-l. Most of the time I'd argue that c-l.js does a better job and modifying c-l.js to act like c-l would be a regression.
Below is the list of changes and reasons:
==== c-l.js ====
- I modified the properties parser to support "key:value" entities because of errors in en-US where the line "deviceType:music=Music" was parsed by c-l as key: "deviceType", value: "music=Music" and by c-l.js as key: "deviceType:music" and value "Music".
I'd argue that we should revert this change after we're done with testing because the original c-l.js behavior is matching the author's intention and the way we parse properties in Gaia.
- we were using ignore-case in unescaping checks so "\N" was not reported. fixed.
- c-l is adding junk entries to strings key in missingFile scenario. We were reporting them as errors.
I'd argue that we should revert this change after we're done with testing because marking junk entries as strings in "missingInFiles" is wrong.
- we weren't allowing for spaces in keys in properties parser, c-l is. In result this entry in .propreties file:
"
entity1 = value
>>>>>>> [Bug 923420] Add 'More Apps' to l10n [r=crdlc]
entity2 = value
"
was reported by c-l as three entities, while c-l.js reported two entities and an error.
Once again, I believe that we should revert this change once we're done with testing.
=== fuzzer ===
- I made fuzzer skip comparisons when c-l result for en-US contains entries with key "_junk_*" which happens when en-US files contain parser errors. I don't think it makes sense to try to replicate this behavior.
- en-US contains a revision like that: http://hg.mozilla.org/gaia-l10n/en-US/file/5547baf1df44/shared/download/download.properties#l23
and c-l is adding the junk entry to strings. c-l.js is reporting an error
- there's a revision in cs/ which contains a bunch of keys without values:
http://hg.mozilla.org/gaia-l10n/cs/file/02341dd96958/apps/email/email.properties#l196
c-l.js is reporting that as an error and it includes a preceding multiline comment in the error chunk. c-l.js it reporting the same error but without the preceding comment. If I'm correct, comments in .properties are not semantically bound to their following entities so c-l.js behavior is correct. I added an exception to the fuzzer
- sq contains a revision like that: http://hg.mozilla.org/gaia-l10n/sq/file/aba033cf6ea5/apps/email/email.properties#l315
c-l is reporting this as a proper entity with key '='. c-l.js is reporting an error. I did not add an exception to the fuzzer because it would be hard to catch it, so when you're fuzzing you may encounter this when comparing en-US to sq.
====================================================================
As you can see, all differences between c-l and c-l.js discovered using the fuzzer are related to error handling. Out of all those scenarios there is one (the case-insensitive unescaping check) which was a proper bug in c-l.js. Everything else are false-positives or implementation details related to error displaying.
What's the next step Axel?
Comment 25•9 years ago
|
||
I'd revise the exception in the fuzzer for parsing unknown content to something like:
If key == 'error': fork to compare error list.
In errorItemCheck, if string starts with "Unparsed Content" in both, check that they end in the same number.
The discard of escapes seems to be a rather fundamental one, in that cl.js reports file offsets, and cl.py reports line/column, for in-entity checker findings at least.
I find line-number/column pairs to be more useful to humans, and depending on the tool, it might be better for tools, too.
I think we're closing in on the aspects of .properties reports, next up will be to close in on error reports for .l20n, see my email response to that.
Flags: needinfo?(l10n)
Reporter | ||
Comment 26•9 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #25)
> I'd revise the exception in the fuzzer for parsing unknown content to
> something like:
>
> If key == 'error': fork to compare error list.
> In errorItemCheck, if string starts with "Unparsed Content" in both, check
> that they end in the same number.
I don't know if I understand this proposal.
I'm currently checking if the result string contains "_junk_" and I just skip the comparison if it does.
This is the example JSON string for cl.py:
{"details": {"children": [["manifest.properties", {"value": {"missingFile": "error", "strings": 2}}], ["sms.properties", {"value": {"missingEntity": ["_junk_1_0-14", "_junk_2_51-68", "_junk_3_697-716", "_junk_4_773-801", "_junk_5_839-860", "_junk_6_1446-1469", "_junk_7_1732-1765", "conversationsDeleted", "conversationsDeleted[few]", "conversationsDeleted[many]", "conversationsDeleted[one]", "conversationsDeleted[other]", "conversationsDeleted[two]", "dayOfWeek-0", "dayOfWeek-1", "dayOfWeek-2", "dayOfWeek-3", "dayOfWeek-4", "dayOfWeek-5", "dayOfWeek-6", "delete-all", "delete-selected", "deleteAll-confirmation", "hoursAgo", "hoursAgo[few]", "hoursAgo[many]", "hoursAgo[one]", "hoursAgo[other]", "hoursAgo[two]", "justNow", "minutesAgo", "minutesAgo[few]", "minutesAgo[many]", "minutesAgo[one]", "minutesAgo[other]", "minutesAgo[two]"], "obsoleteEntity": ["carrier-unknown", "composeMessage.placeholder", "delete", "deleteMessages-confirmation", "deleteThreads-confirmation", "deselect-all", "incorrectDate", "select-all", "sendFlightModeBody", "sendFlightModeBtnOk", "sendFlightModeTitle", "sendGeneralErrorBody", "sendGeneralErrorBtnOk", "sendGeneralErrorTitle"]}}]]}, "summary": {"null": {"unchanged": 1, "changed": 32, "missingInFiles": 2, "obsolete": 14, "missing": 36}}}
And this is for cl.js:
{"details":{"children":[["manifest.properties",{"value":{"missingFile":"error","strings":2}}],["sms.properties",{"value":{"missingEntity":["conversationsDeleted","conversationsDeleted[few]","conversationsDeleted[many]","conversationsDeleted[one]","conversationsDeleted[other]","conversationsDeleted[two]","dayOfWeek-0","dayOfWeek-1","dayOfWeek-2","dayOfWeek-3","dayOfWeek-4","dayOfWeek-5","dayOfWeek-6","delete-all","delete-selected","deleteAll-confirmation","hoursAgo","hoursAgo[few]","hoursAgo[many]","hoursAgo[one]","hoursAgo[other]","hoursAgo[two]","justNow","minutesAgo","minutesAgo[few]","minutesAgo[many]","minutesAgo[one]","minutesAgo[other]","minutesAgo[two]"],"obsoleteEntity":["carrier-unknown","composeMessage.placeholder","delete","deleteMessages-confirmation","deleteThreads-confirmation","deselect-all","incorrectDate","select-all","sendFlightModeBody","sendFlightModeBtnOk","sendFlightModeTitle","sendGeneralErrorBody","sendGeneralErrorBtnOk","sendGeneralErrorTitle"],"error":["Unparsed content \"/* Headers */\n\" at 0-14","Unparsed content \"/* Edit Mode */\n\" at 52-68","Unparsed content \"/* New Message */\n\" at 698-716","Unparsed content \"/* Search functionality */\n\" at 774-801","Unparsed content \"/* Date and time */\n\" at 840-860","Unparsed content \"/* Modal Dialogues */\n\" at 1447-1469","Unparsed content \"/* Labels for contact fields */\n\" at 1733-1765"]}}]]},"summary":{"null":{"changed":32,"missing":29,"missingInFiles":2,"obsolete":14,"unchanged":1,"errors":7}}}
cl.py does not report any errors. At most, I can compare if the number of _junk_ entries matched the number of summary.errors but my guess is that it will be a very risky operation and also costly for the fuzzer complexity because I'd need to guess where did errors go since cl.py shows errors differently depending on if source file exists, target file exists etc.
> The discard of escapes seems to be a rather fundamental one, in that cl.js
> reports file offsets, and cl.py reports line/column, for in-entity checker
> findings at least.
>
> I find line-number/column pairs to be more useful to humans, and depending
> on the tool, it might be better for tools, too.
I agree. But it's a fairly complex challenge for us because L20n parser does not "think" in lines. Lines may be part of string value, expression, hash value, index, comment etc. Currently we just consume them.
If we'd want l20n parser to keep track of lines, that would increase the complexity of the parser quite significantly unless we retroactively just calculate the number of "\n" in the source slice between 0 and the position of the error.
Do you have a better idea?
Reporter | ||
Comment 27•9 years ago
|
||
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #26)
> I agree. But it's a fairly complex challenge for us because L20n parser does
> not "think" in lines. Lines may be part of string value, expression, hash
> value, index, comment etc. Currently we just consume them.
> If we'd want l20n parser to keep track of lines, that would increase the
> complexity of the parser quite significantly unless we retroactively just
> calculate the number of "\n" in the source slice between 0 and the position
> of the error.
>
> Do you have a better idea?
Just to follow up, I have a patch that turns it on for .properties so that we can remove the exception from the fuzzer because the error message looks identical.
But I'm not sure if we need it for L20n. Wouldn't entity ID and position in source be enough?
Comment 28•9 years ago
|
||
Not really, in particular for multiline errors.
Also, this isn't remotely as complicated as you think it is.
Just parse the string, and if errors show up, map indexes to line numbers by searching for newlines in the original strings. That doesn't affect the parsing phase at all.
Regarding the preceding comment on error entities, I'd love to get a recap that gets cl.js vs cl.py right. I'm somewhat thinking that there's at least one mishap in each comment.
Reporter | ||
Comment 29•9 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #28)
> Not really, in particular for multiline errors.
Ok. Sounds good.
> Also, this isn't remotely as complicated as you think it is.
>
> Just parse the string, and if errors show up, map indexes to line numbers by
> searching for newlines in the original strings. That doesn't affect the
> parsing phase at all.
Yeah, that's what my backup plan was in case you'll insist.
I'll get the patch to work for l20n and land it.
> Regarding the preceding comment on error entities, I'd love to get a recap
> that gets cl.js vs cl.py right. I'm somewhat thinking that there's at least
> one mishap in each comment.
So, from what I understand:
- cl.py attaches a comment to the following entity
- cl.js places the comment in the AST as a separate entry
The result is that when there's an error in the entity, cl.py reports the error together with the preceding comment. Since semantically comments are not bound to entities in .properties (who am I kidding, there's no spec for this abomination) I'd prefer not modify cl.js to follow cl.py.
Does it make sense?
Comment 30•9 years ago
|
||
I think we should make the fuzzer exception more aligned to the error case of a parsing error with leading comment. I don't think we need to change cl.js for it.
The point that the fuzzer ignores errors in a particular entity isn't the right answer, so let's figure out how to actually diagnose the error at hand, nail it down so that it'll not slip a bunch of other errors, and ignore that known context.
Reporter | ||
Comment 31•9 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #30)
> I think we should make the fuzzer exception more aligned to the error case
> of a parsing error with leading comment. I don't think we need to change
> cl.js for it.
I can remove the comment from the error message and compare? Would that work for you?
Comment 32•9 years ago
|
||
Yeah, that was my intent, really.
Removing the comment translated to me to:
- start position is wrong
- text ends in the same string
- end position is equal.
Reporter | ||
Comment 33•9 years ago
|
||
Sweet! Thanks. I'll work on that.
Reporter | ||
Comment 34•9 years ago
|
||
I'm almost done with the update to L20n support in cl.js to report line/column in escapes.
Right now, I'm doing this in the code specific to cl.js, but I'm considering moving it to parser.
One question I have is -
In cl.py and cl.js for properties, we report illegal escapes as warnings, because parser doesn't care.
But in L20n AST Parser, we catch illegal escapes in the parser itself. And parser errors are... errors.
https://github.com/l20n/l20n.js/blob/c6db4dcea760a6186d9d1f878385daf6f7fb7e03/src/lib/format/l20n/ast/parser.js#L195
The result for compare-locales is that for .properties we can say "Hey, warning, you do have entityX in both en-US and de, but in de the entity has wrong escape.".
In L20n on the other hand, we'll be like "Oooh, there's a parser error in your source. We'll capture that as JunkEntry and report parser error to you. Also, it seems that you don't have entityX in de".
For this particular scenario, I'm torn. I like that in .properties we do know that we have entityX even if it contains an error, but for .l20n I do like that we parse strings and catch escaping errors as parser errors.
What do you think?
Flags: needinfo?(l10n)
Reporter | ||
Comment 35•9 years ago
|
||
also, the issue with "there's an error with your entityX *and* entityX is missing" in report is how both cl.js and cl.py report parsing errors in .properties, so it's consistent. It's just that in .properties illegal escapes are not parsing errors.
Reporter | ||
Comment 36•9 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #32)
> Yeah, that was my intent, really.
>
> Removing the comment translated to me to:
> - start position is wrong
> - text ends in the same string
> - end position is equal.
Updated the fuzzer: https://gist.github.com/zbraniecki/605042399d296ee537b0
Now I'm checking if the content of the error after removing comments is the same, if the start position plus the length of the removed comment is the same and if the end position is the same.
I think that the escaping Error vs. Warning and pos vs. line/col is the last item for .properties.
Comment 37•9 years ago
|
||
At a fundamental level, the escape question boils down to a family of questions around:
If we detect an error, should we use heuristics that recover the localized entity, and if so, which?
Algorithms could be
* if I just drop these characters, I get something
* if I just add this character, I get something
The \ is a good first example, where just dropping one character gives something localized.
Which leads back to the original definition of error vs warning in compare-locales. Error is something I can't recover from, warning is something that I don't try to recover from, and it may or may not bust the runtime.
In l20n, we can create something in the middle, and I think we should.
Flags: needinfo?(l10n)
Reporter | ||
Comment 38•9 years ago
|
||
Decision has to be made
Flags: needinfo?(stas)
Flags: needinfo?(l10n)
Flags: needinfo?(gandalf)
Comment 39•9 years ago
|
||
How about we treat this as a process over time? We can start with a stricter implementation which results in translations containing wrong escapes being broken; the fallback language will be used. Over time, we can improve the implementation and make it more lax: those broken translations will become warnings, but will work. We'll need to take care to not lose any other entities in the process due to how JunkEntities are glued.
Flags: needinfo?(stas)
Comment 40•9 years ago
|
||
Cleaning this up. We went with adding l20n support to compare-locales.py.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(l10n)
Flags: needinfo?(gandalf)
Resolution: --- → WONTFIX
Updated•4 years ago
|
Product: Webtools → Webtools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•