Closed
Bug 925579
Opened 11 years ago
Closed 11 years ago
SecReview: L20n 1.0
Categories
(mozilla.org :: Security Assurance: Review Request, task)
mozilla.org
Security Assurance: Review Request
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: stas, Assigned: freddy)
References
Details
(Whiteboard: [completed secreview][Web])
Initial Questions:
Project/Feature Name: L20n 1.0
Tracking ID:
Description:
L20n is a new localization file format and framework developed by the l10n drivers team for the client-side. It allows localizers to put small bits of logic in the localization resources to codify the grammar of the language. This prevents the logic to be hard-coded into the main source code where it would unnecessarily affect all other locales. You can find some examples of this on my blog: http://www.informationisart.com/21/.
Localizers can add logic to the translations using simple data structures (individual translations can be either strings or dictionaries of values) and custom macros (localizers can write their own plural macro for instance, using a C-like expression syntax).
For more information on the L20n syntax, you can refer to http://l20n.org/learn/ and http://l20n.github.io/tinker/.
As we're nearing the 1.0 release (we're currently feature-frozen), I'd like to ask for some help evaluating the security of the current JavaScript implementation. The code can be found on GitHub: https://github.com/l20n/l20n.js. If you clone this repo, take a look at examples/multilingual-dev.html which should just work when opened in a browser. The translations for this example are defined in examples/locales/.
The core concept is that we cannot trust the localization resources. So we need to parse them in a way which doesn't allow malicious actions. To that end, we currently parse the localization resource into a JSON AST stored in memory which is then "compiled" into JavaScript functions and these functions disallow unknown actions. So if we find "1 + 2" in a macro, we'll tokenize it, parse and compile it into:
function addOperator(left, right) {
if ((typeof left !== 'number' || typeof right !== 'number') &&
(typeof left !== 'string' || typeof right !== 'string')) {
throw new RuntimeError('The + operator takes two numbers or two strings');
}
return left + right;
};
https://github.com/l20n/l20n.js/blob/6196438ab3fc59bcf8ac7676c31df9bd5667f8d2/lib/l20n/compiler.js#L753-L760
I see three major areas which could be abused for an attack by a malevolent localizer. I'll try to summarize my concerns below but there might be something I've missed. My goal is to give you a rough idea of what L20n does.
1. Localization logic
The localizers can define custom macros (see http://l20n.org/learn/macros/) which should not be able to run arbitrary code from inside of the localization resource, nor render the browser unresponsive.
We support string interpolation and it should be impossible to try to programmatically produce huge strings which would make the system go out of memory. See bug 803931 for some initial work done to prevent this and http://l20n.org/learn/string-values/ to see how to build interpolated strings.
You can interpolate other translation strings but also variables provided by developers to the localization context (e.g. number of unread email messages). See http://l20n.org/learn/context-data/. It should be impossible for localizers to tinker with these variables from inside of the localization resource.
Lastly, since all of this is implemented in JavaScript, it should be impossible to access built-in properties and methods of objects and their prototypes (length, watch etc). See bug 815962 and bug 883664.
2. Resource imports
We allow one localization resource to import another one with an import() statement. All resources are downloaded using XHR so they're limited to the current domain. We disallow recursive or too deeply nested imports. Are there any other concerns that we should be aware of? Both security and privacy related.
3. HTML localization
We allow some HTML markup in translations. See bug 921169 and bug 922576. There's a whitelist of text-level tags like <em> and <strong>, and a whitelist of attributes based on http://www.w3.org/html/wg/drafts/html/master/dom.html#attr-translate.
We use the <template> element to create an inert DOM out of the translation. We then sanitize it using the above whitelists and insert into the main DOM. This hopefully prevents things like:
<img src="missing.png" onerror="alert('pwnd!');">
Additional Information:
I will be in San Francisco the whole week of October 14th and I'll be happy to meet in person to provide more information. Gandalf is another person who can be reached and he's permanently based out of SF.
Key Initiative: Firefox OS
Release Date: 2013-10-31
Project Status: ready
Mozilla Data: No
Mozilla Related: Integration with Firefox OS is planned
Separate Party: No
Reporter | ||
Updated•11 years ago
|
Component: Project Review → Security Assurance: Review Request
Summary: L20n 1.0 → SecReview: L20n 1.0
Whiteboard: [pending secreview]
Reporter | ||
Comment 1•11 years ago
|
||
(In reply to Staś Małolepszy :stas from comment #0)
> 3. HTML localization
>
> We allow some HTML markup in translations. See bug 921169 and bug 922576.
> There's a whitelist of text-level tags like <em> and <strong>, and a
> whitelist of attributes based on
> http://www.w3.org/html/wg/drafts/html/master/dom.html#attr-translate.
See https://groups.google.com/forum/#!topic/mozilla.tools.l10n/9JvxgTUwIrk for the discussion on the whitelists.
Updated•11 years ago
|
Group: mozilla-corporation-confidential
Updated•11 years ago
|
Whiteboard: [pending secreview] → [triage needed]
Updated•11 years ago
|
Assignee: nobody → fbraun
Whiteboard: [triage needed]
Assignee | ||
Comment 2•11 years ago
|
||
The part that whitelists HTML tags and attributes looks good. I checked the is*Allowed() functions, their use and the whitelist object defined on top of the gaia.js file.
Assignee | ||
Comment 3•11 years ago
|
||
If I'm not mistaken, the HTML sanitization happens at the very end of the localization process, i.e. right before the content is inserted into the DOM. This is great as it ensures little to no cause for parsing mismatches (i.e. l20n thinks something is harmless because it lacks understanding of a certain feature).
I have tried things like this (first greater-than sign for indenting, not syntax):
> <foo "<img src='x'><pre">
> <bar ">check 123 </pre>">
> <kthxbye "Freddy prints a test message here: {{ foo }} {{ bar }}">
So it looks pretty good :)
Assignee | ||
Comment 4•11 years ago
|
||
Please mark as fixed/resolved once all blockers are fixed/resolved :)
Whiteboard: [completed secreview][Web]
Reporter | ||
Comment 5•11 years ago
|
||
Blockers fixed, thank you so much for the help! We should be able to get the 1.0 RC out this week which is great.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•