Closed Bug 921169 Opened 11 years ago Closed 11 years ago

Implement secure DOM localization

Categories

(L20n :: HTML Bindings, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stas, Assigned: stas)

References

Details

Attachments

(2 files, 2 obsolete files)

We should make our DOM localization logic secure and sound for the 1.0 release. The initial discussion took place in the mailing list: https://groups.google.com/d/msg/mozilla.tools.l10n/9JvxgTUwIrk/70ja-BE55kQJ Goals ----- 1. Make it possible to localize content in HTML elements and attributes without forcing developers to split strings into pre- and post- parts (definitely a bad practice). For instance, it should be possible for an <a> element to be a part of the translation. 2. Make it possible for localizers to apply text-level semantics to the translations and make use of HTML entities. For instance, it should be possible for a localizer to use an <sup> element in "M<sup>me</me>" (an abbreviation of French "Madame"). Constraints ----------- 1. Make the whole system secure and don't trust translations by default allowing a safe set of attributes on HTML elements. For instance, the localizer should not be able to add an onclick handler or overwrite the target of href or src without the developer or the localization engineer knowingly allowing it. 2. Don't break the Web; in particular, until we get more feedback and gather more data, we should not break any two-way bindings the third-party libraries might have set up on existing DOM nodes.
Attached patch WIP 1 (obsolete) (deleted) — Splinter Review
This is an early work-in-progress implementation. It: - allows safe elements in translations, - keeps original elements instead of replacing them, - uses a whitelist for elements, - doesn't check attributes against whitelist yet, - doesn't copy/overlay attributes yet, - doesn't support ITS translateRules for extending the whitelists, - currently breaks language switching, - probably doesn't handle errors well yet. Gandalf, can you take a look and see if you like the general direction? Michał, would you mind applying this locally and verifying that it works well with Angular? It should keep the nodes for which Angular creates two-way bindings.
Attachment #810757 - Flags: feedback?(m.goleb+mozilla)
Attachment #810757 - Flags: feedback?(gandalf)
Attachment #810757 - Attachment is obsolete: true
Attachment #810757 - Flags: feedback?(m.goleb+mozilla)
Attachment #810757 - Flags: feedback?(gandalf)
I'm quite happy with what I ended up today: - allows safe elements in translations, - keeps original elements instead of replacing them, - uses a whitelist for elements, - uses a whitelist for attributes, - copies and overlays attributes, - works with lang switching and retranslation, - should handle corner cases well. - doesn't support ITS translateRules for extending the whitelists (I'd like to fix this in a follow-up bug to not increase the complexity of this patch). One important change is that I removed the whole l20nSourceNode approach, since it's not trivial to make it work with the paradigm of keeping original nodes in the DOM. However, I don't think this is bad. The patch doesn't support reordering anyways, so retranslation based on the actual node instead of L20nSourceNode can't break too much. The only case I can think of is when language 1 adds a title attribute, and language 2 doesn't. Then the title attr will stay on the node and will be present in the language 2 localization. I'd prefer to fix this in a follow-up if it turns out to be a problem. I do think it will be quite rare an edge-case.
Attachment #811184 - Flags: review?(gandalf)
Attachment #811184 - Flags: feedback?(m.goleb+mozilla)
Attachment #811184 - Attachment is obsolete: true
Attachment #811184 - Flags: review?(gandalf)
Attachment #811184 - Flags: feedback?(m.goleb+mozilla)
I realized one thing: creating a dettached div or body and assigning the unsanitized translation to innerHTML may trigger HTTP requests or even scripts to run. Sanitizing only after the HTML was parsed is too late. For instance, the translation might look like this: <img src="http://evil.com/track.png"> Scripts in <script> tags are not executed, but you can easily run arbitrary code otherwise. For instance: <img src="missing.png" onerror="alert('pwnd!');"> The overlayElement will correclty sanitize these translations and the <img> nor the 'onerror' callback won't be present in the final result, but they will actually be parsed and run the moment when the unsanitized translation is assigned to innerHTML. To combat this, we can 1) use DOMParser and create a new document our of every translation (overkill; also 10x slower) or 2) take advantage of the <template> element, which uses DocumentFragments internally. It parses the HTML but keeps it inert, so no requests are made and no scripts are run when it's created. Because it uses DocumentFragments, I can use XPath any more to match elements in translation to elements in the source HTML (see bug 402129). I tried using CSS selectors instead, but unfortunately I'm only interested in immediate children (deeper descendants are taken care of recursively), and CSS doesn't make it easy (you'll recall you can't just do querySelector('> span'); you need the LHS of the child operator). There is the Selector level 4 spec in the works with the :scope pseudo-class [1], but the support is scarce (Chrome + Firefox but only on the testign channels) and from my testing :scope also doesn't work well with DocumentFragments. We could go the Sizzle route, which is to assign a temporary id to the context node and use querySelector on it [2]. But again, DocumentFragments can't have attributes. So in the end I went for a simple manual approach where I calculate the index among siblings of the same type myself and then use it in reverse to match an element in the translation. This works well. I'd love to get some feedback on the whole approach. I still need to make necessary changes to the documentation. I wish we had a client-side test harness, but I don't want to block on this. [1] http://www.w3.org/TR/2012/ED-selectors-api2-20121115/ [2] https://github.com/jquery/sizzle/blob/c50c710b6411db2cc6d77d3b982ac6625605ea63/src/sizzle.js#L248-L268
Attachment #811518 - Flags: review?(gandalf)
Attachment #811518 - Flags: feedback?(m.goleb+mozilla)
(In reply to Staś Małolepszy :stas from comment #4) > Created attachment 811518 [details] [diff] [review] > Use template elements to force inertness of innerHTML Great, great work! I've just tested it (sorry for the delay, busy Friday at work) and it works with Angular perfectly! :)
Attachment #811518 - Flags: feedback?(m.goleb+mozilla) → feedback+
I'm not too optimistic about it being fixed, though, docco issues seems to linger without attention. We might have to fix it ourselves. (and docco is written in Literate CoffeScript, what a pain)
Sorry for the comment, wrong ticket. :)
This is the same patch as before, but applied to both bindings/l20n/html.js and bindings/l20n/gaia.js.
I ran b2gperf tests on Gandalf's l20n-master with and without the patch (settings app) without patch: median: 1213, mean: 1221, std: 35 with patch: median: 1220, mean: 1229, std: 29 The localization files don't have much HTML in them, so there isn't much difference. I added <em>Foo</em> to all 500 entities to see how the performance changes. without patch: no change, but broken; <em> is visible in textContent with patch: median: 1283, mean: 1286, std: 33
Comment on attachment 811518 [details] [diff] [review] Use template elements to force inertness of innerHTML Review of attachment 811518 [details] [diff] [review]: ----------------------------------------------------------------- It looks good. It's pretty complex, I'd love to make it very clear for the reader of html.js which part does he need to understand without dom overlays, but it works and performance is ok. median of 30 iterations on Settings app: without patch: 1068 with patch: 1076
Attachment #811518 - Flags: review?(gandalf) → review+
Blocks: 922573
Blocks: 922576
(In reply to Staś Małolepszy :stas from comment #4) > To combat this, we can 1) use DOMParser and create a new document our of > every > translation (overkill; also 10x slower) or 2) take advantage of the > <template> > element, which uses DocumentFragments internally. It parses the HTML but > keeps > it inert, so no requests are made and no scripts are run when it's created. The template element has poor browser support for now so I don't believe this is a viable solution. How about a sort-of polyfill for that, i.e. script element with custom type?
Blocks: 922577
Landed in https://github.com/l20n/l20n.js/commit/892b1507a446080e80efebd24534744174561671. I added docs to https://github.com/l20n/l20n.js/blob/892b1507a446080e80efebd24534744174561671/docs/html.md#making-html-elements-localizable I also filed three follow-ups: bug 922573, bug 922576 and bug 922577. (In reply to Michał Gołębiowski from comment #11) > The template element has poor browser support for now so I don't believe > this is a viable solution. How about a sort-of polyfill for that, i.e. > script element with custom type? Would you mind filing follow-up bug about this?
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
(In reply to Staś Małolepszy :stas from comment #12) > (In reply to Michał Gołębiowski from comment #11) > > The template element has poor browser support for now so I don't believe > > this is a viable solution. How about a sort-of polyfill for that, i.e. > > script element with custom type? > > Would you mind filing follow-up bug about this? Sorry, I totally forgot about that until I tested in IE recently... :/ I reported bug 942594 about that.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: