Closed Bug 287502 Opened 20 years ago Closed 18 years ago

Right-to-left text reordering directives confuses text selection

Categories

(Core :: DOM: Selection, defect)

x86
Linux
defect
Not set
trivial

Tracking

()

RESOLVED DUPLICATE of bug 246482

People

(Reporter: david_costanzo, Unassigned)

Details

Attachments

(2 files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2) Gecko/20050323 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2) Gecko/20050323 Normally, when you double-click on a word, it highlights and selects everything between the surrounding whitespace. Normally, when you select a word and right-click, the "Search Web For" option includes the highlighted text. However, if you're looking at HTML that includes non-printing character entities, such as "‮", this does not happen. Instead, double-clicking on a word may highlight some adjacent words (and may only highlight part of the desired word). Furthermore, right-clicking will show "Search Web For" followed by some text that may not even be highlighted. Reproducible: Always Steps to Reproduce: 1. Open "text-selection-bug.html" 2. Select (by double-clicking) the word "You" in "You can't select it" 3. Right-Click Actual Results: When you double-click on "You" the text "text. You can" is highlighted. When you right-click, the "Search Web For" shows "s t r a n g e". Pressing CTRL+C, selecting a text area, and pressing CTRL+V inserts the word "strange". Expected Results: When you double-click on "You", the text "You" should be highlighted. When you right-click, the "Search Web For" should show the word "You". I do not know the meaning of "‮". I receive it in a spam to my Yahoo! account. I noticed how strangely the text selection was behaving and decided to report it. I have given this a "trivial" severity because not only is this a minor cosmetic problem, it only affects the rendering of bizarre or malicious HTML.
Attached file text-selection-bug.html (deleted) —
An HTML that contains non-printing character entities (on my system, anyway). This is the HTML that is mentioned in the repro scenario. The entity "‮" appears between each letter of the text "This is some strange text. You can't select it".
I can reproduce this behaviour with Mozilla 1.8b1. For the user it seems that Mozilla only selects the visible text but it also selects the invisible characters. E.g. you seem to select "This is" (= 7 real characters), but in fact Mozilla selects "T‮h‮i‮s" (= 3 invisible + 4 visible characters). Replacing ‮ with ‭ or ‬ gives the same effect. FWIW, according to http://www.fileformat.info/info/unicode/char/202e/index.htm is "&#8238" an Unicode Character 'Right-To-Left Override' (U+202E).
Thanks for the URL. That's the exact resource I was looking for last night. It's interesting that "‮" is a right-to-left override. In fact, the root of this bug may be that Mozilla doesn't honor the right-to-left override, which would be more serious. As I said, I received this repro scenario in a phishing scam e-mail. The e-mail had many apparent typos--transposing of letters. I assumed it was either to bypass a spam filter or because the spammer didn't know English. But maybe he was using a trick that transposes the letters in HTML (to fool the spam filter), but uses Unicode control characters to re-assemble the text correctly on the screen (to fool the human). Anyway, this trick worked on Yahoo!'s spam filter. The original spam used other different non-printing escape sequence that I didn't include in my repro. It was probably a right-to-left override (‭). If anyone has access to Interent Explorer, could you view HTML attachment and see if the second line reads from right-to-left?
(In reply to comment #3) > If anyone has access to Internet Explorer, could you view HTML attachment and > see if the second line reads from right-to-left? Yes, with IE6 the second line does read from right-to-left (.ti tceles t'nac uoY .txet egnarts emos si sihT), but both Mozilla 1.8b1 and IE6 seem to display the examples from http://www.robinlionheart.com/stds/html4/dir correctly.
Reply to comment #4: > both Mozilla 1.8b1 and IE6 seem to display the > examples from http://www.robinlionheart.com/stds/html4/dir correctly. Great page! The text selection bug exists even when Mozilla correctly swaps the reading order, so ignoring the override character entities is independent of this bug. By the way, the page links to a W3C recommendation that asks browser to *ignore* the directional overrides characters. But that doesn't make this bug irrelevant, because the bug still occurs when you override the direction with HTML tags (which is the preferred way of doing it). I've renamed this summary in this bug to reflect that the selection problem is tied to the right-left direction of the text, not non-printing HTML characters. By the way, I noticed a few other problems that may be related. I'll file them as separate bugs if you don't think they're the same as this bug. First, if you open attachment #178444 [details], then save it to disk, the file changes--the character entities are replaced with something else (probably the UTF-8 encoding of the characters). Second, if you select the bad text in attachment #178444 [details], right-click, and choose "View Selection Source", you don't see the character entities. This is inconsistent with what happens when you "View Page Source".
Summary: Non-printing HTML character entities confuses text selection → Right-to-left text reordering directives confuses text selection
This is a HTML fragment that uses the preferred method of reversing the text direction. It uses the "dir" attribute of the "p" element and "bdo" tags. The text selection still behaves strangely.
==> selection
Assignee: general → selection
Component: General → Selection
Product: Mozilla Application Suite → Core
QA Contact: general
Version: unspecified → Trunk
About the other problems : - "View Selection Source" showing the data, not the entities "View Selection Source" shows content interpreted from the DOM, not the original content download. It's normal and per the spec that you don't get the entities. BTW while testing under 1.7.3, I was seeing some strange spurious characters within the selection, it doesn't happen with the trunk. Looks like it was an occurence of the recently solved security issue about leaking random data from the stack in the selection. - Save to disk saving the data, not the entities : This only happens when you do a save "web page, complete", the behaviour will be as you expect with "html only". Related to bug 115328. OTOH even if 115328 is solved, "save web page complete" works from the DOM representation of the page, not the original content, so it can be seen as normal behavior, and therefore INVALID/WONTFIX to request keeping the entities in the saved version. So it's only features, not a bug. About the bug itself, the last attachment enables to confirm it, but I wonder if a more bidi related component wouldn't be more adequate ? Is the selection owner willing to handle bidi troubles ?
Status: UNCONFIRMED → NEW
Ever confirmed: true
The main issue described here is a duplicate of bug 246482 (fixed on trunk). Marking as a duplicate of that bug. Please report any remaining issues as separate bugs. *** This bug has been marked as a duplicate of 246482 ***
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: