Closed Bug 466599 Opened 16 years ago Closed 16 years ago

Pasting non-ascii characters from browser into a .rtf Textedit document fails.

Categories

(Core :: Widget: Cocoa, defect, P1)

All
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.1b3

People

(Reporter: phiw2, Assigned: jfkthame)

References

Details

(Keywords: fixed1.9.1, regression)

Attachments

(2 files, 7 obsolete files)

Attached file test case (deleted) —
1. Open test case containing a sentence in French and a sentence in Japanese 2. select some text; copy 3. paste into a Textedit.app .rft document 4. weep AR: The non-ascii characters are pasted as garbage. ER: reproduce the text as is. This works fine when pasting into a plain text document. Strangely, pasting into Mail.app, a message in Rich Text Format works fine. Pasting also fails into Unicode checker.app. Works: 11/06 /f3993df71c29 Fails 11/07 /18fec592b35b http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f3993df71c29&tochange=18fec592b35b --> 738aabad5f4b Atul Varma — Bug 428096. Make clipboard and drag-and-drop import and export HTML on Mac. r=josh,sr=roc
Flags: wanted1.9.1?
Flags: blocking1.9.1?
Summary: Pasting non-ascii characters into a .rtf Textedit document fails. → Pasting non-ascii characters from browser into a .rtf Textedit document fails.
Hardware: PC → All
What's specifically happening here is that pasted text (which, correct me if I'm wrong, goes to the clipboard as UTF-8 regardless of the current page's encoding) is being interpreted as though it were encoded as Windows Latin 1 [AKA Western (Windows-1252)]. You can get the same gibberish strings by using the View-> Character Encoding menu on a native UTF-8 page or opening a UTF-8 file with a text editor that lets you force a specific encoding. The upshot is that any character whose representation is the same in both UTF-8 and Windows-1252 comes through fine, everything else gets borked. With the exception of Mail (and I have no idea why IT'S working fine), this is happening in every NSTextView editor control that accepts formatting. (Adium, Bean, VoodooPad, iWeb, etc.) As Philippe pointed out, NSTextViews that only accept plaintext are grokking the Unicode just fine. MS Word 2004 is also getting the Unicode fine, AND still getting the formatting right. (Good for you, Microsoft, you get a cookie.) BBEdit is getting the Unicode fine, but it's plaintext-only in the first place, so I don't know that its engine teaches us anything useful in the first place. Tex-Edit Plus is a weird old bird that's running under Rosetta anyhow, and I can't tell WHAT the hell it's reading that encoding as. That's about all I have on hand; can someone else check Apple Pages (does that use its own text engine?), MS Word 2008, any Adobe apps, Quark, or whatever else happens to be sitting around?
I really think this should block beta 3.
Priority: -- → P1
Target Milestone: --- → mozilla1.9.1b3
Word, it's a pretty noisy regression. (Especially if you don't speak English.)
Pastes fine into InDesign CS4 (Adobe gets a cookie too). The problem seems to be limited to NSTextView-based apps using rich-text mode. I suspect Apple Mail's edit view is based on WebKit rather than NSTextView, so that could explain why it works there.
(In reply to comment #5) Oooooooh, looks like you're right about WebKit: I invoked a rich-text editor in Safari (via Google Maps), and it totally accepted formatted Japanese text without barfing, same as Mail. Good guessing!
(Is enough known about this that we can change the bug title?)
The problem arises because the data is put onto the system pasteboard as HTML using UTF-8, but there is no accompanying metadata to inform receivers of the encoding. Apparently NSTextView interprets it as Latin-1, while WebKit interprets it as UTF-8. The attached patch works around the problem by encoding all non-ASCII characters as numeric entities, so that they will be interpreted correctly regardless of the receiver's encoding assumption. An alternative might be to wrap the HTML fragment in something that allows us to specify the charset used, but I have not yet experimented with this approach. Note that the pasteboard data is not a complete HTML document with <head> and <body> but only a fragment of the body, such as <p>.....</p>, etc.
Attached patch corrected typo in the patch (obsolete) (deleted) — Splinter Review
Previous version had a typo (setData instead of setString) - sorry for the noise.
Attachment #352519 - Attachment is obsolete: true
Attachment #352520 - Flags: review?(joshmoz)
Attached patch updated patch to deal with drag&drop as well (obsolete) (deleted) — Splinter Review
Unsurprisingly, the same issue exists for dragging HTML text to an NSTextView-based application. Updated the patch to factor out the encoding function and apply to the Drag pasteboard as well as during the Copy operation.
Attachment #352520 - Attachment is obsolete: true
Attachment #352523 - Flags: review?(joshmoz)
Attachment #352520 - Flags: review?(joshmoz)
Attached patch alternative patch, wrapping the html fragment (obsolete) (deleted) — Splinter Review
As suggested above, I have tried a second approach, wrapping the HTML fragment from Gecko's internal pasteboard and providing an explicit charset declaration in the META tag. This will be much more efficient than iterating over the text to change all non-ASCII characters to entities; however, it represents a more substantial change to what other applications will receive and therefore needs to be tested as widely as possible. In my experiments with TextEdit, Mail, Word, and InDesign, all the recipient applications seemed to work fine with the "wrapped" HTML, and the encoding was correctly handled. However, it would be good for others to test with a wider variety of Mac applications, before we decide which patch to use.
Attachment #352537 - Flags: review?(joshmoz)
Jonathan, are you going to make try-server build(s) for your patch(es), or at least for the second one? Doing so makes it easier for a wider variety of people to test against a wider variety of other applications (as opposed to just people who can build).
I haven't got access to the try-server; still waiting to hear about a mozilla account that would let me do that. If anyone else wants to do it in the meantime, using the patch attached here, that'd be great.
I've just pushed the patch to the try servers. http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTry Builds should show up in a an hour or so.
I just resubmitted the v2/alternative patch since roc's build failed earlier today (looks like the try-server was having a very bad day, bug 469148).
Thanks for doing this. Philippe, Nicholas, anyone else interested.... could you grab the build from https://build.mozilla.org/tryserver-builds/2008-12-11_22:21-alqahira@ardisson.org-bug466599-v2/ and test copy/paste and drag/drop to as wide a variety of apps as possible?
(In reply to comment #16) > Thanks for doing this. Philippe, Nicholas, anyone else interested.... could you > grab the build from > https://build.mozilla.org/tryserver-builds/2008-12-11_22:21-alqahira@ardisson.org-bug466599-v2/ > and test copy/paste and drag/drop to as wide a variety of apps as possible? A very quick test, it is late here... Copy-pasting from Minefield to Textedit, .rtf document worked perfectly fine for Japanese, French, Spanish, German (copying snippets of text from lemonde.fr, yahoo.co.jp news, Elpais.com and Spiegel.de. Pasting in UnicodeChecker also works fine. I'll check with more apps tomorrow.
Pasting fine into all the usual suspects, including Word 04. (Not Tex-Edit Plus, but it's best to give that up as a bad job anyhow.)
Blocking1.9.1+, this is an unacceptable regression. Thanks for the debug work Nick, and for the patch Jonathan!
Flags: wanted1.9.1?
Flags: blocking1.9.1?
Flags: blocking1.9.1+
Josh: It'd be nice to land this soon, if you could review the patch(es) and approve or suggest improvements. I would recommend we go with the "alternative" patch, given the positive reports, as that is a far more efficient solution.
Wrapping the HTML scares me, I'm going to review the first patch. If somebody wants to advocate wrapping, I'm listening, but performance doesn't seem like a good enough argument so far.
Attachment #352537 - Flags: review?(joshmoz)
Wrapping might actually be nice if we know that the selection never includes anything outside the <body>. I'm not sure it does, though. The other patch gives much worse results when pasting into HTML editors, no? That is, instead of getting the text as Unicode you'll get it as escapes? Or does some unescaping magic happen somewhere? That is, doesn't this affect HTML drag recipients that keep the HTML as HTML, not just NSTextView (which seems to convert it to Unicode plaintext)?
Yeah, I favour the wrapping approach myself. I'm afraid that anything which tries to embed the HTML, or worse, send it somewhere else, is going to propagate these ugly escapes.
Attachment #352523 - Flags: review?(joshmoz)
Comment on attachment 352537 [details] [diff] [review] alternative patch, wrapping the html fragment > if (currentKey == NSStringPboardType || >- currentKey == NSHTMLPboardType || > currentKey == kCorePboardType_url || > currentKey == kCorePboardType_urld || > currentKey == kCorePboardType_urln) > [generalPBoard setString:currentValue forType:currentKey]; >+ else if (currentKey == NSHTMLPboardType) >+ [generalPBoard setString:(nsClipboard::WrapHtmlForSystemPasteboard(currentValue)) >+ forType:currentKey]; Please put braces around all the blocks in the if statement, like: if () { } else if () { } else { } > } >+ >+const NSString* nsClipboard::WrapHtmlForSystemPasteboard(const NSString* aString) Please put 2 blank lines between methods, someone forgot above the method you're adding too. Also, you don't need "const" on the argument or the return value. Other than that this looks fine. Let's go with the wrapping.
Josh, can we take your comment #24 as giving r+ on the wrapping patch? The issues you mentioned are fixed in the new version attached.
Attachment #352523 - Attachment is obsolete: true
Attachment #352537 - Attachment is obsolete: true
Attachment #353829 - Flags: superreview?
Attachment #353829 - Flags: superreview? → superreview?(roc)
Attachment #353829 - Flags: superreview?(roc) → superreview+
Can we be sure to get tests this time around for this patch and the one in bug 428096?
Flags: in-testsuite?
I'm not sure what's the best approach to (automated) testing here, given that the main issue relates to how other applications interpret the data we put on the pasteboard. Atul, did you come up with any kind of tests for the earlier bug that might provide a starting point? I don't see anything in bugzilla since https://bugzilla.mozilla.org/show_bug.cgi?id=428096#c52.
He didn't. We can't test all those other apps directly, but we could build our own helper app (when --enable-tests is set) which performs pastes and verifies the results (or sends them back to the running Firefox test build for verification).
I have a JavaScript test that puts an HTML fragment on the system clipboard and then reads back the result; this allows us to verify that the expected "wrapping" is being done. However, running this requires the UniversalXPConnect privilege. When the script calls netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect"); this triggers an alert sheet, so it won't run as a test without user interaction. Is there a way I can allow this privilege automatically (only for the test environment, of course)?
If I'm not mistaken, that privilege is allowed (when asked for) by default in MochiTest (see build/pgo/automation.py.in).
Yep. If your test is written as a mochitest, it'll Just Work.
We can't directly test the results of dragging or pasting into the full range of 3rd-party apps, but this test at least verifies that the data on the system pasteboard was wrapped with the expected charset metadata, so that we know the patch is doing its intended job.
Attachment #354795 - Flags: review?(roc)
Comment on attachment 354795 [details] [diff] [review] testcase for the html-wrapping patch (chrome mochitest) + result += {'<':'&lt;', '>':'&gt;', '&':'&amp;', '"':'&quot;'}[c] || c; Wow
Attachment #354795 - Flags: review?(roc) → review+
Attached patch combined patch including test case (obsolete) (deleted) — Splinter Review
Josh, if you're happy to give this r+ (see comment #24), I think it's about ready to land. I have combined the actual code and the test into a single patch, and it's currently building on the tryserver to check that nothing breaks there.
Attachment #354807 - Flags: review?(joshmoz)
Attachment #354807 - Flags: review?(joshmoz) → review+
Attachment #354795 - Attachment is obsolete: true
Attachment #353829 - Attachment is obsolete: true
Keywords: checkin-needed
Comment on attachment 354807 [details] [diff] [review] combined patch including test case >diff --git a/widget/tests/Makefile.in b/widget/tests/Makefile.in >--- a/widget/tests/Makefile.in >+++ b/widget/tests/Makefile.in >@@ -52,7 +52,8 @@ > ifeq ($(MOZ_WIDGET_TOOLKIT),cocoa) > _TEST_FILES += native_menus_window.xul \ > test_native_menus.xul \ >- test_bug428405.xul >+ test_bug428405.xul \ >+ test_bug466599.xul > endif Any reason the Cocoa-specific _TEST_FILES block isn't terminated with $(NULL) like the main _TEST_FILES block? It would eliminate false-blame-noise that otherwise comes with adding the \ to the previous test each time a new test is added....
Re comment #35: no reason except ignorance. :) I have updated the patch to use $(NULL) to terminate the list; won't help this time around, as it wasn't there before, but at least we'll be ready for the next addition.
Attachment #354807 - Attachment is obsolete: true
Attachment #354933 - Attachment description: adjusted format of test-case list, updated commit message ready for checkin → adjusted format of test-case list, updated commit message ready for checkin [Checkin: Comment 37]
Comment on attachment 354933 [details] [diff] [review] adjusted format of test-case list, updated commit message ready for checkin [Checkin: Comment 37] http://hg.mozilla.org/mozilla-central/rev/19ca4c2894e7
Status: NEW → RESOLVED
Closed: 16 years ago
Flags: in-testsuite? → in-testsuite+
Resolution: --- → FIXED
Whiteboard: [c-n: baking for 1.9.1]
Depends on: 493189
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: