Closed Bug 1815391 Opened 2 years ago Closed 2 years ago

Broken text selection and highlighting in pdf.js

Categories

(Firefox :: PDF Viewer, defect)

Firefox 109
Desktop
Windows 10
defect

Tracking

()

VERIFIED FIXED
111 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox109 --- wontfix
firefox110 --- wontfix
firefox111 + verified
firefox112 --- verified

People

(Reporter: alice0775, Assigned: jfkthame)

References

(Regression)

Details

(Keywords: nightly-community, regression)

Attachments

(7 files)

Attached image screenshot (deleted) —

[Tracking Requested - why for this release]: Broken text selection and highlighting.

Link to PDF file: https://www.africau.edu/images/default/sample.pdf

Steps to reproduce the problem:

  1. Open https://www.africau.edu/images/default/sample.pdf
  2. Select text with mouse drag e.g from C to d of Continued at the last sentence.
  3. Copy
  4. Paste other application etc.

OR
2. Double-click on a word e.g. Continued
3. Observe highlight
4. Copy
5. Paste other application etc.

What is the expected behavior? (add screenshot)
Selection highlighting should be the letter of the mouse position

What went wrong? (add screenshot)

  • Selected text is wrong
  • Selection highlighting is shifted.
  • Highlight wrong word
    See attached screenshot

Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=120838b58449ab57a04eee2823ead4054efd2e05&tochange=a88ac51c349da1444db493f972ddb2133fc45776

I am unable to reproduce on Windows 11.
Are you able to reproduce only on Windows 10? Did you try other Windows versions or other OSes?

Flags: needinfo?(alice0775)

I can reproduce this on Nightly111.0a1 Windows10 x64 home(22H2).

And also reproduce on Windows10 x64 home VMWare client, Windows11 64bit VMWare client as well.

On ubuntu22.04 64bit VMWare client,

So, it seems Windows specific problem.

Flags: needinfo?(alice0775)
Attached file about:support (deleted) —

:Alice, could you please enable the pref pdfjs.pdfBugEnabled, open https://www.africau.edu/images/default/sample.pdf#textLayer=visible, make a screenshot and attach it here ?

I just tested on Windows 10 Pro and I can't reproduce the issue.
:Snuffleupagus, iirc you're under Windows 10, are you able to reproduce the issue ? (tbh, I'm bit puzzled, because the regressing patch is clearly touching some stuff in the text layer...)

Flags: needinfo?(jonas.jenwald)
Attached image screenshot (deleted) —

(In reply to Calixte Denizet (:calixte) from comment #5)

:Alice, could you please enable the pref pdfjs.pdfBugEnabled, open https://www.africau.edu/images/default/sample.pdf#textLayer=visible, make a screenshot and attach it here ?

Yes, attached here.

Attached image Screenshot on the GOOD build. (deleted) —

:Alice, what's the font you use as default for sans-serif ?

I think I understand what's the problem.
The regression is very likely due to this patch:
https://github.com/mozilla/pdf.js/pull/15722

and especially the use of the OffscreenCanvas which makes the computations faster because there are almost no style resolution when a font changes.
And it seems that the sans-serif in the OffscreenCanvas is not the same as in a "normal" canvas and consequently the scale factor are just wrong.
So an easy fix is just to not use an OffscreenCanvas.

Flags: needinfo?(jonas.jenwald)

This seems to depend on the default language of the operating system.

My OS default language is Japanese.
And in Firefox(incl. Nightly) the default font is Meiryo (sans-serif).

I tested locally and using a canvas instead of an OffscreenCanvas is fixing the issue.
:jfkthame, do you know what is sans-serif supposed to be in an OffscreenCanvas ? is it supposed to be affected by the value of the pref font.name.sans-serif.x-western ?

Flags: needinfo?(jfkthame)

I tested in Chrome with a local pdf.js and the bug doesn't exist hence it seems that the OffscreenCanvas has the same sans-serif font as a Canvas has.

I think the behavior in OffscreenCanvas is different from "normal" (in-document) canvas because font preference resolution for in-doc canvas will be determined by the language of the document; but an OffscreenCanvas doesn't have a document to inherit language from, and so probably falls back to a system default.

(In general, shouldn't pdf.js be explicitly asking for the specific fonts it wants to use, not relying on generic sans-serif? That could map to quite different things for different users.)

Flags: needinfo?(jfkthame)

It looks like the font used for the text layer (which is independent of the OffscreenCanvas, right?) depends on the OS language setting -- I can reproduce the issue on macOS, for example, after setting my default language in System Preferences to Japanese, and I can see that the text layer now uses a different font -- while the OffscreenCanvas rendering continues to use the (en-US-based, presumably) default font.

I think the issue is that the code here doesn't know what to do for an OffscreenCanvas, as there's no <canvas> element to get a lang from. Note the TODO comment!

(In reply to Jonathan Kew [:jfkthame] from comment #14)

I think the behavior in OffscreenCanvas is different from "normal" (in-document) canvas because font preference resolution for in-doc canvas will be determined by the language of the document; but an OffscreenCanvas doesn't have a document to inherit language from, and so probably falls back to a system default.

That's my understanding too but I suppose that a user would expect to have the same font for sans-serif whatever the canvas type is, don't you think ?

(In general, shouldn't pdf.js be explicitly asking for the specific fonts it wants to use, not relying on generic sans-serif? That could map to quite different things for different users.)

Oh yes I fully agree with you but it's another bug (it's a long-standing workaround and rewrite that stuff is somewhere on our todo list).

It looks like the font used for the text layer (which is independent of the OffscreenCanvas, right?) depends on the OS language setting

The font in the text layer is sans-serif and we use an OffscreenCanvas with a sans-serif font in order to measure the string contained in the span and apply a scaleX transform in order to at least fit the string bounding box (the one guessed from the pdf itself).

I'll remove the use of OffscreenCanvas but it's a pity: it was slightly faster with it.

Attached image BAD vs GOOD (deleted) —

Font is "Meiryo" in the inspector. However, the scaleX values for the inline CSS transform are quite different.

(In reply to Calixte Denizet (:calixte) from comment #16)

I'll remove the use of OffscreenCanvas but it's a pity: it was slightly faster with it.

Before you rush to do that, we may be able to help by making the offscreen-canvas code path pass the system locale through to the font system. I'm pushing a patch to tryserver to see if this breaks existing tests....

(In reply to Calixte Denizet (:calixte) from comment #16)

That's my understanding too but I suppose that a user would expect to have the same font for sans-serif whatever the canvas type is, don't you think ?

Well.... would the user expect to have the same font for sans-serif in a canvas element that's within a Japanese document, vs one that's in an English document? Note that in HTML content, sans-serif may resolve to different fonts depending on the language of the content; so it'd be a bit surprising if canvas didn't do the same thing.

But then if you use an OffscreenCanvas, it doesn't have an associated document context or a <canvas> element to inherit language from, so what should it do...?

Anyhow, let's see how this goes: https://treeherder.mozilla.org/jobs?repo=try&revision=22269bedb1c8d16aa818f0176cc8c00242b712c0

So the problem will be the same if my locale is fr_FR, if I set a specific Japanese font for sans-serif and if I open a pdf with some Japanese text.

I'm not entirely sure..... just because the pdf has some Japanese text in it, will that mean the pdf.js document (or text layer that it creates) gets tagged as lang=ja? I wouldn't have thought so, given that the pdf may have any arbitrary mixture of languages in it, and in general I don't think you can know what languages they are. So the pdf.js text layer will probably not be lang-tagged, I'm guessing. (But you may know better.) And if it's not, then it'll presumably inherit the same default locale-based font resolution as my proposed patch gives for the offscreen.

But maybe I'm not fully understanding all the pieces here. Worth some experimentation, I guess!

This helps the offscreen-canvas measurements done by pdf.js to more closely match the
invisible text layer (used for search/selection/etc) in cases where the host system
locale is non-English and has different generic font prefs (e.g. Japanese).

(Not readily testable in CI because it'll only make a visible difference to behavior
when running with a system locale that has different font prefs.)

Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

Calixte, if you have a chance to try this patch and verify whether it helps, that'd be great - thanks. I've tested locally (on macOS) that it seems to behave as expected, but have not checked across all platforms/configurations.

I confirmed that the try-build of comment#19 solves the issue on Windows10.

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/01dd4deb52d9 Pass system locale language when setting up fonts for an offscreen canvas. r=gfx-reviewers,lsalzman

I just tested with the build from CI on Windows 11 and it works as expected when I changed the default font for Latin from Arial to Garamond when it isn't working in nightly.
I think this patch does the job, thank you.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 111 Branch
Flags: qe-verify+

Managed to reproduce the issue on Windows 10 X64 by adding the Meityo UI font to system Os fonts and also by adding the Japanese language as default language on the machine, by using Firefox Nightly from the 7th of February 2023 but the issue did not seem as severe as the screenshots.

Issue could not be reproduced with Firefox Nightly from the 6th of March 2023 on the same configuration.

For safety, can you please check if the issue is verified on your side?

Flags: needinfo?(alice0775)

Verified fix on Nightly112.0a1(20230306211718) and Firefox110.0RC(20230306162820).

Status: RESOLVED → VERIFIED
Flags: needinfo?(alice0775)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: