Broken text selection and highlighting in pdf.js
Categories
(Firefox :: PDF Viewer, defect)
Tracking
()
People
(Reporter: alice0775, Assigned: jfkthame)
References
(Regression)
Details
(Keywords: nightly-community, regression)
Attachments
(7 files)
[Tracking Requested - why for this release]: Broken text selection and highlighting.
Link to PDF file: https://www.africau.edu/images/default/sample.pdf
Steps to reproduce the problem:
- Open https://www.africau.edu/images/default/sample.pdf
- Select text with mouse drag e.g from
C
tod
ofContinued
at the last sentence. - Copy
- Paste other application etc.
OR
2. Double-click on a word e.g. Continued
3. Observe highlight
4. Copy
5. Paste other application etc.
What is the expected behavior? (add screenshot)
Selection highlighting should be the letter of the mouse position
What went wrong? (add screenshot)
- Selected text is wrong
- Selection highlighting is shifted.
- Highlight wrong word
See attached screenshot
Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=120838b58449ab57a04eee2823ead4054efd2e05&tochange=a88ac51c349da1444db493f972ddb2133fc45776
Updated•2 years ago
|
Comment 1•2 years ago
|
||
I am unable to reproduce on Windows 11.
Are you able to reproduce only on Windows 10? Did you try other Windows versions or other OSes?
Reporter | ||
Comment 2•2 years ago
|
||
I can reproduce this on Nightly111.0a1 Windows10 x64 home(22H2).
And also reproduce on Windows10 x64 home VMWare client, Windows11 64bit VMWare client as well.
On ubuntu22.04 64bit VMWare client,
- Nightly111.0a1 crashes(bp-2bb20e0a-56a0-44d8-8d62-8a8ae0230207) when open the PDF (Bug 1807942).
- Firefox110.0RC1 crashes(bp-00724602-10e5-4d02-9be9-70abf0230207) when open the PDF (Bug 1807942).
- Cannot reproduce on Firefox109.0.
So, it seems Windows specific problem.
Reporter | ||
Comment 3•2 years ago
|
||
Reporter | ||
Comment 4•2 years ago
|
||
Comment 5•2 years ago
|
||
:Alice, could you please enable the pref pdfjs.pdfBugEnabled
, open https://www.africau.edu/images/default/sample.pdf#textLayer=visible, make a screenshot and attach it here ?
Comment 6•2 years ago
|
||
I just tested on Windows 10 Pro and I can't reproduce the issue.
:Snuffleupagus, iirc you're under Windows 10, are you able to reproduce the issue ? (tbh, I'm bit puzzled, because the regressing patch is clearly touching some stuff in the text layer...)
Reporter | ||
Comment 7•2 years ago
|
||
(In reply to Calixte Denizet (:calixte) from comment #5)
:Alice, could you please enable the pref
pdfjs.pdfBugEnabled
, open https://www.africau.edu/images/default/sample.pdf#textLayer=visible, make a screenshot and attach it here ?
Yes, attached here.
Reporter | ||
Comment 8•2 years ago
|
||
Comment 9•2 years ago
|
||
:Alice, what's the font you use as default for sans-serif
?
Comment 10•2 years ago
|
||
I think I understand what's the problem.
The regression is very likely due to this patch:
https://github.com/mozilla/pdf.js/pull/15722
and especially the use of the OffscreenCanvas
which makes the computations faster because there are almost no style resolution when a font changes.
And it seems that the sans-serif
in the OffscreenCanvas
is not the same as in a "normal" canvas
and consequently the scale factor are just wrong.
So an easy fix is just to not use an OffscreenCanvas
.
Reporter | ||
Comment 11•2 years ago
|
||
This seems to depend on the default language of the operating system.
My OS default language is Japanese.
And in Firefox(incl. Nightly) the default font is Meiryo (sans-serif).
Comment 12•2 years ago
|
||
I tested locally and using a canvas
instead of an OffscreenCanvas
is fixing the issue.
:jfkthame, do you know what is sans-serif
supposed to be in an OffscreenCanvas
? is it supposed to be affected by the value of the pref font.name.sans-serif.x-western
?
Comment 13•2 years ago
|
||
I tested in Chrome with a local pdf.js and the bug doesn't exist hence it seems that the OffscreenCanvas
has the same sans-serif
font as a Canvas
has.
Assignee | ||
Comment 14•2 years ago
|
||
I think the behavior in OffscreenCanvas is different from "normal" (in-document) canvas because font preference resolution for in-doc canvas will be determined by the language of the document; but an OffscreenCanvas doesn't have a document to inherit language from, and so probably falls back to a system default.
(In general, shouldn't pdf.js be explicitly asking for the specific fonts it wants to use, not relying on generic sans-serif
? That could map to quite different things for different users.)
Assignee | ||
Comment 15•2 years ago
|
||
It looks like the font used for the text layer (which is independent of the OffscreenCanvas, right?) depends on the OS language setting -- I can reproduce the issue on macOS, for example, after setting my default language in System Preferences to Japanese, and I can see that the text layer now uses a different font -- while the OffscreenCanvas rendering continues to use the (en-US-based, presumably) default font.
I think the issue is that the code here doesn't know what to do for an OffscreenCanvas, as there's no <canvas> element to get a lang from. Note the TODO comment!
Comment 16•2 years ago
|
||
(In reply to Jonathan Kew [:jfkthame] from comment #14)
I think the behavior in OffscreenCanvas is different from "normal" (in-document) canvas because font preference resolution for in-doc canvas will be determined by the language of the document; but an OffscreenCanvas doesn't have a document to inherit language from, and so probably falls back to a system default.
That's my understanding too but I suppose that a user would expect to have the same font for sans-serif
whatever the canvas type is, don't you think ?
(In general, shouldn't pdf.js be explicitly asking for the specific fonts it wants to use, not relying on generic
sans-serif
? That could map to quite different things for different users.)
Oh yes I fully agree with you but it's another bug (it's a long-standing workaround and rewrite that stuff is somewhere on our todo list).
It looks like the font used for the text layer (which is independent of the OffscreenCanvas, right?) depends on the OS language setting
The font in the text layer is sans-serif
and we use an OffscreenCanvas
with a sans-serif
font in order to measure the string contained in the span and apply a scaleX
transform in order to at least fit the string bounding box (the one guessed from the pdf itself).
I'll remove the use of OffscreenCanvas
but it's a pity: it was slightly faster with it.
Reporter | ||
Comment 17•2 years ago
|
||
Font is "Meiryo" in the inspector. However, the scaleX values for the inline CSS transform are quite different.
Assignee | ||
Comment 18•2 years ago
|
||
(In reply to Calixte Denizet (:calixte) from comment #16)
I'll remove the use of
OffscreenCanvas
but it's a pity: it was slightly faster with it.
Before you rush to do that, we may be able to help by making the offscreen-canvas code path pass the system locale through to the font system. I'm pushing a patch to tryserver to see if this breaks existing tests....
Assignee | ||
Comment 19•2 years ago
|
||
(In reply to Calixte Denizet (:calixte) from comment #16)
That's my understanding too but I suppose that a user would expect to have the same font for
sans-serif
whatever the canvas type is, don't you think ?
Well.... would the user expect to have the same font for sans-serif
in a canvas element that's within a Japanese document, vs one that's in an English document? Note that in HTML content, sans-serif
may resolve to different fonts depending on the language of the content; so it'd be a bit surprising if canvas didn't do the same thing.
But then if you use an OffscreenCanvas, it doesn't have an associated document context or a <canvas> element to inherit language from, so what should it do...?
Anyhow, let's see how this goes: https://treeherder.mozilla.org/jobs?repo=try&revision=22269bedb1c8d16aa818f0176cc8c00242b712c0
Comment 20•2 years ago
|
||
So the problem will be the same if my locale is fr_FR, if I set a specific Japanese font for sans-serif
and if I open a pdf with some Japanese text.
Assignee | ||
Comment 21•2 years ago
|
||
I'm not entirely sure..... just because the pdf has some Japanese text in it, will that mean the pdf.js document (or text layer that it creates) gets tagged as lang=ja
? I wouldn't have thought so, given that the pdf may have any arbitrary mixture of languages in it, and in general I don't think you can know what languages they are. So the pdf.js text layer will probably not be lang-tagged, I'm guessing. (But you may know better.) And if it's not, then it'll presumably inherit the same default locale-based font resolution as my proposed patch gives for the offscreen.
But maybe I'm not fully understanding all the pieces here. Worth some experimentation, I guess!
Assignee | ||
Comment 22•2 years ago
|
||
This helps the offscreen-canvas measurements done by pdf.js to more closely match the
invisible text layer (used for search/selection/etc) in cases where the host system
locale is non-English and has different generic font prefs (e.g. Japanese).
(Not readily testable in CI because it'll only make a visible difference to behavior
when running with a system locale that has different font prefs.)
Updated•2 years ago
|
Assignee | ||
Comment 23•2 years ago
|
||
Calixte, if you have a chance to try this patch and verify whether it helps, that'd be great - thanks. I've tested locally (on macOS) that it seems to behave as expected, but have not checked across all platforms/configurations.
Reporter | ||
Comment 24•2 years ago
|
||
I confirmed that the try-build of comment#19 solves the issue on Windows10.
Comment 25•2 years ago
|
||
Comment 26•2 years ago
|
||
I just tested with the build from CI on Windows 11 and it works as expected when I changed the default font for Latin from Arial to Garamond when it isn't working in nightly.
I think this patch does the job, thank you.
Comment 27•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Updated•2 years ago
|
Comment 28•2 years ago
|
||
Managed to reproduce the issue on Windows 10 X64 by adding the Meityo UI font to system Os fonts and also by adding the Japanese language as default language on the machine, by using Firefox Nightly from the 7th of February 2023 but the issue did not seem as severe as the screenshots.
Issue could not be reproduced with Firefox Nightly from the 6th of March 2023 on the same configuration.
For safety, can you please check if the issue is verified on your side?
Reporter | ||
Comment 29•2 years ago
|
||
Verified fix on Nightly112.0a1(20230306211718) and Firefox110.0RC(20230306162820).
Description
•