Closed
Bug 1149859
Opened 10 years ago
Closed 10 years ago
isProbablyReaderable too aggressive, many valid sites fail to qualify
Categories
(Toolkit :: Reader Mode, defect)
Toolkit
Reader Mode
Tracking
()
RESOLVED
FIXED
mozilla40
People
(Reporter: markh, Assigned: markh)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
(deleted),
patch
|
Margaret
:
review+
Gijs
:
feedback+
|
Details | Diff | Splinter Review |
eg:
http://www.abc.net.au/news/2015-04-01/local-surfboard-makers-are-being-wiped-out-foreign-drop-ins/6362240
http://www.theage.com.au/digital-life/games/surprise-pacman-is-munching-away-on-google-maps-this-april-fools-day-20150331-1mcfz6.html
Both fail to offer reader view. Instrumenting isProbablyReaderable to dump when the node is too short shows:
node too short: 50
node too short: 123
node too short: 160
node too short: 84
node too short: 115
node too short: 131
node too short: 72
node too short: 0
node too short: 32
node too short: 180
node too short: 169
node too short: 96
node too short: 79
node too short: 88
node too short: 94
node too short: 176
node too short: 90
node too short: 0
node too short: 133
node too short: 61
node too short: 186
node too short: 135
node too short: 137
node too short: 183
node too short: 175
node too short: 59
node too short: 55
node too short: 29
node too short: 0
node too short: 69
node too short: 15
node too short: 84
node too short: 119
node too short: 146
node too short: 39
node too short: 94
node too short: 161
node too short: 100
So there are plenty of paragraphs - just not enough over 200 chars. Tweaking the code to always allow the page to enter reader mode also shows the article looks fine.
So I propose we drop the limit from 200 to 100.
Attachment #8586505 -
Flags: review?(margaret.leibovic)
Assignee | ||
Comment 1•10 years ago
|
||
alternatively, instead of wanting 5 paras over a particular length, we just look for a total length and ignore the number of nodes - eg, 20 <p> elements with 90 chars each is probably still readable.
Assignee | ||
Updated•10 years ago
|
Comment 2•10 years ago
|
||
Comment on attachment 8586505 [details] [diff] [review]
0009-Bug-XXXXXXX-readermode-now-considers-paragraphs-with.patch
Review of attachment 8586505 [details] [diff] [review]:
-----------------------------------------------------------------
Looking at Readability.js, it seems like we only throw away paragraphs that have fewer than 25 characters:
https://github.com/mozilla/readability/blob/master/Readability.js#L581
So the approach here sounds reasonable to me.
I feel like Gijs has spent more time than me in this candidate logic, so it would be good to get his opinion as well.
We just need to remember that the goal here is to show the button only in cases where Readability.js will actually succeed. So are there cases where we do have the requisite number of these short paragraphs but Readability won't return an article? Maybe if there's a lot of links in them? I would also be in support of continuing to test this with real testcases to converge on something that works reasonably well in the real world.
Attachment #8586505 -
Flags: review?(margaret.leibovic)
Attachment #8586505 -
Flags: review+
Attachment #8586505 -
Flags: feedback?(gijskruitbosch+bugs)
Comment 3•10 years ago
|
||
Comment on attachment 8586505 [details] [diff] [review]
0009-Bug-XXXXXXX-readermode-now-considers-paragraphs-with.patch
Review of attachment 8586505 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM, we should probably build a corpus of things that we use to decide reader-ability... e.g. it'd be sad if we now went and offered reader mode on youtube or $SEARCH_PROVIDER search results again, or something.
Attachment #8586505 -
Flags: feedback?(gijskruitbosch+bugs) → feedback+
Assignee | ||
Comment 4•10 years ago
|
||
(In reply to :Gijs Kruitbosch from comment #3)
> e.g. it'd be sad if we now went and offered reader mode on
> youtube or $SEARCH_PROVIDER search results again, or something.
FTR, this doesn't cause youtube, nor google or yahoo search results pages from offering reader mode, so I think I'll go ahead and push this later today.
Assignee | ||
Comment 5•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Updated•10 years ago
|
Assignee: nobody → mhammond
You need to log in
before you can comment on or make changes to this bug.
Description
•