Closed Bug 1149859 Opened 10 years ago Closed 10 years ago

isProbablyReaderable too aggressive, many valid sites fail to qualify

Categories

(Toolkit :: Reader Mode, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- fixed
firefox39 --- fixed
firefox40 --- fixed

People

(Reporter: markh, Assigned: markh)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

eg: http://www.abc.net.au/news/2015-04-01/local-surfboard-makers-are-being-wiped-out-foreign-drop-ins/6362240 http://www.theage.com.au/digital-life/games/surprise-pacman-is-munching-away-on-google-maps-this-april-fools-day-20150331-1mcfz6.html Both fail to offer reader view. Instrumenting isProbablyReaderable to dump when the node is too short shows: node too short: 50 node too short: 123 node too short: 160 node too short: 84 node too short: 115 node too short: 131 node too short: 72 node too short: 0 node too short: 32 node too short: 180 node too short: 169 node too short: 96 node too short: 79 node too short: 88 node too short: 94 node too short: 176 node too short: 90 node too short: 0 node too short: 133 node too short: 61 node too short: 186 node too short: 135 node too short: 137 node too short: 183 node too short: 175 node too short: 59 node too short: 55 node too short: 29 node too short: 0 node too short: 69 node too short: 15 node too short: 84 node too short: 119 node too short: 146 node too short: 39 node too short: 94 node too short: 161 node too short: 100 So there are plenty of paragraphs - just not enough over 200 chars. Tweaking the code to always allow the page to enter reader mode also shows the article looks fine. So I propose we drop the limit from 200 to 100.
Attachment #8586505 - Flags: review?(margaret.leibovic)
alternatively, instead of wanting 5 paras over a particular length, we just look for a total length and ignore the number of nodes - eg, 20 <p> elements with 90 chars each is probably still readable.
Blocks: 1147626
Comment on attachment 8586505 [details] [diff] [review] 0009-Bug-XXXXXXX-readermode-now-considers-paragraphs-with.patch Review of attachment 8586505 [details] [diff] [review]: ----------------------------------------------------------------- Looking at Readability.js, it seems like we only throw away paragraphs that have fewer than 25 characters: https://github.com/mozilla/readability/blob/master/Readability.js#L581 So the approach here sounds reasonable to me. I feel like Gijs has spent more time than me in this candidate logic, so it would be good to get his opinion as well. We just need to remember that the goal here is to show the button only in cases where Readability.js will actually succeed. So are there cases where we do have the requisite number of these short paragraphs but Readability won't return an article? Maybe if there's a lot of links in them? I would also be in support of continuing to test this with real testcases to converge on something that works reasonably well in the real world.
Attachment #8586505 - Flags: review?(margaret.leibovic)
Attachment #8586505 - Flags: review+
Attachment #8586505 - Flags: feedback?(gijskruitbosch+bugs)
Comment on attachment 8586505 [details] [diff] [review] 0009-Bug-XXXXXXX-readermode-now-considers-paragraphs-with.patch Review of attachment 8586505 [details] [diff] [review]: ----------------------------------------------------------------- LGTM, we should probably build a corpus of things that we use to decide reader-ability... e.g. it'd be sad if we now went and offered reader mode on youtube or $SEARCH_PROVIDER search results again, or something.
Attachment #8586505 - Flags: feedback?(gijskruitbosch+bugs) → feedback+
(In reply to :Gijs Kruitbosch from comment #3) > e.g. it'd be sad if we now went and offered reader mode on > youtube or $SEARCH_PROVIDER search results again, or something. FTR, this doesn't cause youtube, nor google or yahoo search results pages from offering reader mode, so I think I'll go ahead and push this later today.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Depends on: 1151087
Assignee: nobody → mhammond
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: