Closed
Bug 1177619
Opened 9 years ago
Closed 8 years ago
Reader mode isn't offered on Blogger/Blogspot based blogs
Categories
(Toolkit :: Reader Mode, defect, P3)
Toolkit
Reader Mode
Tracking
()
RESOLVED
FIXED
mozilla52
Tracking | Status | |
---|---|---|
firefox52 | --- | fixed |
People
(Reporter: braiamp, Assigned: evanxd)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [reader-mode-readability-algorithm][reader-mode-isreadable-detection])
Attachments
(1 file)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.0.1
Build ID: 20150526223604
Steps to reproduce:
Open any article/post on a blogger/blogspot blog with adecuate lenght.
Expected results:
Reader icon should appear as it does on WordPress based blogs (and any other content with some considerable lenght)
I'm not sure if this should block bug 1102450 (platform says Android?) or bug 558882.
Comment 2•9 years ago
|
||
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:41.0) Gecko/20100101 Firefox/41.0
20150626030206
Status: UNCONFIRMED → NEW
Component: Untriaged → Reader Mode
Ever confirmed: true
Product: Firefox → Toolkit
Hardware: Unspecified → All
Comment 3•9 years ago
|
||
Also on desktop.
Updated•8 years ago
|
Priority: -- → P3
Updated•8 years ago
|
Whiteboard: [reader-mode-readability-algorithm][reader-mode-isreadable-detection]
Assignee | ||
Comment 4•8 years ago
|
||
Hi Gijs,
I've sent a pull request[1] to the GitHub repo to fix the issue.
The solution here is score "<br>" tags because blogger use "<br>" tags to typeset their articles not "<p>" tags.
Their DOM structure looks like
```
<div>
Dare to predict and suggest and improve your prediction based on time. Always try to add some more value to the user.<br />
Its a continuous process, reiterate the above point time to time.<br />
Let me know your thoughts how to get a wow ....<br />
<br />
Dare to predict and suggest and improve your prediction based on time. Always try to add some more value to the user.<br />
Its a continuous process, reiterate the above point time to time.<br />
Let me know your thoughts how to get a wow ....<br />
</div>
```
[1]: https://github.com/mozilla/readability/pull/310
Assignee | ||
Comment 5•8 years ago
|
||
Could you review it on the PR[1]?
Thanks.
[1]: https://github.com/mozilla/readability/pull/310
Flags: needinfo?(gijskruitbosch+bugs)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → evan
Assignee | ||
Comment 7•8 years ago
|
||
Try to figure out an algorithm[1] to detect blogger's articles.
[1]: https://github.com/mozilla/readability/pull/310#discussion_r82941697
Assignee | ||
Comment 8•8 years ago
|
||
Updated the patch[1] for review comments.
[1]: https://github.com/mozilla/readability/pull/310/commits/c9f287ac8ff4bdd782085231151455e34cdbd4d3
Assignee | ||
Comment 9•8 years ago
|
||
Updated patch[1] for review comments. And need to check the benchmarks of the patch. It might be OK if it's 5-10% slower.
[1]: https://github.com/mozilla/readability/pull/310/commits/9583038ec0822fb4ef04b897d3e4a7e306f92808
Assignee | ||
Comment 10•8 years ago
|
||
Did benchmark experiment and created a report[1] for it. The result is 7.46% slower after patched the patch[2]. Looks like the benchmarks of the patch could be acceptable, just like we discussed on GitHub[3].
I Run `npm run perf-reference` for 30 times for original Readability.js and patched[2] Readability.js and counted the average values to get the slower rate(7.46%).
[1]: https://docs.google.com/spreadsheets/d/1dxFYsg7bk8bkZqVd5BZuozcnjjXNWxlmhmkocX0TiCk
[2]: https://github.com/mozilla/readability/pull/310/commits/001295afa1254fec562c26414dce03206ef9fc2f
[3]: https://github.com/mozilla/readability/pull/310#discussion_r83250236
Assignee | ||
Comment 11•8 years ago
|
||
But there are still some errors with the patch[1]. Let me continue to fix it...
```
1) Test pages ars-1 jsdom "before all" hook:
TypeError: Cannot read property 'trim' of undefined
at Object.<anonymous> (Readability.js:1762:47)
at Array.some (native)
at Object.Readability._someNode (Readability.js:195:33)
at Object.Readability.isProbablyReaderable (Readability.js:1748:17)
at Context.<anonymous> (test/test-readability.js:62:35)
2) Test pages remove-extra-brs jsdom "before all" hook:
TypeError: Cannot read property 'trim' of undefined
at Object.<anonymous> (Readability.js:1762:47)
at Array.some (native)
at Object.Readability._someNode (Readability.js:195:33)
at Object.Readability.isProbablyReaderable (Readability.js:1748:17)
at Context.<anonymous> (test/test-readability.js:62:35)
3) Test pages replace-brs jsdom should probably be readerable:
AssertionError: expected false to deeply equal true
+ expected - actual
-false
+true
at Assertion.assertEql (node_modules/chai/lib/chai/core/assertions.js:473:10)
at Assertion.ctx.(anonymous function) [as eql] (node_modules/chai/lib/chai/utils/addMethod.js:40:25)
at Context.<anonymous> (test/test-readability.js:140:43)
4) Test pages salon-1 jsdom "before all" hook:
TypeError: Cannot read property 'trim' of undefined
at Object.<anonymous> (Readability.js:1762:47)
at Array.some (native)
at Object.Readability._someNode (Readability.js:195:33)
at Object.Readability.isProbablyReaderable (Readability.js:1748:17)
at Context.<anonymous> (test/test-readability.js:62:35)
5) Test pages tmz-1 jsdom "before all" hook:
TypeError: Cannot read property 'trim' of undefined
at Object.<anonymous> (Readability.js:1762:47)
at Array.some (native)
at Object.Readability._someNode (Readability.js:195:33)
at Object.Readability.isProbablyReaderable (Readability.js:1748:17)
at Context.<anonymous> (test/test-readability.js:62:35)
6) Test pages wapo-1 jsdom "before all" hook:
TypeError: Cannot read property 'trim' of undefined
at Object.<anonymous> (Readability.js:1762:47)
at Array.some (native)
at Object.Readability._someNode (Readability.js:195:33)
at Object.Readability.isProbablyReaderable (Readability.js:1748:17)
at Context.<anonymous> (test/test-readability.js:62:35)
7) Test pages wapo-2 jsdom "before all" hook:
TypeError: Cannot read property 'trim' of undefined
at Object.<anonymous> (Readability.js:1762:47)
at Array.some (native)
at Object.Readability._someNode (Readability.js:195:33)
at Object.Readability.isProbablyReaderable (Readability.js:1748:17)
at Context.<anonymous> (test/test-readability.js:62:35)
8) Test pages webmd-1 jsdom "before all" hook:
TypeError: Cannot read property 'trim' of undefined
at Object.<anonymous> (Readability.js:1762:47)
at Array.some (native)
at Object.Readability._someNode (Readability.js:195:33)
at Object.Readability.isProbablyReaderable (Readability.js:1748:17)
at Context.<anonymous> (test/test-readability.js:62:35)
9) Test pages webmd-2 jsdom "before all" hook:
TypeError: Cannot read property 'trim' of undefined
at Object.<anonymous> (Readability.js:1762:47)
at Array.some (native)
at Object.Readability._someNode (Readability.js:195:33)
at Object.Readability.isProbablyReaderable (Readability.js:1748:17)
at Context.<anonymous> (test/test-readability.js:62:35)
```
[1]: https://github.com/mozilla/readability/pull/310/commits/001295afa1254fec562c26414dce03206ef9fc2f
Assignee | ||
Comment 12•8 years ago
|
||
Fixed the failures on Comment 11[1].
[1]: https://github.com/mozilla/readability/pull/310/commits/b5602a4b715efb9211200af3b7660d0929005ed6
Assignee | ||
Comment 13•8 years ago
|
||
The benchmark report with the fix on Comment 12.
The result(with jsdom)[2] is 38.46% slower after patched the patch[1]. Additionally, the result with JSDOMParser[3] is 24.38% slower. Can check the detail on the reports[2][3].
Discussing the benchmark report on the GitHub's pull request...
P.S. The slower rate = (Patched Result - Original Result)/Original Result
[1]: https://github.com/mozilla/readability/pull/310/commits/b5602a4b715efb9211200af3b7660d0929005ed6
[2]: https://docs.google.com/spreadsheets/d/1s8CXR8QCdSZuAyrVvuYM8Zh-4aq6H9ZRP5g6wpKHIi0
[3]: https://docs.google.com/spreadsheets/d/1I4S0yQZ4-I9p3YI3NIP4jiuSasL3qDtp4QGEcKuqt7Y
Assignee | ||
Comment 14•8 years ago
|
||
Updated patch[1] for review comments. Continue to do benchmark experiment.
[1]: https://github.com/mozilla/readability/pull/310/commits/61af7898401869d49a0ffbb1e0c769b6b8893bd6
Assignee | ||
Comment 15•8 years ago
|
||
The benchmark report with the fix on Comment 14.
The result(with jsdom)[1] is 38.87% slower.
[1]: https://docs.google.com/spreadsheets/d/140SfoJ3YTkExhrsBgFSuL8SWTinPNY_kVl7TS7EfQ6U
Assignee | ||
Comment 16•8 years ago
|
||
Did a benchmark experiment with gecko for review comments[2].
Added two performance checkpoints[1] to evaluate the performance of patched `isProbablyReaderable` method. We could check how much time took by `isProbablyReaderable` method in the performance experiment. For sure, lower is better.
```
var t0 = content.performance.now();
var isProbablyReaderable = ReaderMode.isProbablyReaderable(content.document);
var t1 = content.performance.now();
dump("Call to isProbablyReaderable took " + (t1 - t0) + " milliseconds. \n");
```
Then randomly chose 4 articles (listed in the below) from yahoo, wikipedia, and blogger to test. Two articles have 0 `<br>` node, and two articles some `<br>` node (selected by `div > br`).
* Article 1: https://www.yahoo.com/tech/virtual-reality-systems-compared-144526498.html (with 0 `<br>` node)
* Article 2: https://en.wikipedia.org/wiki/Mozilla (with 0 `<br>` node)
* Article 3: http://www.stevenhoneyman.co.uk/2016/02/bga-on-budget.html (with 11 `<br>` nodes)
* Article 4: http://siliconexposed.blogspot.tw/2016/05/open-verilog-flow-for-silego-greenpak4.html (with 75 `<br>` nodes)
Did performance test for 10 times for each article and averaged them. The experiment result is
| | Article 1 | Article 2 | Article 3 | Article 4 |
|--------------------|-----------|-----------|-----------|-----------|
| Original | 4.121 ms | 4.009 ms | 4.139 ms | 3.924 ms |
| Patched | 4.201 ms | 4.199 ms | 4.397 ms | 5.077 ms |
| Time Increase Rate | 1.9% | 4.7% | 6.2% | 29.38 |
P.S. We could say `1.9%` slower here if Time Increase Rate is `1.9%`. You can check the experiment details here[2].
Then we could find out
* If there is no `<br>` node (can be selected by `div > br`), the performance could be almost same after patched.
* If there are too many(`> 75`) `<br>` node (can be selected by `div > br`), the performance is not good.
Performance issues might be occurred if there are too many `<br>` node (can be selected by `div > br`). But maybe we should accept it because
1. Seems there might be no other way to fix this issue.
2. Only webpages with many `<br>` nodes have the performance issue. The performance of others looks good.
[1]: https://gist.github.com/evanxd/fc5eb3e33732567957aa0c02baaffc00#file-tab-content-js-L9-L12
[2]: https://docs.google.com/spreadsheets/d/1vk7bU_Z0KG4F7OEGlVWX_SA8IrH94lwrNVcw6zYBR6E
[3]: https://github.com/mozilla/readability/pull/310#issuecomment-255344452
Assignee | ||
Comment 17•8 years ago
|
||
Assignee | ||
Comment 18•8 years ago
|
||
Weird, the failures seems not be caused by the patch. Investigate...
```
/home/travis/build/mozilla/readability/Readability.js
1846:14 error Expected indentation of 6 spaces but found 13 indent
1847:14 error Expected indentation of 6 spaces but found 13 indent
1848:14 error Expected indentation of 6 spaces but found 13 indent
1849:14 error Expected indentation of 6 spaces but found 13 indent
1850:14 error Expected indentation of 6 spaces but found 13 indent
1851:14 error Expected indentation of 6 spaces but found 13 indent
1852:14 error Expected indentation of 6 spaces but found 13 indent
```
Assignee | ||
Comment 19•8 years ago
|
||
Assignee | ||
Comment 20•8 years ago
|
||
Fix the eslint failures. The root cause is mentioned here[1].
[1]: https://github.com/mozilla/readability/pull/310#issuecomment-257528601
Assignee | ||
Comment 21•8 years ago
|
||
Comment 22•8 years ago
|
||
We should probably do an update of Readability in m-c (this is not automatic). It should basically just mean copying over Readability.js and JSDOMParser.js and fixing the disclaimer at the top of the file so that doesn't change, and then pushing that to try to make sure we haven't accidentally broken any m-c tests. Evan, can you do this?
Flags: needinfo?(evan)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 25•8 years ago
|
||
I've sent a review request and triggered a try.
Comment 26•8 years ago
|
||
mozreview-review |
Comment on attachment 8806287 [details]
No bug - update readability from github repo, includes fix for Bug 1177619,
https://reviewboard.mozilla.org/r/89790/#review89218
Were these the only changes? I thought there was some other stuff in the pipeline that hadn't made it to m-c yet.
Attachment #8806287 -
Flags: review?(gijskruitbosch+bugs) → review+
Assignee | ||
Comment 27•8 years ago
|
||
Yes, other changes on https://github.com/mozilla/readability/pull/310/commits/be37f569d32ce4d7e59bcb30d52a0e8a2b130824 is about tests(run on Travis CI).
Comment 28•8 years ago
|
||
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #27)
> Yes, other changes on
> https://github.com/mozilla/readability/pull/310/commits/
> be37f569d32ce4d7e59bcb30d52a0e8a2b130824 is about tests(run on Travis CI).
Yes, but the last change to readability.js was in April, and there were commits in github in July and August.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 32•8 years ago
|
||
Got it. Misunderstood before.
I've added all changes into mozilla-central. Could you check the patch again? And this is the try[2].
If anything is OK, could you help land the patch?(I don't have level 3) Thanks.
One question, why don't we apply the commits in July and August before?
Thanks for reviewing.
[1]: https://reviewboard.mozilla.org/r/89790/diff/3#index_header
[2]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=259cbb7ff1d7
Flags: needinfo?(gijskruitbosch+bugs)
Comment 33•8 years ago
|
||
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #32)
> One question, why don't we apply the commits in July and August before?
Because I forgot. :-)
Also because there isn't a whole lot in there that makes a big difference to Firefox - some of the bugfixes are specifically for cases where people use the library with a "real" DOM implementation, and so the combination of using querySelectorAll or getElementsByTagName and removing some of the items while iterating plays out differently than it does with jsdom or JSDOMParser.
In any case, now that there's something we want for us, we should update all the files.
We should probably also add the .eslintrc. I just added a github commit to move it to .eslintrc.js like we do in mozilla-central. Can you add it to the commit and then ping me again so I can land? :-)
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(evan)
Comment hidden (mozreview-request) |
Comment 35•8 years ago
|
||
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ef3d294a1cf6
No bug - update readability from github repo, includes fix for Bug 1177619, r=Gijs
Assignee | ||
Comment 36•8 years ago
|
||
Hi Gijs,
We got the eslint errors on the try[1] after added `.eslintrc` file.
```
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/toolkit/components/reader/AboutReader.jsm:121:2 | Missing semicolon. (semi)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/toolkit/components/reader/AboutReader.jsm:699:6 | Missing semicolon. (semi)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/toolkit/components/reader/AboutReader.jsm:710:10 | Missing semicolon. (semi)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/toolkit/components/reader/ReaderMode.jsm:292:8 | Missing semicolon. (semi)
```
We should write a patch for that.
And one more thing, looks like we also need to remove the below two lines in .eslintignore file[2] to let eslint check the files, right?
```
toolkit/components/reader/Readability.js
toolkit/components/reader/JSDOMParser.js
```
[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=803d231136ab&selectedJob=30300566
[2]: http://searchfox.org/mozilla-central/source/.eslintignore#240-241
Flags: needinfo?(evan) → needinfo?(gijskruitbosch+bugs)
Comment 37•8 years ago
|
||
Getting sheriffs to land a followup to fix the bustage. Not taking the eslintignore off Readability.js and JSDOMParser.js because eslint still enables rules enabled in files higher up, meaning that those files generate errors for block-spacing and spaced-comment:
gijs-mbp:firefox-unified gkruitbosch$ ./mach eslint toolkit/components/reader/
/Users/gkruitbosch/dev/firefox-unified/toolkit/components/reader/JSDOMParser.js
1:1 error Expected space or tab after '/*' in comment. spaced-comment (eslint)
1024:69 error Expected space or tab after '/*' in comment. spaced-comment (eslint)
/Users/gkruitbosch/dev/firefox-unified/toolkit/components/reader/Readability.js
1:1 error Expected space or tab after '/*' in comment. spaced-comment (eslint)
301:17 error Requires a space after '{'. block-spacing (eslint)
301:60 error Requires a space before '}'. block-spacing (eslint)
637:3 error Expected space or tab after '/**' in comment. spaced-comment (eslint)
(this is without re-adding the disclaimers at the top so YMMV).
I'll file a followup to enable even more rules and take the exceptions out.
Flags: needinfo?(gijskruitbosch+bugs)
Comment 38•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Updated•8 years ago
|
QA Whiteboard: [good first verify]
You need to log in
before you can comment on or make changes to this bug.
Description
•