Closed
Bug 12274
(prefetch)
Opened 25 years ago
Closed 22 years ago
RFE: Browser should prefetch LINK tag documents
Categories
(Core :: Networking: Cache, enhancement, P5)
Core
Networking: Cache
Tracking
()
VERIFIED
FIXED
mozilla1.2alpha
People
(Reporter: sirilyan, Assigned: darin.moz)
References
()
Details
(Keywords: embed, topembed, Whiteboard: [parity-webtv])
Attachments
(3 files, 4 obsolete files)
(deleted),
patch
|
darin.moz
:
review+
rpotts
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
gagan
:
review+
rpotts
:
superreview+
jesup
:
approval+
|
Details | Diff | Splinter Review |
From a discussion in netscape.public.mozilla.wishlist, under the subject "browse
in a non perishable cache and silent browse ahead":
Matt Fletcher wrote:
>
> > Richard Shaw wrote:
> [snip non-perishable cache idea]
> > Also how about as an alternative to silent download, silent browse
> > ahead. This would when the user is doing nothing would, use up the
> > bandwidth by browsing ahead through the links on a page putting them
> > in a temporary cache only adding them to a normal cache if they are
> > actually visited. This temporary cache should be emptied on exit.
>
> A lot of ISPs and websites would blacklist Mozilla if it did this. For
> instance, imdb.com had a section on how web 'accelerators' really slow
> down the web (i.e. they waste bandwidth on pages at which one never
> looks).
>
> A cvs-like system that checks frequently viewed pages for changes and
> updates those that did might work well with an expanded cache (whether
> your idea or another form). I believe this to be a more net friendly
> and practical option.
>
> Fletch
(The article Matt mentions can be found at http://www.imdb.com/irony.)
This has actually come up a few times in npm.wishlist and I couldn't find a
Bugzilla entry on it just now, so here it is. My admittedly imperfect memory
gives this list of the things mentioned for "look ahead" cacheing:
1. Any look ahead solution should respect the robots.txt exclusion standard, at
a bare minimum.
2. Ideally any look ahead will be history based, not link based. Preloading
pages you have already visited and will probably visit again is good; preloading
pages that you may or may not visit because they are linked from whatever's in
the browser window is bad.
3. An implementation of the LINK tag relationship values "prev" and "next"
might also cache the appropriate resources, regardless of whether they were seen
or not, in the hopes that someone who is going through a series of documents
will probably read them in series.
There are some good reasons to close this bug early, though:
1. There are already many third-party products that provide these capabilities
by acting as proxy servers, so this may be just another creeping feature for
Mozilla.
2. Implementing this *badly* for Mozilla would be worse than doing nothing at
all.
Somewhat related is bug #11644 asking for more control over what types of
resources get cached.
Reporter | ||
Updated•25 years ago
|
Assignee: gagan → nobody
Summary: [RFE] Browser should look ahead and cache [frequently visited] sites automatically → [HELP WANTED] Browser should look ahead and cache [frequently visited] sites automatically
Reporter | ||
Comment 1•25 years ago
|
||
Assigned to nobody@mozilla.org to flag as unclaimed feature request.
Comment 2•25 years ago
|
||
Note: WebTV precaches the document pointed to by <link rel="next"> elements, so
if we do anything it should probably be that.
Bulk move of all Cache (to be deleted component) bugs to new Networking: Cache
component.
Updated•25 years ago
|
Assignee: nobody → fur
Comment 4•25 years ago
|
||
->fur
Updated•25 years ago
|
Target Milestone: M20
Comment 5•25 years ago
|
||
We probably won't get to this for this release, but I'm going to leave it in
the list in case someone wants to volunteer.
Comment 6•25 years ago
|
||
Assigning fur's cache bugs to Gordon. He can split them up with davidm.
Updated•25 years ago
|
Keywords: helpwanted
Updated•25 years ago
|
Summary: [HELP WANTED] Browser should look ahead and cache [frequently visited] sites automatically → Browser should look ahead and cache [frequently visited] sites automatically
Comment 7•25 years ago
|
||
spam, changing qa contact from paulmac to tever@netscape.com on networking/RDF
bugs
QA Contact: paulmac → tever
marking rfe.
Summary: Browser should look ahead and cache [frequently visited] sites automatically → RFE: Browser should look ahead and cache [frequently visited] sites automatically
Moving to target milestone FUTURE. We'll take a look at it again after we ship
N6.
Comment 10•23 years ago
|
||
Changing component and summary according to hixie's suggestion.
Component: Networking: Cache → Parser
Summary: RFE: Browser should look ahead and cache [frequently visited] sites automatically → RFE: Browser should prefetch LINK tag documents
Comment 11•23 years ago
|
||
Reassigning to component & QA owners.
Assignee: gordon → harishd
QA Contact: tever → bsharma
Comment 12•23 years ago
|
||
This isn't a parser bug. Giving to nobody.
Assignee: harishd → nobody
Component: Parser → Networking: Cache
Priority: P3 → P5
Comment 13•23 years ago
|
||
Who put in next, index and prev links if they dont want them
to be used? Don't think this is bad in any way but wery good
for the user. The browser shuld precash next, index and prev
in that order. index and prev will be in the cach 90% of the
time so they are no big deal to the webload. But precashing
them vill be good 90% of the remaining 10% of the times.
IMHO, this RFE is much more important than the link navigation
GUI. Any webpage must define navigation in HTML anyway (most
browsers dont implement link navigation) :-(
Hovewer important, and I would *love* it for 1.0, it *is* a
RFE and it is *not* critical ;-)
Comment 14•23 years ago
|
||
I also want this feature, although it's a minor problem. For example,
http://www.cessna.com
has this problem. On other browsers including NS4.78, prefetch works; ie
hovering mouse cursor over the middle menu "Our Aircraft", "Owners.." etc you
will see a different picture on the right. On mozilla, it's very slow(and
actually it loads the picture each time via Internet), but NS4.78, it's very
quick and it doesn't use the net at all. Apparently NS4.78 prefetches pictures.
Updated•23 years ago
|
Whiteboard: parity-webtv
Comment 15•23 years ago
|
||
I don't think it would make sense on Google: most of the time, I just look at
the first page of results. (Google doesn't use link rel=next now, but they
might add it if browsers get good keyboard shortcuts to access link rel=next.)
On the other hand, this would make sense on most sites that use link rel=next,
and rel=index certainly makes sense. Overall I think this rfe would be good to
implement.
Comment 16•23 years ago
|
||
Prefetching is generally considered evil. To take an example close to heart, it
woul approximately double the load on Bugzilla.
Whiteboard: parity-webtv → WONTFIX? parity-webtv
Comment 17•23 years ago
|
||
I don't think I've ever used Bugzilla's first/last/prev/next links, and I'm not
sure Bugzilla is even using those links correctly.
Comment 18•23 years ago
|
||
I have used them, and they are used correctly.
Assignee | ||
Comment 20•22 years ago
|
||
we recently discussed prefetching with the caveat that the additional downloads
would be low-priority, limited possibly to a single network connection, and
preempted by new page loads. with our existing support for partial cache
entries and byte range requests, we should be able to agressively drop partial
prefetches without sacrificing all of the work done to prefetch.
*** This bug has been marked as a duplicate of 159044 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Comment 21•22 years ago
|
||
oops, i closed the wrong bug :(
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Assignee | ||
Comment 22•22 years ago
|
||
*** Bug 159044 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•22 years ago
|
Status: REOPENED → ASSIGNED
Target Milestone: Future → mozilla1.2alpha
Assignee | ||
Comment 23•22 years ago
|
||
I am pretty worried about comment #16... in our recent discussions, we really
haven't addressed server load as a potential concern. sure, any prefetching
done might improve browser performance, but the overall benefits really depend
on the likelihood that a prefetched document will actually be viewed. perhaps
this is reason enough to trigger off a different tag? that way, servers could
opt in. maybe a HTTP header would even be more appropriate, so as to ensure
that only server admins get to choose the appropriate load level due to
prefetching. or, maybe that shouldn't be any of mozilla's concern?! hmm...
Comment 24•22 years ago
|
||
IMO, Bugzilla should hide the list links unless you click "Go through this list
one bug at a time". The current list links are wrong more often than they're
right, take up a lot of space (especially if you haven't disabled the site
navigation bar), and break caching. Bugzilla's strangeness should not stop us
from making Mozilla faster on sites where Next actually means something.
Assignee | ||
Comment 25•22 years ago
|
||
updated status whiteboard and keywords
Whiteboard: WONTFIX? parity-webtv → [parity-webtv]
Assignee | ||
Comment 26•22 years ago
|
||
this patch implements a very simple version of prefetching. during page load
it collects all of the link rel=next hrefs. when the page finishes loading,
the prefetch code loads the first collected URL in the background. when that
completes, it loads the next URL. it does this until all URLs are loaded or
until the user clicks on a link or otherwise starts loading a new page.
i've added some code to kill off prefetched loads of content that either is
already from the cache or could not be taken from the cache without server
revalidation. this should hopefully avoid taxing servers that misuse link
rel=next (e.g., bugzilla).
Assignee | ||
Comment 27•22 years ago
|
||
for those inside the netscape firewall, there's a link rel=next testcase at
http://unagi.mcom.com/~darinf/test0.html
Assignee | ||
Comment 28•22 years ago
|
||
this patch just tidies things up a bit. i've moved the prefetch code into the
uriloader module (i think that makes sense). this patch works for the 1.0
branch and trunk (makefile.win changes only apply to the branch, of course). i
haven't done the mac project changes yet.
Attachment #96109 -
Attachment is obsolete: true
Updated•22 years ago
|
Attachment #96364 -
Flags: review+
Comment 29•22 years ago
|
||
Comment on attachment 96364 [details] [diff] [review]
v0.1 patch
r=dougt
I spoke with darin about this patch and it is a great start. We/I would like
to add a few things like per window precache queues, an attribute on
nsICacheableChannel to determine if a channel can or should be precached, an d
maybe some kind of UI that shows that there is precaching going on. Worse is
better. Lets get this in.
Assignee | ||
Comment 30•22 years ago
|
||
one change i'd like to make is to not prefetch links containing a query string.
those most likely correspond to dynamic content that will in most cases come
back without cache headers forcing us to hit the server when loading the content
for real. so, the value of prefetching such content is minimal. this nicely
addresses the bugzilla problem BTW :-)
Assignee | ||
Comment 31•22 years ago
|
||
ok, minor change... just blocked URLs w/ query strings, and touched up some
comments.
Attachment #96364 -
Attachment is obsolete: true
Assignee | ||
Comment 32•22 years ago
|
||
Comment on attachment 96936 [details] [diff] [review]
v0.2 patch
carrying forward r=dougt
Attachment #96936 -
Flags: review+
Assignee | ||
Comment 33•22 years ago
|
||
ok, this patch changes things a bit. instead of modifying the HTML content
sink to invoke nsIPrefetchService::PrefetchURI, the prefetch service now hooks
itself up as a tag observer (implementing nsIElementObserver). this way it'll
be notified whenever the HTML parser encounters a <link> tag. i've also added
code to make the prefetch service observe HTTP response headers. this would
allow, for example, a proxy cache to dynamically introduce prefetch requests
for content that is statistically very popular, for example.
Assignee | ||
Updated•22 years ago
|
Attachment #96936 -
Attachment is obsolete: true
Comment 34•22 years ago
|
||
Comment on attachment 97232 [details] [diff] [review]
v1 patch
r=dougt
Attachment #97232 -
Flags: review+
Assignee | ||
Comment 35•22 years ago
|
||
replaced the switch statement with an if-else to simplify code. thx dougt!
Attachment #97232 -
Attachment is obsolete: true
Assignee | ||
Updated•22 years ago
|
Attachment #97242 -
Flags: review+
Comment 36•22 years ago
|
||
hey darin,
this looks really good.
the ownership model of the nsIURI within the nsPrefetchNode is kinda scary :-)
You might want to add a comment or two explaining it ;-)
also, could you leverage the loadgroup (associated with each document) to hold
the prefetch requets... that way, you won't need to worry about cancelling them
when a new document load is initiated...
i don't think this is a big deal... just a thought.
-- rick
Comment 37•22 years ago
|
||
Comment on attachment 97242 [details] [diff] [review]
v1.1 patch - revised per comments from dougt over AIM
sr=rpotts@netscape.com
Attachment #97242 -
Flags: superreview+
Assignee | ||
Comment 38•22 years ago
|
||
thx for the comments rick...
1- yeah, i'll add some comments on the URI ownership
2- i don't think using the loadgroup of the document would work. consider the
case of two documents. one does prefetching, and in the other a user clicks on
a link. now, loading the new page must contend with the prefetch traffic.
whereas what i'd really like is to kill off all prefetch traffic when any other
part of mozilla requests a page load.
Assignee | ||
Comment 39•22 years ago
|
||
patch landed on trunk... minus mac project changes. working on that now.
Comment 40•22 years ago
|
||
For the sake of people that have download limits on there accounts, I hope that
a option has been put into place to turn this on and off as per a users
requirements!!
Comment 41•22 years ago
|
||
I sure hope so too! I don't like the idea of URL prefetching. I have scarce bandwidth. Let me turn this off and I'll shut up. :-)
Comment 42•22 years ago
|
||
Chris, aaronl: Only links in the form of <link rel="next" href="..."> are
prefetched. not <a href="...">.
and there is a preference, disable this with:
user_pref("network.prefetch-next", false);
Assignee | ||
Comment 43•22 years ago
|
||
the preference is only configurable from all.js for now. i probably should have
made it dynamic (i.e., settable via prefs.js). the other thing to note is that
prefetching will only occur when the browser has nothing else to do w/ the
network connection, and furthermore any other browser requests to load anything
will kill off any and all prefetch requests. we are also very selective in what
we'll prefetch. that is, we only prefetch documents that can be reused.
Comment 44•22 years ago
|
||
what is now prefetched? Static files linked with <link rel="next"/> only, right?
Assignee | ||
Comment 45•22 years ago
|
||
any http:// URL that does not contain a ?query string will be prefetched. if
the http headers indicate that the document would have to be fetched fresh
each-and-everytime then we'll cancel the prefetch. in other words, yes, only
static content will be prefetched.
Comment 46•22 years ago
|
||
Assuming it works as described above, I am impressed by the thought that has
gone into this. Let's hope it makes pages appear to render nice and fast!
Do we also support HTTP "Link" headers? They are supported for linking to CSS
style sheets, so presumably it should work for this too, but they are not
supported for site icons, so presumably we still don't have a single Link
service yet, and this means it probably won't work for this either...
Assignee | ||
Comment 47•22 years ago
|
||
the prefetch service is a HTTP header observer, which means that it will pick up
HTTP Link headers, but one downside is that HTTP only reports headers that are
new. IOW, loading a page from the cache will not trigger HTTP header observers.
that's one way in which Link: header differs from LINK tag. perhaps a unified
Link service would be the best way to resolve these differences.
Comment 48•22 years ago
|
||
We have several consumers of link elements and headers, each done slightly
differently (e.g. the stylesheet thing probes, rather than listening, and the
site icon code only does <link> elements). One day someone will snap and unify
all this, hopefully. :-)
Great to hear than Link: headers were taken into account though! How about <meta
http-equiv="link"> ?
Assignee | ||
Comment 49•22 years ago
|
||
nope... looks like the <meta HTTP-EQUIV="link" ...> would be missed :-(
oh well... will work on a follow up patch!
1) make prefetching a user preference
2) add support for <meta HTTP-EQUIV="link" CONTENT="...">
Assignee | ||
Comment 50•22 years ago
|
||
checked in mac project changes.. so this should be in all builds of mozilla 1.2
alpha :-)
Assignee | ||
Comment 51•22 years ago
|
||
fixed-on-trunk
see these bugs for the remaining issues:
http://bugzilla.mozilla.org/show_bug.cgi?id=166647
http://bugzilla.mozilla.org/show_bug.cgi?id=166648
Status: ASSIGNED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → FIXED
Comment 52•22 years ago
|
||
Would it be worth a special case to not stop the prefetching when the user
clicks on a link to the page that is currently being prefetched? It seems that
the probability of this happening is quite high on the rel=next case.
I understand that the partial cache entries and byte range requests help with
this somewhat, but the worst case is still tearing a tcp connection down,
re-establishing it and sending a new request, right?
Assignee | ||
Comment 53•22 years ago
|
||
marko, if the next page references a large document that is being prefetched,
and the user clicks a link to advance to the next page, then if we don't cancel
the prefetched load, loading of the next page will appear to stop on the
prefetched document. only when the prefetched document is entirely downloaded
will the document appear to snap into place/view. this happens because the
prefetch load and the new load are not at all tied together. instead, the
second one gets blocked waiting for access to the cache. so, while it might be
true that not canceling would give better page load times, the result is
something that most likely won't appeal to many users. i think it's better to
cancel the prefetch partway thru, so we can go ahead and display what we've
already got ASAP.
Comment 54•22 years ago
|
||
Wouldn't it be cool if instead of blocking, it noticed that a fetch was already
in progress for that resource, and simply hooked into it?
Assignee | ||
Comment 55•22 years ago
|
||
it would certainly be cool, but not at all easy to implement.
Comment 56•22 years ago
|
||
Concerning comment #38 (2):
Why killing this, and not just suspending until any other activity ends?
(example situation: webpage with search results. I open some of them in new
windows/tabs to see if they contain what I want, but if not - I want to check
the next page of results. Would be nice if it was there already.)
Concerning comment #47:
I wouldn't worry. If the document is already in cache, then there's a good
chance that either the "next" document will be there as well, or it will not be
needed, since if the user didn't follow it first time, they won't follow it now too.
Worth consideration: Mozilla vs Law. "By clicking on this link you agree to
terms of conidtions...". The server logs will say you followed this link even
though you didn't. Luckily webmasters rather don't implement such points as <link...
Also consider possiblity of maliciously formed pages to use this feature to
exploit remote server vulnerablities. More on topic:
http://www.phrack.org/show.php?p=57&a=10
Limiting number of links to follow to a fairly low number (8?) would prevent
abusing this.
Comment 57•22 years ago
|
||
Bartosz wrote:
>Worth consideration: Mozilla vs Law. "By clicking on this link you agree to
>terms of conidtions...". The server logs will say you followed this link even
>though you didn't. Luckily webmasters rather don't implement such points as
><link...
As I was reading through this bug I was thinking exactly the same thing. TBH I
was considering whether 'next' might be considered a 'submit' link (eg. in a
wizard/druid of some kind), in which case preloading it would submit without the
user's express permission - as Bartosz said. But I'm sure there are worse ways
this could be exploited, much as I think its a good idea ;|
At some level this might be considered a bug in the web-app, but if a user
switches to a gecko browser and suddenly finds that their web-email inbox is
empty due to mozilla precaching the 'remove this email' link, there could be
problems...
Comment 58•22 years ago
|
||
Why would anyone ever say
<link rel="next" href="./delete-mail">
...and would they ever do it without a query string? I doubt it.
Assignee | ||
Comment 59•22 years ago
|
||
here's a site that uses <link rel=next> without any cache control headers. the
pages appear to be served up using PHP without the use of query strings. as a
result we prefetch each next page, but then kill off the load once we see that
it doesn't have any cache headers. needless to say this is bad for a number of
reasons. perhaps the best recourse is to evangelize such sites. hmm...
http://www.gnome.org/start/2.0/releasenotes.html
Comment 60•22 years ago
|
||
I take it "Last-Modified" is one of those cache headers?
VERIFIED FIXED. This bug makes reading the HTML4 spec so much nicer. Thanks
Darin. You kick ass.
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 61•22 years ago
|
||
yes, Last-Modified is good enough, because it let's us take a guess at how long
the document can remain stale ((date - lastmodified)/10), and when the document
does expire, all we have to do is send a conditional If-Modified-Since request
to the server (allowing the server to say 304 not modified).
Comment 62•22 years ago
|
||
Uhm... why do I see two downloads of the same document with ethereal? I'm
browsing the HTML specs and I see two full GET's for each page...! I'm using
build 2002091505 .
Comment 63•22 years ago
|
||
With a current trunk build on win2k, I can't reproduce this (although instead of
using ethereal I was just checking breakpoints in the code and also checking the
server logs for a prefetch testcase modelled on the w3 TR documents). I see a GET
for the first document, then a prefetch of the second. On moving to the second
document, I see a prefetch of the third document, etc.
Note: if you click on the 'Next' link in those pages before the prefetch of that
next document was complete then this would cancel that pending prefetch and a
new (partial) GET request would be issued for the next document. But this is by
design.
Assignee | ||
Comment 64•22 years ago
|
||
nick: you should also verify that ethereal isn't "lying to you" ... sometimes
(especially under windows) it'll report a packet twice. check the sequence
numbers to be certain you aren't seeing an ethereal bug. barring that there is
always the possibility that you are loading pages that do not allow caching.
after loading one of the pages, you could look at the page info for the page to
see if the server specified an expiration time. once that expiration time is
past, the prefetched document will have to be validated the next time it is loaded.
Assignee | ||
Comment 65•22 years ago
|
||
Assignee | ||
Comment 66•22 years ago
|
||
there's quite a bit of whitespace-noise in the real branch patch. review this
one instead.
Assignee | ||
Updated•22 years ago
|
Keywords: adt1.0.2,
mozilla1.0.2
Assignee | ||
Updated•22 years ago
|
Alias: prefetch
Comment 67•22 years ago
|
||
While I agree that PRTimeToSeconds should really live in NSPR, we should
minimize having multiple definitions of the same function (I count 3 more within
Necko alone-- FTP has 2 and HTTP has another one) Can we at least consolidate
that into Necko's common util.h file (I forget the exact name-- it's been that
long!)
Other than this nitpick it looks great! r=gagan
Comment 68•22 years ago
|
||
Comment on attachment 102309 [details] [diff] [review]
1.0 branch patch w/ no whitespace changes
r=gagan
Attachment #102309 -
Flags: review+
Assignee | ||
Comment 69•22 years ago
|
||
gagan: thanks for the review. i talked to wtc about PRTimeToSeconds (can't
remember the bug no), and he decided that he didn't want to put that function in
NSPR because (1) no way to represent dates before 1970 and (2) no way to
represent dates after ~2130. i see his point, and i'm hoping to eventually
clean things up so that we can have one instance of this function.
Comment 70•22 years ago
|
||
Comment on attachment 102309 [details] [diff] [review]
1.0 branch patch w/ no whitespace changes
sr=rpotts@netscape.com
Attachment #102309 -
Flags: superreview+
Comment 71•22 years ago
|
||
Comment on attachment 102309 [details] [diff] [review]
1.0 branch patch w/ no whitespace changes
a=rjesup@wgate.com for 1.0 branch with the proviso that the default be that the
feature is disabled unless the pref is used to enable it, as per driver
discussions and Darin's agreement.
Attachment #102309 -
Flags: approval+
Updated•22 years ago
|
Keywords: mozilla1.0.2 → mozilla1.0.2+
Assignee | ||
Comment 72•22 years ago
|
||
default disabled sounds perfectly reasonable to me.
Comment 73•22 years ago
|
||
Discussed in bBird team meeting. We will give adt approval for this as soon as
test cases are attached to this bug. Please make sure Bindu is cc'ed on bug.
We need the checkin to happen by 10/16 COB. I will watch this bug and
immediately plus it when the test cases are attached.
We are going to reserve the right to ship with this turned off if problems are
discovered.
Assignee | ||
Comment 74•22 years ago
|
||
this URL contains links to some examples (sorry for the internal site):
http://unagi/~darinf/prefetch/testcases.html
Comment 75•22 years ago
|
||
Plussing per email from darin indicated lyecies approval to ship pref'ed on, and
attached test cases.
Comment 76•22 years ago
|
||
Plussing per email from darin indicated lyecies approval to ship pref'ed on, and
attached test cases.
Assignee | ||
Comment 77•22 years ago
|
||
marking fixed1.0.2 belatedly. patch landed 10/16.
Keywords: mozilla1.0.2+ → fixed1.0.2
Comment 79•22 years ago
|
||
Verified on 2002-10-25-branch build on Win 2K.
The test case works as expected.
Keywords: fixed1.0.2 → verified1.0.2
You need to log in
before you can comment on or make changes to this bug.
Description
•