Closed Bug 457809 Opened 16 years ago Closed 15 years ago

Speculatively load images from preloading

Categories

(Core :: DOM: HTML Parser, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mrbkap, Assigned: bjarne)

References

Details

Attachments

(2 files, 14 obsolete files)

(deleted), patch
dbaron
: review+
Details | Diff | Splinter Review
(deleted), patch
bzbarsky
: review+
bzbarsky
: superreview+
Details | Diff | Splinter Review
Currently, the preload code only preloads scripts. We should preload images too. jst suggested creating new nsHTMLImageElements since they already contain the code needed to do this (and this is basically what web pages do currently anyway, via new Image()). Alternative suggestions or patches welcome!
Hmm. You could hit the imgILoader yourself, via nsContentUtils, with possibly lower overhead. You'd still need someplace to stick the imgIRequest, though (same issue as where to stick the nsHTMLImageElement).
Attached patch V 1.0 preloading both images and stylesheets (obsolete) (deleted) — Splinter Review
Starting-point. With patch applied, FF fails the normal number of mochitests (40-100), but this is always the case on my build machine... The testpage on http://stevesouders.com/efws/stylesheet-before-iframe.php loads significantly faster with the patch applied. In order to have automatic (mochi)tests for this patch, httpd.js must (at least) be improved to accept requests in parallel. A few issues - avoid preloading an alternate stylesheet (not 100% sure how to detect this) - add support for "legacy html"-cases, if any (current code follows spec) - check that the dummy-observer passed to CSSLoader->LoadSheet is free'd
Assignee: nobody → bjarne
Status: NEW → ASSIGNED
Attachment #368541 - Flags: review?(bzbarsky)
Not sure who should review this... trying bz first. Feel free to change reviewer.
Blocks: 457810
Comment on attachment 368541 [details] [diff] [review] V 1.0 preloading both images and stylesheets Blake should review the parser parts; I can do the rest
Attachment #368541 - Flags: superreview?(bzbarsky)
Attachment #368541 - Flags: review?(mrbkap)
Attachment #368541 - Flags: review?(bzbarsky)
(In reply to comment #2) > - check that the dummy-observer passed to CSSLoader->LoadSheet is free'd From it's doc, CSSLoaderImpl::SheetComplete should do this but I have not found the code which does it.
> - avoid preloading an alternate stylesheet (not 100% sure how to detect this) Your rel check is already doing that, no? At least it'll avoid preloading sheets with rel "alternate stylesheet". If someone has a rel="stylesheet" but a title and it's not the preferred title, you'd still preload, but I think that's fine. You could use GetPreferredSheet() on the CSSLoader to filter out that case if you cared. > - add support for "legacy html"-cases, if any (current code follows spec) I wouldn't worry about it, honestly. > - check that the dummy-observer passed to CSSLoader->LoadSheet is free'd A bigger problem is that you pass in a XPCOM object with a refcount of 0. Please assign the new() result into an nsRefPtr in the caller instead of assigning to a raw pointer? Your StyleSheetLoaded args are mis-indented. Your nsDummyCSSLoaderObserver needs to use NS_IMPL_ISUPPORTS1(nsDummyCSSLoaderObserver, nsICSSLoaderObserver), no? I doesn't need a threadsafe isupports impl; nix that comment. We might want to file a followup on an API for the LoadSheet() thing that lets you pass in the charset too.
Attached patch V1.1 honoring comments from reviewer (obsolete) (deleted) — Splinter Review
(In reply to comment #6) > > - avoid preloading an alternate stylesheet (not 100% sure how to detect this) > > Your rel check is already doing that, no? At least it'll avoid preloading > sheets with rel "alternate stylesheet". If someone has a rel="stylesheet" but > a title and it's not the preferred title, you'd still preload, but I think > that's fine. You could use GetPreferredSheet() on the CSSLoader to filter out > that case if you cared. Ok. The issue was really uncertainty about what makes a stylesheet relative. I'll leave it as it is. It can be tuned later if necessary. > > - add support for "legacy html"-cases, if any (current code follows spec) > > I wouldn't worry about it, honestly. I'm not particularly worried. :) Will leave it as it is. > > - check that the dummy-observer passed to CSSLoader->LoadSheet is free'd > > A bigger problem is that you pass in a XPCOM object with a refcount of 0. > Please assign the new() result into an nsRefPtr in the caller instead of > assigning to a raw pointer? Should be better now. Any hints about how/where to check if this object is free'd is appreciated. (If it's necessary, that is.)
Attachment #368541 - Attachment is obsolete: true
Attachment #368579 - Flags: superreview?(bzbarsky)
Attachment #368579 - Flags: review?(mrbkap)
Attachment #368541 - Flags: superreview?(bzbarsky)
Attachment #368541 - Flags: review?(mrbkap)
It's freed when its reference count goes to zero, which happens when the SheetLoadData referencing it has its destructor called. So no need to worry about it on your end.
Comment on attachment 368579 [details] [diff] [review] V1.1 honoring comments from reviewer >diff -r 82b723917b6b content/base/src/nsDocument.cpp >+void >+nsIDocument::PreLoadImage(nsIURI* uri) { >+ Nitpicks: Move the { into the first column on the following line. Also, nuke the empty line. >+ nsCOMPtr<imgIRequest> aRequest; Nit: aFoo names should be parameters, so this should just be named "request". >+ nsContentUtils::LoadImage(uri, >+ this, >+ NodePrincipal(), >+ GetDocumentURI(), // should be ok for referrer >+ nsnull, // no observer >+ nsIRequest::LOAD_NORMAL, >+ getter_AddRefs(aRequest)); >+ >+ >diff -r 82b723917b6b parser/htmlparser/src/nsParser.cpp >+ case nsSpeculativeScriptThread::STYLESHEET: >+ nsCOMPtr<nsDummyCSSLoaderObserver> obs = new nsDummyCSSLoaderObserver(); nsCOMPtr<> wants an interface, nsRefPtr<> the concrete type. So this should either be: nsCOMPtr<nsICSSLoaderObserver> obs or nsRefPtr<nsDummyCSSLoaderObserver> obs >+ doc->CSSLoader()->LoadSheet(uri, doc->NodePrincipal(), obs); >+ break; >+ if (ptype != NONE) { >+ Nit: nuke the trailing line. While I appreciate wanting to use one loop here, I think it's unclear to use the same loop for <link> and the others, so I'd break this up into two separate branches "if (link) ...do rel, href... else ...do src, type...".
Attachment #368579 - Flags: review?(mrbkap) → review-
(In reply to comment #9) > While I appreciate wanting to use one loop here, I think it's unclear to use > the same loop for <link> and the others, so I'd break this up into two separate > branches "if (link) ...do rel, href... else ...do src, type...". Assuming that mTokenizer->PopToken() in fact pops a token and moves to the next, using two loops with fall-through from first to second may lead to a situation where the first loop skips a token which should have been handled by the second loop. A variant is to let the "link-loop" handle *all* attributes for link (rel, href, type and charset), but then we duplicate code for parsing 'charset' and 'type'. My suggestion is to keep the approach in the patch, unless you really feel strong about this. The other comments will be taken care of in an upcoming patch. As you can see, my xpcom-fu is weak... :)
(In reply to comment #10) > 'charset' and 'type'. My suggestion is to keep the approach in the patch, > unless you really feel strong about this. That's fair enough. I forgot how much overlap there was between the two cases.
Attached patch V1.2 (obsolete) (deleted) — Splinter Review
Update honoring comment #9 and comment #11.
Attachment #368579 - Attachment is obsolete: true
Attachment #368879 - Flags: superreview?(bzbarsky)
Attachment #368879 - Flags: review?(mrbkap)
Attachment #368579 - Flags: superreview?(bzbarsky)
Comment on attachment 368879 [details] [diff] [review] V1.2 >+ nsIRequest::LOAD_NORMAL, >+ getter_AddRefs(request)); >+ >+ Still a couple of extra newlines here. r=mrbkap with them nuked. Thanks!
Attachment #368879 - Flags: review?(mrbkap) → review+
Attached patch V1.3 - fewer newlines (obsolete) (deleted) — Splinter Review
*boom*
Attachment #368879 - Attachment is obsolete: true
Attachment #368966 - Flags: superreview?(bzbarsky)
Attachment #368879 - Flags: superreview?(bzbarsky)
Awesome progress here! I think we should make a case to take this for 1.9.1 as well, once the patch has sr and is in on the trunk.
Flags: wanted1.9.1+
Attachment #368966 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 368966 [details] [diff] [review] V1.3 - fewer newlines Looks good. Please file a followup bug on being able to specify a charset for LoadSheet()?
And cc me on that bug, of course.
(In reply to comment #16) > (From update of attachment 368966 [details] [diff] [review]) > Please file a followup bug on being able to specify a charset for > LoadSheet()? Bug #485012 has been filed.
Keywords: checkin-needed
Pushed http://hg.mozilla.org/mozilla-central/rev/b47f0de93a82 Once bug 396226 is fixed we should be able to write a test for this.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Depends on: 396226
Flags: in-testsuite?
Keywords: checkin-needed
Resolution: --- → FIXED
Comment on attachment 368966 [details] [diff] [review] V1.3 - fewer newlines This is landed on the trunk already. Nominating this for 1.9.1 as this is pretty high return per buck as far as perceived (and actual) performance goes, especially on slow networks. We'll probably want this to bake for a few days, but I'm nominating it now to make sure it doesn't get lost...
Attachment #368966 - Flags: approval1.9.1?
Comment on attachment 368966 [details] [diff] [review] V1.3 - fewer newlines >diff -r 1841618a90ab parser/htmlparser/src/nsParser.cpp >--- a/parser/htmlparser/src/nsParser.cpp Mon Mar 23 23:06:14 2009 +0100 >+++ b/parser/htmlparser/src/nsParser.cpp Mon Mar 23 23:08:32 2009 +0100 >+/** >+ * Used if we need to pass an nsICSSLoaderObserver as parameter, >+ * but don't really need it's services Nit: its >+ nsCAutoString aReferrer; >+ switch(pe.type) { aReferrer is unused. Style nit: add a space between switch and the open parenthesis. >+ // We currently handle the following element/attribute combos : >+ // <link rel="stylesheet" href= charset= type> >+ // <img src= > >+ // <script src= charset= type=> >+ if (ptype != NONE) { >+ // Extract attributes > for (; i < attrs; ++i) { > CAttributeToken *attr = static_cast<CAttributeToken *>(mTokenizer->PopToken()); > NS_ASSERTION(attr->GetTokenType() == eToken_attribute, "Weird token"); >+ >+ if (ptype == STYLESHEET && attr->GetKey().EqualsLiteral("rel")) { >+ // early break from loop if this is not a stylesheet > if (!attr->GetValue().EqualsLiteral("stylesheet")) { >+ ptype = NONE; > IF_FREE(attr, &mTokenAllocator); > break; Don't you end up calling PopToken() one too many times in the clean-up loop below due to not incrementing |i|? Also, if an href attribute was seen before you break here, the src variable will have been set, and will incorrectly get handled below. > } >+ } else if (ptype == STYLESHEET && attr->GetKey().EqualsLiteral("href")) { > src.Assign(attr->GetValue()); >+ } else if (ptype != STYLESHEET && attr->GetKey().EqualsLiteral("src")) { > src.Assign(attr->GetValue()); > } else if (attr->GetKey().EqualsLiteral("charset")) { > charset.Assign(attr->GetValue()); > } else if (attr->GetKey().EqualsLiteral("type")) { > elementType.Assign(attr->GetValue()); > } > IF_FREE(attr, &mTokenAllocator); > } > >+ // Add to list if we found the src > if (!src.IsEmpty()) { > AddToPrefetchList(src, charset, elementType, ptype); > } > } > > for (; i < attrs; ++i) { > CToken *attr = mTokenizer->PopToken(); > if (!attr) { > break; > } > NS_ASSERTION(attr->GetTokenType() == eToken_attribute, "Weird token"); > IF_FREE(attr, &mTokenAllocator); > }
I backed this out; it caused a consistent mochitest failure on Mac and a sometimes one on Windows (on the same test): 42868 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug416896.html | Didn't find the linked rule? - got false, expected true
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
> Don't you end up calling PopToken() one too many times in the clean-up > loop below due to not incrementing |i|? Which I guess isn't a big deal thanks to the |if (!attr) break;|
(In reply to comment #23) > > Don't you end up calling PopToken() one too many times in the clean-up > > loop below due to not incrementing |i|? > > Which I guess isn't a big deal thanks to the |if (!attr) break;| You got ahead of me there... :) > Also, if an href attribute was seen before you break here, the src variable > will have been set, and will incorrectly get handled below. This one is worse... A quick and dirty fix is to check for ptype != NONE && !src.isEmpty() below. Makes the whole thing more conservative and we might miss some opportunities, but at least nothing wrong should happen. Could also clear src before breaking, I guess... Any thoughts?
OK. So the problem happens whenever we actually prefetch that data: sheet. And it happens because the test is testing the inIDOMUtils getCSSStyleRules method, which has a known issue with cloned sheets: it doesn't force rules to have the right parents. See bug 253354 for a description of the inspector problem with CSS rules. I'll see what I can do in terms of fixing the test to work right.
With our current httpd.js, any preloading might cause problems if a test does not run in an onload-handler, or..? I mean, the preload-request is queued up by httpd.js, and the test-code might run before the "preload" receives the answer. This may explain random behaviour. I modified the test-part of test_bug416896.html to run in the body.onload handler, and it succeeds consistently if you clear the cache between each run. I guess the need to clear the cache may arise from the issue in the previous comment?
The ptype != NONE thing works too, since src won't be assigned into if ptype starts out as NONE, but I think the code is easier to understand if you clear src on failure. You'll also need to deal with the case where you have a <link> with href but no rel (unless we never get here in that case?) Do we really need to keep track of |i < attrs| or can we simplify the code a bit (in both loops) by just depending on PopToken() returning nsnull when the list is empty?
Attached patch Proposed text fix (deleted) — Splinter Review
Bjarne, want to update your patch to jag's comments?
Attachment #369197 - Flags: review?(dbaron)
> and the test-code might run before the "preload" receives the answer Only if the test is buggy, since in that case it might have run before the normal load happened too. > I guess the need to clear the cache may arise from the issue in the previous > comment? Clearing the cache just changes timing in your case so that the preload doesn't happen for the data: sheet, I suspect. Over here, I could reproduce the issue about 1 out of every 5 loads, but every single time when the data: sheet got preloaded (I added some logging to check this).
Attached patch V1.4 new logic in parser (obsolete) (deleted) — Splinter Review
Jag pointed out a couple of things which triggered some changes to the logic in ProcessToken() - it should be simpler now. Requesting a new round of reviews...
Attachment #368966 - Attachment is obsolete: true
Attachment #369282 - Flags: superreview?(bzbarsky)
Attachment #369282 - Flags: review?(mrbkap)
Attachment #368966 - Flags: approval1.9.1?
Comment on attachment 369282 [details] [diff] [review] V1.4 new logic in parser Nice! Looks much better. This won't free the tokens when ptype == NONE. And I still think you should just drop the |i < attrs| test in favour of testing the return value of |PopToken()|.
> This won't free the tokens when ptype == NONE. Hmm... do I need to free tokens which I have not popped? I believe everything popped in this code are also freed, and the last line of the function frees the in-parameter. Isn't this sufficient, or am I missing something? > And I still think you should just drop the |i < attrs| test in favour of > testing the return value of |PopToken()|. I have tried this, but a loop like while (attr = PopToken()) seems to continue popping tokens beyond the attributes, eventually crashing FF. It might be possible to add a check in the loop to break if token-type == etoken_end or something, but I don't think this is any nicer. :) I'm open to suggestions on this one...
Not sure. Maybe Boris can shed some light on whether this code needs to manually free the tokens. I was going by the old code and your previous patch. And I guess |i < attr| it is.
The tokenizer will free tokens that are left in it, I believe. But Blake's the expert on that stuff. And the big question is whether it's OK to leave tokens in it at all.
(In reply to comment #32) > Hmm... do I need to free tokens which I have not popped? I believe everything > popped in this code are also freed, and the last line of the function frees the > in-parameter. Isn't this sufficient, or am I missing something? Unfortunately, in the parser, the start tag token doesn't know about its attributes (except for knowing the attribute count). The code in the patch won't leak because the extra attributes will be consumed by the loop in nsSpeculativeScriptThread::Run() and subsequently ignored by ProcessToken (which switches on the token type). It seems like good hygiene to pop the rest of the attributes when we have a start (or end) tag, though. > It might be possible to add a check in the loop to break if token-type == > etoken_end or something, but I don't think this is any nicer. :) I'm open to > suggestions on this one... I don't know why this crashes without seeing the code. I didn't think it was possible to have both a start and end tag in the tokenizer at the same time (given the loop structure in nsSpeculativeScriptThread::Run), but I guess you could run into trouble if we stop tokenizing and then start again with some tokens left over in the tokenizer.
> It seems like good hygiene to pop the rest > of the attributes when we have a start (or end) tag, though. Ok, but I don't understand which attributes that would be. The current loop iterates <attribute-count> times and pops/frees one token in each iteration. Wouldn't this consume all attributes attached to the given start tag? > I don't know why this crashes without seeing the code. I didn't think it was > possible to have both a start and end tag in the tokenizer at the same time > (given the loop structure in nsSpeculativeScriptThread::Run), but I guess you > could run into trouble if we stop tokenizing and then start again with some > tokens left over in the tokenizer. I'd prefer to keep the |i < attrs| loop if that's ok with you, in order to not worry about the issues above. (I can provide the while-loop I tried out if necessary, but I don't think it will lead to much improvement.)
(In reply to comment #36) > > It seems like good hygiene to pop the rest > > of the attributes when we have a start (or end) tag, though. > > Ok, but I don't understand which attributes that would be. The current loop > iterates <attribute-count> times and pops/frees one token in each iteration. > Wouldn't this consume all attributes attached to the given start tag? Remember that we skip this loop when ptype == NONE. Should we put the Pop+free loop in the else case?
Attached patch V1.5 with pop/free-loop for all tags (obsolete) (deleted) — Splinter Review
> Remember that we skip this loop when ptype == NONE. Should we put the Pop+free > loop in the else case? Ahh... now I see what you mean. Sorry for being thick-headed... :) My thinking was to either process the tag and pop all attributes, or drop the tag and not worry about attributes at all. However, adding a loop to pop all attributes for any tag is fine with me.
Attachment #369282 - Attachment is obsolete: true
Attachment #369488 - Flags: superreview?(bzbarsky)
Attachment #369488 - Flags: review?(mrbkap)
Attachment #369282 - Flags: superreview?(bzbarsky)
Attachment #369282 - Flags: review?(mrbkap)
Attached patch V1.6, even more polished (obsolete) (deleted) — Splinter Review
Changed name of variable and avoided if-test in cleanup-loop
Attachment #369488 - Attachment is obsolete: true
Attachment #369493 - Flags: superreview?(bzbarsky)
Attachment #369493 - Flags: review?(mrbkap)
Attachment #369488 - Flags: superreview?(bzbarsky)
Attachment #369488 - Flags: review?(mrbkap)
Attachment #369493 - Flags: superreview?(bzbarsky) → superreview+
Attachment #369493 - Flags: review?(mrbkap) → review+
Keywords: checkin-needed
Note that this can't land without the test fix that's waiting on dbaron's review.
Attachment #369197 - Flags: review?(dbaron) → review+
Comment on attachment 369197 [details] [diff] [review] Proposed text fix Could you add a comment here to point to the bugs that this is working around? (e.g., bug 253354?)
Pushed http://hg.mozilla.org/mozilla-central/rev/67ddbe030ab6 to fix the test. Fixed the patch to actually apply to tip, and changed the nsIDocument IID (should have noticed this before). Pushed that as http://hg.mozilla.org/mozilla-central/rev/c428086801d0 Please make sure to rev the iid on the branch if/when landing this on branch.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Backed out again on suspicion of causing the Tp3 regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
We need to fix bug 486044 before we reland the CSS part of this. And we should have some tests that would have caught that problem (ones where the stylesheet can't be HTTP cached).
Depends on: 486044
And indeed, this caused the Tp3 hit. That might be due to bug 486044 but might also be a weird interaction with the image cache... It might make the most sense to land the image and stylesheet parts separately (with the stylesheet part blocked on bug 486044).
Attached patch V1.7 only preloading images (obsolete) (deleted) — Splinter Review
Part of patch needed to preload images (see comment #45). Requesting another round of reviews just in case...
Attachment #369493 - Attachment is obsolete: true
Attachment #370194 - Flags: superreview?(bzbarsky)
Attachment #370194 - Flags: review?(mrbkap)
Attachment #370194 - Flags: superreview?(bzbarsky) → superreview+
No longer depends on: 486044
Attachment #370194 - Flags: review?(mrbkap) → review+
Should we request check-in for this (ref dependency) to let it bake in trunk?
Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
And backed out again, because it caused the same Tp regression. See http://graphs-new.mozilla.org/#tests=[{%22test%22:%222%22,%22branch%22:%221%22,%22machine%22:%2265%22},{%22test%22:%222%22,%22branch%22:%221%22,%22machine%22:%2252%22}]&sel=1238390870,1238716140 for a graph showing both regressions. Joe, any idea what might be going on here? Bjarne, do you want to try getting in just the stylesheet change instead?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Calling nsContentUtils::LoadImage will probably also start a decode, which may just slow things down. You could try to start an HTTP load just so it's in the disk or memory cache, because that's the slow part anyways.
Well... if we're making this call, that means we're about (as soon as the script we're blocked on finishes loading) to call LoadImage() anyway. We really do want the image data (decoded, etc) in this case. Or we will very soon. That said, does LoadImage start a decode even if there are no imgRequestProxies live when the data starts coming in? > You could try to start an HTTP load just so it's in the disk or memory cache The problem there is that if the "real" load (the one the nsImageLoadingContent will start) starts before this one finishes, we won't be able to coalesce the loads and will load the image twice. I suspect it's strictly worse than what we have right now, or than what this patch implemented.
I meant to ask about this earlier but it slipped my mind: Do you think using nsIRequest::LOAD_BACKGROUND when calling nsContentUtils::LoadImage() would make a difference here? I guess using it wouldn't hurt in any case..? Also, where can I find the pages/scripts involved in Tp3?
> I guess using it wouldn't hurt in any case..? It might; you'd have to check imagelib for sure, but it might make onload fire before the image loads (even though the non-prefetch load won't pass that flag), depending on what flags imagelib passes to necko. > Also, where can I find the pages/scripts involved in Tp3? You'll have to talk to alice about that; they're not publicly available... :(
No longer blocks: 457810
Hmm, it's been a while since there's been an update here. Any time to work on this or are we not going to make 3.5?
Attached patch V1.7 unbitrotted (obsolete) (deleted) — Splinter Review
Patched up the patch to apply cleanly on trunk. (In reply to comment #54) > It might; you'd have to check imagelib for sure, but it might make onload fire > before the image loads (even though the non-prefetch load won't pass that > flag), depending on what flags imagelib passes to necko. imgLoader::LoadImage() merges the load-flag param with some default before passing it to necko. Thus, LOAD_BACKGROUND from the preload thread should be passed to necko. Not sure how this will influence the onload though... any hints? An interesting point about imgLoader is how the image-cache works with multiple threads... I've been reading and re-reading the code, but it's still not clear to me how a second request (on another thread) for an image in the process of being preloaded, will (or should) behave. I'll figure it out eventually, but any pointers to bugs/tests or a knowledgeable person would probably be helpful. imgLoader does *not* have its own thread, right? The imgLoader::LoadImage() call would run on the preload-thread, no? > > Also, where can I find the pages/scripts involved in Tp3? > > You'll have to talk to alice about that; they're not publicly available... :( Talked to Alice and was referred to the standalone talos-test. The issue here is that in standalone talos, all pages seems to be loaded via file://, and I'm not sure how preloading images can influence this kind of loading. In other words : I'm not convinced about the relevance. The big question becomes : does Tp3 load pages also via file:// ? Anyone with insight in this?
Attachment #370194 - Attachment is obsolete: true
Attachment #376313 - Flags: review?(bzbarsky)
> Not sure how this will influence the onload though. LOAD_BACKGROUND loads do not block onload. > how the image-cache works with multiple threads It doesn't; all image loading happens on the main thread. > The imgLoader::LoadImage() call would run on the preload-thread, no? I'd hope not! I thought all that stuff was proxied to the main thread. The script loader and cssloader are not threadsafe either. Tp3 runs over http:// (and standalone talos can run that way too, if desired; needs a local apache install). It does have about 0 latency, though.
(In reply to comment #56) > imgLoader does *not* have its own thread, right? The imgLoader::LoadImage() > call would run on the preload-thread, no? It shouldn't -- we proxy all preload requests to the main thread and kick of the load from there, don't we (that's the point of nsPreloadURIs)?
(In reply to comment #56) > it's still not clear > to me how a second request (on another thread) for an image in the process of > being preloaded, will (or should) behave. Well, first of all, imagelib's interaction with threads might be sort of iffy. We explicitly aren't threadsafe at the moment, so puts into the cache are probably going to make the hash table sad. Anyways, that should be possible to be fixed if it's what we want to do, though it's news to me. Most important is imgLoader.cpp:985 and imgLoader.cpp:689, which more or less mean that if there's currently an image loading on a different thread, we'll load it *again* in a cache entry that is orphaned. If it's done being loaded, we're happy using it on a different thread. > I'll figure it out eventually, but > any pointers to bugs/tests or a knowledgeable person would probably be helpful. (I'm the imglib maintainer - feel free to ask me! joe on irc and :JOEDREW on bugzilla!) > imgLoader does *not* have its own thread, right? The imgLoader::LoadImage() > call would run on the preload-thread, no? imgLoader does not have its own thread, correct.
> LOAD_BACKGROUND loads do not block onload. Ok, but a more interesting insight would be how preloading w/LOAD_BACKGROUND influences or interacts with the "real" image-load (with respect to onload). Any idea...?
(In reply to comment #58) > It shouldn't -- we proxy all preload requests to the main thread and kick of > the load from there, don't we (that's the point of nsPreloadURIs)? Thanks! In fact, nsSpeculativeScriptThread::AddToPrefetchList() makes this point rather clear... :] So I guess threading is not the issue here, but consider the scenario where a request to preload an image has started and the "real" image load starts before it finishes. Here's my understanding of the execution (linenumbers in imgLoader.cpp) : imgRequest::IsReusable() will always return true for these requests since they happen on main-thread. Line 1208: expression is true since preload-request added cache entry ==> entry==preload-cache-entry Line 1209: ValidateEntry returns true since Line 917: hasExpired is true only before imgRequest->OnStartRequest() is called ==> hasExpired==false since we assume OnStartRequest() is called Line 959: ShouldRevalidateEntry() returns false since no special flags are used and hasExpired==false ==> validateRequest==false Line 981: expression is false since IsReusable() returns true Line 1006: skip since app-caches are probably not in use Line 1014: expression is false since validateRequest==false Line 1023: return true since validateRequest==false ==> aProxyRequest == null ==> _retval == null Line 1211: request becomes preload-request ==> entry==preload-cache-entry, _retval==null, request==preload-request Line 1214: don't think this is pertinent - skip Line 1235: expression is false since request!=null ==> load id for preload-request is replaced (line 1297) Line 1301: expression is true since _retval is not set by ValidateEntry ==> proxy for "real" load is created and attached to request for preload ==> Result : New proxy attached to the cached preload-request. Assuming I got this right, are two requests for the same image supposed to interact like this? If not, how should they interact?
> Assuming I got this right, are two requests for the same image supposed to > interact like this? If not, how should they interact? Put in a different way : I guess this is how it's supposed to work?
> Here's my understanding of the execution Looks right. > are two requests for the same image supposed to interact like this? Yes. Which is why I said that you don't want to use LOAD_BACKGROUND.
Ok. A more interesting scenario appears when imgRequest->OnStartRequest() has *not* been called in line 917 : Line 1209: ValidateEntry returns false (but _retval is set) because Line 917: hasExpired is true before imgRequest->OnStartRequest() is called assuming it is NOT called ==> hasExpired==true Line 959: ShouldRevalidateEntry() returns true since hasExpired==true ==> validateRequest==true Line 1014: expression is true since validateRequest==true ==> ValidateRequestWithNewChannel() is called Line 841: expression is false since mValidator is not set yet (only set in line 886) Line 853: a new image channel is created Line 874: a new proxy is created and attached to the preload-request Line 1023: return false since validateRequest==true ==> aProxyRequest==new-proxy Line 1230: entry is set to null ==> _retval==new-proxy, entry==null, request==null Line 1235: expression is true since request==null ==> new image-request and cache entry is created ==> _retval==new-proxy, entry==new-entry, request==new-request Line 1288: PutIntoCache is called for the new entry Line 680: expression is true since preload-request added cache entry ==> tmpRequest becomes the preload-request Line 688: expression is false since IsReusable() is always true Line 696: cached entry is removed from cache Line 702: new entry is inserted in cache Line 1301: expression is false since _retval was set by ValidateEntry ==> Result : Uhh... looks messy. A new image-request is created (line 1235), cache-entry from preload-request is removed from cache (line 696), and two new channels for the image seems to be created and opened (lines 841 and 1235). I haven't been able to identify who calls imgRequest->OnStartRequest(), but I hope it happens on the main-thread before LoadImage() returns, making the above scenario impossible. (Or maybe I missed something in the analysis...?)
> I haven't been able to identify who calls imgRequest->OnStartRequest() Necko. This is the nsIStreamListener callback for the stream listener passed to the channel's AsyncOpen. > I hope it happens on the main-thread before LoadImage() returns It happens on the main thread, but never before LoadImage() returns. It's called when we get the HTTP headers from the server (or read them from our disk cache, which happens asynchronously). I can't speak to the details of the validation/cache analysis; that's joe's bailiwick. But if we indeed don't coalesce image loads for the same url that all start before we get the response to the first one we started, that sounds unfortunate.
Bjarne, I suspect you're right in the general case, but in doing a brief test it turns out to not matter because the load flags on the load group are set to VALIDATE_NEVER. If you want, you can fix this by setting the expiry time to -1 on the imgCacheEntry in its constructor, and then testing against that in ValidateEntry. Alternately, just file a new bug and assign it to me, and I'll do it!
If there are no practical consequences, I see no reason for changing anything. :) But out of curiosity - where is this set and why? I'm really just trying to figure out why this patch hits performance on Tp3 (ref comment #45 and comment #50), but Tp3 is secret and I haven't had any luck reproducing this with standalone talos tests yet. Another approach : maybe someone with sufficient powers could push this fix to a server running Tp3? If it's possible to get data from a Tp3 run before pushing and after pushing the patch, I could compare and maybe approach a solution from that angle (yes, I know there are almost 400 pages, but it's an approach, right...?)
I can't guarantee there are no practical consequences, since that's the way it loads _now_, but it might not be the case with your specloading patch and it might not be that way in the future. As for Tp3, I've sent you an email. Also, Tp3 is run on the try server.
(In reply to comment #67) > But out of curiosity - where is this set and why? Hmm... I wasn't particularly clear on this one. I meant to ask: Where are the load flags on the load group are set to VALIDATE_NEVER, and why is this done? I got your email about Tp3 - thanks! :)
(In reply to comment #69) > Hmm... I wasn't particularly clear on this one. I meant to ask: Where are the > load flags on the load group are set to VALIDATE_NEVER, and why is this done? That I don't know. Someone who knows more about Gecko's page loading might be able to comment, though...
After figuring out how to use the try server and how to retrieve results from it, I was eventually able to identify a couple of pages in Tp3 with the following characteristics : - increased mean and median values (appr. 30% increase) - reasonable stable dataset (< 7% standard-deviation over samples) (i.e. load-time is clearly degraded, and it's not too hard to reproduce) I put the identified pages on an apache-server hidden behind a 10MBit HUB in my LAN and ran some tests. The observation is that sometimes, a prefetch-load happens *after* the first "real" load. Not sure why this happens, nor why it impacts overall performance, but it's fairly simple to avoid so I made a new patch, smoke-tested it and pushed it to the try server. Waiting for results...
> a prefetch-load happens *after* the first "real" load. Hmm. Could it be that we end up finishing up the script load before the prefetch thread has a chance to run or something?
(In reply to comment #72) > Hmm. Could it be that we end up finishing up the script load before the > prefetch thread has a chance to run or something? This isn't possible. What *could* happen, though, is: Main thread asks speculative parsing thread to start. Speculative parsing thread starts and queues up 4 URLs. Network thread posts "script ready" event to the main thread. Speculative parsing thread finds a 5th URL and posts an event to prefetch the URLs. Main thread runs the script and starts the parser (kicking of image loads). Main thread runs the nsPrefetchURLs that was posted earlier, prefetching parts o the document that we've already parsed for real.
> but it's fairly simple to avoid so I made a new patch, smoke-tested it and > pushed it to the try server. Waiting for results... Nope... Tp3 is still degraded.
(In reply to comment #73) > Main thread asks speculative parsing thread to start. > Speculative parsing thread starts and queues up 4 URLs. > Network thread posts "script ready" event to the main thread. > Speculative parsing thread finds a 5th URL and posts an event to prefetch the > URLs. > Main thread runs the script and starts the parser (kicking of image loads). > Main thread runs the nsPrefetchURLs that was posted earlier, prefetching parts > o the document that we've already parsed for real. The patch mentioned in comment #74 makes an image prefetch-load return immediately if the image is found (and validated) in the image-cache. Do you see other or better ways to handle this?
Comment on attachment 376313 [details] [diff] [review] V1.7 unbitrotted This looks ok, but removing review pending the Tp issue getting sorted out.
Attachment #376313 - Flags: review?(bzbarsky)
Something just struck me : Since a prefetch happening before the "real" load creates a proxy-listener, would animated GIFs cause problems here? Can we avoid creating a proxy for the preload, or maybe create some proforma-proxy?
For some days I had a sneaking suspicion about Tp3 and it was just confirmed talking to Alice. Tp3 uses a local installation of apache and runs all pageloads in one big loop. This means (among other things) that 1) the network is *not* a bottleneck (access to localhost is most likely heavily optimized) 2) images will be held in the network-cache and maybe also in the img-cache between each pageload Thus, the relevance of a Tp3 run for this particular patch must be considered rather low, since the essential idea is to invest CPU-time to "hide" image-loading time. I'm not saying that the solution/patch cannot be improved, just suggesting that we should consider what Tp3 actually does before we draw conclusions based on Tp3-results. As mentioned in comment #71, I have a local setup simulating a reasonably slow network, and I'm currently trying to get the standalone talos working reliably so I can run talos-tests against it with caching turned off.
Bjarne, I've been doing a pile of page load testing over slow networks and comparing cache and non-cache behavior. If you need help contact me in mail - I have a big setup for that and you can have it if you want to do some testing + measurement.
Yes, I wouldn't expect Tp3 to show an _improvement_ with this change (though in some cases I suspect it still might). Butthe CPU overhead shouldn't be so high that it shows a definite regression... We're doing the parsing no matter what, so the only extra cost is the LoadImage calls, right? Are those that expensive?
The LoadImage calls would _also_ have to happen anyways. But if you're getting two references to images via LoadImage (one in the preloader, one by the document), you are doing extra work - possibly even another network load if it's expired or Cache-Control: no-cache.
Hmm. Ideally the preload load would prime the cache unconditionally, ignoring cache-control and such... maybe we need more imagelib work for that.
I think it'd be enough if the callers of LoadImage were changed: the only way imagelib knows about Cache-Control: no-cache is because the load group passed in says to bypass the cache. If you clear that bit in the load group, you should be OK, because we unconditionally put images into the cache (maybe incorrectly?).
Where is this loadgroup check?
(In reply to comment #80) > Butthe CPU overhead shouldn't be so high that it shows a definite regression... > We're doing the parsing no matter what, so the only extra cost is the > LoadImage calls, right? Are those that expensive? In this context - what are the units of numbers reported by Tp3? In one run for example, it says "tp: 303.19". Seconds? Milliseconds? Sum of mean/median over all pages? Things may be clearer if we quantify the loss.
Last I checked, tp was the average of the medians of the pages; all times measured in ms. What matters is the percentage change, though, not the absolute change.
(In reply to comment #79) > Bjarne, I've been doing a pile of page load testing over slow networks and > comparing cache and non-cache behavior. If you need help contact me in mail - > I have a big setup for that and you can have it if you want to do some testing > + measurement. Thanks! I've sent you an email.
(In reply to comment #86) > Last I checked, tp was the average of the medians of the pages; all times > measured in ms. > > What matters is the percentage change, though, not the absolute change. I suspect we could have a nice, long discussion on this topic. :) However, let me phrase my understanding of facts so far : For a particular talos-run we have a regression from 224ms to 233ms (9ms / 4%). The hit is on the average of the medians over almost 400 unique pages. Certain pages take larger hits (50-60% for median, 30-35% for mean), but we also see improvement for some pages (up to 23% for median, 15% for mean). The median per page comes from timing 10 loads. Loading is done from an apache-server on localhost with default cache-settings enabled. Now, in my opinion. loading from localhost means that DOM building, layout and rendering represents a larger portion of total page-load time than for normal browsing. I also think caching should be disabled, but since all pages are loaded before starting the next cycle, we may assume that the cache do not impact samples 2-10 for a particular page too heavily. What to conclude from this is not entirely clear to me yet... :] But in general, I agree with comments above that this patch should not hit performance, and certainly not 50%. So more work is needed apparently... Any insights about animated GIFs (comment #77) ?
> but we also see improvement for some pages (up to 23% for median, 15% for mean) I don't think anyone other than yourself was aware of this fact until now. ;) > loading from localhost means that DOM building, layout and rendering represents > a larger portion of total page-load time than for normal browsing. Yes of course. > we may assume that the cache do not impact samples 2-10 for a particular page > too heavily. It actualyl does: changing our memory cache size has a noticeable impact on this time. Same for our image cache size. > this patch should not hit performance, and certainly not 50% Right. If it has 20% wins on some pages and we keep the downside low, it should be a net Tp3 win (and of course an even bigger win on more realistic setups). > Any insights about animated GIFs (comment #77) ? Not offhand. If we had a way to have proxies that don't trigger the animation, we could use them elsewhere too, though!
The animated gif theory could be tested by running talos with and without this patch, in both cases with image animations disabled in the profile.
Or heck, disabled in all.js.
(In reply to comment #89) > > this patch should not hit performance, and certainly not 50% > > Right. If it has 20% wins on some pages and we keep the downside low, it > should be a net Tp3 win (and of course an even bigger win on more realistic > setups). A 20% win on an already fast page will influence the final result much less than a 5% loss on a slow page, though. And I think it would make sense that this would lead to fast pages getting faster, since they won't be doing silly things that would require us to reload images. Slow pages will be, and so will suffer more.
Oh. That's not handling no-cache. That's handling cases when the document channel has some of the cache or validation flags set on it (by inheriting them to all other loads in the document). I don't believe any of them are set by default unless one is doing a history load of some sort (back/forward, reload, shift-reload). That could be checked by adding some printfs there, of course.
I'm experimenting with an extra flag to LoadImage to indicate a preload. What would be the suggested actions when this flag is set with respect to "prime the cache unconditionally"? (I'm already using it with, ehh, very limited success to exit early if the image is in the cache and valid already.)
I'm pretty sure that it's used for no-cache in addition to that, because in debugging bug 490949 those bits were set. (Pretty sure, not 100%.) Your best bet is going to be your existing early exit combined with what we do now in the case of a cache miss. The biggest problem with priming the cache unconditionally is going to be the validation that we want to do when we get the cache element the second time, but I think you can work around all those mechanisms by passing in nsIRequest::VALIDATE_NEVER.
Hrm... I decided to time the calls to imgLoader::Loadimage() today, and something occurred to me : With preloading enabled, imgLoader::Loadimage() is called at least twice for all images (one for the preload and one for the real load, in some order). Or put in a different way : preloading adds one extra call for each unique image, either prior to the real load or after the real load. The observation is that the real load frequently takes 1ms and sometimes 2-3ms, even if it finds the preloaded image in img-cache. Thus, with a large number of images on the page, the overhead of the extra calls introduced by preloading may actually be significant. Thoughts and/or insights? I'm planning to investigate this issue further tomorrow.
> 1ms and sometimes 2-3ms, even if it finds the preloaded image in img-cache That's a huge amount of time. If you can figure out where it's being spent, that would be really nice!
(In reply to comment #98) > > 1ms and sometimes 2-3ms, even if it finds the preloaded image in img-cache > > That's a huge amount of time. If you can figure out where it's being spent, > that would be really nice! Yeah... I'm currently using the macro GIVE_ME_MS_NOW() - is there something which gives me a better resolution?
PR_Now() ?
Hehe... PR_now() seems to do the trick, yes... :} Below is a typical call to imgLoader::LoadImage() for a "real" load of an image which has been preloaded (extra details logged for this exercise). This particular image is used 15 times on the page, resulting in 16 calls to LoadImage() (1 for the preload and 15 per usage). Preloading it (http from localhost) takes 2380us. -1361524992[7fc6a6910040]: -1994105050 [this=7fc69e3f70c0] imgLoader::LoadImage (aURI="http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif") -1361524992[7fc6a6910040]: -1994105050 [this=7fc69e3f70c0] imgLoader::LoadImage (prefetch="0") -1361524992[7fc6a6910040]: -1994105050 [this=7fc69e3f70c0] imgLoader::ValidateEntry {ENTER} -1361524992[7fc6a6910040]: -1994105050 [this=7fc69e3f70c0] imgLoader::ValidateEntry {EXIT} -1361524992[7fc6a6910040]: -1994105050 [this=7fc694dff7c0] imgCacheEntry::Touch {ENTER} -1361524992[7fc6a6910040]: -1994105050 [this=7fc694dff7c0] imgCacheEntry::Touch {EXIT} -1361524992[7fc6a6910040]: -1994105050 [this=7fc69e3f70c0] imgLoader::LoadImage |cache hit| (request="7fc694eca460") -1361524992[7fc6a6910040]: -1994105049 [this=7fc69e3f70c0] imgLoader::LoadImage -- creating proxy request. -1361524992[7fc6a6910040]: -1994105049 [this=7fc69e3f70c0] imgLoader::CreateNewProxyForRequest (imgRequest=7fc694eca460) {ENTER} -1361524992[7fc6a6910040]: -1994105049 [this=7fc694ffbf10] imgRequestProxy::Init (request=7fc694eca460) {ENTER} -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::AddProxy (proxy=7fc694ffbf10) {ENTER} -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::AddProxy {EXIT} -1361524992[7fc6a6910040]: -1994105049 [this=7fc694ffbf10] imgRequestProxy::Init {EXIT} -1361524992[7fc6a6910040]: -1994105049 [this=7fc69e3f70c0] imgLoader::CreateNewProxyForRequest {EXIT} -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetSecurityInfo -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694ffbf10] imgRequestProxy::OnStartRequest (name="http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif") -1361524992[7fc6a6910040]: -1994105049 [this=7fc694ffbf10] imgRequestProxy::OnStopDecode -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694ffbf10] imgRequestProxy::OnStopRequest (name="http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif") -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetSecurityInfo -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: -1994105049 [this=7fc694eca460] imgRequest::GetURI -1361524992[7fc6a6910040]: imgLoader::LoadImage() proxy created. EXIT [uri=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=582 us] The early exit when preload comes after a real load takes between 3 and 7us. Subsequent calls to LoadImage() after a preload takes between 400 and 700us (582us in the log above). (My statement about 2-3ms seems to be wrong - I must have looked at wrong log-lines.) On this particular page we have 39 unique images. 4 are loaded for "real" before the preload kicks in, 35 are preloaded. Preloading thus adds 35 extra LoadImage calls, which with an average of 600us per call gives an extra 21ms on the total page-load time. The timings I have from the try server indicates about 90ms extra time with preloading enabled, but numbers are from different computers so I better get my standalone talos up running to get comparable numbers... (I suspect that 5-700us extra time per image may not explain the whole problem, but on the other and we should keep in mind that preloading images from a real network is bound to have a bigger impact on total user-experience than preloading from localhost. I.e. Tp3 may not be a suitable way to assess the value of preloading.)
That's really a long time for a cache hit to take, even so. Are you compiling optimized?
Yeah, that seems pretty long. If this is an opt build, can you figure out where the time is being spent inside LoadImage?
Cleared my ccache, removed obj-dir and changed mozconfig to include ac_add_options --disable-debug ac_add_options --disable-logging ac_add_options --enable-optimize Then, added printfs to LoadImage(). Result for the image discussed above [url=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=5] CacheSize verified [url=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=11] searching for cache-entry [url=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=18] cache search done [url=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=31] cache-entry validated [url=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=39] no proxy found - creating one [url=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=51] proxy created [url=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=95] proxy added to loadgroup [url=http://localhost/tp3/www.flurl.com/www.flurl.com/images/misc/arrow_flurl.gif time=123] listeners notified - exit Thus, reduced by a factor of almost 5. Trying to get my standalone talos working now so that I can get comparable numbers...
Hm. So 44us of that time is spent basically adding to the loadgroup (I would bet mostly firing the STATE_START notification in docloader, plus the various other loadgroup and docloader stuff). Another 28us seems to be NotifyProxyListener. That part should be really cheap for the initial preload, though, since there's nothing to notify about...
After a number of mysterious stops in the middle of the run (typically in some cycle between 3 and 6), I eventually got full browser-logs (used to gather statistics) for a clean build of mozilla-central (baseline) plus a build with patch applied. Results are rather confusing though... Average over all medians for baseline (on my local box, nothing else is running there) is 626.38ms, whereas this number with patch applied is 277.48, which doesn't make sense at all. Corresponding values derived from try server are 227.64 and 235.39 respectively. I'm currently re-running local tests, and any insight into why standalone talos suddenly may decide to stop would be very useful... Also, if anyone knows if there are any non-standard settings in the apache-configuration on try server, please let me know (suddenly realized that this also can make a difference).
Finally there were two more completed Talos-runs - one from a clean build and one with the patch applied. Some analysis follows... This is the average-over-all-medians, aka the number we see on the graph-server for Tp3. It looks like the largest median is excluded from this average in the report from Talos, whereas the average reported in previous comment was calculated by me and includes the largest median. It would be nice to get an explanation of the statistics reported by Talos (or maybe see the code which calculates it). Average over medians : Clean run 1: 605.86ms Clean run 2: 273.44ms Patched run 1: 261.82ms Patched run 2: 277.66ms Tryserver "clean" run: 226.05 (not really clean, but no preloading) Tryserver patched run: 233.72 On my local box, there are two very different clean runs. For some reason, clean run 1 at this level sticks out and must be viewed as an error. There are also two patched runs where one is slower and one is faster than the best clean run. The trend on try server/talos boxes, however, is clearly that the patched run performs worse than clean runs. Conclusion : My local server does not seem to produce stable results, even when we calculate average and medians over large datasets. One possible difference is that my local box is running the Tp3 pageset with the standalone Talos configuration, where the talos-boxes may have some other configuration. Below are details for the two pages identified in comment #71 Medians for page1 page2 Clean run 1: 150ms 258ms Clean run 2: 156ms 255ms Quite similar times, but note that page1 loads slower in run2 whereas page2 loads faster. Here are medians for the two pages, as well as %increase as compared to the first clean build : Medians for page1 page2 Patched run 1: 150ms (00.0%) 258ms (00.0%) Patched run 2: 191ms (27.3%) 413ms (60.1%) and for reference Tryserver clean: 270ms 467ms Tryserver patched:359ms (33.0%) 721ms (54.4%) Patched run 2 has increased load times on the scale of the try server, but patched run 1 loads as fast as the clean run. And no, patched run 1 is not confused with the clean run - here are the time-samples for page1 including mean (as reported by Talos) and (biased) standard deviation for the sample Clean run 1: 147;150;151;147;154;153;145;169;151;150 (149.78 +- 4.18%) Clean run 2: 156;158;159;149;156;160;158;148;151;158 (154.78 +- 2.66%) Patched run 1: 178;151;149;150;155;150;155;148;149;155 (151.33 +- 5.46%) Patched run 2: 220;186;185;187;191;196;194;184;195;196 (190.44 +- 5.13%) and for page2 Clean run 1: 287;262;259;253;255;254;263;262;257;258 (258.11 +- 3.55%) Clean run 2: 254;259;282;260;255;255;254;267;255;312 (260.11 +- 6.64%) Patched run 1: 490;250;258;263;314;253;256;259;254;263 (263.33 +- 24.55%) Patched run 2: 390;397;446;413;448;411;414;413;414;417 (412.78 +- 4.17) It should be noted that due to the way Talos seems to compute mean values (excluding the largest value in the set), my calculation of standard deviation is slightly inaccurate since it includes deviation of the largest value. It is not a crucial point however, except for patched run 1 where this effect kicks in heavily. But when taking that into account, IMO all datasets are quite stable. What can be concluded from all this? Not sure really. Between two clean runs, page1 loads slower and page2 loads faster. In patched run 1, both pages use exactly the same time as in clean run 1 (and thus page1 in patched run 1 loads faster than in clean run 2). In patched run 2, both pages load significantly slower than both clean runs, looking more like what we see on try server/talos box for these pages. I guess what I'm trying to say is that I see no trend for these two individual pages on my local box.
Jeez... now I see what happens... it is necessary to run a large pageset to see this effect, but it's pretty obvious once you do that. The method imgLoader::CheckCacheLimits() loops until the total cache-size is less than a given threshold. It calls imgLoader::RemoveFromCache() which uses timestamps and datasize to determine which cache-entries to remove first. Small and old images seems to be killed off first. Obviously, this only happens when the image-cache is full. From logs you can see that a call (maybe as part of cleaning up images from the previous document) to imgLoader::CheckCacheLimits() frequently occurs after a number of preloads have started. The cache-cleaning logic then tends to remove those images which just were preloaded since their datasize very often is 0 (possibly related to bug #487883). The result is that, frequently, an image is set up to be preloaded, then removed from cache, and finally when the normal load happens the image must be set up and loaded all over again. An obvious solution is to adjust logic in imgLoader::CheckCacheLimits() to avoid removing images with size=0. This improves the situation, but I still observe images being removed that have just been preloaded. A better criterion might be to avoid removing images which are in progress of being loaded/decoded - is there a safe way to check for this?
(In reply to comment #108) > Small and old images seems to be killed off first. Whoa! This is not what I intended! It should be killing off _large_ and old images. Can you file a bug on this? > A better criterion > might be to avoid removing images which are in progress of being loaded/decoded > - is there a safe way to check for this? Aha! Images that are being preloaded aren't used by documents, so the caching logic will consider removing them. If the preloader kept a reference to the imgIRequest (imgRequestProxy) that imgLoader::LoadImage returns, we would _never_ remove that entry from the cache.
We considered doing that, but it's actually a bit of a pain: there is no obvious place to put the preload imgIRequest. I suppose we could just create a hashtable on the document or something.... Then we have the issue of when to drop from that hashtable (which is that same issue we'd have if we tried to not evict still-loading images). Since some image loads never terminate, and since ideally we need to keep the image alive until the page requests it "for real" (whevener that might be) this would be a somewhat tough call. Does just changing the small/large thing help sufficiently that we don't have to worry about pinning?
(In reply to comment #109) > (In reply to comment #108) > > Small and old images seems to be killed off first. > > Whoa! This is not what I intended! It should be killing off _large_ and old > images. Can you file a bug on this? It is my interpretation of the logs - I haven't studied the code properly. I'll see what I can do.
(In reply to comment #110) > I suppose we could just create a > hashtable on the document or something.... Then we have the issue of when to > drop from that hashtable Would dropping entries from the hastable on 1) the first "real" load and 2) document unload be unsafe?
You mean unpinning? Yes, I think so. Think multipart jpegs. There's no reason to keep them around forever if the page is not showing them; and relying on the page unload is not great, imo.
Hmm. So CompareCacheEntries says that one < two if one is older and bigger, right? make_heap puts the "biggest" (in your comparator's sense) entry as the first element. imgCacheQueue::Pop pops of the first element of mQueue. CheckCacheLimits uses Pop() to evict entries. Therefore CheckCacheLimits evicts the smallest and newest entries, as far as I can see.
Depends on: 496593
> Therefore CheckCacheLimits evicts the smallest and newest entries, as far as I > can see. Yup - this fits with what I see in the logs ("small and old" in comment #108 was inaccurate - "small and new" fits better). And, needless to say, this effectively eliminates positive effects from preloading once the cache-limit has been reached (as well as giving a rather sluggish user-experience with respect to image-loading I would guess) :\ So - once the fix for bug #496593.has landed I guess we should try to land this patch again?
(In reply to comment #116) > So - once the fix for bug #496593.has landed I guess we should try to land this > patch again? The fix for bug 496593 will only make it less likely that the preloaded image will be evicted from the cache, not impossible. (Think of a very large image.) I'd like to see us come up with some sort of solution for pinning those elements in the cache, even if it's an extra flag passed in to LoadImage().
It'd need to be a very large image that we actually get a lot of data for before the "real" load starts. That should be pretty rare, I would think. That said, we could try pinning them until DOMContentLoaded. That might work...
Simple approach to pin images while they are preloaded. Since we really just need to pin all preloading images until document itself has a reference to them, I use a nsCOMArray and unpin them all in nsDocument::DispatchContentLoadedEvents() method. (Xpcom-fu is still weak though, so any suggested improvements to this approach are expected and welcome...)
Attachment #376313 - Attachment is obsolete: true
Attachment #382702 - Flags: review?(bzbarsky)
Results from Talos on try server looks promising. Awaiting review...
Have you tested cases when DOMContentLoaded is not fired? For example, when the user closes the window, or hits the stop button?
No. Would the document-object survive in such cases? Is there another method in nsDocument more suitable for unpinning (onSomethingWeirdHappened() or similar)? Is there some way to set up a test doing such things, btw, or need it be done manually?
The document object certainly survives when you hit stop. Maybe put the code in nsDocument::EndLoad? As for testing... Testing the user hitting stop should not be that hard; you can call window.stop() partway through a test. The question is how you tell whether the image preloads get dropped then.
(In reply to comment #123) > The document object certainly survives when you hit stop. Ok. But are consequences of pinned references to images in this situation severe enough to worry about? I guess it is if the image-cache is global for the instance of FF (all windows/tabs)... Is the document otherwise sufficiently intact to be used? Moreover, on the next load, wouldn't all pinned preloaded images be cleared anyway? Or should we perhaps unpin in nsDocument destructor, assuming that the document-object is replaced on the next load? > Maybe put the code in nsDocument::EndLoad? No problem. :) I assume this will cover more unexpected situations than the current approach? (What about destructor, mentioned above?) > As for testing... Testing the user hitting stop should not be that hard; you > can call window.stop() partway through a test. The question is how you tell > whether the image preloads get dropped then. Hmmm... maybe the preloading images could unpin themselves on error-conditions? Would require a method in the document. Or maybe the document could hold an observer for each preloaded image which unpins the image on error-conditions?
> are consequences of pinned references to images in this situation > severe enough to worry about? Yes, since it can keep alive network streams that should die. It can also cause us to keep in memory images the user doesn't want (hit stop before we go there). The image cache is in fact global. You seem to want to assume documents are shortlived. They are not. That document might just go into bfcache when you navigate away from it, and stick around for a good long while. > assume this will cover more unexpected situations than the current approach? I _think_ so. Double check, please.
(In reply to comment #125) > > are consequences of pinned references to images in this situation > > severe enough to worry about? > > Yes, since it can keep alive network streams that should die. It can also > cause us to keep in memory images the user doesn't want (hit stop before we go > there). The image cache is in fact global. Very well... The open question, of-course, is whether unpinning these references actually causes resources used by the images to be released. But I assume we can just make sure to clean up what we've introduced and leave the rest of cleanup to other parts of the system, yes? > You seem to want to assume documents are shortlived. They are not. That > document might just go into bfcache when you navigate away from it, and stick > around for a good long while. What we _want_ is ultimately not relevant when coding like this, I guess (only when designing new systems, which is not the case here). But I've found that challenging "established" facts and patterns can be fruitful from time to time in complex software, leaning on reviews and test-coverage for reality-checks. :) (Some description of the life-cycle of documents would probably be generally useful, btw.) Manual testing shows that pushing escape and/or stop button in the middle of a page-load causes DispatchContentLoadedEvents() to be called. So does closing the tab or window before the page is loaded. Double-checking : nsDocument::EndLoad() calls DispatchContentLoadedEvents() without exceptions, afaics. nsXMLDocument and nsHTMLDocument call nsDocument::EndLoad() from their overridden EndLoad()-methods. The last (which I'm able to find) subclass of nsDocument, nsXULDocument, does _not_ call nsDocument::EndLoad(), but calls DispatchContentLoadedEvents() in a number of situations via ResumeWalk -> DoneWalking (I have not analyzed these situations closer, though). Thus, I've currently left unpinning where it is since DispatchContentLoadedEvents() is a) called from EndLoad() in any case, and b) called directly from a number of places in nsXULDocument. Manual testing indicates that this is safe. Let me know if an automated test for this bug is required - otherwise I'm awaiting review and hoping to get some baking-time on trunk for this patch.
> nsDocument::EndLoad() calls DispatchContentLoadedEvents() without exceptions, > afaics. Um... yes. I'd forgotten that we had different semantics for DOMContentLoaded and onload (the latter does not fire on stop; the former, as you noted, does). The patch looks pretty good then, with two issues: one minor, one major. Minor: the indent on + // Unpin references to preloaded images + mPreloadingImages.Clear(); is off. Major: This patch is taking up the last available nsIRequest load flag for something that's completely irrelevant to most requests. I'd rather see an API change to loadImage here that allows us to say that this is a prefetch instead, if joe is ok with that.
(In reply to comment #127) > Major: This patch is taking up the last available nsIRequest load flag for > something that's completely irrelevant to most requests. I'd rather see an API > change to loadImage here that allows us to say that this is a prefetch instead, > if joe is ok with that. I'm totally fine with that. Double points if you can figure out some way to not have to change every user of LoadImage() to add that extra parameter. (Maybe create a new method in imgILoader, PreloadImage(), and then change LoadImage and PreloadImage to each call an internal helper function?)
Comment 128 sounds great to me.
It would be simple to refactor the code like this, yes. However, would it be possible to use a bit btw LOAD_BACKGROUND and INHIBIT_CACHING instead (I put it in the end by old habit)?
Those flags already mean something. Don't overload their meaning.
This refactoring feels a little "clunky" and I'd feel better using a flag. However, I do see the point of not spending the last load flag. Note that knowledge about preloading is only used for early exit if the image is already being loaded (or more precisely : if it's in the cache and valid). Thus, maybe we could do better using imgLoader::FindEntryProperties() or similar to check this before triggering a preload (would avoid the load flag as well as the refactoring)? Joe...? Another thing worth considering is the 9th parameter to imgLoader::LoadImage(), apparently representing an existing request to an image. The old approach in nsContentUtils::LoadImage() passes nsnull here, but it might be better to use the imgIRequest-object being preloaded instead, no? It's not entirely clear to me what the purpose of this parameter is, but I assume it's not there by accident. Any insights to this?
Attachment #382702 - Attachment is obsolete: true
Attachment #383458 - Flags: review?(bzbarsky)
Attachment #382702 - Flags: review?(bzbarsky)
> but it might be better to use the imgIRequest-object being preloaded instead, > no? Except we don't have one, do we? That argument is for cases when you already have a non-null imgIRequest around. I'd like joe to get the imagelib api he wants here sorted out (in particular the DoLoadImage approach vs the PreloadImage approach) before deciding how the nsContentUtils code should look...
(In reply to comment #133) > > but it might be better to use the imgIRequest-object being preloaded instead, > > no? > > Except we don't have one, do we? That argument is for cases when you already > have a non-null imgIRequest around. I really need to learn to express myself clearer... :\ What I mean is to let nsContentUtils::LoadImage() pass the pinned (if it exists, obviously) imgIRequest for *real* loads. What I don't know/understand is what effect this will have, i.e. what does passing an existing imgIRequest to imgLoader::LoadImage() mean. For preloads, we can use imgLoader::FindEntryProperties() or similar to check if the image already has started to load, and avoid calling nsContentUtils::LoadImage() altogether if so. The bonus of this is that we don't need the troublesome preload-flag, nor do we need my clunky refactoring. We would, however, need a method in nsContentUtils to wrap the call to imgLoader::FindEntryProperties (like the original nsContentUtils::LoadImage does).
> what does passing an existing imgIRequest to imgLoader::LoadImage() mean. "good question". I'll let joe field this. ;) > avoid calling nsContentUtils::LoadImage() altogether if so Sounds good. > need a method in nsContentUtils to wrap the call to > imgLoader::FindEntryProperties That's fine, if it's really needed. Is it, though? Why not call it directly from the preloading code?
(In reply to comment #134) > What I mean is to let nsContentUtils::LoadImage() pass the pinned (if it > exists, obviously) imgIRequest for *real* loads. What I don't know/understand > is what effect this will have, i.e. what does passing an existing imgIRequest > to imgLoader::LoadImage() mean. You can't re-use an imgIRequest (e.g., an imgProxyRequest) in this way. This is mostly a useless parameter that Boris has just proposed (and I agree) be deleted. You can choose to create an imgIRequest and pass that in to LoadImage as aExistingRequest, or one will be created for you, but either way, it has to be brand new.
With this approach, nsDocument::PreLoadImage() checks whether the image is in the img-cache by requesting its properties, and returns if this is the case. This way we can avoid both the clunky refactoring in previous patch as well as the load-flag to indicate preload. I'm a bit uncertain whether the property-object retrieved by nsDocument::PreLoadImage() will leak (still weak xpcom-fu...). Tp3 looks ok on the try server. For some reason, the test-page on stevesouders.com from comment #2 loads *really* slow both with and without this patch (maybe my network is weird today). I can try it again some other day, or someone else may test it...
Attachment #383458 - Attachment is obsolete: true
Attachment #384001 - Flags: review?(bzbarsky)
Attachment #383458 - Flags: review?(bzbarsky)
Attached patch Like previous, but stupid typo corrected... (obsolete) (deleted) — Splinter Review
Funny how changing "==" to '!=' makes a lot of difference... :} The previous patch had a real stupid typo when checking for existence of the properties-object. Correcting it makes Tp3 on try server look better, and I've seen the testpage from comment #2 load as expected a few times. However, it looks like speculative parsing for some reason is inactive most of the time when loading the testpage, so the expected positive effect is rarely seen. When (and how) to trigger speculative parsing is out of scope for this fix and I have not studied it yet. I can dig into that if necessary, but IMO it should be done as a separate defect.
Attachment #384001 - Attachment is obsolete: true
Attachment #384390 - Flags: review?(bzbarsky)
Attachment #384001 - Flags: review?(bzbarsky)
See last part of comment 133, please.
Ok - waiting for Joe. Feel free to change reviewer. (Latest patch avoids changes to imgLib, btw.)
I.. am not sure what part I am to play in this? Bjarne's patch looks good at first glance, and if there's no need for changes to imagelib, all the better. Bjarne, am I correct in assuming that right now you only handle preloading img tags? If so, we should file follow-up bugs for background images from stylesheets and such.
> Bjarne, am I correct in assuming that right now you only handle preloading img > tags? If so, we should file follow-up bugs for background images from > stylesheets and such. Absolutely correct (see changes to nsParser).
We don't want to preload background images from stylesheets. I didn't realize that the latest patch has no imagelib changes. I'll try to get this reviewed ASAP.
There's no reason to add the imgIRequest dependency to nsIDocument. That array member can (and should) live in nsDocument instead. You should probably just cache the imgICache pointer instead of QIing every single time. You should probably clear mPreloadingImages in the document's unlink, no? I can't think of an obvious reason someone would actually want the image properties. Just have the nsContentUtils API return a boolean (and call it ShouldPreloadImage or some such)? Please fix the indentation around your LoadImage call. Other than that, looks good.
Comment on attachment 384390 [details] [diff] [review] Like previous, but stupid typo corrected... Per comment 144.
Attachment #384390 - Flags: review?(bzbarsky) → review-
I'll unbitrot and wrap up this asap, but first a few clarifications : > There's no reason to add the imgIRequest dependency to nsIDocument. That array > member can (and should) live in nsDocument instead. PreLoadImage() is a method belonging to nsIDocument... wouldn't scoping be difficult? Or is there some recommended way to deal with that? (I just tried the obvious solution of moving the member to nsDocument.h and the compiler yells loudly... not sure what nasty tricks one can do to expose a member like this...) > I can't think of an obvious reason someone would actually want the image > properties. Just have the nsContentUtils API return a boolean (and call it > ShouldPreloadImage or some such)? I basically agree, but was hoping that maybe the props include information which can be used to e.g. lazily clean up the img-cache on a separate thread... (If http-response info is available, for example, we might remove images which have expired in the http-sense.) However, if that is completely out-of-the-question I'd be happy to simplify the API. :)
> PreLoadImage() is a method belonging to nsIDocument... But there's no good reason for that. It could be pure virtual on nsIDocument and actually implemented on nsDocument. > which can be used to e.g. lazily clean up the img-cache on a separate thread... That shouldn't be something content does, imo. Please do simplify the API.
Attached patch Including latest comments from reviewer (obsolete) (deleted) — Splinter Review
Renamed PreLoadImage() to MaybePreLadImage() since it may not do the preload. Also unbitrottet, in particular changed nsDocument::MaybePreLoadImage() to use field mDocumentURI instead of GetDocumentURI(). > But there's no good reason for that. It could be pure virtual on nsIDocument > and actually implemented on nsDocument. Ahh... of-course! :) Fixed. > That shouldn't be something content does, imo. Please do simplify the API. Ok. Simplified. > You should probably just cache the imgICache pointer instead of QIing every > single time. Ok. > You should probably clear mPreloadingImages in the document's unlink, no? Uhhh... you mean destructor? Added code there... > Please fix the indentation around your LoadImage call. Ok.
Attachment #384390 - Attachment is obsolete: true
Attachment #386709 - Flags: review?(bzbarsky)
There's an actual unlink method. Search nsDocument.cpp for "unlink". It's run when we detect that the document is part of a reference cycle in order to break the cycle and let the objects involved reach a refcount of 0 and be destroyed. No need to do anything in the dtor. All members will be destroyed anyway and so all references will be released.
> Uhhh... you mean destructor? Added code there... No, the destructor was already clearing it, by calling the nsCOMArray destructor. I meant an NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMARRAY on the member between NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN and NS_IMPL_CYCLE_COLLECTION_UNLINK_END. Other comments: The contentutils.h comment should say: Returns whether the given URI is in the image cache. Caller doesn't care how you determine that, and it might change without affecting this API. Don't use a static nsCOMPtr. Your code leaks the image loader through shutdown, and will fail leak tests. Just use exactly the same setup as sImgLoader, with manual release on shutdown just like we do for that one. This will incidentally let you just forward-declare imgICache in nsContentUtils.h instead of including the header. Please do so. In nsContentUtils::Init, the new line you added is mis-indented. Looks like 4-space instead of 2-space like all the other code. Fix that. No need for the "= nsnull" for the stack nsCOMPtr. nsCOMPtr defaults to null. In MaybePreLoadImage, please put the early return on a separate line and put curlies around it. And fix that comment to refer to unlink instead of destructor. In nsDocument.h, put |virtual| before the |void| for the new function. It'll end up virtual no matter what, but this will at least make that clear.
Hopefully getting there...
Attachment #386709 - Attachment is obsolete: true
Attachment #386971 - Flags: review?(bzbarsky)
Attachment #386709 - Flags: review?(bzbarsky)
Attachment #386971 - Flags: superreview+
Attachment #386971 - Flags: review?(bzbarsky)
Attachment #386971 - Flags: review+
Status: REOPENED → RESOLVED
Closed: 16 years ago15 years ago
Resolution: --- → FIXED
Some Tp data from that checkin: Talos Improvement: Tp4 decrease 3.61% on Linux Firefox Talos Improvement: Tp4 decrease 4.49% on Tiger Firefox Talos Improvement: Tp3 decrease 2.49% on Linux Firefox Talos Improvement: Tp3 decrease 2.01% on Tiger Firefox Talos Improvement: Tp4 decrease 4.65% on Leopard Firefox Talos Improvement: Tp3 decrease 3.06% on XP Firefox At least I'll bet money it was this change and not the other thing I pushed at the same time. ;)
And I'll bet even more money that the effect is better when loading from an actual network as opposed to from localhost (like advocated above and in a number of emails) ;)
this seems broken with the html5 parser enabled (see Bug 503292)
Yes, the HTML5 parser doesn't support any speculative loading yet. That's one of the reasons it's not turned on by default yet.
Depends on: 533247
Depends on: CVE-2010-0168
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: