Closed
Bug 77269
Opened 24 years ago
Closed 24 years ago
ibench test performance regression
Categories
(Core Graveyard :: Tracking, defect)
Core Graveyard
Tracking
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla0.9
People
(Reporter: cathleennscp, Assigned: darin.moz)
References
Details
(Keywords: perf, Whiteboard: investigating)
Attachments
(2 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
Tom narrowed down to an 8 hr window where the regression happened.
between 041620 build (in 04-16-22 win32 ftp dir) and 041704 build (in 04-17-06
win32 ftp dir).
I'm going to pull builds by time stamps, and Tom will run tests to find a more
percised time frame.
run this query to see a list of fishy checkins (provided by waterson):
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&bra
nchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&d
ate=explicit&mindate=04%2F16%2F2001+20%3A00%3A00&maxdate=04%2F17%2F2001+04%3A00%
3A00&cvsroot=%2Fcvsroot
Can this be tested also with debug builds? If yes how can do it. I am suspecting
my checkin to BasicTableLayoutStrategy.cpp.
Comment 3•24 years ago
|
||
gordon is on the hook for that day, and he beleives in the
grand unification theory.
Assignee: cathleen → gordon
Updated•24 years ago
|
Whiteboard: third on gordon's list toi get to.
Target Milestone: --- → mozilla0.9
Comment 4•24 years ago
|
||
Chris Hofmann wrote:
Here is a big hairy grand unification theory.....
IIRC the double read on the pages that jrgm sees comes out
of the need to look at the header to figure out the char set conversion.
The old cache had some clever voodoo of just looking at just the first X
bytes to figure this out, then go back to load the entire document.
In conversation with gordon just before the disabling of the old cache
this was still on the issues list of how to handle it.
Gordon, is it possible that the mechanism to peek into the doc
to figure out char set is not working correctly and is slowing us down?
I also see that our big leak bug
http://bugzilla.mozilla.org/show_bug.cgi?id=75641
has now been associated with char set conversions..
------- Additional Comments From alexeyc@bigfoot.com 2001-04-24 00:51 -------
Oh, and if it's of any help, I have Auto-Detect set on Russian.
It's my wild guess, but probably with Auto-Detect on, after processing the pages
are left in memory and accumulate as you browse/reload.
Are we hanging on to the first top level document under some conditions?
I'm I letting my mind run wild with this big harmonic coverange theory
or will fixing up the peaking into to the doc to determine the char set
conversion fix up all our problems?... if it will is there a bug?
chris h.
Gordon Sheridan wrote:
The cache checkins were to remove the old cache from the build because it was no
longer used. The cache checkins for nsDiskCacheBlockFile are not part of the
current build.
-Gordon
Chris Waterson wrote:
So, if I understand this correctly, something went sour between the
build that we pushed at 8pm on 4/16 and the build we pushed at 4am on
4/17 (judging by the number of checkins, this must be in GMT?) Anyway,
here's the bonsai link for those checkins.Somebody double-check to make
sure I've actually queried the right set of changes.
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=04%2F16%2F2001+20%
AFAICT, there are plenty of big cache and layout checkins here, with a
bit of event queue hackery just to spice things up.
chris
jrgm wrote:
So, the zdnet pages work like this.
1) All the pages hardwire an onload handler into the <body> element:
<body onload='document.location="nextpageinsequence.html"'>
2) The last page in a cycle pings a JS enabled page, which reads a
timestamp cookie, and updates it with that cycle times. At the
end of all the cycles, it does a form submission to a server.
I grabbed a copy of all the zdnet content, and put it on a local server
so I could see the server logs. I snipped out step (2) and just made
the document.onloads into a continuous loop.
I grabbed 2001041612 and 2001041706 for win32(win2k), created new, *default
profiles* for each and started the cycle running. After three cycles, I'd
hit the throbber to break out of the loop.
What I see in the server logs:
Apr 16 build
- cycle 1 - pages and images are fetched for each URL (200 OK)
- however, each top level, html document is fetched TWICE!!!
(That's a bug, but it turns out that's not this bug).
- cycle 2/3 - silence. No 200's, no 304's.
Apr 17 build
- cycle 1 - pages and images are fetched for each URL (200 OK)
- each top level, html document is still fetched TWICE!!!
- cycle 2/3 - a 304 for every image on every page
- a '200 OK' for every top level html document, repeated TWICE!!!
Okay, so which build runs faster :-]
(Note: the reason that the in-house page loader test didn't pick this up, is
that each top level document has a unique URL (due to the fact I use it to
pass information back to the server in the query string. A bug/feature!).
So, every top level document should be 200 OK in my test harness, so it
didn't see this bug happen).
Do we have a bug on file for this slowdown? (I will file a bug for the double fetch,
that was already happening in Apr 16 builds (at least for the way that zdnet
triggers
page loads)).
John
Chris Waterson wrote:
>
> So, if I understand this correctly, something went sour between the
> build that we pushed at 8pm on 4/16 and the build we pushed at 4am on
> 4/17 (judging by the number of checkins, this must be in GMT?) Anyway,
> here's the bonsai link for those checkins.Somebody double-check to make
> sure I've actually queried the right set of changes.
>
>
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=04%2F16%2F2001+20%3A00%3A00&maxdate=04%2F17%2F2001+04%3A00%3A00&cvsroot=%2Fcvsroot
>
> AFAICT, there are plenty of big cache and layout checkins here, with a
> bit of event queue hackery just to spice things up.
>
> chris
>
> Tom Everingham wrote:
>
> > It appears as though the largest regression for ibench takes place
> > between the 4-16 and 4-17 builds. Cache seems to be working to improve
> > performance by 22.9 and 25.4 seconds for the two runs. - Tom
> >
> > 2001041620
> > All iterations - 382.11
> > First iteration - 67.80
> > Subsequent iteration - 44.90
> >
> > 2001041704 (4-17-06)
> > All iterations - 639.96
> > First iteration - 102.22
> > Subsequent iteration - 76.82
> >
Assignee | ||
Comment 6•24 years ago
|
||
ok, i've been playing around with the slashdot.org test page from
jrgm.mcom.com/page-loader/loader.pl, and i noticed that some <noscript>
images are being downloaded every time!! these images should not even
be downloaded. otherwise, i don't see any problems on this particular
page with multiple downloads or excessive validations.
Assignee | ||
Comment 7•24 years ago
|
||
jrgm told me that the double-load problem was happening with builds before 4-16,
and i've somewhat tracked it down to nsParser::OnDataAvailable returning
NS_ERROR_HTMLPARSER_STOPPARSING on the first load. this causes HTTP to cancel
the page load and doom the cache entry. the second load successfully fills
out the cache entry, and subsequent loads of the page are taken from the cache
as expected.
does anyone know why nsParser could be returning this STOPPARSING nsresult??
Assignee | ||
Comment 8•24 years ago
|
||
btw: i'm testing a debug pull from today 4/24.
Assignee | ||
Comment 9•24 years ago
|
||
ok, the image requests resulting in conditional HTTP requests with 304 "Not
Modified" responses seem to be resulting directly from the VALIDATE_ALWAYS
loadFlag being set on the HTTP channel for those images. I don't understand
how or why this flag is being set, but from what I can tell, HTTP is simply
doing what it has been told to do.
Assignee | ||
Comment 10•24 years ago
|
||
Ok, it seems like VALIDATE_ALWAYS is always being set on the toplevel document
load by nsDocShell::DoChannelLoad. I'm not sure why this is the case since
we're just doing normal loads. At any rate, this causes VALIDATE_ALWAYS to be
set as the default load flags for the load group, which then causes all image
loads (which cannot be satisfied from the imagelib cache) to be validated.
i'm not sure how changes on the 16th might have caused this to happen.
Updated•24 years ago
|
Whiteboard: third on gordon's list toi get to. → investigating
Comment 11•24 years ago
|
||
Perhaps we need a separate pair of bugs, one to deal with the duplicate GET's
(which was happening prior to apr 17), and one to deal with the unwanted
'if-modified-since' GET's that began happening on apr 17. Since one existed
independent of the other, it would seem they have different root causes.
But, in fear of breaking up chofmann's harmonic convergence groove, I will
leave the strong force and the weak force as part of the GUT for now ....
A comment on the duplicate GET's: I have only seen this happen with the i-bench
pages. One thing to note about these pages is the each document load is
initiated immediately in the onload handler of the previous page. My test,
which (default) delays the load of a subsequent page by 1 second, does not
show the double GET. (Is it possible that the first attempt to load the next
page fails because the window.onload event happens before some other per-page
cleanup code has yet to be run?).
Assignee | ||
Comment 12•24 years ago
|
||
the double reload seems to result from a page with a meta tag like:
<meta content="text/html; charset=windows-1252" http-equiv=Content-Type>
the charset "change" seems to force a reload, which causes VALIDATE_ALWAYS
to be set. with VALIDATE_ALWAYS set on the toplevel document, all sub-documents
(in particular images) get the VALIDATE_ALWAYS load flag as well.
this would seem to explain both seemingly unrelated problems. i'm not sure what
exactly changed to cause this regression. i suspect some code in
nsHTMLDocument.cpp that was storing charset values in the cache, but i'm not
sure exactly how doing so would prevent the double load.
Assignee | ||
Comment 13•24 years ago
|
||
cc'ing ftang for help! i don't understand what our "original/current" solution
is to this problem... how is the browser supposed to handle this problem??
Comment 14•24 years ago
|
||
See related http://bugscape.mcom.com/show_bug.cgi?id=4540 (originally
filed as bugzilla bug 64612) "Auto-Detect (All)" causes a duplicate HTTP GET
(or POST) to be sent". [I'll note that I filed that over four months ago,
and the only action has been to ask me to file it in bugscape instead].
Assignee | ||
Comment 15•24 years ago
|
||
here's a simple testcase which demonstrates the problem
http://status.mcom.com/~darin/test-meta-charset.html
Comment 16•24 years ago
|
||
It's not clear how easy we can change content-type, or really charset, in the
middle of parsing the document's head. The old codebase, IIRC, let the cache
finish filling and (possibly without blocking) also re-read the cache file from
the beginning, redecoding characters using an old netlib stream converter, and
reparsing via the old ebina HTML layout engine.
Why don't we try to do the same sort of thing in the new codebase? Cc'ing
harishd.
/be
Comment 17•24 years ago
|
||
In response to Darin's 2001-04-24 17:17 comment:
Does anyone know why nsParser could be returning this STOPPARSING nsresult?
STOPPARSING is ment to shutdown the parsing engine for the current document (
usually when we encounter META tag with a charset ). But, I wonder, how this
message will be useful for netlib. In other words, can we not propagate this
message beyond the parser? Darin, how does netlib react to canelling a document
load?
Comment 18•24 years ago
|
||
Typo!
canelling => cancelling
Assignee | ||
Comment 19•24 years ago
|
||
turns out i was mistaken about the <noscript> images loading... in the <script>
case there are some image URLs that contain "the current time" as a string.
somehow i matched these URLs up to the <noscript> image URLs... so false alarm!
Assignee | ||
Comment 20•24 years ago
|
||
Assignee | ||
Comment 21•24 years ago
|
||
i've attached a patch that makes the "store the charset in the cache" logic work
with the new cache. this doesn't fix the initial double-fetch problem, but it
does make it so that later fetches are taken from the cache, which in turn
prevents the proliferation of the VALIDATE_ALWAYS flag.
Status: NEW → ASSIGNED
Keywords: patch
Comment 22•24 years ago
|
||
I don't understand the charsetSource comparisons (in the new or original), but
the cache code looks good. Darin, what was the difference between this patch and
the original we tried to land last week?
r=gordon.
Assignee | ||
Comment 23•24 years ago
|
||
i believe it is the rv=NS_OK line that was left out of the previous attempt.
Comment 24•24 years ago
|
||
Nits only:
No point in declaring or initializing cachingChan outside the if (httpChannel)
then-block:
+ nsCOMPtr<nsICachingChannel> cachingChan = do_QueryInterface(httpChannel);
+ if (httpChannel) {
+ nsCOMPtr<nsISupports> cacheToken;
+ cachingChan->GetCacheToken(getter_AddRefs(cacheToken));
Someone violated When-in-Rome with 'if(' style, three-space indentation, and
rv=NS_OK (no spaces on either side of =), but I see no need to preserve those
style violations in your reworking of the charset-in-cache code.
Fix if you can, sr=brendan@mozilla.org with or without for 0.9.
/be
Assignee | ||
Comment 25•24 years ago
|
||
Assignee | ||
Comment 26•24 years ago
|
||
turns out the "if (httpChannel)" should have been "if (cachingChan)" ... this
didn't manifest itself as a bug since nsHTTPChannel implements both nsIHTTPChannel
and nsICachingChannel, but it could have become trouble once other protocols
start implementing nsICachingChannel.
Assignee | ||
Comment 27•24 years ago
|
||
fix checked in on the trunk and the 0.9 branch.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 28•24 years ago
|
||
tever, can you generate some new ibench numbers for us, now that darin's fix is
checked in last night? :-)
Comment 29•24 years ago
|
||
We can generate numbers with the current build. However, independent of darin's
change, these numbers should|will be lower, due to hyatt's change.
But tever can confirm the fix by going to http://jrgm/zdnet2 for a testcase,
and reviewing the server logs. Tom, I can go through this with you if you want.
Comment 30•24 years ago
|
||
Cathleen, I posted the ibench numbers earlier and they show an improvement,
probably because of a combination of darins and hyatts changes. I believe jrgm
is still looking into Darin's fix ... from what I saw it looked good.
Comment 31•24 years ago
|
||
Yeah, tever and I looked at this, and darin's fix knocks out the problem of
the "proliferation of the VALIDATE_ALWAYS flag" which shows up in the server
logs in the second and third cycle through (as HTTP 304 for all the images on
the pages and HTTP 200 (twice) for the top level HTML document.) So that
problem is fixed. Yay, darin!
However, given that hyatt's change landed with darin's, we can't use the test
results to say with certainty whether the charset issue was the only reason
we regressed between 4/16 and 4/17. (I think there may have been something
else involved, since we regressed on both initial and subsequent page load
times; darin's fix would only address performance for subsequent visits, no?).
Comments?
Two other lingering issues:
1) I do see a small number of 304's and 200's for images on the second and
third cycle through the pages. In general there should be none (modulo any
special (e.g., no-cache) HTTP headers, or loading enough content that we would
be forced to evict, but neither of those cases apply to the test at
jrgm/zdnet2). I'm looking some more at this and will file a bug.
2) as darin notes, the problem of "double-fetching" the document when a
charset is encountered in the document still occurs. There is a bug
http://bugscape.mcom.com/show_bug.cgi?id=4540 (originally filed as bugzilla
bug 64612) for this happening in another scenario (using auto-detect).
But I think I will file a bug to cover the case (seen in mozilla and commercial
builds) where a charset meta tag in a document forces this behaviour.
So, two bugs to file, and an open question whether there was something else
behind the slowdown 4/17. Marking this bug verified for the 'VALIDATE_ALWAYS'
issue.
Status: RESOLVED → VERIFIED
Comment 32•24 years ago
|
||
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•