Closed Bug 82244 Opened 23 years ago Closed 23 years ago

PDT+ ibench regression between 5/18-5/21

Categories

(Core :: Networking: Cache, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.3

People

(Reporter: cathleennscp, Assigned: tetsuroy)

References

()

Details

(Keywords: perf, Whiteboard: checked in.)

Attachments

(5 files)

paw did additional test on ibech and we got some interesting results... machine used: win98 266mHz 128Mb RAM 5/18 5/18 (with 5/22 nkcache.dll) ----------------- ------------------------ all iteration 607.80 937.25 first (no cache) 85.63 118.31 subsequent 74.60 116.99 paw re-ran 5/18 build just to verify that it is getting the same results we got before. then he removed nkcache.dll and replaced with new nkcache.dll from 5/22, deleted and regenerated component.reg, removed cache (created new profile) ran ibench and generated the results.
getting this on the 0.9.1 target milestone radar...
Summary: ibench regression → ibench regression between 5/18-5/21
Target Milestone: --- → mozilla0.9.1
don't think it can make it to 0.9.1 since we don't know as yet the details of whats causing it let alone the fix.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
I'll be looking into this this evening.
I tried the above "removed [5/18] nkcache.dll and replaced with new [5/21] nkcache.dll, deleted and regenerated component.reg, removed cache", and ran with the page-loader. But ... after deleting the component.reg and running, I ran with '-console', and it said: nsNativeComponentLoader:SelfRegisterDll(C:\JRGM\051813\components\nkcache.dll) Load FAILED with error: error 31 So (for whatever reason), this means I done got no cache. We already know that with no cache, on a fast network, then that makes us faster, particularly as compared to the pre-L2 cache, where we were doing even more overhead work. When I ran in this state, compared to running a 5/18 'as-installed', I got the following times with the page-loader #1 #2 #3 5/18 'as-installed' : 2271 1620 1631 5/18 + 5/21 nkcache.dll (unregistered) : 1650 1669 1644 Okay, so ... 1) I'm faster in my test, after "breaking" the cache. 2) My cached and uncached times are ~equal 3) point (1) is inconsistent with paw's numbers above 4) point (2) is consistent with paw's numbers, in that there is no effective difference between his cached and uncached times (i.e., he had no cache) One thing to consider: I reviewed the server logs, and noticed that in the 'has no cache' state, every image is fetched *including* images mentioned more than once on a page (e.g., 'pixel.gif' is fetched 59 times on every visit to netscape.com). Perhaps a 266 machine is more affected by this than a 500MHz machine, leading to the overall slowdown. [I don't really believe this]. I guess I'm going to start poking at the tests some more. (There's always a reason, there's always a reason, there's no place like home ...).
> 2) My cached and uncached times are ~equal 2) My supposedly "cached" and "uncached" times are ~equal
So, I ran the ibench test with the 500MHz win98 PC, and I got: 05/21 05/18 Total 301.1 Total 291.98 First iter. 56.63 First iter. 46.52 Subsq. iter. 34.92 Subsq. iter. 35.07 These are similar to paw's results on the other machine (modulo machine power). I grabbed the server logs from the test server, and after a little slicing and dicing, it turns out that (with post 5/18 builds) we are not caching images properly when they appear in more than one page. e.g., if '/path/to/some.gif' is used in two different pages, it is fetched once for each page, and not from the cache. (I have more detailed data that shows this happening). Now, the zdnet pages serve up a lot of pages that are identical in structure, and which reuse the same graphics page to page. So, effectively, very few of the zdnet pages are in a purely uncached state (some content that they need has been previously pulled in for earlier pages). But, in my test, each page has a unique set of images, so the uncached round is a purely uncached round, not some mixed state. That is why my results shows a lowered number (we *are* better at the initial download of content), but the ibench times go "up" (we aren't caching 'cross-page' effectively).
Keywords: perf
btw, dialy ibench results are posted internally here: http://slip/projects/mojo/perf/i-bench.html we found out droping in the new cache dll to old build didn't work since some base string api got changed over the weekend, so new cache dll couldn't work with friday or older builds. the number paw got earlier reflects no cache runs.
cathleen's ibench results: 5/24 commercial bld my optimzed commercial bld (with pav's changes backed out) --------------------- ----------------------- all 136.01 126.25 first 26.25 19.13 subsequent 15.68 15.30 and also, gordon's iBench results: optimized Mac build on 500MHz G4 Titanium PowerBook: L1 Disk Cache (120 k) Run #1 Run #2 Average 403.02 405.85 404.435 76.98 75.86 76.42 46.58 47.14 46.86 L2 Disk Cache (112 k) Run #1 Run #2 Average % speedup 384.32 380.25 382.285 5.79% 71.49 70.15 70.82 7.91% 44.69 44.30 44.495 5.32%
gagan will help investigate what's causing pav's changes to regress ibench results. attatched the diff to backout pavlov's 2 line changes to this bug, in case anyone else is interested in helping out.
Attached patch newer patch, diff with context (deleted) — Splinter Review
We talked about this yesterday, but I just wanted to make sure I got it on the record. I _think_ Pavlov's changes are doing the correct thing. It sounds like the real bug is that somebody is telling imagelib to force-invalidate. Even though it makes us slower, I think that pav's changes should stay.
Seems like the only places where we set VALIDATE_ALWAYS is either thru Reload or the browser.cache.check_doc_frequency preference of validate always. And otherwise someone is not initializing these flags correctly somewhere... and we are getting garbage. I agree Pav's changes are correct and should stay. Can someone verify that in the lab we are not setting validate always in the cache prefs?
Perhaps this is an interaction of Gordon and Pav's checkins? Maybe the L2 cache doesn't do the right thing here, but Pav's bug masked the problem. Maybe when Pav fixed his problem, he exposed a bug in the new cache? Just randomly guessing here. Maybe it's some other checkin that caused the force validate.
pav is doing the right thing. the problem here is that we reload each top level document because we detect a charset change in a meta html tag. the docshell is instructed to do a reload just as it would do if the user pressed reload, which results in the VALIDATE_ALWAYS flag being set. In the old world, pav didn't honor this flag and would just happily reuse his cached images. Now that he's fixed that bug, we're once again seeing the ramifications of the meta charset issue.
This will not be fixed by adding overlapped i/o to the disk cache. I don't see any reason why the images shouldn't be resused just because the text is being interpreted as a different charset. Isn't there something else docShell can do besides a full reload?
cc'ing international folks. We need help fix charset meta reload problems. ftang, can you help? Looks like it is surfaced up again with pav' fix, which is the correct thing to do. (bug 81253) pavlov/darin, is there anyway we can fix the docshell, so that we will use cached images? also, anything we can do to use images from cache between different pages that request the same exact images?
Depends on: 81253
assigning to pavlov.
Assignee: gordon → pavlov
Refer my comment in 81253, why it is the right thing to do to reload images for charset reload? That's certainly not true. If pov's fix cover a large scope other than charset reload, should we use a flag to indicate it is a charset reload and skipping image reloading?
yes exactly.. the solution is to figure out where to put/define that flag.
Look at bug 83721. It is even worse when you do a shift-reload.
I post a patch, which added a flag for charset reload, and pass it to docshell. Inside docshell, this charset reload is treated as normal reload except that nsIRequest::VALIDATE_ALWAYS is not set. This should provide a base for a final patch that might eventually fix this problem. Let me know if I can be any further help. Since we all agreed that the page did need to be reloaded for charset change, I will close 81253.
r=darin on the docshell patch
r=gagan as well. darin could you do the sr?
Keywords: patch
Priority: -- → P3
i defer to rpotts for the sr= on this one. i'm not so familar with the docshell code.
Radha, Can you look at the attached patch - especially the checks around the calls to SetCacheKey(...). Does this look right? If so, then sr=rpotts :-)
I think the calls around SetCacheKey() are harmless. By the time Reload() is called upon charset detection, the right SH entries are set and the cachekey is obtained from the right entry.
looks like patch for this bug is reviewed. who should be the real owner now to check this in??
After thinking about this patch a bit more... I have one question: Why is the new 'LOAD_RELOAD_CHARSET_CHANGE' load flag different from the existing 'LOAD_HISTORY' flag? It seems like in the charset reload case, you want to pull as much of the content from the cache as possible - just like the history case... If this is true, then you don't need to add all the code to deal with the new load flag... Instead, in nsWebShell::ReloadDocument(...) you can simply do: return Reload(LOAD_HISTORY); What am I missing? -- rick
rick: i think you're right on about that! shanjian?
reassign to shanjian, since he owns the patch for fixing this bug... :-)
Assignee: pavlov → shanjian
shanjian is a new father for 2nd baby now. He won't be available till 5/26
cathleen- sorry to reassign this bug back to you. Shanjian just have a new baby. He will be on two weeks vacation to take care his baby for his wife. He will be off line for two weeks. Is that possible that you can find someone to work on the remaining issue. I think you can call him and ask question (see phonebook for telephone number) if you need. But I don't think he can work on patch with handful of diaper. I am not familar with the code in that patch and are also overloaded right now. Otherwise, I will jump in...
Assignee: shanjian → cathleen
base on previous bug comments, we can either 1) take shanjian's already reviewed patch to get an approval for check in or 2) make a new pacth to replace LOAD_RELOAD_CHARSET_CHANGE flag with LOAD_HISTORY flag and undo other necessary stuff in docshell ftang, since you're the expert in charset code, you should own this bug, review and decide what we need to do.
Assignee: cathleen → ftang
if changing nsWebShell::ReloadDocument(...) to use LOAD_HISTORY instead of LOAD_NORMAL will fix the problem, then I would definately go with it rather than the current patch... Changing the load flag is an isolated '1 line change' which only effects the codepath when a meta charset reload is occurring... The current docshell patch is much bigger and effects the code paths of *all* URL loads. "simple is better"
vidur is also working on a patch for parser to sniff charset tag. see bug 81253, and that is another way of fixing reload twice problem (should we got with that one instead??) pavlov need to fix in imglib to not throw away the images from image cache, bug 85978
Depends on: 85978
vidur's fix should help the common case of preventing the Reload (assuming the meta charset tag is in the first buffer of data...) However, I think that we still need to fix the Reload case too. I never got a clear answer to my question of whether changing LOAD_NORMAL to LOAD_HISTORY actually fixes the problem... If so, I think that we should make this change too..
> I never got a clear answer to my question of whether changing LOAD_NORMAL to > LOAD_HISTORY actually fixes the problem... ftang, can you help??
rpotts- I am not quite sure what do you exactly mean about "changing nsWebShell::ReloadDocument(...) to use LOAD_HISTORY instead of LOAD_NORMAL" In shanjian's patch, it seems he change from LOAD_FLAGS_NONE to LOAD_FLAGS_CHARSET_CHANGE (which he added in the early in the patch) It seems LOAD_FLAGS_NONE is defined in http://lxr.mozilla.org/seamonkey/source/docshell/base/nsIWebNavigation.idl#83 I cannot find a LOAD_HISTORY in that files.
Status: NEW → ASSIGNED
rpotts- I reassing to you. reassing back to me after you put the patch in. Thanks
Assignee: ftang → rpotts
Status: ASSIGNED → NEW
rpotts- any progress? need my testing?
Rick the second form looks preferable to me as long as long as equating history to cache is safe i.e. works even when global/session history aren't there (e.g. in embedding) or set to 0 entries.
Frank, i'm reassigning this back to you for testing and checkin... After talking with Adam and Radha, we decided that both patches have the same issues if session history is disabled. In either patch, if a session history entry is not available (ie. history is disabled), the document will be reloaded from the network - even if it has been cached. This is because the cache key is obtained from the session history entry :-( If this behavior is unacceptable, we can come up with a further patch to fix this issue...
Assignee: rpotts → ftang
It seems loading the character correctly with my local build. I cannot verify the patch load from cache or network. I will ask i18ngrp to do more testing on different platform this afternoon.
You may want to set the Session History length to 0 entries in Prefs and load a character-encoded page again to see if you get the desired result.
I applied the last patch on my Macintosh and tried two URLs. http://home.netscape.com/ja and http://www.lycos.co.jp both have META charset. The pages loaded fine. I tried again with history length to zero, that also loaded fine.
adamlock and radah, please r= vidur, please sr=
Whiteboard: rpott's patch need r= and sr=
sr=vidur.
Whiteboard: rpott's patch need r= and sr=
I test it on linux and it work fine.
Comparing differences between the last patch (from rpotts) and the patch from 6/4/01 14:06 (that has LOAD_RELOAD_CHARSET_CHANGE), I notice that this segment of code is missing in the last patch @@ -4629,6 +4645,9 @@ case LOAD_RELOAD_NORMAL: loadFlags |= nsIRequest::VALIDATE_ALWAYS; + break; + + case LOAD_RELOAD_CHARSET_CHANGE: break; ie., in latest patch, since LOAD_HISTORY flag is used for charset changes, it will be loaded with nsIRequest::LOAD_FROM_CACHE loadFlags. Is that alright? In the previous patch, no flags were passed to necko for a RELOAD_CHARSET_CHANGE. If this is alright, r=radha
I think we want to load it from cache in case of this usage. However, I prefer rpotts to answer that explicitly since it's his patch. Once that got answer, we can ask a= and wait for check in. Testing on Mac/Linux/Win show it display CJK correctly with meta tag.
cathleen- from the previous comment, I assume we don't want to check in the first two patches which change imglib, right?
Whiteboard: r=radha, sr=vidur, wait for rpotts to answer one Q before ask for a=
r=adamlock as well
I think loading from cache is appropriate and the latest patch from rpotts does that. I do not understand, why the previous patch did not pass any load flag to necko.
assigned
Whiteboard: r=radha, sr=vidur, wait for rpotts to answer one Q before ask for a= → r=radha, sr=vidur, ask a= since 13:25 6/21
a=chofmann in email
Status: NEW → ASSIGNED
Whiteboard: r=radha, sr=vidur, ask a= since 13:25 6/21 → r=radha, sr=vidur, a=chofmann. Wait for tree green since 5:51 6/21
fix check in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: r=radha, sr=vidur, a=chofmann. Wait for tree green since 5:51 6/21 → r=radha, sr=vidur, a=chofmann.
Whiteboard: r=radha, sr=vidur, a=chofmann. → r=radha, sr=vidur, a=chofmann. critical for 0.9.2
The check in caused a regression bug 87700 (HTML anchor does not work). Reopen this, please consider different change for this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 87700
Whiteboard: r=radha, sr=vidur, a=chofmann. critical for 0.9.2 → new patch need review. old patch cause regression on 87700
The new patch seems work very fine
rpotts- can you code review it?
Status: REOPENED → ASSIGNED
Whiteboard: new patch need review. old patch cause regression on 87700 → need r= from rpotts
r= radha for the latest patch
sr=rpotts
Severity: normal → critical
Summary: ibench regression between 5/18-5/21 → PDT+ ibench regression between 5/18-5/21
Whiteboard: need r= from rpotts → r=radha sr=rpotts, wait for a= since Wed, 27 Jun 2001 13:30:21 -0700
ask yokoyama to land into trunk . yokoyama, please do not close this bug after you land into trunk. I need this open to land into moz0.9.2.
pushing out. 0.9.2 is done. (querying for this string will get you the list of the 0.9.2 bugs I moved to 0.9.3)
Target Milestone: mozilla0.9.2 → mozilla0.9.3
land the new change into moz0.9.2 branch reassign to yokoyama for trunk landing. yokoyama- once you land into trunk, mark this and 87700 fixed. thanks.
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Whiteboard: r=radha sr=rpotts, wait for a= since Wed, 27 Jun 2001 13:30:21 -0700 → waiting for the trunk to open.
checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Whiteboard: waiting for the trunk to open. → checked in.
*** Bug 17889 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: