Closed
Bug 483902
Opened 16 years ago
Closed 15 years ago
leopard/tiger talos boxes require flash upgrade to run new pageset
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: anodelman, Assigned: anodelman)
References
Details
I have yet to see moz-central browsers complete a tp run using the new page set on leopard or vista.
The observed behavior is the browser successfully completes 5-7 cycles and then stops loading new pages. No crash stacks are collected. Some errors found in the js console include:
Error: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIFileOutputStream.init]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: file:///Users/mozqa/talos-slave/mac-trunk/Minefield.app/Contents/MacOS/components/nsSessionStore.js :: sss_writeFile :: line 2519" data: no]
Source File: file:///Users/mozqa/talos-slave/mac-trunk/Minefield.app/Contents/MacOS/components/nsSessionStore.js
Line: 2519
Error: uncaught exception: [Exception... "Illegal document.domain value" code: "1009" nsresult: "0x805303f1 (NS_ERROR_DOM_BAD_DOCUMENT_DOMAIN)" location: "http://localhost/page_load_test/pages/www.worldofwarcraft.com/www.worldofwarcraft.com/new-hp/js/functions.js Line: 5"]
Error: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIFileOutputStream.init]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: file:///Users/mozqa/talos-slave/mac-trunk/Minefield.app/Contents/MacOS/components/nsSessionStore.js :: sss_writeFile :: line 2519" data: no]
Source File: file:///Users/mozqa/talos-slave/mac-trunk/Minefield.app/Contents/MacOS/components/nsSessionStore.js
Line: 2519
Error: uncaught exception: [Exception... "Component returned failure code: 0x80570019 (NS_ERROR_XPC_CANT_CREATE_WN) [nsIJSCID.getService]" nsresult: "0x80570019 (NS_ERROR_XPC_CANT_CREATE_WN)" location: "JS frame :: chrome://global/content/macWindowMenu.js :: checkFocusedWindow :: line 7" data: no]
The failure to write to sessionstore error has been observed during several tests. The sessionstore file does exist and is writable, it's unknown why this error is being thrown.
I have also seen failure to correctly load pages - appearing as if style sheets have not been correctly loaded and thus being displayed as a simple column of html without images/animation/layout/etc. This could indicate a caching error.
It is also possible that the new page set is in some way causing a failure in the pageloader extension - thus halting the pageloader and stopping the test from advancing. This would have to do with interfering with the onload handlers or some other component of the pageloader.
Assignee | ||
Comment 1•16 years ago
|
||
Just to clarify, moz-central browsers are able to complete the new tp test on winxp, vista and ubuntu. You can see these machines cycling on the MozillaTest waterfall.
Updated•16 years ago
|
Assignee: nobody → anodelman
Assignee | ||
Comment 2•16 years ago
|
||
I had a theory that what I was seeing was caching errors, so I added:
browser.cache.disk.capacity : 0
to the talos prefs configuration. The tp test then ran to completion. I'm going to install this pref to both tiger/leopard and see if I can get more than a single successful run.
Assignee | ||
Comment 3•16 years ago
|
||
Lots of successful runs over night. Tiger still occasionally freezes up, but it looks more like the failures that we see occasionally on the production talos moz-central testers.
I would be pretty confident in saying that the issue here lies with a corrupted or damaged cache creation on mac.
Assignee | ||
Updated•16 years ago
|
Assignee: anodelman → nobody
Component: Release Engineering → Networking: Cache
Product: mozilla.org → Core
QA Contact: release → networking.cache
Version: other → Trunk
Assignee | ||
Comment 4•16 years ago
|
||
Moved to Core/Networking-Cache as I seem to be generating a corrupted cache when I run the new pageset on tiger/leopard.
Assignee | ||
Comment 5•16 years ago
|
||
Here's a profile used by a frozen leopard test box:
http://people.mozilla.org/~anodelman/profile.zip
Comment 6•16 years ago
|
||
Are there any specific pages that are hanging more often?
Assignee | ||
Comment 7•16 years ago
|
||
I haven't seen any pattern in the pages that it gets stuck on.
Comment 8•16 years ago
|
||
I put an http log of a failed run up at http://campd.org/stuff/cache.log.gz
It's giant, but if you search for 'cacheMap', You'll see that at some point the cache is getting confused:
-1605746784[50a960]: Destroying nsHttpTransaction @7889700
-1605746784[50a960]: nsHttpChannel::FinalizeCacheEntry [this=d6c5130]
-1605746784[50a960]: calling OnStopRequest
-1605746784[50a960]: CACHE: Flush [90df8358 doomed=0]
-1605746784[50a960]: CACHE: DeleteStorage [90df8358 0]
-1605746784[50a960]: WARNING: cacheMap->DeleteStorage() failed.: file ../../../.
./mozilla/netwerk/cache/src/nsDiskCacheStreams.cpp, line 517
WARNING: cacheMap->DeleteStorage() failed.: file ../../../../mozilla/netwerk/cac
he/src/nsDiskCacheStreams.cpp, line 517
-1605746784[50a960]: CACHE: DeleteRecord [90df8358]
-1605746784[50a960]: ###!!! ASSERTION: Flush() failed: 'NS_SUCCEEDED(rv)', file
../../../../mozilla/netwerk/cache/src/nsDiskCacheStreams.cpp, line 461
###!!! ASSERTION: Flush() failed: 'NS_SUCCEEDED(rv)', file ../../../../mozilla
/netwerk/cache/src/nsDiskCacheStreams.cpp, line 461
-1605746784[50a960]: nsHttpChannel::CloseCacheEntry [this=d6c5130]
-1605746784[50a960]: Deactivating entry 98166f0
-1605746784[50a960]: Removed deactivated entry 98166f0 from mActiveEntries
-1605746784[50a960]: CACHE: disk DeactivateEntry [98166f0 90df8358]
-1605746784[50a960]: CACHE: WriteDiskCacheEntry [90df8358]
-1605746784[50a960]: CACHE: UpdateRecord [90df8358]
-1605746784[50a960]: ###!!! ASSERTION: record not found: 'Not Reached', file ../
../../../mozilla/netwerk/cache/src/nsDiskCacheMap.cpp, line 462
###!!! ASSERTION: record not found: 'Not Reached', file ../../../../mozilla/ne
twerk/cache/src/nsDiskCacheMap.cpp, line 462
-1605746784[50a960]: WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x800
0FFFF: file ../../../../mozilla/netwerk/cache/src/nsDiskCacheMap.cpp, line 867
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8000FFFF: file ../../../
../mozilla/netwerk/cache/src/nsDiskCacheMap.cpp, line 867
-1605746784[50a960]: CACHE: DeleteStorage [90df8358 0]
-1605746784[50a960]: CACHE: DeleteStorage [90df8358 1]
-1605746784[50a960]: CACHE: DeleteRecord [90df8358]
-1605746784[50a960]: ###!!! ASSERTION: deleting dirty buffer: 'mBufDirty == PR_F
ALSE', file ../../../../mozilla/netwerk/cache/src/nsDiskCacheStreams.cpp, line 7
50
###!!! ASSERTION: deleting dirty buffer: 'mBufDirty == PR_FALSE', file ../../.
./../mozilla/netwerk/cache/src/nsDiskCacheStreams.cpp, line 750
-1605746784[50a960]: Destroying nsHttpChannel @d6c5130
This pattern repeats itself a few more times before talos gets stuck.
This definitely seems to be related. Runs that failed always seemed to generate this failure, and runs that succeeded never did.
The talos machine I was looking at stopped reproducing it, but it should be possible to reproduce again, using the new talos pageset and the pageloader extension - the rest of the talos setup didn't seem to be necessary.
Assignee | ||
Comment 9•15 years ago
|
||
I'm no longer seeing this behavior when running talos with the new page set.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 10•15 years ago
|
||
I've changed my mind here, and I think that this is still occurring. I'd like some confirmation from dcamp before I attempt to close it again.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 11•15 years ago
|
||
Is there any up-to-date documentation about how to set up this tp test?
https://wiki.mozilla.org/Performance:Tinderbox_Tests seems to be obsolete.
Comment 12•15 years ago
|
||
You'll want https://wiki.mozilla.org/StandaloneTalos
Comment 13•15 years ago
|
||
Is it possible to download somewhere the new page set? StandaloneV1_5.zip contains some other set since there is no www.worldofwarcraft.com page.
Assignee | ||
Comment 14•15 years ago
|
||
jst - any update here? Is there anything else that I can provide that would help the hunt? (Have already responded to comment #13 on irc and provided the pageset in question).
Comment 15•15 years ago
|
||
I've downloaded the pageset and I'm able to reproduce the bug. Now I'm trying to find out what's wrong with the cache.
Comment 16•15 years ago
|
||
(In reply to comment #15)
> I've downloaded the pageset and I'm able to reproduce the bug. Now I'm trying
> to find out what's wrong with the cache.
Not trying to add any pressure, but any update? Obviously, we're anxious to use the new pageset in production, but I'm also concerned if this might be a FF3.5 blocker?
Comment 17•15 years ago
|
||
It took me so long because it takes hours to reproduce the bug. But in the end the problem is quite simple. There is no problem with the cache. Cache gets confused because OpenNSPRFileDesc() in nsDiskCacheStreamIO::OpenCacheFile() fails to create/open a cache file. For some reason firefox process sometimes doesn't close file "/Library/Internet Plug-Ins/Flash Player.plugin/Contents/Resources/Flash Player.rsrc". So when viewing sites with flash content after some time firefox reaches a limit of opened files which is 256 on Mac OS X by default.
Is there anybody who knows plugin code and can look at the reason why this happens?
Comment 18•15 years ago
|
||
(In reply to comment #17)
> It took me so long because it takes hours to reproduce the bug. But in the end
> the problem is quite simple. There is no problem with the cache. Cache gets
> confused because OpenNSPRFileDesc() in nsDiskCacheStreamIO::OpenCacheFile()
> fails to create/open a cache file. For some reason firefox process sometimes
> doesn't close file "/Library/Internet Plug-Ins/Flash
> Player.plugin/Contents/Resources/Flash Player.rsrc". So when viewing sites with
> flash content after some time firefox reaches a limit of opened files which is
> 256 on Mac OS X by default.
Thanks Michal, Alice.
Sounds like this new pageset did tripped over something that could be a blocker, hence nom'd.
> Is there anybody who knows plugin code and can look at the reason why this
> happens?
Flags: blocking1.9.1?
Comment 19•15 years ago
|
||
Do we know if this is a problem in our code or Flash? If it's flash keeping file handles open, I'd say minus.
Comment 20•15 years ago
|
||
Michal, thanks for digging in here!
Josh, is "/Library/Internet Plug-Ins/Flash Player.plugin/Contents/Resources/Flash Player.rsrc" by any chance a file that we open repeatedly in the plugin code and forget to ever close, or is this a file descriptor leak in the flash player?
QA Contact: networking.cache → joshmoz
Comment 21•15 years ago
|
||
Not blocking final release, looking into it for 3.5.x, based on the fact that so far the only people to run into the problem are releng getting the page set to run, not users.
Josh: please see comment 20 in a hurry, and renominate this if you think my judgement is wrong, thanks.
Component: Networking: Cache → Plug-ins
Flags: wanted1.9.1.x?
Flags: blocking1.9.1?
Flags: blocking1.9.1-
QA Contact: joshmoz → plugins
Comment 22•15 years ago
|
||
Is the box running Flash 9? See bug 397053, you might need to upgrade to Flash 10 to fix this. Mac OS X 10.5.7 might come with Flash 10, iirc Apple updated it, but I could be misremembering. If that is true though then all you need to do is update to 10.5.7.
Comment 23•15 years ago
|
||
(In reply to comment #21)
> Not blocking final release, looking into it for 3.5.x, based on the fact that
> so far the only people to run into the problem are releng getting the page set
> to run, not users.
Beltzner: we're hitting this in our new Top100 website pageset. Not clear which specific page(s) are causing this, but as all of them are in the top 100 websites, it feels like it might quickly become an urgent requirement to fix. Hence the blocker request. The nom- is fine for now, but depending on tests below, I may re-nom.
> Josh: please see comment 20 in a hurry, and renominate this if you think my
> judgement is wrong, thanks.
(In reply to comment #22)
> Is the box running Flash 9? See bug 397053, you might need to upgrade to Flash
> 10 to fix this. Mac OS X 10.5.7 might come with Flash 10, iirc Apple updated
> it, but I could be misremembering. If that is true though then all you need to
> do is update to 10.5.7.
I've just looked at 5 talos leopard machines, and they had:
OSX 10.5.2
Flash: 9.0.115
Josh: We intentionally do *not* upgrade s/w on Talos machines unless we *need* to, because changes like this typically causes changes in perf data results. Which means recalibrating results, discussions about what to do with regenerating results for historical milestone releases, and a serious downtime! However, if thats what it takes, so be it.
Alice: on *one* staging talos machine could you see if:
- the Flash upgrade by itself fixes the problem?
- an O.S. upgrade *and* Flash upgrade fixes the problem?
If either of these work, then we should all regroup and figure out next step.
Comment 24•15 years ago
|
||
(In reply to comment #23)
> Beltzner: we're hitting this in our new Top100 website pageset. Not clear which
> specific page(s) are causing this, but as all of them are in the top 100
Pages containing flash are:
http://localhost/page_load_test/pages/www.youtube.com/www.youtube.com/index.html
http://localhost/page_load_test/pages/www.imdb.com/www.imdb.com/index.html
http://localhost/page_load_test/pages/www.bbc.co.uk/www.bbc.co.uk/index.html
http://localhost/page_load_test/pages/www.nicovideo.jp/www.nicovideo.jp/index.html
http://localhost/page_load_test/pages/www.gamespot.com/www.gamespot.com/index.html
http://localhost/page_load_test/pages/www.blogfa.com/www.blogfa.com/index.html
http://localhost/page_load_test/pages/www.maktoob.com/www.maktoob.com/index.html
http://localhost/page_load_test/pages/www.spiegel.de/www.spiegel.de/index.html
http://localhost/page_load_test/pages/www.jugem.jp/jugem.jp/index.html
http://localhost/page_load_test/pages/www.marca.com/www.marca.com/index.html
http://localhost/page_load_test/pages/www.ku6.com/www.ku6.com/index.html
http://localhost/page_load_test/pages/www.it168.com/www.it168.com/index.html
http://localhost/page_load_test/pages/www.corriere.it/www.corriere.it/index.html
http://localhost/page_load_test/pages/www.people.com.cn/www.people.com.cn/index.html
http://localhost/page_load_test/pages/www.minijuegos.com/www.minijuegos.com/index.html
http://localhost/page_load_test/pages/www.yam.com/www.yam.com/index.html
http://localhost/page_load_test/pages/www.nnm.ru/www.nnm.ru/index.html
Running tp test only with these pages speeds up the failure.
> I've just looked at 5 talos leopard machines, and they had:
> OSX 10.5.2
> Flash: 9.0.115
I have OSX 10.4.10 with flash 9.0.22. Upgrading just flash to 10.0.22.87 seems to help.
bz and msintov seemed to imply there was an underlying Gecko bug causing Flash to keep eating up the file descriptors; see bug 397053 comment 30 and bug 397053 comment 33.
If you didn't want to upgrade Flash and didn't want to fix the constant "reopening of the resource map" or whatever, you could also up the file descriptor limit like we did as work-around in Camino (which would make the bug harder, but not impossible, to trigger, both for Talos and for actual users who aren't using Flash 10); see bug 401138.
Comment 26•15 years ago
|
||
Doh. I filed bug 496344 to track hunting that down. It's nice to have a testcase to test fixes against!
Assignee | ||
Comment 27•15 years ago
|
||
Upgrading flash has allowed the pageset to run to completion on leopard. Should we:
- upgrade flash on all mac talos boxes
- upgrade flash on all talos boxes (to try and get parity)
- something else?
Component: Plug-ins → Networking: Cache
QA Contact: plugins → networking.cache
Updated•15 years ago
|
Component: Networking: Cache → Plug-ins
QA Contact: networking.cache → plugins
Assignee | ||
Comment 28•15 years ago
|
||
Possible fix for upgrading flash throughout talos slave pool with bug 475383.
Have it working on stage using moz-central builds. But the fix for loading plugins from profiles isn't on 1.9.1 or Firefox3.0.
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → anodelman
Component: Plug-ins → Release Engineering
Product: Core → mozilla.org
QA Contact: plugins → release
Summary: leopard/tiger talos freezing on new page set → leopard/tiger talos boxes require flash upgrade to run new pageset
Version: Trunk → other
Comment 30•15 years ago
|
||
(In reply to comment #28)
> Possible fix for upgrading flash throughout talos slave pool with bug 475383.
>
> Have it working on stage using moz-central builds. But the fix for loading
> plugins from profiles isn't on 1.9.1 or Firefox3.0.
Beltzner: Gentle ping - can we get approval to land this, so we can enable tp4 talos on the mozilla-1.9.1 branch?
Assignee | ||
Comment 31•15 years ago
|
||
Fixed by downloading plugins per talos run - thus they can be updated at will and installed through the profile used by talos during testing.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•