Closed
Bug 478603
Opened 16 years ago
Closed 16 years ago
intermittent orange on Windows mozilla-central talos Ts and Tdhtml tests ("failed to initialize browser")
Categories
(Release Engineering :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: anodelman)
References
Details
(Keywords: intermittent-failure)
Attachments
(2 files)
(deleted),
patch
|
bhearsum
:
review+
anodelman
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
joduinn
:
review+
anodelman
:
checked-in+
|
Details | Diff | Splinter Review |
The tree is currently closed "due to Windows talos orange."
My best guess for wtf that means is the way that starting with qm-pvista-trunk03 at 2009/02/08 18:12:12, then becoming nearly continuous with 03 the 9th and with 02 joining in increasingly often the 9th and through the rest of the week (ignored by everyone, all week long), we've been having "FAIL: Busted: tdhtml
FAIL: failed to initialize browser".
During that entire time, qm-pvista-trunk01 has been continuously green except for one non-tdhtml failure, which would certainly seem to indicate that it's box trouble rather than code trouble.
I think we're going to reopen, since whoever anonymously closed the tree didn't leave any indication of having noticed that it was only two of the three, much less that it had been ignored all week long, but it's still pretty critical since it leaves us with just one.
Comment 1•16 years ago
|
||
I was just about to close the tree again, in particular, as it's been a while since qm-pvista-trunk01 reported a successful report.
Anyone on this?
Comment 2•16 years ago
|
||
The logs for qm-pvista-trunk02/03 look fine in the sense that they clean up old builds at the start, and run Ts and Tp happily. There isn't very much more info for the failing Tdhtml in the log, just
Running test tdhtml:
Started Mon, 16 Feb 2009 09:48:44
Failed tdhtml:
Stopped Mon, 16 Feb 2009 09:49:17
FAIL: Busted: tdhtml
FAIL: failed to initialize browser
Trying to catch the problem on qm-pvista-trunk03 with a nightly build, in the hope of getting some sort of crash report.
Comment 3•16 years ago
|
||
I haven't managed to reproduce the error in comment #2 when adjusting the config.yml (test manifest) to test just tdhtml. And get a counter error if I leave ts and tp in before tdhtml.
AFAICT, the "failed to initialize browser" message can only come from initializeProfile()
http://mxr.mozilla.org/mozilla/source/testing/performance/talos/ttest.py#117
which is calling InitializeNewProfile
http://mxr.mozilla.org/mozilla/source/testing/performance/talos/ffsetup.py#143
to fill out the profile dir (with all the other files that talos doesn't want to explicitly specify). If we're getting an error here, it's because it's taking more than 30 seconds to do that launch and shutdown. Talos has already checked for existing firefox/minefield processes at this point.
It could be that the Tp chews up a bunch of memory, and it takes a little while for things to clear up to a useful state for Tdhtml (say if we're well into the swap file). I've observed Working Sets and Private Bytes which are larger than 400MB after a few cycles of the pageset for Tp, and Commit was as high as 550MB, which doesn't really tally with the ~70MB which is reported on the graph server. Probably that's measuring different things and I need some schooling, but I mention it because the mini's have 1GB of RAM, of which about 150MB was still being used for caching and small amount truly free. Don't have data for the state when it gets to the very end of Tp, which would be useful to confirm or shoot down this guess. Alice will of course know how it really works.
At any rate, I've left qm-pvista-trunk03 out of circulation for the moment.
Comment 4•16 years ago
|
||
I talked to alice about this when I first noticed it. (Although forgot to file a bug, tsk tsk.) I was suspicious that 2/3 were orange, and one solid green, but at the same time, it seems odd that both of those machines would be having trouble at the same time. I didn't wind up feeling confident about it either way.
Comment 5•16 years ago
|
||
We have a theory that this is heat related. qm-pvista-trunk01/02/03 are the first three machines in the rack (on their side, in a row of 8), and idling 03 seemed to cause 02 to be green rather than often orange. To rule out code changes in the meantime, qm-pvista-trunk03 is back on for a spell.
The next 4 machines in the row are qm-pxp-trunk05/01/02/03, so I'll look at their recent history too. Unthrottling may also be related (bug 468680, jan 30).
Comment 6•16 years ago
|
||
qm-pxp-trunk05/01/02/03 were perma-green over the last week. Perhaps Vista is working the mini much harder than XP does.
Comment 7•16 years ago
|
||
Since restarting qm-pvista-trunk03 it's been green except for one weird failure
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234969776.1234975262.13021.gz&fulltext=1
Similarly qm-pvista-trunk02 has been green except for one
FAIL: Busted: ts
FAIL: previous cycle still running
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234944070.1234950010.13051.gz&fulltext=1
after doing a few of the startups in the test.
Going to be very tempting to resolve this WFM if it continues like this.
Comment 8•16 years ago
|
||
They've continued to be green apart from the cross-tree talos bustage for revisions f4800de50e03, d17cb4c725bd, d17cb4c725bd. It's unsatisfying to resolve without knowing what the problem was, please reopen if this starts up again.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Comment 9•16 years ago
|
||
qm-pvista-trunk03 and 02 are exhibiting similar symptoms again. repeated failure:
FAIL: Busted: tdhtml
FAIL: failed to initialize browser
01 had the Ts failure again, but seems more stable than 02 and 03.
should i re-open this?
Assignee | ||
Comment 10•16 years ago
|
||
Since the boxes are still going orange, let's re-open.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 11•16 years ago
|
||
Alice:
1) So, is this a fair summary: qm-pvista-trunk01 remains green, but qm-pvista-trunk02,03 start going orange at the same time?
2) Last time these machines went green, what exactly did you do? Reboot 02,03? Reimage 02,03? Nothing?
3) Its unclear to me if this is talos/releng infrastructure problem, or a genuine code bug. The fact it hits two machines at the same time makes me suspect a code/testware problem, but how can we tell where to start looking?
Reporter | ||
Comment 12•16 years ago
|
||
Looks like the continuous green lasted until (using "days" starting at 10:30, since that's what Tinderbox gave me) 2/22. Since then, the daily fails were
02 03
1 0
0 0
0 0
1 0
2 0
0 3
0 0
0 2
1 0
2 0
2 1
4 3
and then 0 and 2 for the half-day today. Not exactly the pattern of the previous episode, but here again, if you want to blame it on code, you'll first need to explain away the fact that during those 24 failures over 12.5 days, 01 failed this way zero times.
Assignee | ||
Comment 13•16 years ago
|
||
We can attempt here to give Talos a little more time to open/close the browser and see if we can stop this orange. I've been wary about changing test parameters, but I have an idea for a browser shutdown test in the works that should start collecting data.
Assignee | ||
Comment 14•16 years ago
|
||
Increase allowable time for browser to open/close.
Assignee: nobody → anodelman
Attachment #366017 -
Flags: review?(bhearsum)
Comment 15•16 years ago
|
||
Comment on attachment 366017 [details] [diff] [review]
[Checked in]increase timeout for browser opening/closing
wfm
Attachment #366017 -
Flags: review?(bhearsum) → review+
Assignee | ||
Comment 16•16 years ago
|
||
Comment on attachment 366017 [details] [diff] [review]
[Checked in]increase timeout for browser opening/closing
Checking in sample.config;
/cvsroot/mozilla/testing/performance/talos/sample.config,v <-- sample.config
new revision: 1.25; previous revision: 1.24
done
Checking in ffsetup.py;
/cvsroot/mozilla/testing/performance/talos/ffsetup.py,v <-- ffsetup.py
new revision: 1.8; previous revision: 1.7
done
Attachment #366017 -
Attachment description: increase timeout for browser opening/closing → [Checked in]increase timeout for browser opening/closing
Attachment #366017 -
Flags: checked‑in+ checked‑in+
Assignee | ||
Comment 17•16 years ago
|
||
These machines were not re-imaged. Machines were re-imaged in bug 480048 - and it seems to have resulted in green machines. My timeout increase may not have been necessary, we may just need to re-image the orange cycling machines.
Comment 18•16 years ago
|
||
After this landed, Ts regressed about 11-14% on Mac 10.5, both mozilla-central and mozilla-1.9.1 with no changes landed on the latter. Possibly also bug 480577 but the timing here matches better.
Comment 19•16 years ago
|
||
(In reply to comment #17)
> These machines were not re-imaged. Machines were re-imaged in bug 480048 - and
> it seems to have resulted in green machines. My timeout increase may not have
> been necessary, we may just need to re-image the orange cycling machines.
ok, good to know, thanks Alice.
Agreed, sounds like next time we start seeing orange cycling machines, we should start by reimaging quickly, before too many other code changes land and complicate the picture. If a reimage turns the machine back green again, then we at least have narrowed down the problem a bit!
Assignee | ||
Comment 20•16 years ago
|
||
I can't see how changing the browser timeout would have affected Ts - and if it does affect Ts in some way that I'm not seeing I don't understand why it would only affect leopard.
I'd be willing to do a backout to see if numbers normalize, but that should happen during a downtime.
Assignee | ||
Comment 21•16 years ago
|
||
Attachment #370539 -
Flags: review?(joduinn)
Assignee | ||
Comment 22•16 years ago
|
||
Going back to the old timer settings in an attempt to resolve scattered Ts wonkiness since the timer settings were increased.
Updated•16 years ago
|
Attachment #370539 -
Flags: review?(joduinn) → review+
Comment 23•16 years ago
|
||
Comment on attachment 370539 [details] [diff] [review]
[Checked in]back to the old timer settings
From irc, aki already confirmed he manually overrides these settings for Talos-for-mobile.
As this doesn't cause mobile any problems, and as it simply reverts to previous values, I'll r+.
Assignee | ||
Comment 24•16 years ago
|
||
Comment on attachment 370539 [details] [diff] [review]
[Checked in]back to the old timer settings
Checking in ffsetup.py;
/cvsroot/mozilla/testing/performance/talos/ffsetup.py,v <-- ffsetup.py
new revision: 1.9; previous revision: 1.8
done
Checking in sample.config;
/cvsroot/mozilla/testing/performance/talos/sample.config,v <-- sample.config
new revision: 1.26; previous revision: 1.25
done
Attachment #370539 -
Attachment description: back to the old timer settings → [Checked in]back to the old timer settings
Attachment #370539 -
Flags: checked‑in+ checked‑in+
Comment 25•16 years ago
|
||
Has this gone live?
Assignee | ||
Comment 26•16 years ago
|
||
Went live yesterday at around 5pm.
Comment 27•16 years ago
|
||
Judging from this graph of branch Darwin 9.2.2 boxes, the backout resolved part of the regression:
http://graphs-new.mozilla.org/#tests=[{%22test%22:%2216%22,%22branch%22:%223%22,%22machine%22:%2240%22},{%22test%22:%2216%22,%22branch%22:%223%22,%22machine%22:%2241%22},{%22test%22:%2216%22,%22branch%22:%223%22,%22machine%22:%2242%22},{%22test%22:%2216%22,%22branch%22:%223%22,%22machine%22:%2243%22}]&sel=1238192959,1238687040
It's early yet, but it looks like we were averaging ~1530 before the regression, 1800 during the regression, and post-backout we're down to 1650.
Updated•16 years ago
|
Summary: Investigate orange on mozilla-central's qm-pvista-trunk02 and qm-pvista-trunk03 → Tdhtml orange ("failed to initialize browser") on mozilla-central's qm-pvista-trunk02 and qm-pvista-trunk03
Whiteboard: [orange]
Assignee | ||
Updated•16 years ago
|
Summary: Tdhtml orange ("failed to initialize browser") on mozilla-central's qm-pvista-trunk02 and qm-pvista-trunk03 → intermittent orange on mozilla-central's qm-pvista-trunk02/03/04
Comment 29•16 years ago
|
||
qm-pvista-trunk03 has been orange with the problem mentioned in comment 9 for a few days now. Is there anything that can be done about it, or can that box be removed from the Firefox tinderbox so we don't have to keep starring it?
Comment 30•16 years ago
|
||
Buildbot is stopped on qm-pvista-trunk03, it'll fall off the waterfall from lack of work.
Assignee | ||
Comment 31•16 years ago
|
||
Having talos monitor browser shutdown time should eliminate these oranges, as they are caused by the browser taking a long time to close and thus confusing talos - that is why the initial fix of increasing timeouts made the orange go away.
Depends on: Tshutdown
Updated•16 years ago
|
Priority: -- → P3
Comment 32•16 years ago
|
||
Alice, would the shutdown monitoring address the oranges we're seeing today on
WINNT 6.0 talos mozilla-central nochrome qm-pvista-trunk04
WINNT 6.0 talos mozilla-central qm-pvista-trunk02
WINNT 5.1 talos mozilla-central qm-pxp-trunk02
All three have experienced either
FAIL: Busted: ts
FAIL: failed to initialize browser
or
FAIL: Busted: tdhtml
FAIL: failed to initialize browser
Or could these be something else?
Updated•16 years ago
|
Summary: intermittent orange on mozilla-central's qm-pvista-trunk02/03/04 → intermittent orange on mozilla-central's qm-pvista-trunk02/03/04 ("failed to initialize browser")
Comment 33•16 years ago
|
||
(In reply to comment #32)
> FAIL: Busted: ts
> FAIL: failed to initialize browser
Seeing this today too. Is this the same bug, or something else?
Comment 34•16 years ago
|
||
To get an idea of the frequency of this...
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239810979.1239821303.9933.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239809627.1239820434.8427.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239809627.1239817231.2791.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239808615.1239819813.7473.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239808615.1239815506.32330.gz&fulltext=1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.5/1239807268.1239816550.1499.gz&fulltext=1
That's just today...
Comment 35•16 years ago
|
||
I'll take the blame for the annoyance, since I got Alice to back it out while we were hunting the Ts regression on Mac. As I understand it though, the Tshutdown patch that will remedy this just needs review and then scheduled downtime to push.
Even with the semi-regular failures, I suspect we don't want that downtime to happen before freeze?
Comment 36•16 years ago
|
||
(In reply to comment #35)
> Even with the semi-regular failures, I suspect we don't want that downtime to
> happen before freeze?
Probably not. Good to know this is on course to getting fixed.
Comment 37•16 years ago
|
||
A lot of my previous links may have been bug 482575 as well (didn't check machine until now). Maybe that's a dupe of this though...
Comment 38•16 years ago
|
||
(In reply to comment #35)
> Even with the semi-regular failures, I suspect we don't want that downtime to
> happen before freeze?
We've been assuming (hopefully correctly) that we should avoid any downtime before FF3.5b4 work is done. However, happy to change plans if thats preferred.
Comment 40•16 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239976792.1239982719.11487.gz&fulltext=1
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 on 2009/04/17 06:59:52
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239988477.1239994511.2731.gz&fulltext=1
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 on 2009/04/17 10:14:37
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239984953.1239991004.29196.gz&fulltext=1
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 on 2009/04/17 09:15:53
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239984953.1239995249.3869.gz&fulltext=1
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/17 09:15:53
Comment 41•16 years ago
|
||
I thought this was supposed to be fixed by this weekend's talos maintenance, but it's not:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240231529.1240242539.14651.gz
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/20 05:45:29
Tdhtml
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240231529.1240238249.7784.gz
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 on 2009/04/20 05:45:29
Ts
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240231529.1240239353.9482.gz
WINNT 6.0 talos mozilla-central nochrome qm-pvista-trunk04 on 2009/04/20 05:45:29
Ts
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240236984.1240242926.15242.gz
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 on 2009/04/20 07:16:24
Ts
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240228045.1240233098.26771.gz
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 on 2009/04/20 04:47:25
Ts
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240218634.1240225467.13349.gz
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/20 02:10:34
Ts
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240224794.1240230173.22414.gz
WINNT 5.1 talos mozilla-central qm-pxp-trunk02 on 2009/04/20 03:53:14
Ts
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240224794.1240230172.22410.gz
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 on 2009/04/20 03:53:14
Ts
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240226189.1240235637.30602.gz
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/20 04:16:29
Tdhtml
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240226189.1240231365.24217.gz
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 on 2009/04/20 04:16:29
Ts
Those are the "failed to initialize browser" oranges in the past 12 hours. They include XP and Vista machines, Ts and Tdhtml tests, and include the nochrome boxes.
Which of those are covered by this bug, and for which should new bugs be filed?
Comment 42•16 years ago
|
||
This weekend's fix was backed out due to reds on some mac boxes. See bug 480413 comment 20
Updated•16 years ago
|
Summary: intermittent orange on mozilla-central's qm-pvista-trunk02/03/04 ("failed to initialize browser") → intermittent orange on Windows mozilla-central talos Ts and Tdhtml tests ("failed to initialize browser")
Comment 43•16 years ago
|
||
WINNT 5.1 talos mozilla-central nochrome qm-pxp-trunk07 [testfailed]
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240277387.1240283560.15351.gz&fulltext=1
TinderboxPrint:FAIL: Busted: ts
TinderboxPrint:FAIL: failed to initialize browser
Comment 44•16 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240408820.1240415112.13072.gz
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/22 07:00:20
Ts
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240400478.1240407892.28809.gz
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/22 04:41:18
Ts
Comment 45•16 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240463483.1240474113.10075.gz&fulltext=1
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/22 22:11:23
Tdhtml
Comment 46•16 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240576888.1240586182.25390.gz
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/24 05:41:28
Tdhtml
Comment 47•16 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240589285.1240599408.15920.gz
WINNT 6.0 talos mozilla-central qm-pvista-trunk02 on 2009/04/24 09:08:05
Tdhtml
Assignee | ||
Comment 48•16 years ago
|
||
I'm seeing a lot of green vista boxes now. Will leave this open for another couple of days to ensure that we are done with this error for good.
Assignee | ||
Comment 49•16 years ago
|
||
Still seeing lots of green, and on moz-central instead of intermittent orange there now appears to be an intermittent crash with stack - so I'm going to call this success.
Status: REOPENED → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange]
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•