Closed Bug 416911 Opened 17 years ago Closed 16 years ago

per-test timeout in talos

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: anodelman, Assigned: anodelman)

References

Details

Attachments

(7 files, 2 obsolete files)

Currently talos uses the same 8 hour time limit for any test.  This can cause a pretty major delay in the browser freezing up and us correctly reporting the failure.  We actually pretty much have this code in place, we just need to start using it.

Before this change could be made all the currently used buildbot talos config files would have to be updated to have an appropriate timeout for each test.
Attached patch per test timeout for talos (obsolete) (deleted) — Splinter Review
Assignee: nobody → anodelman
Status: NEW → ASSIGNED
Attachment #302705 - Flags: review?(rcampbell)
Here are my proposed timeouts:

tjss : 30 minutes (1800 s)
twinopen : 5 minutes (300 s)
tsspider : 5 minutes (300 s)
tp : 2 1/2 hours (9000 s)
tp (historic page set) : 5 minutes (300 s)
tsvg : 15 minutes (900 s)
tdhtml : 15 minutes (900 s)
tgfx : 15 minutes (900 s)

The main worry would be if they are set too low so as to stop a successful test run part way through.  We'll also have to maintain these as more test pages get added to the various suites.
Attachment #302707 - Flags: review?(rcampbell)
Summary: per test timeout in talos → per-test timeout in talos
Attachment #302705 - Flags: review?(rcampbell) → review+
Attachment #302707 - Flags: review?(rcampbell) → review+
Attached patch reworking per-test timeout (deleted) — Splinter Review
This is to replace the previous version of this patch which required the specification of timeouts for each test.  Now I'm making it optional since that will make it easier to check in without having to alter all the currently used config files.
Attachment #302705 - Attachment is obsolete: true
Attachment #302707 - Attachment is obsolete: true
Attachment #305078 - Flags: review?(ccooper)
Attachment #305078 - Flags: review?(ccooper) → review?(rhelmer)
Attachment #305078 - Flags: review?(rhelmer) → review+
Checking in ttest.py;
/cvsroot/mozilla/testing/performance/talos/ttest.py,v  <--  ttest.py
new revision: 1.12; previous revision: 1.11
done
Checking in sample.config;
/cvsroot/mozilla/testing/performance/talos/sample.config,v  <--  sample.config
new revision: 1.13; previous revision: 1.12
done
Attachment #307339 - Flags: review?(rhelmer)
Attachment #307339 - Flags: review?(rhelmer) → review+
Pushed to stage.

Checking in sample.config;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/sample.config,v  <--  sample.config
new revision: 1.11; previous revision: 1.10
done
Checking in sample.config.nochrome;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/sample.config.nochrome,v  <--  sample.config.nochrome
new revision: 1.5; previous revision: 1.4
done
Checking in sample.config.nogfx;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/sample.config.nogfx,v  <--  sample.config.nogfx
new revision: 1.11; previous revision: 1.10
done
Checking in fast.sample.config;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/fast.sample.config,v  <--  fast.sample.config
new revision: 1.3; previous revision: 1.2
done
Assignee: anodelman → nobody
Status: ASSIGNED → NEW
Component: Testing → Release Engineering
OS: Mac OS X → All
Priority: -- → P2
Product: Core → mozilla.org
QA Contact: testing → release
Hardware: PC → All
Version: unspecified → other
Assignee: nobody → anodelman
Adding timeouts showed up some flaws in how we kill lingering browser processes in windows.  While the browser is terminated it takes more time than on linux/mac because of the windows OS always giving a process the chance to close cleanly before killing.  This time difference between the request to kill and the actual killing leads to states where we are attempting to clean up profiles/temporary data that is still in use - which then throw exceptions.

This patch adds some smarts to terminating processes by doing checks to make sure that they are really dead before moving on in the code.  I've tested on win/linux and it works fine.  We may want to do some mac tests before we fully push the code in case it breaks things there.
Attachment #310659 - Flags: review?(rcampbell)
Comment on attachment 310659 [details] [diff] [review]
better process killing on timeout (or other bad situations)

one more round with this, eh?

Looks reasonable.
Attachment #310659 - Flags: review?(rcampbell) → review+
Better process killing on timeout:

Checking in ttest.py;
/cvsroot/mozilla/testing/performance/talos/ttest.py,v  <--  ttest.py
new revision: 1.13; previous revision: 1.12
done
Checking in ffprocess_linux.py;
/cvsroot/mozilla/testing/performance/talos/ffprocess_linux.py,v  <--  ffprocess_linux.py
new revision: 1.7; previous revision: 1.6
done
Checking in ffprocess_mac.py;
/cvsroot/mozilla/testing/performance/talos/ffprocess_mac.py,v  <--  ffprocess_mac.py
new revision: 1.8; previous revision: 1.7
done
Checking in ffprocess_win32.py;
/cvsroot/mozilla/testing/performance/talos/ffprocess_win32.py,v  <--  ffprocess_win32.py
new revision: 1.5; previous revision: 1.4
done
Giving this another try.  The issue I ran into with the other patches was that a timeout per test could too easily go wrong - it would kill tests that were running but running slowly.  This new approach has the browser send a ping back to talos after loading each page.  If talos doesn't hear from the browser in a certain amount of time it can presume that the browser is frozen and kill it.  This will also give us extra data as to what page the browser is stuck on.

I have this up and running on talos stage and you can see output files on the MozillaTest waterfall.
Attachment #327178 - Flags: review?(rcampbell)
The pageloader's noisy output currently only includes the page that it just loaded and not the name of the page that it is about to load.  If we get frozen on a page we are more interested in what the browser is trying to do as opposed to what it just did.
Attachment #327180 - Flags: review?(rcampbell)
Comment on attachment 327178 [details] [diff] [review]
timeout based upon browser activity [Checked in]

ah, this old chestnut.
Attachment #327178 - Flags: review?(rcampbell) → review+
Attachment #327179 - Flags: review?(rcampbell) → review+
Comment on attachment 327180 [details] [diff] [review]
more information from the pageloader [Checked in]

>+    dumpLine("Cycle " + (cycle+1) + ": loaded " + pageName + ' (next: ' + nextName + ')');

mixing quote styles is somewhat confusing here, but acceptable.
Attachment #327180 - Flags: review?(rcampbell) → review+
Comment on attachment 327178 [details] [diff] [review]
timeout based upon browser activity [Checked in]

Checking in ffprocess_win32.py;
/cvsroot/mozilla/testing/performance/talos/ffprocess_win32.py,v  <--  ffprocess_win32.py
new revision: 1.7; previous revision: 1.6
done
Checking in sample.config;
/cvsroot/mozilla/testing/performance/talos/sample.config,v  <--  sample.config
new revision: 1.16; previous revision: 1.15
done
Checking in ttest.py;
/cvsroot/mozilla/testing/performance/talos/ttest.py,v  <--  ttest.py
new revision: 1.15; previous revision: 1.14
done
Checking in utils.py;
/cvsroot/mozilla/testing/performance/talos/utils.py,v  <--  utils.py
new revision: 1.6; previous revision: 1.5
done
Attachment #327178 - Attachment description: timeout based upon browser activity → timeout based upon browser activity [Checked in]
Comment on attachment 327179 [details] [diff] [review]
new configs for talos production/stage/try for new timeout structure [Checked in]

Checking in perf-staging/configs/sample.config;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perf-staging/configs/sample.config,v  <--  sample.config
new revision: 1.4; previous revision: 1.3
done
Checking in perfmaster2/configs/fast.production.sample.config;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/fast.production.sample.config,v  <--  fast.production.sample.config
new revision: 1.5; previous revision: 1.4
done
Checking in perfmaster2/configs/jss.production.sample.config;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/jss.production.sample.config,v  <--  jss.production.sample.config
new revision: 1.4; previous revision: 1.3
done
Checking in perfmaster2/configs/production.sample.config;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/production.sample.config,v  <--  production.sample.config
new revision: 1.7; previous revision: 1.6
done
Checking in perfmaster2/configs/production.sample.config.nochrome;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/production.sample.config.nochrome,v  <--  production.sample.config.nochrome
new revision: 1.4; previous revision: 1.3
done
Checking in perfmaster2/configs/production.sample.config.nogfx;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/configs/production.sample.config.nogfx,v  <--  production.sample.config.nogfx
new revision: 1.7; previous revision: 1.6
done
Checking in tryperfmaster/configs/sample.config;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/configs/sample.config,v  <--  sample.config
new revision: 1.4; previous revision: 1.3
done
Attachment #327179 - Attachment description: new configs for talos production/stage/try for new timeout structure → new configs for talos production/stage/try for new timeout structure [Checked in]
Comment on attachment 327180 [details] [diff] [review]
more information from the pageloader [Checked in]

Checking in pageloader.js;
/cvsroot/mozilla/layout/tools/pageloader/pageloader.js,v  <--  pageloader.js
new revision: 1.14; previous revision: 1.13
done
Attachment #327180 - Attachment description: more information from the pageloader → more information from the pageloader [Checked in]
I've checked in bustage fixes for all the config files - I had the pageloader options malformed

"-tpformat -tpnoisy tinderbox" should be "-tpnoisy -tpformat tinderbox"
Attachment #328080 - Flags: review?(rcampbell)
Comment on attachment 328080 [details] [diff] [review]
minor cleanup of debug/noisy statements [Checked in]

are you getting '\r' because of Windows or is something putting them in there explicitly?
Attachment #328080 - Flags: review?(rcampbell) → review+
Comment on attachment 328080 [details] [diff] [review]
minor cleanup of debug/noisy statements [Checked in]

Checking in utils.py;
/cvsroot/mozilla/testing/performance/talos/utils.py,v  <--  utils.py
new revision: 1.7; previous revision: 1.6
done
Attachment #328080 - Attachment description: minor cleanup of debug/noisy statements → minor cleanup of debug/noisy statements [Checked in]
To Comment #20 - there aren't any explicit \r's anyway, but they are appearing in the windows output.  
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Component: Release Engineering: Talos → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: