Closed Bug 704654 Opened 13 years ago Closed 13 years ago

Talos should feature unified configuration

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: k0scist, Assigned: k0scist)

References

Details

(Whiteboard: [mozbase])

Attachments

(2 files, 15 obsolete files)

(deleted), patch
jmaher
: review+
Details | Diff | Splinter Review
(deleted), patch
jmaher
: review+
Details | Diff | Splinter Review
Currently the workflow of talos is as follows: 1. generate a yml file with `python PerfConfigurator.py [options]` 2. run the tests with `python run_tests.py [configuration.yml] [...]` Once the configuration is generated, there is no way outside of editing the configuration.yml file to override the configuration at run time. In addition, there is no real reason for the two-step process. If you PerfConfigurator command is, say, python PerfConfigurator.py --executablePath ... --enableTests ts for a one-instance run, there is no reason that you should not be able to do this directly as e.g. python run_test.py --executablePath ... --enableTests ts etc. Overriding would also be nice. Let's say you have a talos configuration file but want to run it with a different browser path. Currently you copy the configuration and change one line. What about overrides? python run_test.py configuration.yml --executablePath /path/to/other/firefox The default values of configurations live in two places, currently -- three if you count the generated configuration.yml. 1. in PerfConfigurator.py, both in the CLI and in the configuration object: http://hg.mozilla.org/build/talos/file/214df9ee2ea7/talos/PerfConfigurator.py#l26 , http://hg.mozilla.org/build/talos/file/214df9ee2ea7/talos/PerfConfigurator.py#l239 2. in run_tests.py: http://hg.mozilla.org/build/talos/file/214df9ee2ea7/talos/run_tests.py#l405 This not only leads to it being difficult to find what the code is doing, because the configuration is baked in to application logic, but you also have to remember, in general, to change both when you add new configuration. Proposal: Have a unified configuration object that serves as the blueprint for configuration. This object only holds configuration and methods dealing with configuration. You should be able to look at this object and see what configuration talos requires and is optional. The current form of talos configuration looks like: {'bcontroller_config': 'bcontroller.yml', 'branch': None, 'browser_log': 'browser_output.txt', 'browser_path': '/home/jhammel/firefox/firefox', 'browser_wait': 5, 'buildid': '2011103103', 'bundles': {'talos_pageloader': 'page_load_test/pageloader.xpi'}, 'dirs': {}, 'env': {'NO_EM_RESTART': 1}, 'extensions': {}, 'extra_args': '', 'init_url': 'getInfo.html', 'preferences': {'browser.EULA.override': True, 'browser.bookmarks.max_backups': 0, 'browser.cache.disk.smart_size.enabled': False, 'browser.cache.disk.smart_size.first_run': False, 'browser.dom.window.dump.enabled': True, 'browser.link.open_newwindow': 2, 'browser.shell.checkDefaultBrowser': False, 'browser.warnOnQuit': False, 'dom.allow_scripts_to_close_windows': True, 'dom.disable_open_during_load': False, 'dom.disable_window_flip': True, 'dom.disable_window_move_resize': True, 'dom.max_chrome_script_run_time': 0, 'dom.max_script_run_time': 0, 'extensions.checkCompatibility': False, 'extensions.update.notifyUser': False, 'network.proxy.http': 'localhost', 'network.proxy.http_port': 80, 'network.proxy.type': 1, 'security.enable_java': False, 'security.fileuri.strict_origin_policy': False}, 'process': 'firefox', 'test_timeout': 1200, 'tests': [{'cycles': 20, 'linux_counters': [], 'mac_counters': [], 'name': 'ts', 'profile_path': 'base_profile', 'resolution': 1, 'responsiveness': False, 'shutdown': True, 'timeout': 150, 'url': 'startup_test/startup_test.html?begin=', 'url_mod': 'str(int(time.time()*1000))', 'w7_counters': [], 'win_counters': []}], 'title': 'qm-pxp01'} There are essentially three kinds of data here: - immutable types: strings, integers, bools - lists - dicts The Configuration class should store what is expected where and what is optional. Bad configuration should result in early error. Once you have a unified configuration object with unified defaults, you can take configuration from sources supporting these formats: - YAML: since talos already uses it anyway - JSON: since its essentially free - CLI options: since you'll want to override stuff at run time; these are also useful for one-off runs as displayed above CLI options will have to be handled separately, but you are helped by the fact that since you have a universal source for configuration, you can transform this (selectively, with translation if desired/required) into a set of CLI options for OptionParser. The basic scheme would be that you feed in configurations to run_tests.py and override them with CLI options. Currently, run_tests.py treats each argument as a .yml file to run tests on. We could preserve that and apply command line arguments to each file you run with (interspersed, if you want to make it even more complicated). If we don't care about this behaviour, we could apply each configuration file in series as an override. E.g. python runtests.py base.yml specific.yml --exePath /home/jhammel/firefox/firefox In this case, overriding is done like: 1. The defaults from the Configuration class are applied 2. These are overridden with the configuration from base.yml 3. These are overridden with the configuration from specific.yml 4. The exePath (and conceivably more variables) is overridden from the command line option 5. run tests By overriding here I mean the following for each type of data: - immutable types are replaced - lists are appended to - dictionaries are updated with dict.update() If we insist on running tests per .yml file, as is done now, this becomes: For base.yml: 1a. The defaults from the Configuration class are applied 2a. These are overridden with the configuration from base.yml 3a. The exePath (and conceivably more variables) is overridden from the command line option 4a. run tests For specific.yml: 1a. The defaults from the Configuration class are applied 2a. These are overridden with the configuration from specific.yml 3a. The exePath (and conceivably more variables) is overridden from the command line option 4a. run tests This also fits in with domain-specific problems, such as mobile, as you can create a subclass of Configuration pointing to your needs. The type of configuration object used should be dictated by the rest of control flow but in both cases the control flow of the configuration munging remains the same. One of the upshots of Talos's existing configuration process is that you get baked in configuration files. In order to preserve this advantage, a CLI flag, e.g. `--dump` should be added that will dump the unified configuration to a file.
Whiteboard: [mozbase]
Depends on: 716921
Depends on: 717395
Depends on: 719861
So essentially the talos options actually have the following parameters: - whether the option is required or not; this is sketched out in http://hg.mozilla.org/build/talos/file/9c1b3addb9ee/talos/run_tests.py#l440 although not necessary in good form - for optional options, we should have a default value; we could have this for required options....or not. probably not. Although see e.g http://hg.mozilla.org/build/talos/file/9c1b3addb9ee/talos/sample.config#l16 - each option should have a type. For optional options, this could be taken from the type of the default, although there are things like filesystem paths that may require more thought to. For required options, you might want to specify a type verbosely - each option should have a help description - for command line switches, you can take the name of the option as the switch; since options are unique, --some_option will be unique. You could additionally specify short switches, though I'd prefer to do this not at the option level but at the parser level. Essentially, you'd be dynamically creating an optionparser from the options data
Depends on: 725097
Depends on: 725140
Depends on: 718062
We should probably also get rid of using application.ini before we can consider this done, as this will change the way we gather data about the application
Depends on: 721773
I've done a proof of concept for how I think Talos configuration should work: http://k0s.org/mozilla/hg/configuration/file/tip/configuration/configuration.py This gives you a class which you can - enumerate configuration on (explicitly, which we are missing from Talos) - serialize to a file (bug 719861, which we are missing from Talos) - deserialize from YAML or JSON though with a pluggable option (we have YAML for Talos and that is it) - allow overrides to deserialized files from the command line (we sorta have this from Talos, but because we do a lot of logic/checking in PerfConfigurator[*] this is imperfect at best, broken at worst) [*] (We actually have checking logic split between PerfConfigurator and run_tests.py, which has several negative consequences. PerfConfigurator can't verify all the things that are needed to actually run, which it really should be able to, and run_tests.py doesn't perform several of the setup steps that PerfConfigurator does, like e.g. copying files to remote devices, which is also a negative, and it breaks (some) command line overrides) Note that this is a POC and therefore a few things would have to be added to be "talos-ready", but nothing I feel is particularly hard. Note also that this code isn't talos-specific. http://k0s.org/mozilla/hg/configuration/ also has tests and an example, albeit the documentation is yet largely non-existent. It would be pretty straightforward to transition sample.config to a Configuration class. Note that in any case the hard work will be getting logic out of PerfConfigurator. I've listed a lot of the work as dependencies here, though likely more will need to be added.
Depends on: 747187
Assignee: nobody → jhammel
I think I'm going to take a basic stab at getting configuration to be programmatic and fixing most of the blocking bugs here.
I'll do as much as bug 747187 as is convenient, but there may be some additional items there. Other than that, I hope to knock out the other blocking bugs soon.
Attached patch WIP (obsolete) (deleted) — Splinter Review
Not done and certainly not thoroughly test, but this works with ts locally. I've made a PerfConfigurator2.py just for convenience, though in reality this will be moved to PerfConfigurator.py. I also require the configuration package, so you'll need to run setup.py develop|install (its one file, so easy enough to add to talos.zip). Outside of testing and a few TODO items, the major things left to do: - I haven't touched remotePerfConfigurator at all. This will need to be done - wire this up to run_tests.py and possibly replace a bunch of configuration-related stuff there. But this should give a basic idea of where I'm going with this. I believe it makes things much more cleaner and prescriptive. If anyone has free time to read, feedback is welcome, though I plan on putting up a more thorough version soon.
The current WIP just dumps raw YAML to a file (in alphabetical order of keys, if it matters). Our current .config -> YAML files result in a more structured file with comments explaining usage, etc. It would be be hard to make a template to replicate this behaviour. The question is, do we care? Or is it YAGNI?
Comment on attachment 619472 [details] [diff] [review] WIP Review of attachment 619472 [details] [diff] [review]: ----------------------------------------------------------------- I like the direction of this. This is a lot of work though, hope we can make it stick. Would we remove the prefs and other .config file related bits? I assume since you are changing stuff to the test_overrides, that you are changing everything. ::: talos/utils.py @@ +289,5 @@ > + nettools = devicemanager.NetworkTools() > + return nettools.findOpenPort(ip, 15707) > + > +def getLanIp(): > + from mozdevice import devicemanager I believe we were trying to get rid of this inline imports.
I don't care about the order of things. It is nice to have the tests at the bottom, but the rest can be random in my opinion.
Yes, almost all of all of the config files will be deleted. Only overrides will remain. > > +def getLanIp(): > > + from mozdevice import devicemanager > > I believe we were trying to get rid of this inline imports. This is just a straight copy from PerfConfigurator.py. I can move this to the top-level.
(In reply to Joel Maher (:jmaher) from comment #10) > I don't care about the order of things. It is nice to have the tests at the > bottom, but the rest can be random in my opinion. Okay, we can reassess as I get closer to landing. If we don't take here it can be done as a follow-up (again, not hard to do)
Attached patch WIP 2 (obsolete) (deleted) — Splinter Review
I've wired up the PerfConfigurator class to run_tests.py and things seem to work well, though I haven't done much thorough testing. Still not ready to go, but getting there. I still have to - fix remotePerfConfigurator - get rid of 90+% of the config files - take more methods out of run_tests.py and put it in PerfConfigurator (e.g. browser_config, test fixing-up, other configuration-type details) - test test test I hope to have something reviewable in a day or so.
Attachment #619472 - Attachment is obsolete: true
Attached patch WIP 3 (obsolete) (deleted) — Splinter Review
This isn't quite good to go but I think the basic form is there. There is some issue with run_tests.py where I can't get to the getInfo.html to load from a remotePerfConfigurator test (which is interesting, since the same config file works in old talos run_tests.py, so it is a run_tests.py change; will investigate). In addition, there's a lot of cleanup I decided was beyond scope here. Itd be nice to move the browser config generation to PerfConfigurator, for instance, and it would be nice to be (much) more declarative about how tests work, but I think those and other issues are better left as follow-up issues
Attachment #619771 - Attachment is obsolete: true
Attachment #620425 - Flags: feedback?(jmaher)
Comment on attachment 620425 [details] [diff] [review] WIP 3 Review of attachment 620425 [details] [diff] [review]: ----------------------------------------------------------------- overall I really don't like 'config' things in code. This deviates from the way our other test harnesses do options. If we want to go this route for mochitest, reftest, xpcshell, I am fine being a line leader with talos. The goal is to have everything use similar tools. So this is a huge step forward, but I can imagine a lot of smaller patches to tweak things after something like this lands. ::: talos/PerfConfigurator.py @@ +3,2 @@ > """ > +performance configuration for Talos can we get a MPL 2.0 license here? ::: talos/remotePerfConfigurator.py @@ +66,5 @@ > + {'tpcycles': 3}, > + 'tsspider': > + {'tpcycles': 3} > + } > + } all of these just feel like config file options ::: talos/run_tests.py @@ +384,4 @@ > raise talosError('pageloader delay must be int 1 to 10,000') > + if 'tpmanifest' not in test: > + raise talosError("tpmanifest not found in test: %s" % test) > + # TODO: should probably check if the tpmanifest exists tpmanifest won't be in every test, only tp based tests.
Attachment #620425 - Flags: feedback?(jmaher) → feedback+
(In reply to Jeff Hammel [:jhammel] from comment #14) > Created attachment 620425 [details] [diff] [review] > WIP 3 > > This isn't quite good to go but I think the basic form is there. There is > some issue with run_tests.py where I can't get to the getInfo.html to load > from a remotePerfConfigurator test (which is interesting, since the same > config file works in old talos run_tests.py, so it is a run_tests.py change; > will investigate). > > In addition, there's a lot of cleanup I decided was beyond scope here. Itd > be nice to move the browser config generation to PerfConfigurator, for > instance, and it would be nice to be (much) more declarative about how tests > work, but I think those and other issues are better left as follow-up issues Ah, the issue was just a few preferences. Why this worked at all with the old run_tests.py is beyond me. This is now working
(In reply to Joel Maher (:jmaher) from comment #15) > Comment on attachment 620425 [details] [diff] [review] > WIP 3 > > Review of attachment 620425 [details] [diff] [review]: > ----------------------------------------------------------------- > > overall I really don't like 'config' things in code. This deviates from the > way our other test harnesses do options. If we want to go this route for > mochitest, reftest, xpcshell, I am fine being a line leader with talos. The > goal is to have everything use similar tools. I'll write a follow-up comment addressing this, as it is a higher level issue than nuts+bolts implementation (plus, its good to keep a record of such reasons around for posterity). But my short response is that we already have a bunch of code to deal with configuration which is much less flexible than what I added and what PerfConfigurator does, as I would put it, isn't holding configuration, but holding a basis for configuration. Anyway, I'll put a fuller response up soon. > So this is a huge step forward, but I can imagine a lot of smaller patches > to tweak things after something like this lands. Indeed. Writing this patch, while I think it is a huge improvement on our current situation, has revealed to me a bunch of things to clean up. I'll also detail these as a follow-up, but not necessarily as soon. > ::: talos/PerfConfigurator.py > @@ +3,2 @@ > > """ > > +performance configuration for Talos > > can we get a MPL 2.0 license here? > > ::: talos/remotePerfConfigurator.py > @@ +66,5 @@ > > + {'tpcycles': 3}, > > + 'tsspider': > > + {'tpcycles': 3} > > + } > > + } > > all of these just feel like config file options I could put this in remote.config and have remotePerfConfigurator read it by default if that's preferable. > ::: talos/run_tests.py > @@ +384,4 @@ > > raise talosError('pageloader delay must be int 1 to 10,000') > > + if 'tpmanifest' not in test: > > + raise talosError("tpmanifest not found in test: %s" % test) > > + # TODO: should probably check if the tpmanifest exists > > tpmanifest won't be in every test, only tp based tests. We actually do this already, with a few caveats: http://hg.mozilla.org/build/talos/file/c702ff8892be/talos/run_tests.py#l388 We only do this if the test doesn't have a url (in both cases), i.e. a tp based test: http://hg.mozilla.org/build/talos/file/c702ff8892be/talos/run_tests.py#l437 The difference is that in the former case, basetest has tpmanifest, etc, even though they are empty strings: http://hg.mozilla.org/build/talos/file/c702ff8892be/talos/sample.config#l101 In short, I can't think of any case where you'd not have a url and have a tpmanifest that is legitimate.
> overall I really don't like 'config' things in code. Overall, I agree. I would make the distinction that PerfConfigurator, and in particular PerfConfigurator.options is the basis for configuration and that PerfConfigurator + configuration files (YAML or JSON) plus command line arguments generates the actual configuration. The existing state of PerfConfigurator actually already does this and, in general, already does a lot of configuration declaration, checking, and serialization work. Take, for example, the processing of command line options to generate a resulting .yml file. So while I empathize with the desire to "keep configuration simple and out of code", it already is not simple and we have a ton of code dedicated to it. Goals: - remove the need to edit several different configuration to change a configuration basis. Currently most .config edits need to happen in 5 places (formerly 6). This is not only prone to human error (which I and others have been guilty of many times), it is a discouragement to change default configuration - consistent and declarative serialization/deserialization. Serialization in PerfConfigurator is mostly awful, scanning through line by line and looking for particular strings in (basically) an if-else tree, often depending on particular whitespace or other subtle (and undocumented) formatting issues. While the .config files conform to YAML, we don't make use of this for de/serialization. In addition, while in run_tests.py we allow command line overrides for the YAML items, we do not post-process them as we would in PerfConfigurator. - consistent error checking. Currently some of our config-checking is in PerfConfigurator and some is done in run_tests. This opens the possibility that either case may miss cases where the other one would find it. If you call run_tests.py with a .yml file, you will not get the checking done for the combination of command line items and the .yml configuration that is done in PerfConfigurator. Since we process a lot of command line items into resulting configuration, this can lead to interesting results (e.g. while --activeTests is a command line item for run_tests.py, it is not used, anywhere). In general, configuration should be checked in one place before any program logic takes place. While this patch doesn't completely address this issue, it a big step forward and should pave the way for future improvement. - configuration should be declarative. You should get what you expect from configuration, not inconsistent results. If you edit a (e.g.) .yml file with the existing Talos, you have no real way to know if the keys you add or edit are going to be used by run_tests.py (and what format they should be in, etc.) Having a basis for configuration gives a single place to denote what is expected (and thereby what isn't allowed) and the form that it is supposed to be in. It is also nice to have all configuration in a single place instead of having to look at a bunch of config files for the basis as well as all over the code to see what is expected and how it is processed. - allow running directly from run_test.py . For particular (e.g. production) systems, it may be advisable to use tuned (.yml) configuration files to have highly customized runs (note that we *don't* do this and use (remote)PerfConfigurator in all cases for reasons that may be infered from the above). However, for a typical developer, there is little reason to run ``PerfConfigurator -e `which firefox` -a ts --develop -o ts.yml && talos -n -d ts.yml`` for a particular run. Instead, the entirety of this may be invoked with this patch as ``talos -n -d -e `which firefox` -a ts --develop -o ts.yml`` in a one-step process. (Note that we're still dumping to ``ts.yml`` though one wouldn't have to if the result is intended as ephemeral). The overall principle guiding this design is that intention should be declarative and definitive. > This deviates from the > way our other test harnesses do options. If we want to go this route for > mochitest, reftest, xpcshell, I am fine being a line leader with talos. The > goal is to have everything use similar tools. While I would like to see mochitest, etc, move to the model I use here, that is a larger conversation. As far as I know, except for test manifests (which are mostly a mess last I looked), none of these suites have a combination of base_configuration + command_line_options as Talos has, which makes them a bit of a different beast. If we want to figure out what we want to do here first, that is fine, but I'm overall happy with this architecture. TL;DR; I've become increasingly convinced working on this patch and the upstream configuration package that what optparse calls 'dest' is the important factor (as already evidenced in Talos' move towards having dest align with the keys in the YAML file) and, additionally having 'dest' be a key to a dictionary whereby individual items may be looked up. If you want to modify ``OptionParser().options`` outside of defaults after declaration and before parsing, it is, at best, difficult. I'm happy to go over how the configuration package works and address issues there. (I am also happy to move to mozbase or another mozilla repo if we want.)
Attached patch proposed implementation (obsolete) (deleted) — Splinter Review
Putting this up for review. I have tested this quite a bit locally and on android and haven't found any problems with the final form. That said, I will push to try and would not be at all surprised if there are any number of errors that my twenty-or-so hand-runs didn't find.
Attachment #620425 - Attachment is obsolete: true
Attachment #620521 - Flags: review?(jmaher)
Try run for 26dea0f72f5b is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=26dea0f72f5b Results (out of 77 total builds): exception: 4 success: 5 failure: 68 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-26dea0f72f5b
(In reply to Mozilla RelEng Bot from comment #21) > Try run for 26dea0f72f5b is complete. > Detailed breakdown of the results available here: > https://tbpl.mozilla.org/?tree=Try&rev=26dea0f72f5b > Results (out of 77 total builds): > exception: 4 > success: 5 > failure: 68 > Builds (or logs if builds failed) available at: > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla. > com-26dea0f72f5b My fault, I attached a bad version of configuration.py. Will fix
Attached patch proposed implementation with fixes (obsolete) (deleted) — Splinter Review
The new configuration which I have added to the tree plus a few fixes i noticed on try. Note that if we keep configuration in the tree I will have to upport fixes. So I hope everyone will play nice and inform me of any changes proposed or otherwise there.
Attachment #620521 - Attachment is obsolete: true
Attachment #620521 - Flags: review?(jmaher)
Attachment #620754 - Flags: review?(jmaher)
Try run for 778386202654 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=778386202654 Results (out of 17 total builds): exception: 8 success: 2 failure: 7 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-778386202654
Attached patch include output name for buildbot (obsolete) (deleted) — Splinter Review
The last ran failed because buildbot was looking for an outputName. This should add it back in
Attachment #620754 - Attachment is obsolete: true
Attachment #620754 - Flags: review?(jmaher)
Attachment #620819 - Flags: review?(jmaher)
Attached patch more fixes (obsolete) (deleted) — Splinter Review
deal with prefs better, i somehow deleted the --rss switch so adding that back in
Attachment #620819 - Attachment is obsolete: true
Attachment #620819 - Flags: review?(jmaher)
Attachment #620886 - Flags: review?(jmaher)
Try run for f7ee9f1919f0 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=f7ee9f1919f0 Results (out of 83 total builds): exception: 8 success: 29 failure: 46 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-f7ee9f1919f0
Try run for f6357429b59f is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=f6357429b59f Results (out of 83 total builds): success: 58 failure: 25 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-f6357429b59f
Try run for f6357429b59f is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=f6357429b59f Results (out of 83 total builds): success: 58 failure: 25 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-f6357429b59f Timed out after 12 hours without completing.
The xperf failure seems to be (ABICT) a double-quoting issue for paths (but I could be wrong). I fail to understand the android failures or the 'c' failures :/
I'll try to debug this over the weekend as I want to focus on jetperf today. If anyone does have time to look at the patch and these failures with a fresh set of eyes it would be a huge help
Talking to :jmaher and :ted on IRC, this is an interim fix while we resolve our longer term strategy for the unification of testing harnesses. I'll attempt to distill what this is below, but please forgive me if I misrepresent anything. For talos in particular, we will give up YAML entirely and there will be no configuration loaded to run. This matches the rest of the test harnesses. Instead, we will use the talos console_script (aka run_test.py) with a series of arguments that will define the run desired. For the harnesses as a whole, we will converge on a set of command line switches that are common, though of course each test harness may have switches that are particular to that harness. This will probably be part of the mozbase effort, and, if needed, we will introduce a package/module that will help to this effect.
the mobile tests are failing because we have extra prefs in the profile: browser.bookmarks.max_backups: 0 dom.max_chrome_script_run_time: 0 I believe it is just the dom.max_chrome_script_run_time one, but these are not in the remote.config and shouldn't be in the profile.
the 'c' tests are failing on ts_paint, and the culprit is: dom.send_after_paint_to_content: 'true' change the 'true' from a string to a boolean and we will be testing ts_paint again.
I suspect the string->boolean will fix the xperf tests as well!
Attached patch fixes from comment 34, comment 35 (obsolete) (deleted) — Splinter Review
fixed the preferences issues I hope
Attachment #620886 - Attachment is obsolete: true
Attachment #620886 - Flags: review?(jmaher)
Attachment #621130 - Flags: review?(jmaher)
Forgot to unmask flags for --fennecIDs and remotePerfConfigurator; fixed locally and will upload a patch with other fixes from the try run (https://tbpl.mozilla.org/?tree=Try&rev=26ef300c2a39)
There are still failures that I am looking at. I made a script to compare the old and new config files: http://k0s.org/mozilla/talos/compare-config.py I wish I had done this before. The datadiff, which, and PyYAML python packages need to be installed to use this script. I'll use this to try to narrow down the differences and fix them
I've been able to reproduce and hopefully fix the android failures locally due to a change to the patch regarding how preferences were dealt with. In short, I was using preferences vs. extraPrefs wrongly: if 'preferences' are encountered, we should not extend this or use the default. extraPrefs, on the other hand, should be extended (dict.update() in python terms) and be used for .config and command line overrides. I'll push this to try. Hopefully this will fix the chrome failures on win opt as well. xperf is likely a separate issue, but we will see.
Attached patch fix the way preferences are dealt with (obsolete) (deleted) — Splinter Review
Attachment #621437 - Flags: review?(jmaher)
Attachment #621130 - Attachment is obsolete: true
Attachment #621130 - Flags: review?(jmaher)
Attachment #621437 - Attachment is obsolete: true
Attachment #621437 - Flags: review?(jmaher)
Attachment #621441 - Flags: review?(jmaher)
As best I can tell, the xperf differences are from the doubly quoted xperf path on buildbot: 'python' 'PerfConfigurator.py' '-v' '-e' '../firefox/firefox' '-t' 'talos-r3-w7-028' '--branchName' 'Try' '--activeTests' 'ts_paint:tpaint' '--sampleConfig' 'xperf.config' '--setPref' 'dom.send_after_paint_to_content=true' '--xperf_path' '"c:/Program Files/Microsoft Windows Performance Toolkit/xperf.exe"' '--symbolsPath' '../symbols' (See https://tbpl.mozilla.org/php/getParsedLog.php?id=11514590&tree=Try&full=1) The difference, where (-) is the new parser and (+) is the old: -'xperf_path': '"c:/Program Files/Microsoft Windows Performance Toolkit/xperf.exe"', +'xperf_path': 'c:/Program Files/Microsoft Windows Performance Toolkit/xperf.exe', I will check for the double quotes and unquote for now, but this is working around a workaround
Attached patch handle deviceroot case for foopys (obsolete) (deleted) — Splinter Review
Attachment #621441 - Attachment is obsolete: true
Attachment #621441 - Flags: review?(jmaher)
Attachment #621488 - Flags: review?(jmaher)
This is a horrible hack but I can't think of a way of getting around this. Longer term, the buildbot config should be changed and this BBB code deleted.
Attachment #621488 - Attachment is obsolete: true
Attachment #621488 - Flags: review?(jmaher)
Attachment #621754 - Flags: review?(jmaher)
Not seeing any substantive difference in the tsspider configs. Incidentally, our preference for first_run (vs. firstrun) is messed up in remote.config: https://developer.mozilla.org/en/Mozilla_Networking_Preferences http://hg.mozilla.org/build/talos/file/8ee9b373ea24/talos/remote.config#l55 Compare vs http://hg.mozilla.org/build/talos/file/8ee9b373ea24/talos/sample.config#l52 (This is the sort of mistake i hope to eliminate with this patch)
Not sure why it didn't post to bug, but https://tbpl.mozilla.org/?tree=Try&rev=ee8132d40320 The good news: there seems to be only three outstanding failures: rck2, sp, and x. The bad news: one of them is x when i had hoped to clear up with this last push. I'll look in more detail.
So it is now my off-hand guess that xperf is failing somewhere in the subshell bcontroller madness (run_tests shells out to bcontroller which shells out to xtalos, never fun). Going to try something but I'm running out of guesses here.
Attached patch stabbing in the dark here (obsolete) (deleted) — Splinter Review
I've reverted my change to bcontroller.yml; tbh, it shouldn't make a difference at all, but....its bcontroller, subshells, and magic
Try run for ee8132d40320 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=ee8132d40320 Results (out of 86 total builds): success: 81 failure: 5 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-ee8132d40320
Try run for 37ac87ba5bd8 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=37ac87ba5bd8 Results (out of 84 total builds): success: 80 failure: 3 other: 1 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-37ac87ba5bd8 Timed out after 12 hours without completing.
Failure case: talos-r3-w7-061: Started Tue, 08 May 2012 11:42:32 Running test ts_paint: Started Tue, 08 May 2012 11:42:32 No xperf providers options given __browserInfo browser_name:Firefox browser_version:15.0a1 buildID:20120508094655 __browserInfo__metrics Screen width/height:1024/768 colorDepth:24 Browser inner width/height: 1010/601 __metricsxperf: error: NT Kernel Logger: The instance name passed was not recognized as valid by a WMI data provider. (0x1069). in readfile: test.etl.csv.part, firefox xperf: error: test.etl: The specified path is invalid. (0x800700a1). xperf: warning: applying restriction of access for trace processing __xperf_data_begin__ filename, readcount, readbytes, writecount, writebytes __xperf_data_end__ NOISE: __startBeforeLaunchTimestamp1336502560584__endBeforeLaunchTimestamp NOISE: __startAfterTerminationTimestamp1336502578223__endAfterTerminationTimestamp Failed ts_paint: Stopped Tue, 08 May 2012 11:42:59 FAIL: Busted: ts_paint FAIL: failed to initialize browser Traceback (most recent call last): File "run_tests.py", line 559, in run_tests browser_dump, counter_dump, print_format = mytest.runTest(browser_config, test) File "c:\talos-slave\talos-data\talos\ttest.py", line 259, in runTest self.initializeProfile(profile_dir, browser_config) File "c:\talos-slave\talos-data\talos\ttest.py", line 133, in initializeProfile raise talosError("failed to initialize browser") talosError: 'failed to initialize browser' Traceback (most recent call last): File "run_tests.py", line 619, in ? main() File "run_tests.py", line 616, in main run_tests(parser.config) File "run_tests.py", line 570, in run_tests raise e utils.talosError: 'failed to initialize browser' program finished with exit code 1 elapsedTime=30.363000 TinderboxPrint:<a href = "http://hg.mozilla.org/try/rev/37ac87ba5bd8">rev:37ac87ba5bd8</a> TinderboxPrint:FAIL: Busted: ts_paint TinderboxPrint:FAIL: failed to initialize browser
For reference, a partial success case: talos-r3-w7-035: Started Tue, 08 May 2012 21:31:52 Running test ts_paint: Started Tue, 08 May 2012 21:31:52 No xperf providers options given __browserInfo browser_name:Firefox browser_version:15.0a1 buildID:20120508180000 __browserInfo__metrics Screen width/height:1024/768 colorDepth:24 Browser inner width/height: 1010/601 __metricsxperf: error: NT Kernel Logger: The instance name passed was not recognized as valid by a WMI data provider. (0x1069). in readfile: test.etl.csv.part, firefox xperf: error: test.etl: The specified path is invalid. (0x800700a1). xperf: warning: applying restriction of access for trace processing __xperf_data_begin__ filename, readcount, readbytes, writecount, writebytes __xperf_data_end__ __start_report1108__end_report Could not read chrome manifest 'file:///c:/talos-slave/talos-data/firefox/extensions/%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D/chrome.manifest'. [JavaScript Warning: "XUL box for _moz_generated_content_after element contained an inline #text child, forcing all its children to be wrapped in a block." {file: "chrome://browser/content/browser.xul" line: 0}] [JavaScript Warning: "Use of enablePrivilege is deprecated. Please use code that runs with the system principal (e.g. an extension) instead." {file: "file:///c:/talos-slave/talos-data/talos/scripts/MozillaFileLogger.js" line: 7}] __startTimestamp1336537946254__endTimestamp The trace you have just captured "c:\talos-slave\talos-data\talos\test.etl" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people. xperf: warning: applying restriction of access for trace processing Screen width/height:1024/768 colorDepth:24 Browser inner width/height: 1010/601 browser_name:Firefox browser_version:15.0a1 buildID:20120508180000 extending with xperf! NOISE: __start_report1108__end_report <snip/>
Try run for de17352965f0 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=de17352965f0 Results (out of 2 total builds): success: 1 failure: 1 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-de17352965f0
(In reply to Mozilla RelEng Bot from comment #54) > Try run for de17352965f0 is complete. > Detailed breakdown of the results available here: > https://tbpl.mozilla.org/?tree=Try&rev=de17352965f0 > Results (out of 2 total builds): > success: 1 > failure: 1 > Builds (or logs if builds failed) available at: > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla. > com-de17352965f0 Despite the abysmal failure, I do believe it worked: https://tbpl.mozilla.org/php/getParsedLog.php?id=11614477&tree=Try&full=1 This uses the full path to the talos.log preference to be the full path to the log. I'm still not sure why this would make any difference, but maybe there's some subtlty here Note that the failure mode (a url timeout) is different than the previous. I've seen this even running win32+xperf with basically no talos changes on try (only a few extra print/utils.noisy statements). I'm debugging seeing win32+xperf running at all with the old version. For whatever reason, it seems very tempermental on try.
> I'm debugging seeing win32+xperf running at all with the old version. For whatever reason, it seems very tempermental on try. e.g. see https://tbpl.mozilla.org/?tree=Try&rev=52b96d0e4a98 The only code changes are a few print and utils.noisy statements . I'm repushing with an even more bare-bones patch just for the sake of sanity
Really, this is handled somewhat badly, as browser_log and talos.logfile (ABICT) should always point to the same place. That said, this isn't any better in the existing code. I'm guessing we *should* always set the preference to point to browser_log regardless of anything else, or you could get in a state where they are not the same
Attachment #621754 - Attachment is obsolete: true
Attachment #622013 - Attachment is obsolete: true
Attachment #621754 - Flags: review?(jmaher)
Attachment #622515 - Flags: review?(jmaher)
(In reply to Jeff Hammel [:jhammel] from comment #56) > > I'm debugging seeing win32+xperf running at all with the old version. For whatever reason, it seems very tempermental on try. > > e.g. see https://tbpl.mozilla.org/?tree=Try&rev=52b96d0e4a98 > > The only code changes are a few print and utils.noisy statements . I'm > repushing with an even more bare-bones patch just for the sake of sanity https://tbpl.mozilla.org/php/getParsedLog.php?id=11618244&tree=Try&full=1 This patch is literally a single blank line added to run_tests.py after the shebang. Still getting a timeout when trying to post to autolog.
Attached patch ffsetup mods (deleted) — Splinter Review
In the process of diagnosing the xperf failure(s), I changed ffsetup.py a bit: - there was a useless level of indentation, now gone - now the failure case prints the regex that didn't match and the content that didn't match for debugging I don't know if we want to take this here, file a follow-up bug, or what, but thought I'd put it up for review anyway.
Attachment #622544 - Flags: review?(jmaher)
(In reply to Jeff Hammel [:jhammel] from comment #58) > (In reply to Jeff Hammel [:jhammel] from comment #56) > > > I'm debugging seeing win32+xperf running at all with the old version. For whatever reason, it seems very tempermental on try. > > > > e.g. see https://tbpl.mozilla.org/?tree=Try&rev=52b96d0e4a98 > > > > The only code changes are a few print and utils.noisy statements . I'm > > repushing with an even more bare-bones patch just for the sake of sanity > > https://tbpl.mozilla.org/php/getParsedLog.php?id=11618244&tree=Try&full=1 > > This patch is literally a single blank line added to run_tests.py after the > shebang. Still getting a timeout when trying to post to autolog. ....which I filed a bug about here: https://bugzilla.mozilla.org/show_bug.cgi?id=753577 But, it turns out that that is not the reason the run goes red, since bug 753577 is occuring in production xperf on every run for who knows how long and the run goes green there. Instead, the actual problem is https://bugzilla.mozilla.org/show_bug.cgi?id=752951#c6 This explains why xperf always fails on try runs even with no changes. I will make the change in https://bugzilla.mozilla.org/show_bug.cgi?id=752951#c8 as part of this patch for both expediency and to get some green.
Attached patch include fix for bug 752951 (obsolete) (deleted) — Splinter Review
No need to review both patches, i'm putting this fix here so my try runs can go green (I hope) and to ensure I didn't mess up anything. If we want to take the bug 752951 fix separately, I am fine with that too.
Attachment #622578 - Flags: review?(jmaher)
Need to take out the default of shutdown: False or this will override values from test.py
Attachment #622515 - Attachment is obsolete: true
Attachment #622578 - Attachment is obsolete: true
Attachment #622515 - Flags: review?(jmaher)
Attachment #622578 - Flags: review?(jmaher)
Attachment #622617 - Flags: review?(jmaher)
Try run for 73054caed4e0 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=73054caed4e0 Results (out of 83 total builds): success: 81 failure: 2 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-73054caed4e0
Comment on attachment 622544 [details] [diff] [review] ffsetup mods Review of attachment 622544 [details] [diff] [review]: ----------------------------------------------------------------- this looks good.
Attachment #622544 - Flags: review?(jmaher) → review+
Comment on attachment 622617 [details] [diff] [review] don't have a default shutdown value Review of attachment 622617 [details] [diff] [review]: ----------------------------------------------------------------- a lot of code, a good chance I missed something, but overall this looks good and it passes on try server! ::: talos/configuration.py @@ +76,5 @@ > + newfile = False > + try: > + self._write(f, config) > + finally: > + # XXX try: finally: works in python >= 2.5 does this work in 2.4 ? ::: talos/remote.config @@ +1,1 @@ > +# placeholder \ No newline at end of file can we get rid of this file? ::: talos/sample.config @@ +1,1 @@ > +# placeholder \ No newline at end of file do we even need this file anymore? ::: talos/test.py @@ +259,5 @@ > tpan, tzoom, trobopan, tcheckerboard, tprovider, tcheck2, > dromaeo_css, dromaeo_dom, dromaeo_jslib, dromaeo_sunspider, dromaeo_v8, dromaeo_basics, > a11y, > tdhtml_2, tsvg_2, tsvg_opacity_2, v8_2, tsspider_2, tscroll_2, a11y_2 > ] we might want to fix this in the future so remote only tests are not valid without specifying remote options (like ip address and webserver) ::: talos/ttest.py @@ +347,5 @@ > + process.kill() > + except OSError, e: > + if e.errno != 3: > + # 3 == No such process in Linux and Mac (errno.h) > + raise thanks. We could also use ffprocess.terminateProcess if we want to. I prefer this also.
Attachment #622617 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #65) > Comment on attachment 622617 [details] [diff] [review] > don't have a default shutdown value > > Review of attachment 622617 [details] [diff] [review]: > ----------------------------------------------------------------- > > a lot of code, a good chance I missed something, but overall this looks good > and it passes on try server! > > ::: talos/configuration.py > @@ +76,5 @@ > > + newfile = False > > + try: > > + self._write(f, config) > > + finally: > > + # XXX try: finally: works in python >= 2.5 > > does this work in 2.4 ? Good catch, I had forgotten about this. Will fix. > ::: talos/remote.config > @@ +1,1 @@ > > +# placeholder > \ No newline at end of file > > can we get rid of this file? > > ::: talos/sample.config > @@ +1,1 @@ > > +# placeholder > \ No newline at end of file > > do we even need this file anymore? AFAIK, we can get rid of sample.config as nothing in automation uses it. However, remote.config is actively used in automation so we need to keep it around :( I will file a tracking bug for all the things that should change regarding buildbot automation and add this to the list. > ::: talos/test.py > @@ +259,5 @@ > > tpan, tzoom, trobopan, tcheckerboard, tprovider, tcheck2, > > dromaeo_css, dromaeo_dom, dromaeo_jslib, dromaeo_sunspider, dromaeo_v8, dromaeo_basics, > > a11y, > > tdhtml_2, tsvg_2, tsvg_opacity_2, v8_2, tsspider_2, tscroll_2, a11y_2 > > ] > > we might want to fix this in the future so remote only tests are not valid > without specifying remote options (like ip address and webserver) Yeah, sounds like a plan. I'll file a follow-up bug. > ::: talos/ttest.py > @@ +347,5 @@ > > + process.kill() > > + except OSError, e: > > + if e.errno != 3: > > + # 3 == No such process in Linux and Mac (errno.h) > > + raise > > thanks. We could also use ffprocess.terminateProcess if we want to. I > prefer this also. Since we use subprocess directly and not the ffprocess front end, I'm not sure if this is cleanly doable. That said, we should probably start marking stuff that we want to get rid of for python > 2.4. The whole process thing should be much cleaner and this is yet another case where mozprocess would help us. I noticed a few "stnanks" when looking through the code that I'll file, e.g. ffprocess_mac.TerminateProcess
pushed: http://hg.mozilla.org/build/talos/rev/1680be7af3bf will file a bunch of follow-ups in a bit and close the other bugs i've fixed with this
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
No longer depends on: 747187
(In reply to Joel Maher (:jmaher) from comment #64) > Comment on attachment 622544 [details] [diff] [review] > ffsetup mods > > Review of attachment 622544 [details] [diff] [review]: > ----------------------------------------------------------------- > > this looks good. pushed: http://hg.mozilla.org/build/talos/rev/488bc187a3ef
Follow-ups filed: bug 754075 bug 754073 bug 754063 (amonst other less-related bugs). There will be more coming, but this is the direct fallout
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: