Closed
Bug 1121571
Opened 9 years ago
Closed 9 years ago
Improve Talos profiling
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mstange, Assigned: mstange)
References
Details
Attachments
(19 files, 19 obsolete files)
No description provided.
Assignee | ||
Comment 1•9 years ago
|
||
The problems with the current talos profiling setup are: - We dump exactly one profile per browser run, which contains all tested pages. It would be much better to have a separate profile per tested page. - Profiles don't have high accuracy. Since each profile spans such a large time frame (the whole browser run), the sampling interval is often set to 10ms so that the resulting profile file still has a manageable size. A lower sampling interval would result in much more useful profiles. - Many resulting profile files are truncated because we dump the profile during shutdown and kill the browser if shutdown takes more than 5 seconds. - Profiles aren't symbolicated, they contain raw addresses. - Many tests don't pause/resume the profiler properly, so the resulting profile contains stuff that Talos didn't measure. - The Compositor thread isn't profiled.
Assignee | ||
Comment 2•9 years ago
|
||
Attachment #8549221 -
Flags: review?(jmaher)
Assignee | ||
Comment 3•9 years ago
|
||
Attachment #8549222 -
Flags: review?(jmaher)
Assignee | ||
Comment 4•9 years ago
|
||
Attachment #8549223 -
Flags: review?(jmaher)
Assignee | ||
Comment 5•9 years ago
|
||
Attachment #8549224 -
Flags: review?(jmaher)
Assignee | ||
Comment 6•9 years ago
|
||
Attachment #8549225 -
Flags: review?(jmaher)
Assignee | ||
Comment 7•9 years ago
|
||
Attachment #8549226 -
Flags: review?(jmaher)
Assignee | ||
Comment 8•9 years ago
|
||
Attachment #8549227 -
Flags: review?(jmaher)
Assignee | ||
Comment 9•9 years ago
|
||
Attachment #8549229 -
Flags: review?(jmaher)
Assignee | ||
Comment 10•9 years ago
|
||
Attachment #8549230 -
Flags: review?(jmaher)
Assignee | ||
Comment 11•9 years ago
|
||
Attachment #8549231 -
Flags: review?(jmaher)
Assignee | ||
Comment 12•9 years ago
|
||
Attachment #8549232 -
Flags: review?(jmaher)
Assignee | ||
Comment 13•9 years ago
|
||
Attachment #8549233 -
Flags: review?(jmaher)
Assignee | ||
Comment 14•9 years ago
|
||
Attachment #8549234 -
Flags: review?(jmaher)
Assignee | ||
Comment 15•9 years ago
|
||
Attachment #8549235 -
Flags: review?(jmaher)
Assignee | ||
Comment 16•9 years ago
|
||
Attachment #8549236 -
Flags: review?(jmaher)
Assignee | ||
Comment 17•9 years ago
|
||
I've fired off a try push here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8e8db0eec687 Previous pushes with older patch sets: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=b54a23073798 https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=020ae04cbd73 (Windows 8 red is because unwinding on Windows 64 bit is broken at the moment)
Assignee | ||
Comment 18•9 years ago
|
||
Let's try that again: https://treeherder.mozilla.org/#/jobs?repo=try&revision=af8dbeccc933
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #17) > (Windows 8 red is because unwinding on Windows 64 bit is broken at the > moment) ... and now it has been fixed by bug 1120126! W00t!
Depends on: 1120126
Comment 20•9 years ago
|
||
Comment on attachment 8549221 [details] [diff] [review] 01-Bug_1121571___Explicitly_call_out_shutdown_timeout_and_print_a_message_for_shutdown_termination_.diff Review of attachment 8549221 [details] [diff] [review]: ----------------------------------------------------------------- thanks!
Attachment #8549221 -
Flags: review?(jmaher) → review+
Comment 21•9 years ago
|
||
Comment on attachment 8549222 [details] [diff] [review] 02-Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_and_change_the_profile_zip_layout_.diff Review of attachment 8549222 [details] [diff] [review]: ----------------------------------------------------------------- it appears we are not turning on the profiler, or we don't have a need for these variables. ::: talos/ttest.py @@ +329,5 @@ > + sps_profile_dir = tempfile.mkdtemp(); > + > + sps_profile_interval = test_config.get('sps_profile_interval', 1) > + sps_profile_entries = test_config.get('sps_profile_entries', 1000000) > + sps_profile_threads = 'GeckoMain,Compositor' do we ever use these variables? Before we applied them to environment variables which are removed with this patch and now not used.
Attachment #8549222 -
Flags: review?(jmaher) → review-
Comment 22•9 years ago
|
||
Comment on attachment 8549223 [details] [diff] [review] 03-Bug_1121571___Add_Profiler_js_capabilities_.diff Review of attachment 8549223 [details] [diff] [review]: ----------------------------------------------------------------- I would like to ensure we safeguard these values. ::: talos/page_load_test/tart/addon/content/Profiler.js @@ +74,5 @@ > + return; > + } > + profiler_interval = obj.sps_profile_interval; > + profiler_entries = obj.sps_profile_entries; > + profiler_threadsArray = obj.sps_profile_threads.split(","); what happens if these values don't exist? also we are doing a split on sps_profile_threads, so that assumes it exists and it is a string.
Attachment #8549223 -
Flags: review?(jmaher) → review-
Comment 23•9 years ago
|
||
Comment on attachment 8549224 [details] [diff] [review] 04-Bug_1121571___Pass_profiling_information_to_pageloader_and_make_pageloader_dump_profiles_.diff Review of attachment 8549224 [details] [diff] [review]: ----------------------------------------------------------------- keep in mind pageloader.js was just cleaned up yesterday (2015-01-14) to run through e10s code paths all the time and remove regular code paths. I am not sure if that will affect this patch for bitrot or functionality.
Attachment #8549224 -
Flags: review?(jmaher) → review+
Comment 24•9 years ago
|
||
Comment on attachment 8549222 [details] [diff] [review] 02-Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_and_change_the_profile_zip_layout_.diff Review of attachment 8549222 [details] [diff] [review]: ----------------------------------------------------------------- I see how you use these in a future patch. I would have preferred that you put it all in the same patch, but as it stands, this is just fine.
Attachment #8549222 -
Flags: review- → review+
Comment 25•9 years ago
|
||
Comment on attachment 8549225 [details] [diff] [review] 05-Bug_1121571___Pass_profiling_info_to_non_pageloader_tests_via_URL_params_.diff Review of attachment 8549225 [details] [diff] [review]: ----------------------------------------------------------------- looks good!
Attachment #8549225 -
Flags: review?(jmaher) → review+
Comment 26•9 years ago
|
||
Comment on attachment 8549226 [details] [diff] [review] 06-Bug_1121571___Fix_tp5o_scroll_profiling_.diff Review of attachment 8549226 [details] [diff] [review]: ----------------------------------------------------------------- thanks. ::: talos/pageloader/chrome/pageloader.js @@ +209,5 @@ > " }" + > "} " + > "addEventListener('load', _contentLoadHandler, true); "; > content.selectedBrowser.messageManager.loadFrameScript(contentScript, false, true); > + content.selectedBrowser.messageManager.loadFrameScript("chrome://pageloader/content/Profiler.js", true, true); our other scripts are loaded as "false, true", but this is loaded with "true, true". Is there a reason for this?
Attachment #8549226 -
Flags: review?(jmaher) → review+
Comment 27•9 years ago
|
||
Comment on attachment 8549227 [details] [diff] [review] 07-Bug_1121571___Fix_tresize_profiling_.diff Review of attachment 8549227 [details] [diff] [review]: ----------------------------------------------------------------- a few nits below, r+ as they are minor. ::: talos/generate-tresize-xpi.html @@ +15,5 @@ > var files = [ > "chrome.manifest", > "install.rdf", > "content/framescript.js", > + "content/Profiler.js" you are missing a trailing , here ::: talos/startup_test/tresize/addon/content/tresize.overlay.xul @@ +5,1 @@ > <script type="application/x-javascript" src="tresize.js" /> this specifically calls out language="Javascript" and add a </script> tag, this is differethen than the other <script...> tags in this file, why?
Attachment #8549227 -
Flags: review?(jmaher) → review+
Comment 28•9 years ago
|
||
Comment on attachment 8549229 [details] [diff] [review] 08-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff Review of attachment 8549229 [details] [diff] [review]: ----------------------------------------------------------------- r- as I would like to learn more about the url parameters vs environment variables for sps profiling knobs. ::: talos/startup_test/sessionrestore/main.js @@ +15,2 @@ > > + var duration = startup_info.sessionRestored - startup_info.sessionRestoreInit; is there any concern of altering these numbers in general while profiling is not enabled? ::: talos/ttest.py @@ +357,5 @@ > + 'MOZ_PROFILER_STARTUP': '1', > + 'MOZ_PROFILER_INTERVAL': str(sps_profile_interval), > + 'MOZ_PROFILER_ENTRIES': str(sps_profile_entries), > + "MOZ_PROFILER_THREADS": str(sps_profile_threads) > + } is there an advantage of doing this vs sending params into pageloader? This is starting to get confusing.
Attachment #8549229 -
Flags: review?(jmaher) → review-
Updated•9 years ago
|
Attachment #8549230 -
Flags: review?(jmaher) → review+
Comment 29•9 years ago
|
||
Comment on attachment 8549231 [details] [diff] [review] 10-Bug_1121571___Print_some_debug_messages_.diff Review of attachment 8549231 [details] [diff] [review]: ----------------------------------------------------------------- ::: talos/ttest.py @@ +417,5 @@ > profile_dir, > test_config['url'], > profiling_info=profiling_info) > > + utils.info("command_args: {0}".format(command_args)) I would like to see this be a bit more specific, maybe: utils.info("browser command line args: ...")
Attachment #8549231 -
Flags: review?(jmaher) → review+
Comment 30•9 years ago
|
||
Comment on attachment 8549232 [details] [diff] [review] 11-Bug_1121571___Increase_the_timeout_when_profiling_.diff Review of attachment 8549232 [details] [diff] [review]: ----------------------------------------------------------------- ::: talos/ttest.py @@ +411,5 @@ > # Run the test > timeout = test_config.get('timeout', 7200) # 2 hours default > + if sps_profile: > + # When profiling, give the browser plenty of extra time to dump the profile. > + timeout += 3600 an extra hour? Why not an extra 5 minutes?
Attachment #8549232 -
Flags: review?(jmaher) → review-
Comment 31•9 years ago
|
||
Comment on attachment 8549233 [details] [diff] [review] 12-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests_.diff Review of attachment 8549233 [details] [diff] [review]: ----------------------------------------------------------------- ::: talos/page_load_test/v8_7/base.js @@ +212,5 @@ > // average time it takes to run a single iteration. > BenchmarkSuite.prototype.RunSingleBenchmark = function(benchmark, data) { > function Measure(data) { > var elapsed = 0; > + Profiler.resume(benchmark.name); are you sure this 'benchmark.name' will name sense?
Attachment #8549233 -
Flags: review?(jmaher) → review+
Comment 32•9 years ago
|
||
Comment on attachment 8549234 [details] [diff] [review] 13-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff Review of attachment 8549234 [details] [diff] [review]: ----------------------------------------------------------------- is there anyway we can have sym*.py as a published python module, then we don't have to add the code. At the very least this looks as if it has existed elsewhere for a while- we should reference where it comes from. also for sps.py, please make it 4 space indentation. Quite possibly sps.py comes from the same source as the other sym*.py files, maybe it can be rolled into a package as well? ::: talos/sps.py @@ +1,4 @@ > +try: > + import simplejson as json > +except ImportError: > + import json now that we run on python 2.7, this isn't needed anymore @@ +15,5 @@ > + if startMeasurementMarker in marker["name"]: > + in_measurement = True > + if stopMeasurementMarker in marker["name"]: > + in_measurement = False > + if True or in_measurement: this if statement seems odd, why not just "if in_measurement" ? ::: talos/symbolicationRequest.py @@ +101,5 @@ > + return > + self.forwardCount = rawRequests["forwarded"] > + > + # Only used for compatibility with older clients. > + # TODO: Remove after June 2013. I assume this code is copied from somewhere, this could have been removed 1.5 years ago! ::: talos/ttest.py @@ +541,5 @@ > + symbolicator = symbolication.ProfileSymbolicator({ > + # Trace-level logging (verbose) > + "enableTracing": 0, > + # Fallback server if symbol is not found locally > + "remoteSymbolServer": "http://symbolapi.mozilla.org:80/", I don't know if we will have access to this from the talos machine
Attachment #8549234 -
Flags: review?(jmaher) → review-
Comment 33•9 years ago
|
||
Comment on attachment 8549235 [details] [diff] [review] 14-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X_.diff Review of attachment 8549235 [details] [diff] [review]: ----------------------------------------------------------------- the ttest.py changes are file, I see what you are doing in symbolication.py, although my previous review to have it in a separate package would be preferred.
Attachment #8549235 -
Flags: review?(jmaher) → review+
Comment 34•9 years ago
|
||
Comment on attachment 8549236 [details] [diff] [review] 15-Bug_1121571___Don_t_resume_profiling_at_the_end_of_TART_.diff Review of attachment 8549236 [details] [diff] [review]: ----------------------------------------------------------------- thanks!
Attachment #8549236 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 35•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #22) > Comment on attachment 8549223 [details] [diff] [review] > 03-Bug_1121571___Add_Profiler_js_capabilities_.diff > > Review of attachment 8549223 [details] [diff] [review]: > ----------------------------------------------------------------- > > I would like to ensure we safeguard these values. > > ::: talos/page_load_test/tart/addon/content/Profiler.js > @@ +74,5 @@ > > + return; > > + } > > + profiler_interval = obj.sps_profile_interval; > > + profiler_entries = obj.sps_profile_entries; > > + profiler_threadsArray = obj.sps_profile_threads.split(","); > > what happens if these values don't exist? also we are doing a split on > sps_profile_threads, so that assumes it exists and it is a string. Good point, I'll add some validation. (In reply to Joel Maher (:jmaher) from comment #23) > Comment on attachment 8549224 [details] [diff] [review] > 04- > Bug_1121571___Pass_profiling_information_to_pageloader_and_make_pageloader_du > mp_profiles_.diff > > Review of attachment 8549224 [details] [diff] [review]: > ----------------------------------------------------------------- > > keep in mind pageloader.js was just cleaned up yesterday (2015-01-14) to run > through e10s code paths all the time and remove regular code paths. I am > not sure if that will affect this patch for bitrot or functionality. The patch is already based on that updated version, and everything's working correctly. :-) (In reply to Joel Maher (:jmaher) from comment #24) > Comment on attachment 8549222 [details] [diff] [review] > 02- > Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_an > d_change_the_profile_zip_layout_.diff > > Review of attachment 8549222 [details] [diff] [review]: > ----------------------------------------------------------------- > > I see how you use these in a future patch. I would have preferred that you > put it all in the same patch, but as it stands, this is just fine. Sorry, yeah, should have reshuffled a little. (In reply to Joel Maher (:jmaher) from comment #26) > Comment on attachment 8549226 [details] [diff] [review] > 06-Bug_1121571___Fix_tp5o_scroll_profiling_.diff > > Review of attachment 8549226 [details] [diff] [review]: > ----------------------------------------------------------------- > > thanks. > > ::: talos/pageloader/chrome/pageloader.js > @@ +209,5 @@ > > " }" + > > "} " + > > "addEventListener('load', _contentLoadHandler, true); "; > > content.selectedBrowser.messageManager.loadFrameScript(contentScript, false, true); > > + content.selectedBrowser.messageManager.loadFrameScript("chrome://pageloader/content/Profiler.js", true, true); > > our other scripts are loaded as "false, true", but this is loaded with > "true, true". Is there a reason for this? Hmm, I must have misunderstood the meaning of that argument when I wrote this part. It should work just fine with false. I'll change it. (The docs say: "If true, this flag means that the frame script will be loaded into any new frames opened after the loadFrameScript() call, until removeDelayedFrameScript() is called for that script.") (In reply to Joel Maher (:jmaher) from comment #27) > Comment on attachment 8549227 [details] [diff] [review] > 07-Bug_1121571___Fix_tresize_profiling_.diff > > Review of attachment 8549227 [details] [diff] [review]: > ----------------------------------------------------------------- > > a few nits below, r+ as they are minor. > > ::: talos/generate-tresize-xpi.html > @@ +15,5 @@ > > var files = [ > > "chrome.manifest", > > "install.rdf", > > "content/framescript.js", > > + "content/Profiler.js" > > you are missing a trailing , here will fix > > ::: talos/startup_test/tresize/addon/content/tresize.overlay.xul > @@ +5,1 @@ > > <script type="application/x-javascript" src="tresize.js" /> > > this specifically calls out language="Javascript" and add a </script> tag, > this is differethen than the other <script...> tags in this file, why? I must have copied it from somewhere else. But you're right, since this is a xul file, we can use a self-closing tag (<script/>) here. I'll update it to match the other one. (In reply to Joel Maher (:jmaher) from comment #28) > Comment on attachment 8549229 [details] [diff] [review] > 08-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff > > Review of attachment 8549229 [details] [diff] [review]: > ----------------------------------------------------------------- > > r- as I would like to learn more about the url parameters vs environment > variables for sps profiling knobs. > > ::: talos/startup_test/sessionrestore/main.js > @@ +15,2 @@ > > > > + var duration = startup_info.sessionRestored - startup_info.sessionRestoreInit; > > is there any concern of altering these numbers in general while profiling is > not enabled? Actually, let me split up this patch into one that only shuffles stuff around and one that adds the profiling infrastructure. I think my changes won't change any numbers but I'd like Yoric to confirm. > > ::: talos/ttest.py > @@ +357,5 @@ > > + 'MOZ_PROFILER_STARTUP': '1', > > + 'MOZ_PROFILER_INTERVAL': str(sps_profile_interval), > > + 'MOZ_PROFILER_ENTRIES': str(sps_profile_entries), > > + "MOZ_PROFILER_THREADS": str(sps_profile_threads) > > + } > > is there an advantage of doing this vs sending params into pageloader? This > is starting to get confusing. Yes, it really is confusing. I'll add a comment. These environment variables are *not* read by talos JS code. Talos JS code doesn't get a chance to run until we're almost finished starting up the browser. Instead, they're read by Gecko code that runs extremely early during startup, and they will cause Gecko to start profiling as soon as it can, so that we'll get a profile that contains as much of startup as possible. (In reply to Joel Maher (:jmaher) from comment #30) > Comment on attachment 8549232 [details] [diff] [review] > 11-Bug_1121571___Increase_the_timeout_when_profiling_.diff > > Review of attachment 8549232 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: talos/ttest.py > @@ +411,5 @@ > > # Run the test > > timeout = test_config.get('timeout', 7200) # 2 hours default > > + if sps_profile: > > + # When profiling, give the browser plenty of extra time to dump the profile. > > + timeout += 3600 > > an extra hour? Why not an extra 5 minutes? I'll have a look through the logs, but 5 minutes are almost certainly not enough. I'll come back to this patch when I have more evidence. (In reply to Joel Maher (:jmaher) from comment #31) > Comment on attachment 8549233 [details] [diff] [review] > 12-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests_.diff > > Review of attachment 8549233 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: talos/page_load_test/v8_7/base.js > @@ +212,5 @@ > > // average time it takes to run a single iteration. > > BenchmarkSuite.prototype.RunSingleBenchmark = function(benchmark, data) { > > function Measure(data) { > > var elapsed = 0; > > + Profiler.resume(benchmark.name); > > are you sure this 'benchmark.name' will name sense? I'll add a sanity check, but this worked without throwing any errors, as far as I can tell. (Looks like Linux is the only platform that ran V8 in my try push, and the profile isn't big enough to contain the complete run...) (In reply to Joel Maher (:jmaher) from comment #32) > Comment on attachment 8549234 [details] [diff] [review] > 13-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff > > Review of attachment 8549234 [details] [diff] [review]: > ----------------------------------------------------------------- > > is there anyway we can have sym*.py as a published python module, then we > don't have to add the code. At the very least this looks as if it has > existed elsewhere for a while- we should reference where it comes from. > > also for sps.py, please make it 4 space indentation. Quite possibly sps.py > comes from the same source as the other sym*.py files, maybe it can be > rolled into a package as well? > > ::: talos/sps.py > @@ +1,4 @@ > > +try: > > + import simplejson as json > > +except ImportError: > > + import json > > now that we run on python 2.7, this isn't needed anymore What should I replace it with? Just "import json"? Because "import simplejson" didn't work. > @@ +15,5 @@ > > + if startMeasurementMarker in marker["name"]: > > + in_measurement = True > > + if stopMeasurementMarker in marker["name"]: > > + in_measurement = False > > + if True or in_measurement: > > this if statement seems odd, why not just "if in_measurement" ? Oops, this is a leftover from an earlier experiment apparently. > ::: talos/symbolicationRequest.py > @@ +101,5 @@ > > + return > > + self.forwardCount = rawRequests["forwarded"] > > + > > + # Only used for compatibility with older clients. > > + # TODO: Remove after June 2013. > > I assume this code is copied from somewhere, this could have been removed > 1.5 years ago! Indeed! I'll update the original code and import the updated version. > ::: talos/ttest.py > @@ +541,5 @@ > > + symbolicator = symbolication.ProfileSymbolicator({ > > + # Trace-level logging (verbose) > > + "enableTracing": 0, > > + # Fallback server if symbol is not found locally > > + "remoteSymbolServer": "http://symbolapi.mozilla.org:80/", > > I don't know if we will have access to this from the talos machine It worked in my test runs. I was surprised, too! Should I do anything to make sure that it continues to accessable? (In reply to Joel Maher (:jmaher) from comment #33) > Comment on attachment 8549235 [details] [diff] [review] > 14-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X_.diff > > Review of attachment 8549235 [details] [diff] [review]: > ----------------------------------------------------------------- > > the ttest.py changes are file, I see what you are doing in symbolication.py, > although my previous review to have it in a separate package would be > preferred. I'll move the symbolication files into a profiler/ subdirectory, as discussed on IRC.
Comment 36•9 years ago
|
||
lets just use 'import json', thanks for the rest of the comments, they all look good!
Assignee | ||
Comment 37•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #29) > Comment on attachment 8549231 [details] [diff] [review] > 10-Bug_1121571___Print_some_debug_messages_.diff > > Review of attachment 8549231 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: talos/ttest.py > @@ +417,5 @@ > > profile_dir, > > test_config['url'], > > profiling_info=profiling_info) > > > > + utils.info("command_args: {0}".format(command_args)) > > I would like to see this be a bit more specific, maybe: > utils.info("browser command line args: ...") Actually, utils.GenerateBrowserCommandLine already prints this command line, but at the "debug" log level, which I hadn't enabled, so I didn't see it. I'll just remove this change.
Assignee | ||
Comment 38•9 years ago
|
||
> (In reply to Joel Maher (:jmaher) from comment #31)
> > Comment on attachment 8549233 [details] [diff] [review]
> > 12-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests_.diff
> >
> > Review of attachment 8549233 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > ::: talos/page_load_test/v8_7/base.js
> > @@ +212,5 @@
> > > // average time it takes to run a single iteration.
> > > BenchmarkSuite.prototype.RunSingleBenchmark = function(benchmark, data) {
> > > function Measure(data) {
> > > var elapsed = 0;
> > > + Profiler.resume(benchmark.name);
> >
> > are you sure this 'benchmark.name' will name sense?
>
> I'll add a sanity check, but this worked without throwing any errors, as far
> as I can tell.
Actually, I'll leave it as is. There's a Benchmark "class" at the top of that file, and that explicitly contains a name field, and I'm pretty sure all variables with the name "benchmark" in that file are an instance of Benchmark.
Assignee | ||
Comment 39•9 years ago
|
||
I've shuffled some variables around and made the profile zip end in ".sps.zip".
Attachment #8549222 -
Attachment is obsolete: true
Assignee | ||
Comment 40•9 years ago
|
||
Now with added type checks.
Attachment #8549223 -
Attachment is obsolete: true
Attachment #8550515 -
Flags: review?(jmaher)
Assignee | ||
Comment 41•9 years ago
|
||
unchanged, afaik
Assignee | ||
Comment 42•9 years ago
|
||
also unchanged
Assignee | ||
Comment 43•9 years ago
|
||
with ", false, true);"
Assignee | ||
Updated•9 years ago
|
Attachment #8549224 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8549225 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8549226 -
Attachment is obsolete: true
Assignee | ||
Comment 44•9 years ago
|
||
added the missing comma and changed the <script/> declaration
Attachment #8549227 -
Attachment is obsolete: true
Comment 45•9 years ago
|
||
Comment on attachment 8550515 [details] [diff] [review] 03-Bug_1121571___Add_Profiler_js_capabilities_.diff Review of attachment 8550515 [details] [diff] [review]: ----------------------------------------------------------------- ::: talos/page_load_test/tart/addon/content/Profiler.js @@ +52,5 @@ > + > + for (var i in pairs) { > + if (pairs[i] !== "") { > + var pair = pairs[i].split("="); > + result[decodeURIComponent(pair[0])] = decodeURIComponent(pair[1]); here you assume that pair has a length of 2, most likely it will, either fix it or add a comment
Attachment #8550515 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 46•9 years ago
|
||
This patch does two things: 1. It makes us not wait for the "load" event. This was just an unneeded delay, as far as I can tell. 2. It moves the setTimeout(..., 0) into the finish() function so that we have a place in the finish() function that is hit as soon after "sessionstore-windows-restored" as possible. It shouldn't change any numbers. David, do you agree?
Attachment #8550521 -
Flags: review?(dteller)
Assignee | ||
Comment 47•9 years ago
|
||
Just the profiler changes from the old part 8, this time.
Attachment #8550522 -
Flags: review?(jmaher)
Assignee | ||
Comment 48•9 years ago
|
||
I had to increase the profile size for v8 and I also increased the resolution for tresize a little.
Attachment #8549229 -
Attachment is obsolete: true
Attachment #8549230 -
Attachment is obsolete: true
Assignee | ||
Comment 49•9 years ago
|
||
Print just the "activating profiling" message.
Assignee | ||
Comment 50•9 years ago
|
||
Just five minutes this time. The longer timeouts I was seeing are likely caused by deadlocks in the profiler that hang the browser completely. I still need to find out how to fix those.
Attachment #8549231 -
Attachment is obsolete: true
Attachment #8549232 -
Attachment is obsolete: true
Attachment #8550527 -
Flags: review?(jmaher)
Assignee | ||
Comment 52•9 years ago
|
||
I've moved the imported files to a "profiler" subdirectory. I've also simplified the imported files a little, and removed functions we don't use. I've kept the indentation as-is. I was seeing out-of-memory errors during profile symbolication (specifically in the "profile = json.load(profile_file)" line), so I'm now catching the MemoryError exception.
Attachment #8549234 -
Attachment is obsolete: true
Attachment #8550529 -
Flags: review?(jmaher)
Assignee | ||
Comment 53•9 years ago
|
||
Attachment #8549235 -
Attachment is obsolete: true
Assignee | ||
Comment 54•9 years ago
|
||
Attachment #8550531 -
Attachment is obsolete: true
Assignee | ||
Comment 55•9 years ago
|
||
Attachment #8549236 -
Attachment is obsolete: true
Assignee | ||
Comment 56•9 years ago
|
||
This error occurred in some of the Windows runs: "The process cannot access the file because it is being used by another process: 'c:\\docume~1\\cltbld~1.t-x\\locals~1\\temp\\tmpv0wi28\\page_0.sps'" It happened in the "d" and "s" tests of this try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a819e4547e25 I don't know what's happening here. The browser is the only process that knows of that file, and it should absolutely be closed by now. Maybe you know what's going on?
Attachment #8550535 -
Flags: review?(jmaher)
Assignee | ||
Comment 57•9 years ago
|
||
There were a few tp timeouts, and I still haven't figured out their cause. In any case, dumping profiles on timeout will make this easier to track down. Here's an example log (search for "__FAILTimeout in tp5n__FAIL"): http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.com-44c6445d8121/try-macosx64/try_mountainlion_test-tp5o-bm108-tests1-macosx-build0.txt.gz
Attachment #8550536 -
Flags: review?(jmaher)
Assignee | ||
Comment 58•9 years ago
|
||
This might have unintended side effects. Any ideas?
Attachment #8550537 -
Flags: review?(jmaher)
Assignee | ||
Comment 59•9 years ago
|
||
New try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7b17651136c
Assignee | ||
Comment 60•9 years ago
|
||
Comment on attachment 8550535 [details] [diff] [review] 17-Bug_1121571___Retry_profile_rmtree_for_five_minutes_.diff This didn't fix the issue: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7b17651136c
Attachment #8550535 -
Flags: review?(jmaher)
Comment 61•9 years ago
|
||
Comment on attachment 8550537 [details] [diff] [review] 19-Bug_1121571___Continue_even_if_test_results_add_throws_an_error_.diff Review of attachment 8550537 [details] [diff] [review]: ----------------------------------------------------------------- a few random chances of failure in the future, but if we get a failed case to start with the job will be red/orange anyway, so forcing us to collect more information is a win.
Attachment #8550537 -
Flags: review?(jmaher) → review+
Comment 62•9 years ago
|
||
Comment on attachment 8550536 [details] [diff] [review] 18-Bug_1121571___Also_dump_the_profile_from_pageloader_if_it_times_out_.diff Review of attachment 8550536 [details] [diff] [review]: ----------------------------------------------------------------- good find.
Attachment #8550536 -
Flags: review?(jmaher) → review+
Comment 63•9 years ago
|
||
Comment on attachment 8550535 [details] [diff] [review] 17-Bug_1121571___Retry_profile_rmtree_for_five_minutes_.diff Review of attachment 8550535 [details] [diff] [review]: ----------------------------------------------------------------- I would like to see this done in general and address my question. ::: talos/ttest.py @@ +622,5 @@ > + # For some reason, on Windows, big profiles are sometimes locked > + # by another process even after all Firefox processes have been > + # terminated. Allow up to 5 minutes for the file lock to be > + # released. > + utils.rmtree_until_timeout(sps_profile_dir, 5 * 60) fwiw, we see this on windows in general, could you apply this to profile cleanup overall? ::: talos/utils.py @@ +279,5 @@ > + return > + except Exception as e: > + if elapsed >= timeout: > + raise e > + time.sleep(1) this assumes that shutl.rmtree will hit an exception in a few milliseconds, isn't it possible that we could hit the exception after a few seconds due to a locked file or some type of timeout?
Attachment #8550535 -
Flags: review-
Comment 64•9 years ago
|
||
Comment on attachment 8550529 [details] [diff] [review] 14-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff Review of attachment 8550529 [details] [diff] [review]: ----------------------------------------------------------------- I would like some indication of the root of these files and a mpl at the top. Ideally they would live in a profiling/, sps/, or the existing scripts/ subdir instead of the root dir. If they were in their own subdir, it would be easy to create a readme.txt which could point to the github root. I am fine with the files, not with the locations- a few nits below. ::: talos/profiler/symFileManager.py @@ +201,5 @@ > + #print "Can't fit " + pdbName + "/" + pdbId > + break > + fetchedSymbols[(pdbName, pdbId)] = symbolInfo > + fetchedCount += symbolInfo.GetEntryCount() > + #print "fetchedCount = " + str(fetchedCount) + " after " + pdbName + "/" + pdbId we have a lot of print statements commented out, can we remove these? ::: talos/ttest.py @@ +590,5 @@ > + except MemoryError as e: > + utils.info("Ran out of memory while trying to symbolicate profile") > + except Exception as e: > + utils.info(e) > + utils.info("Encountered an exception during profile symbolication") could we be more explicit about which test we were not able to get sps profiles for in the error message?
Attachment #8550529 -
Flags: review?(jmaher) → review-
Comment 65•9 years ago
|
||
Comment on attachment 8550527 [details] [diff] [review] 12-Bug_1121571___Increase_the_timeout_when_profiling_.diff Review of attachment 8550527 [details] [diff] [review]: ----------------------------------------------------------------- thanks!
Attachment #8550527 -
Flags: review?(jmaher) → review+
Comment 66•9 years ago
|
||
Comment on attachment 8550522 [details] [diff] [review] 09-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff Review of attachment 8550522 [details] [diff] [review]: ----------------------------------------------------------------- thanks!
Attachment #8550522 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 67•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #63) > Comment on attachment 8550535 [details] [diff] [review] > 17-Bug_1121571___Retry_profile_rmtree_for_five_minutes_.diff > ::: talos/ttest.py > @@ +622,5 @@ > > + # For some reason, on Windows, big profiles are sometimes locked > > + # by another process even after all Firefox processes have been > > + # terminated. Allow up to 5 minutes for the file lock to be > > + # released. > > + utils.rmtree_until_timeout(sps_profile_dir, 5 * 60) > > fwiw, we see this on windows in general, could you apply this to profile > cleanup overall? Sure. Not sure it's going to work, though. > ::: talos/utils.py > @@ +279,5 @@ > > + return > > + except Exception as e: > > + if elapsed >= timeout: > > + raise e > > + time.sleep(1) > > this assumes that shutl.rmtree will hit an exception in a few milliseconds, time.sleep accepts seconds, not milliseconds. I'll add a comment about that. Does that clear up the confusion, or is there something else that's unclear or that I'm missing? The loop doesn't really care how long rmtree takes until it throws an exception. It waits for it to either succeed or throw, no matter how long it takes to do that. And if it throws, we discard the exception, until we've seen |timeout| many exceptions. And we wait for one second between tries. > isn't it possible that we could hit the exception after a few seconds due to > a locked file or some type of timeout? Apparently. But in my latest push, a timeout of 5 minutes didn't help. I'll try increasing the timeout, maybe I just haven't waited long enough.
Assignee | ||
Comment 68•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #64) > Comment on attachment 8550529 [details] [diff] [review] > 14-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff > > Review of attachment 8550529 [details] [diff] [review]: > ----------------------------------------------------------------- > > I would like some indication of the root of these files and a mpl at the > top. Good point. > Ideally they would live in a profiling/, sps/, or the existing > scripts/ subdir instead of the root dir. They're in the "profiler" dir in this patch, not in the root dir. > If they were in their own subdir, > it would be easy to create a readme.txt which could point to the github root. I'll add one. > ::: talos/profiler/symFileManager.py > @@ +201,5 @@ > > + #print "Can't fit " + pdbName + "/" + pdbId > > + break > > + fetchedSymbols[(pdbName, pdbId)] = symbolInfo > > + fetchedCount += symbolInfo.GetEntryCount() > > + #print "fetchedCount = " + str(fetchedCount) + " after " + pdbName + "/" + pdbId > > we have a lot of print statements commented out, can we remove these? Sure. > ::: talos/ttest.py > @@ +590,5 @@ > > + except MemoryError as e: > > + utils.info("Ran out of memory while trying to symbolicate profile") > > + except Exception as e: > > + utils.info(e) > > + utils.info("Encountered an exception during profile symbolication") > > could we be more explicit about which test we were not able to get sps > profiles for in the error message? Good idea.
Comment on attachment 8550521 [details] [diff] [review] 08-Bug_1121571___Call_finish___immediately_when_sessionstore_windows_restored_fires_.diff Review of attachment 8550521 [details] [diff] [review]: ----------------------------------------------------------------- I don't see any reason why this wouldn't work. So r=me.
Attachment #8550521 -
Flags: review?(dteller) → review+
Assignee | ||
Comment 70•9 years ago
|
||
Attachment #8550515 -
Attachment is obsolete: true
Assignee | ||
Comment 71•9 years ago
|
||
Attachment #8550529 -
Attachment is obsolete: true
Attachment #8551437 -
Flags: review?(jmaher)
Assignee | ||
Comment 72•9 years ago
|
||
Attachment #8550532 -
Attachment is obsolete: true
Assignee | ||
Comment 73•9 years ago
|
||
Attachment #8550535 -
Attachment is obsolete: true
Attachment #8551439 -
Flags: review?(jmaher)
Assignee | ||
Comment 74•9 years ago
|
||
New push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a40998427912 This one has an updated mozilla-central base. I hope it won't hit bug 1122947 during the Talos runs.
Updated•9 years ago
|
Attachment #8551437 -
Flags: review?(jmaher) → review+
Comment 75•9 years ago
|
||
Comment on attachment 8551439 [details] [diff] [review] 17-Bug_1121571___Retry_rmtree_for_twenty_minutes__in_case_the_file_is_still_locked_by_another_process_.diff Review of attachment 8551439 [details] [diff] [review]: ----------------------------------------------------------------- this looks good. My reservations about the times were misdirected.
Attachment #8551439 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 76•9 years ago
|
||
I'm going to land all of these patches now. The remaining issues can be fixed in follow-up bugs. It's already strictly an improvement over the old state.
Assignee | ||
Comment 77•9 years ago
|
||
Pushed: https://hg.mozilla.org/build/talos/pushloghtml?changeset=e7e1a81cba8d
Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•