Closed Bug 1121571 Opened 9 years ago Closed 9 years ago

Improve Talos profiling

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mstange, Assigned: mstange)

References

Details

Attachments

(19 files, 19 obsolete files)

01-Bug_1121571___Explicitly_call_out_shutdown_timeout_and_print_a_message_for_shutdown_termination_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
02-Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_and_change_the_profile_zip_layout_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
03-Bug_1121571___Add_Profiler_js_capabilities_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review-	Details \| Diff \| Splinter Review
04-Bug_1121571___Pass_profiling_information_to_pageloader_and_make_pageloader_dump_profiles_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
05-Bug_1121571___Pass_profiling_info_to_non_pageloader_tests_via_URL_params_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
06-Bug_1121571___Fix_tp5o_scroll_profiling_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
07-Bug_1121571___Fix_tresize_profiling_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
08-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review-	Details \| Diff \| Splinter Review
09-Bug_1121571___Tweak_profiler_intervals___buffer_sizes_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
10-Bug_1121571___Print_some_debug_messages_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
11-Bug_1121571___Increase_the_timeout_when_profiling_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review-	Details \| Diff \| Splinter Review
12-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
13-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review-	Details \| Diff \| Splinter Review
14-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
15-Bug_1121571___Don_t_resume_profiling_at_the_end_of_TART_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
02-Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_and_change_the_profile_zip_layout__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
03-Bug_1121571___Add_Profiler_js_capabilities_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
04-Bug_1121571___Pass_profiling_information_to_pageloader_and_make_pageloader_dump_profiles__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
05-Bug_1121571___Pass_profiling_info_to_non_pageloader_tests_via_URL_params__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
06-Bug_1121571___Fix_tp5o_scroll_profiling__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
07-Bug_1121571___Fix_tresize_profiling__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
08-Bug_1121571___Call_finish___immediately_when_sessionstore_windows_restored_fires_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	Yoric : review+	Details \| Diff \| Splinter Review
09-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
10-Bug_1121571___Tweak_profiler_intervals___buffer_sizes__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
11-Bug_1121571___Print_a_debug_message__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
12-Bug_1121571___Increase_the_timeout_when_profiling_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
13-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
14-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review-	Details \| Diff \| Splinter Review
15-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X_.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
15-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
16-Bug_1121571___Don_t_resume_profiling_at_the_end_of_TART__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
17-Bug_1121571___Retry_profile_rmtree_for_five_minutes_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review-	Details \| Diff \| Splinter Review
18-Bug_1121571___Also_dump_the_profile_from_pageloader_if_it_times_out_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
19-Bug_1121571___Continue_even_if_test_results_add_throws_an_error_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
03-Bug_1121571___Add_Profiler_js_capabilities_.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
14-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
15-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X__r_jmaher.diff 9 years ago Markus Stange [:mstange] (deleted), patch		Details \| Diff \| Splinter Review
17-Bug_1121571___Retry_rmtree_for_twenty_minutes__in_case_the_file_is_still_locked_by_another_process_.diff 9 years ago Markus Stange [:mstange] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review

Markus Stange [:mstange]

Assignee

Description

•

9 years ago

      No description provided.

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Depends on: 1121576

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Depends on: 1121591

Markus Stange [:mstange]

Assignee

Comment 1

•

9 years ago

The problems with the current talos profiling setup are:
 - We dump exactly one profile per browser run, which contains all tested pages. It would be much better to have a separate profile per tested page.
 - Profiles don't have high accuracy. Since each profile spans such a large time frame (the whole browser run), the sampling interval is often set to 10ms so that the resulting profile file still has a manageable size. A lower sampling interval would result in much more useful profiles.
 - Many resulting profile files are truncated because we dump the profile during shutdown and kill the browser if shutdown takes more than 5 seconds.
 - Profiles aren't symbolicated, they contain raw addresses.
 - Many tests don't pause/resume the profiler properly, so the resulting profile contains stuff that Talos didn't measure.
 - The Compositor thread isn't profiled.

Markus Stange [:mstange]

Assignee

Comment 2

•

9 years ago

Attached patch 01-Bug_1121571___Explicitly_call_out_shutdown_timeout_and_print_a_message_for_shutdown_termination_.diff (deleted) — Details — Splinter Review

Attachment #8549221 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 3

•

9 years ago

Attached patch 02-Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_and_change_the_profile_zip_layout_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549222 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 4

•

9 years ago

Attached patch 03-Bug_1121571___Add_Profiler_js_capabilities_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549223 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 5

•

9 years ago

Attached patch 04-Bug_1121571___Pass_profiling_information_to_pageloader_and_make_pageloader_dump_profiles_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549224 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 6

•

9 years ago

Attached patch 05-Bug_1121571___Pass_profiling_info_to_non_pageloader_tests_via_URL_params_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549225 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 7

•

9 years ago

Attached patch 06-Bug_1121571___Fix_tp5o_scroll_profiling_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549226 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 8

•

9 years ago

Attached patch 07-Bug_1121571___Fix_tresize_profiling_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549227 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 9

•

9 years ago

Attached patch 08-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549229 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 10

•

9 years ago

Attached patch 09-Bug_1121571___Tweak_profiler_intervals___buffer_sizes_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549230 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 11

•

9 years ago

Attached patch 10-Bug_1121571___Print_some_debug_messages_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549231 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 12

•

9 years ago

Attached patch 11-Bug_1121571___Increase_the_timeout_when_profiling_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549232 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 13

•

9 years ago

Attached patch 12-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549233 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 14

•

9 years ago

Attached patch 13-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549234 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 15

•

9 years ago

Attached patch 14-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549235 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 16

•

9 years ago

Attached patch 15-Bug_1121571___Don_t_resume_profiling_at_the_end_of_TART_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549236 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 17

•

9 years ago

I've fired off a try push here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8e8db0eec687

Previous pushes with older patch sets:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=b54a23073798
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=020ae04cbd73
(Windows 8 red is because unwinding on Windows 64 bit is broken at the moment)

Markus Stange [:mstange]

Assignee

Comment 18

•

9 years ago

Let's try that again: https://treeherder.mozilla.org/#/jobs?repo=try&revision=af8dbeccc933

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Depends on: 1109862

Markus Stange [:mstange]

Assignee

Comment 19

•

9 years ago

(In reply to Markus Stange [:mstange] from comment #17)
> (Windows 8 red is because unwinding on Windows 64 bit is broken at the
> moment)

... and now it has been fixed by bug 1120126! W00t!

Depends on: 1120126

Joel Maher ( :jmaher ) (UTC -8)

Comment 20

•

9 years ago

Comment on attachment 8549221 [details] [diff] [review]
01-Bug_1121571___Explicitly_call_out_shutdown_timeout_and_print_a_message_for_shutdown_termination_.diff

Review of attachment 8549221 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!

Attachment #8549221 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 21

•

9 years ago

Comment on attachment 8549222 [details] [diff] [review]
02-Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_and_change_the_profile_zip_layout_.diff

Review of attachment 8549222 [details] [diff] [review]:
-----------------------------------------------------------------

it appears we are not turning on the profiler, or we don't have a need for these variables.

::: talos/ttest.py
@@ +329,5 @@
> +                sps_profile_dir = tempfile.mkdtemp();
> +
> +                sps_profile_interval = test_config.get('sps_profile_interval', 1)
> +                sps_profile_entries = test_config.get('sps_profile_entries', 1000000)
> +                sps_profile_threads = 'GeckoMain,Compositor'

do we ever use these variables?  Before we applied them to environment variables which are removed with this patch and now not used.

Attachment #8549222 - Flags: review?(jmaher) → review-

Joel Maher ( :jmaher ) (UTC -8)

Comment 22

•

9 years ago

Comment on attachment 8549223 [details] [diff] [review]
03-Bug_1121571___Add_Profiler_js_capabilities_.diff

Review of attachment 8549223 [details] [diff] [review]:
-----------------------------------------------------------------

I would like to ensure we safeguard these values.

::: talos/page_load_test/tart/addon/content/Profiler.js
@@ +74,5 @@
> +        return;
> +      }
> +      profiler_interval = obj.sps_profile_interval;
> +      profiler_entries = obj.sps_profile_entries;
> +      profiler_threadsArray = obj.sps_profile_threads.split(",");

what happens if these values don't exist?  also we are doing a split on sps_profile_threads, so that assumes it exists and it is a string.

Attachment #8549223 - Flags: review?(jmaher) → review-

Joel Maher ( :jmaher ) (UTC -8)

Comment 23

•

9 years ago

Comment on attachment 8549224 [details] [diff] [review]
04-Bug_1121571___Pass_profiling_information_to_pageloader_and_make_pageloader_dump_profiles_.diff

Review of attachment 8549224 [details] [diff] [review]:
-----------------------------------------------------------------

keep in mind pageloader.js was just cleaned up yesterday (2015-01-14) to run through e10s code paths all the time and remove regular code paths.  I am not sure if that will affect this patch for bitrot or functionality.

Attachment #8549224 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 24

•

9 years ago

Comment on attachment 8549222 [details] [diff] [review]
02-Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_and_change_the_profile_zip_layout_.diff

Review of attachment 8549222 [details] [diff] [review]:
-----------------------------------------------------------------

I see how you use these in a future patch.  I would have preferred that you put it all in the same patch, but as it stands, this is just fine.

Attachment #8549222 - Flags: review- → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 25

•

9 years ago

Comment on attachment 8549225 [details] [diff] [review]
05-Bug_1121571___Pass_profiling_info_to_non_pageloader_tests_via_URL_params_.diff

Review of attachment 8549225 [details] [diff] [review]:
-----------------------------------------------------------------

looks good!

Attachment #8549225 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 26

•

9 years ago

Comment on attachment 8549226 [details] [diff] [review]
06-Bug_1121571___Fix_tp5o_scroll_profiling_.diff

Review of attachment 8549226 [details] [diff] [review]:
-----------------------------------------------------------------

thanks.

::: talos/pageloader/chrome/pageloader.js
@@ +209,5 @@
>                         "  }" +
>                         "} " +
>                         "addEventListener('load', _contentLoadHandler, true); ";
>                       content.selectedBrowser.messageManager.loadFrameScript(contentScript, false, true);
> +                     content.selectedBrowser.messageManager.loadFrameScript("chrome://pageloader/content/Profiler.js", true, true);

our other scripts are loaded as "false, true", but this is loaded with "true, true".  Is there a reason for this?

Attachment #8549226 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 27

•

9 years ago

Comment on attachment 8549227 [details] [diff] [review]
07-Bug_1121571___Fix_tresize_profiling_.diff

Review of attachment 8549227 [details] [diff] [review]:
-----------------------------------------------------------------

a few nits below, r+ as they are minor.

::: talos/generate-tresize-xpi.html
@@ +15,5 @@
>      var files = [
>        "chrome.manifest",
>        "install.rdf",
>        "content/framescript.js",
> +      "content/Profiler.js"

you are missing a trailing , here

::: talos/startup_test/tresize/addon/content/tresize.overlay.xul
@@ +5,1 @@
>  <script type="application/x-javascript" src="tresize.js" />

this specifically calls out language="Javascript" and add a </script> tag, this is differethen than the other <script...> tags in this file, why?

Attachment #8549227 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 28

•

9 years ago

Comment on attachment 8549229 [details] [diff] [review]
08-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff

Review of attachment 8549229 [details] [diff] [review]:
-----------------------------------------------------------------

r- as I would like to learn more about the url parameters vs environment variables for sps profiling knobs.

::: talos/startup_test/sessionrestore/main.js
@@ +15,2 @@
>  
> +    var duration = startup_info.sessionRestored - startup_info.sessionRestoreInit;

is there any concern of altering these numbers in general while profiling is not enabled?

::: talos/ttest.py
@@ +357,5 @@
> +                        'MOZ_PROFILER_STARTUP': '1',
> +                        'MOZ_PROFILER_INTERVAL': str(sps_profile_interval),
> +                        'MOZ_PROFILER_ENTRIES': str(sps_profile_entries),
> +                        "MOZ_PROFILER_THREADS": str(sps_profile_threads)
> +                    }

is there an advantage of doing this vs sending params into pageloader?  This is starting to get confusing.

Attachment #8549229 - Flags: review?(jmaher) → review-

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

9 years ago

Attachment #8549230 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 29

•

9 years ago

Comment on attachment 8549231 [details] [diff] [review]
10-Bug_1121571___Print_some_debug_messages_.diff

Review of attachment 8549231 [details] [diff] [review]:
-----------------------------------------------------------------

::: talos/ttest.py
@@ +417,5 @@
>                                                                  profile_dir,
>                                                                  test_config['url'],
>                                                                  profiling_info=profiling_info)
>  
> +                utils.info("command_args: {0}".format(command_args))

I would like to see this be a bit more specific, maybe:
utils.info("browser command line args: ...")

Attachment #8549231 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 30

•

9 years ago

Comment on attachment 8549232 [details] [diff] [review]
11-Bug_1121571___Increase_the_timeout_when_profiling_.diff

Review of attachment 8549232 [details] [diff] [review]:
-----------------------------------------------------------------

::: talos/ttest.py
@@ +411,5 @@
>                  # Run the test
>                  timeout = test_config.get('timeout', 7200) # 2 hours default
> +                if sps_profile:
> +                    # When profiling, give the browser plenty of extra time to dump the profile.
> +                    timeout += 3600

an extra hour?  Why not an extra 5 minutes?

Attachment #8549232 - Flags: review?(jmaher) → review-

Joel Maher ( :jmaher ) (UTC -8)

Comment 31

•

9 years ago

Comment on attachment 8549233 [details] [diff] [review]
12-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests_.diff

Review of attachment 8549233 [details] [diff] [review]:
-----------------------------------------------------------------

::: talos/page_load_test/v8_7/base.js
@@ +212,5 @@
>  // average time it takes to run a single iteration.
>  BenchmarkSuite.prototype.RunSingleBenchmark = function(benchmark, data) {
>    function Measure(data) {
>      var elapsed = 0;
> +    Profiler.resume(benchmark.name);

are you sure this 'benchmark.name' will name sense?

Attachment #8549233 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 32

•

9 years ago

Comment on attachment 8549234 [details] [diff] [review]
13-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff

Review of attachment 8549234 [details] [diff] [review]:
-----------------------------------------------------------------

is there anyway we can have sym*.py as a published python module, then we don't have to add the code.  At the very least this looks as if it has existed elsewhere for a while- we should reference where it comes from.

also for sps.py, please make it 4 space indentation.  Quite possibly sps.py comes from the same source as the other sym*.py files, maybe it can be rolled into a package as well?

::: talos/sps.py
@@ +1,4 @@
> +try:
> +    import simplejson as json
> +except ImportError:
> +    import json

now that we run on python 2.7, this isn't needed anymore

@@ +15,5 @@
> +        if startMeasurementMarker in marker["name"]:
> +          in_measurement = True
> +        if stopMeasurementMarker in marker["name"]:
> +          in_measurement = False
> +    if True or in_measurement:

this if statement seems odd, why not just "if in_measurement" ?

::: talos/symbolicationRequest.py
@@ +101,5 @@
> +          return
> +        self.forwardCount = rawRequests["forwarded"]
> +
> +      # Only used for compatibility with older clients.
> +      # TODO: Remove after June 2013.

I assume this code is copied from somewhere, this could have been removed 1.5 years ago!

::: talos/ttest.py
@@ +541,5 @@
> +                    symbolicator = symbolication.ProfileSymbolicator({
> +                        # Trace-level logging (verbose)
> +                        "enableTracing": 0,
> +                        # Fallback server if symbol is not found locally
> +                        "remoteSymbolServer": "http://symbolapi.mozilla.org:80/",

I don't know if we will have access to this from the talos machine

Attachment #8549234 - Flags: review?(jmaher) → review-

Joel Maher ( :jmaher ) (UTC -8)

Comment 33

•

9 years ago

Comment on attachment 8549235 [details] [diff] [review]
14-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X_.diff

Review of attachment 8549235 [details] [diff] [review]:
-----------------------------------------------------------------

the ttest.py changes are file, I see what you are doing in symbolication.py, although my previous review to have it in a separate package would be preferred.

Attachment #8549235 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 34

•

9 years ago

Comment on attachment 8549236 [details] [diff] [review]
15-Bug_1121571___Don_t_resume_profiling_at_the_end_of_TART_.diff

Review of attachment 8549236 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!

Attachment #8549236 - Flags: review?(jmaher) → review+

Markus Stange [:mstange]

Assignee

Comment 35

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #22)
> Comment on attachment 8549223 [details] [diff] [review]
> 03-Bug_1121571___Add_Profiler_js_capabilities_.diff
> 
> Review of attachment 8549223 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I would like to ensure we safeguard these values.
> 
> ::: talos/page_load_test/tart/addon/content/Profiler.js
> @@ +74,5 @@
> > +        return;
> > +      }
> > +      profiler_interval = obj.sps_profile_interval;
> > +      profiler_entries = obj.sps_profile_entries;
> > +      profiler_threadsArray = obj.sps_profile_threads.split(",");
> 
> what happens if these values don't exist?  also we are doing a split on
> sps_profile_threads, so that assumes it exists and it is a string.

Good point, I'll add some validation.

(In reply to Joel Maher (:jmaher) from comment #23)
> Comment on attachment 8549224 [details] [diff] [review]
> 04-
> Bug_1121571___Pass_profiling_information_to_pageloader_and_make_pageloader_du
> mp_profiles_.diff
> 
> Review of attachment 8549224 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> keep in mind pageloader.js was just cleaned up yesterday (2015-01-14) to run
> through e10s code paths all the time and remove regular code paths.  I am
> not sure if that will affect this patch for bitrot or functionality.

The patch is already based on that updated version, and everything's working correctly. :-)

(In reply to Joel Maher (:jmaher) from comment #24)
> Comment on attachment 8549222 [details] [diff] [review]
> 02-
> Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_an
> d_change_the_profile_zip_layout_.diff
> 
> Review of attachment 8549222 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I see how you use these in a future patch.  I would have preferred that you
> put it all in the same patch, but as it stands, this is just fine.

Sorry, yeah, should have reshuffled a little.

(In reply to Joel Maher (:jmaher) from comment #26)
> Comment on attachment 8549226 [details] [diff] [review]
> 06-Bug_1121571___Fix_tp5o_scroll_profiling_.diff
> 
> Review of attachment 8549226 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> thanks.
> 
> ::: talos/pageloader/chrome/pageloader.js
> @@ +209,5 @@
> >                         "  }" +
> >                         "} " +
> >                         "addEventListener('load', _contentLoadHandler, true); ";
> >                       content.selectedBrowser.messageManager.loadFrameScript(contentScript, false, true);
> > +                     content.selectedBrowser.messageManager.loadFrameScript("chrome://pageloader/content/Profiler.js", true, true);
> 
> our other scripts are loaded as "false, true", but this is loaded with
> "true, true".  Is there a reason for this?

Hmm, I must have misunderstood the meaning of that argument when I wrote this part. It should work just fine with false. I'll change it. (The docs say: "If true, this flag means that the frame script will be loaded into any new frames opened after the loadFrameScript() call, until removeDelayedFrameScript() is called for that script.")

(In reply to Joel Maher (:jmaher) from comment #27)
> Comment on attachment 8549227 [details] [diff] [review]
> 07-Bug_1121571___Fix_tresize_profiling_.diff
> 
> Review of attachment 8549227 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> a few nits below, r+ as they are minor.
> 
> ::: talos/generate-tresize-xpi.html
> @@ +15,5 @@
> >      var files = [
> >        "chrome.manifest",
> >        "install.rdf",
> >        "content/framescript.js",
> > +      "content/Profiler.js"
> 
> you are missing a trailing , here

will fix

> 
> ::: talos/startup_test/tresize/addon/content/tresize.overlay.xul
> @@ +5,1 @@
> >  <script type="application/x-javascript" src="tresize.js" />
> 
> this specifically calls out language="Javascript" and add a </script> tag,
> this is differethen than the other <script...> tags in this file, why?

I must have copied it from somewhere else. But you're right, since this is a xul file, we can use a self-closing tag (<script/>) here. I'll update it to match the other one.

(In reply to Joel Maher (:jmaher) from comment #28)
> Comment on attachment 8549229 [details] [diff] [review]
> 08-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff
> 
> Review of attachment 8549229 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r- as I would like to learn more about the url parameters vs environment
> variables for sps profiling knobs.
> 
> ::: talos/startup_test/sessionrestore/main.js
> @@ +15,2 @@
> >  
> > +    var duration = startup_info.sessionRestored - startup_info.sessionRestoreInit;
> 
> is there any concern of altering these numbers in general while profiling is
> not enabled?

Actually, let me split up this patch into one that only shuffles stuff around and one that adds the profiling infrastructure. I think my changes won't change any numbers but I'd like Yoric to confirm.

> 
> ::: talos/ttest.py
> @@ +357,5 @@
> > +                        'MOZ_PROFILER_STARTUP': '1',
> > +                        'MOZ_PROFILER_INTERVAL': str(sps_profile_interval),
> > +                        'MOZ_PROFILER_ENTRIES': str(sps_profile_entries),
> > +                        "MOZ_PROFILER_THREADS": str(sps_profile_threads)
> > +                    }
> 
> is there an advantage of doing this vs sending params into pageloader?  This
> is starting to get confusing.

Yes, it really is confusing. I'll add a comment.
These environment variables are *not* read by talos JS code. Talos JS code doesn't get a chance to run until we're almost finished starting up the browser. Instead, they're read by Gecko code that runs extremely early during startup, and they will cause Gecko to start profiling as soon as it can, so that we'll get a profile that contains as much of startup as possible.

(In reply to Joel Maher (:jmaher) from comment #30)
> Comment on attachment 8549232 [details] [diff] [review]
> 11-Bug_1121571___Increase_the_timeout_when_profiling_.diff
> 
> Review of attachment 8549232 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: talos/ttest.py
> @@ +411,5 @@
> >                  # Run the test
> >                  timeout = test_config.get('timeout', 7200) # 2 hours default
> > +                if sps_profile:
> > +                    # When profiling, give the browser plenty of extra time to dump the profile.
> > +                    timeout += 3600
> 
> an extra hour?  Why not an extra 5 minutes?

I'll have a look through the logs, but 5 minutes are almost certainly not enough. I'll come back to this patch when I have more evidence.

(In reply to Joel Maher (:jmaher) from comment #31)
> Comment on attachment 8549233 [details] [diff] [review]
> 12-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests_.diff
> 
> Review of attachment 8549233 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: talos/page_load_test/v8_7/base.js
> @@ +212,5 @@
> >  // average time it takes to run a single iteration.
> >  BenchmarkSuite.prototype.RunSingleBenchmark = function(benchmark, data) {
> >    function Measure(data) {
> >      var elapsed = 0;
> > +    Profiler.resume(benchmark.name);
> 
> are you sure this 'benchmark.name' will name sense?

I'll add a sanity check, but this worked without throwing any errors, as far as I can tell. (Looks like Linux is the only platform that ran V8 in my try push, and the profile isn't big enough to contain the complete run...)

(In reply to Joel Maher (:jmaher) from comment #32)
> Comment on attachment 8549234 [details] [diff] [review]
> 13-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff
> 
> Review of attachment 8549234 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> is there anyway we can have sym*.py as a published python module, then we
> don't have to add the code.  At the very least this looks as if it has
> existed elsewhere for a while- we should reference where it comes from.
> 
> also for sps.py, please make it 4 space indentation.  Quite possibly sps.py
> comes from the same source as the other sym*.py files, maybe it can be
> rolled into a package as well?
> 
> ::: talos/sps.py
> @@ +1,4 @@
> > +try:
> > +    import simplejson as json
> > +except ImportError:
> > +    import json
> 
> now that we run on python 2.7, this isn't needed anymore

What should I replace it with? Just "import json"? Because "import simplejson" didn't work.

> @@ +15,5 @@
> > +        if startMeasurementMarker in marker["name"]:
> > +          in_measurement = True
> > +        if stopMeasurementMarker in marker["name"]:
> > +          in_measurement = False
> > +    if True or in_measurement:
> 
> this if statement seems odd, why not just "if in_measurement" ?

Oops, this is a leftover from an earlier experiment apparently.

> ::: talos/symbolicationRequest.py
> @@ +101,5 @@
> > +          return
> > +        self.forwardCount = rawRequests["forwarded"]
> > +
> > +      # Only used for compatibility with older clients.
> > +      # TODO: Remove after June 2013.
> 
> I assume this code is copied from somewhere, this could have been removed
> 1.5 years ago!

Indeed! I'll update the original code and import the updated version.

> ::: talos/ttest.py
> @@ +541,5 @@
> > +                    symbolicator = symbolication.ProfileSymbolicator({
> > +                        # Trace-level logging (verbose)
> > +                        "enableTracing": 0,
> > +                        # Fallback server if symbol is not found locally
> > +                        "remoteSymbolServer": "http://symbolapi.mozilla.org:80/",
> 
> I don't know if we will have access to this from the talos machine

It worked in my test runs. I was surprised, too! Should I do anything to make sure that it continues to accessable?

(In reply to Joel Maher (:jmaher) from comment #33)
> Comment on attachment 8549235 [details] [diff] [review]
> 14-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X_.diff
> 
> Review of attachment 8549235 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> the ttest.py changes are file, I see what you are doing in symbolication.py,
> although my previous review to have it in a separate package would be
> preferred.

I'll move the symbolication files into a profiler/ subdirectory, as discussed on IRC.

Joel Maher ( :jmaher ) (UTC -8)

Comment 36

•

9 years ago

lets just use 'import json', thanks for the rest of the comments, they all look good!

Markus Stange [:mstange]

Assignee

Comment 37

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #29)
> Comment on attachment 8549231 [details] [diff] [review]
> 10-Bug_1121571___Print_some_debug_messages_.diff
> 
> Review of attachment 8549231 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: talos/ttest.py
> @@ +417,5 @@
> >                                                                  profile_dir,
> >                                                                  test_config['url'],
> >                                                                  profiling_info=profiling_info)
> >  
> > +                utils.info("command_args: {0}".format(command_args))
> 
> I would like to see this be a bit more specific, maybe:
> utils.info("browser command line args: ...")

Actually, utils.GenerateBrowserCommandLine already prints this command line, but at the "debug" log level, which I hadn't enabled, so I didn't see it. I'll just remove this change.

Markus Stange [:mstange]

Assignee

Comment 38

•

9 years ago

> (In reply to Joel Maher (:jmaher) from comment #31)
> > Comment on attachment 8549233 [details] [diff] [review]
> > 12-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests_.diff
> > 
> > Review of attachment 8549233 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: talos/page_load_test/v8_7/base.js
> > @@ +212,5 @@
> > >  // average time it takes to run a single iteration.
> > >  BenchmarkSuite.prototype.RunSingleBenchmark = function(benchmark, data) {
> > >    function Measure(data) {
> > >      var elapsed = 0;
> > > +    Profiler.resume(benchmark.name);
> > 
> > are you sure this 'benchmark.name' will name sense?
> 
> I'll add a sanity check, but this worked without throwing any errors, as far
> as I can tell.

Actually, I'll leave it as is. There's a Benchmark "class" at the top of that file, and that explicitly contains a name field, and I'm pretty sure all variables with the name "benchmark" in that file are an instance of Benchmark.

Markus Stange [:mstange]

Assignee

Comment 39

•

9 years ago

Attached patch 02-Bug_1121571___Create_a_directory_for_profiles__don_t_use_startup_profiling_and_change_the_profile_zip_layout__r_jmaher.diff (deleted) — Details — Splinter Review

I've shuffled some variables around and made the profile zip end in ".sps.zip".

Attachment #8549222 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 40

•

9 years ago

Attached patch 03-Bug_1121571___Add_Profiler_js_capabilities_.diff (obsolete) (deleted) — Details — Splinter Review

Now with added type checks.

Attachment #8549223 - Attachment is obsolete: true

Attachment #8550515 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 41

•

9 years ago

Attached patch 04-Bug_1121571___Pass_profiling_information_to_pageloader_and_make_pageloader_dump_profiles__r_jmaher.diff (deleted) — Details — Splinter Review

unchanged, afaik

Markus Stange [:mstange]

Assignee

Comment 42

•

9 years ago

Attached patch 05-Bug_1121571___Pass_profiling_info_to_non_pageloader_tests_via_URL_params__r_jmaher.diff (deleted) — Details — Splinter Review

also unchanged

Markus Stange [:mstange]

Assignee

Comment 43

•

9 years ago

Attached patch 06-Bug_1121571___Fix_tp5o_scroll_profiling__r_jmaher.diff (deleted) — Details — Splinter Review

with ", false, true);"

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Attachment #8549224 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Attachment #8549225 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Attachment #8549226 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 44

•

9 years ago

Attached patch 07-Bug_1121571___Fix_tresize_profiling__r_jmaher.diff (deleted) — Details — Splinter Review

added the missing comma and changed the <script/> declaration

Attachment #8549227 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Comment 45

•

9 years ago

Comment on attachment 8550515 [details] [diff] [review]
03-Bug_1121571___Add_Profiler_js_capabilities_.diff

Review of attachment 8550515 [details] [diff] [review]:
-----------------------------------------------------------------

::: talos/page_load_test/tart/addon/content/Profiler.js
@@ +52,5 @@
> +
> +    for (var i in pairs) {
> +      if (pairs[i] !== "") {
> +        var pair = pairs[i].split("=");
> +        result[decodeURIComponent(pair[0])] = decodeURIComponent(pair[1]);

here you assume that pair has a length of 2, most likely it will, either fix it or add a comment

Attachment #8550515 - Flags: review?(jmaher) → review+

Markus Stange [:mstange]

Assignee

Comment 46

•

9 years ago

Attached patch 08-Bug_1121571___Call_finish___immediately_when_sessionstore_windows_restored_fires_.diff (deleted) — Details — Splinter Review

This patch does two things:
 1. It makes us not wait for the "load" event. This was just an unneeded delay, as far as I can tell.
 2. It moves the setTimeout(..., 0) into the finish() function so that we have a place in the finish() function that is hit as soon after "sessionstore-windows-restored" as possible.

It shouldn't change any numbers. David, do you agree?

Attachment #8550521 - Flags: review?(dteller)

Markus Stange [:mstange]

Assignee

Comment 47

•

9 years ago

Attached patch 09-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff (deleted) — Details — Splinter Review

Just the profiler changes from the old part 8, this time.

Attachment #8550522 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 48

•

9 years ago

Attached patch 10-Bug_1121571___Tweak_profiler_intervals___buffer_sizes__r_jmaher.diff (deleted) — Details — Splinter Review

I had to increase the profile size for v8 and I also increased the resolution for tresize a little.

Attachment #8549229 - Attachment is obsolete: true

Attachment #8549230 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 49

•

9 years ago

Attached patch 11-Bug_1121571___Print_a_debug_message__r_jmaher.diff (deleted) — Details — Splinter Review

Print just the "activating profiling" message.

Markus Stange [:mstange]

Assignee

Comment 50

•

9 years ago

Attached patch 12-Bug_1121571___Increase_the_timeout_when_profiling_.diff (deleted) — Details — Splinter Review

Just five minutes this time. The longer timeouts I was seeing are likely caused by deadlocks in the profiler that hang the browser completely. I still need to find out how to fix those.

Attachment #8549231 - Attachment is obsolete: true

Attachment #8549232 - Attachment is obsolete: true

Attachment #8550527 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 51

•

9 years ago

Attached patch 13-Bug_1121571___Add_more_granular_profiler_pausing_for_many_tests__r_jmaher.diff (deleted) — Details — Splinter Review

unchanged

Attachment #8549233 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 52

•

9 years ago

Attached patch 14-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff (obsolete) (deleted) — Details — Splinter Review

I've moved the imported files to a "profiler" subdirectory. I've also simplified the imported files a little, and removed functions we don't use. I've kept the indentation as-is.
I was seeing out-of-memory errors during profile symbolication (specifically in the "profile = json.load(profile_file)" line), so I'm now catching the MemoryError exception.

Attachment #8549234 - Attachment is obsolete: true

Attachment #8550529 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 53

•

9 years ago

Attached patch 15-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X_.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8549235 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 54

•

9 years ago

Attached patch 15-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X__r_jmaher.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8550531 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 55

•

9 years ago

Attached patch 16-Bug_1121571___Don_t_resume_profiling_at_the_end_of_TART__r_jmaher.diff (deleted) — Details — Splinter Review

Attachment #8549236 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 56

•

9 years ago

Attached patch 17-Bug_1121571___Retry_profile_rmtree_for_five_minutes_.diff (obsolete) (deleted) — Details — Splinter Review

This error occurred in some of the Windows runs: "The process cannot access the file because it is being used by another process: 'c:\\docume~1\\cltbld~1.t-x\\locals~1\\temp\\tmpv0wi28\\page_0.sps'"
It happened in the "d" and "s" tests of this try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a819e4547e25

I don't know what's happening here. The browser is the only process that knows of that file, and it should absolutely be closed by now. Maybe you know what's going on?

Attachment #8550535 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 57

•

9 years ago

Attached patch 18-Bug_1121571___Also_dump_the_profile_from_pageloader_if_it_times_out_.diff (deleted) — Details — Splinter Review

There were a few tp timeouts, and I still haven't figured out their cause. In any case, dumping profiles on timeout will make this easier to track down.

Here's an example log (search for "__FAILTimeout in tp5n__FAIL"):
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.com-44c6445d8121/try-macosx64/try_mountainlion_test-tp5o-bm108-tests1-macosx-build0.txt.gz

Attachment #8550536 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 58

•

9 years ago

Attached patch 19-Bug_1121571___Continue_even_if_test_results_add_throws_an_error_.diff (deleted) — Details — Splinter Review

This might have unintended side effects. Any ideas?

Attachment #8550537 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 59

•

9 years ago

New try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7b17651136c

Markus Stange [:mstange]

Assignee

Comment 60

•

9 years ago

Comment on attachment 8550535 [details] [diff] [review]
17-Bug_1121571___Retry_profile_rmtree_for_five_minutes_.diff

This didn't fix the issue:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7b17651136c

Attachment #8550535 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 61

•

9 years ago

Comment on attachment 8550537 [details] [diff] [review]
19-Bug_1121571___Continue_even_if_test_results_add_throws_an_error_.diff

Review of attachment 8550537 [details] [diff] [review]:
-----------------------------------------------------------------

a few random chances of failure in the future, but if we get a failed case to start with the job will be red/orange anyway, so forcing us to collect more information is a win.

Attachment #8550537 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 62

•

9 years ago

Comment on attachment 8550536 [details] [diff] [review]
18-Bug_1121571___Also_dump_the_profile_from_pageloader_if_it_times_out_.diff

Review of attachment 8550536 [details] [diff] [review]:
-----------------------------------------------------------------

good find.

Attachment #8550536 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 63

•

9 years ago

Comment on attachment 8550535 [details] [diff] [review]
17-Bug_1121571___Retry_profile_rmtree_for_five_minutes_.diff

Review of attachment 8550535 [details] [diff] [review]:
-----------------------------------------------------------------

I would like to see this done in general and address my question.

::: talos/ttest.py
@@ +622,5 @@
> +                # For some reason, on Windows, big profiles are sometimes locked
> +                # by another process even after all Firefox processes have been
> +                # terminated. Allow up to 5 minutes for the file lock to be
> +                # released.
> +                utils.rmtree_until_timeout(sps_profile_dir, 5 * 60)

fwiw, we see this on windows in general, could you apply this to profile cleanup overall?

::: talos/utils.py
@@ +279,5 @@
> +            return
> +        except Exception as e:
> +            if elapsed >= timeout:
> +                raise e
> +            time.sleep(1)

this assumes that shutl.rmtree will hit an exception in a few milliseconds, isn't it possible that we could hit the exception after a few seconds due to a locked file or some type of timeout?

Attachment #8550535 - Flags: review-

Joel Maher ( :jmaher ) (UTC -8)

Comment 64

•

9 years ago

Comment on attachment 8550529 [details] [diff] [review]
14-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff

Review of attachment 8550529 [details] [diff] [review]:
-----------------------------------------------------------------

I would like some indication of the root of these files and a mpl at the top.  Ideally they would live in a profiling/, sps/, or the existing scripts/ subdir instead of the root dir.  If they were in their own subdir, it would be easy to create a readme.txt which could point to the github root.

I am fine with the files, not with the locations- a few nits below.

::: talos/profiler/symFileManager.py
@@ +201,5 @@
> +            #print "Can't fit " + pdbName + "/" + pdbId
> +            break
> +          fetchedSymbols[(pdbName, pdbId)] = symbolInfo
> +          fetchedCount += symbolInfo.GetEntryCount()
> +          #print "fetchedCount = " + str(fetchedCount) + "  after " + pdbName + "/" + pdbId

we have a lot of print statements commented out, can we remove these?

::: talos/ttest.py
@@ +590,5 @@
> +                            except MemoryError as e:
> +                                utils.info("Ran out of memory while trying to symbolicate profile")
> +                            except Exception as e:
> +                                utils.info(e)
> +                                utils.info("Encountered an exception during profile symbolication")

could we be more explicit about which test we were not able to get sps profiles for in the error message?

Attachment #8550529 - Flags: review?(jmaher) → review-

Joel Maher ( :jmaher ) (UTC -8)

Comment 65

•

9 years ago

Comment on attachment 8550527 [details] [diff] [review]
12-Bug_1121571___Increase_the_timeout_when_profiling_.diff

Review of attachment 8550527 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!

Attachment #8550527 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 66

•

9 years ago

Comment on attachment 8550522 [details] [diff] [review]
09-Bug_1121571___Profile_startup_using_profiler_env_vars_.diff

Review of attachment 8550522 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!

Attachment #8550522 - Flags: review?(jmaher) → review+

Markus Stange [:mstange]

Assignee

Comment 67

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #63)
> Comment on attachment 8550535 [details] [diff] [review]
> 17-Bug_1121571___Retry_profile_rmtree_for_five_minutes_.diff

> ::: talos/ttest.py
> @@ +622,5 @@
> > +                # For some reason, on Windows, big profiles are sometimes locked
> > +                # by another process even after all Firefox processes have been
> > +                # terminated. Allow up to 5 minutes for the file lock to be
> > +                # released.
> > +                utils.rmtree_until_timeout(sps_profile_dir, 5 * 60)
> 
> fwiw, we see this on windows in general, could you apply this to profile
> cleanup overall?

Sure. Not sure it's going to work, though.

> ::: talos/utils.py
> @@ +279,5 @@
> > +            return
> > +        except Exception as e:
> > +            if elapsed >= timeout:
> > +                raise e
> > +            time.sleep(1)
> 
> this assumes that shutl.rmtree will hit an exception in a few milliseconds,

time.sleep accepts seconds, not milliseconds. I'll add a comment about that.
Does that clear up the confusion, or is there something else that's unclear or that I'm missing? The loop doesn't really care how long rmtree takes until it throws an exception. It waits for it to either succeed or throw, no matter how long it takes to do that. And if it throws, we discard the exception, until we've seen |timeout| many exceptions. And we wait for one second between tries.

> isn't it possible that we could hit the exception after a few seconds due to
> a locked file or some type of timeout?

Apparently. But in my latest push, a timeout of 5 minutes didn't help. I'll try increasing the timeout, maybe I just haven't waited long enough.

Markus Stange [:mstange]

Assignee

Comment 68

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #64)
> Comment on attachment 8550529 [details] [diff] [review]
> 14-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff
> 
> Review of attachment 8550529 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I would like some indication of the root of these files and a mpl at the
> top.

Good point.

> Ideally they would live in a profiling/, sps/, or the existing
> scripts/ subdir instead of the root dir.

They're in the "profiler" dir in this patch, not in the root dir.

> If they were in their own subdir,
> it would be easy to create a readme.txt which could point to the github root.

I'll add one.

> ::: talos/profiler/symFileManager.py
> @@ +201,5 @@
> > +            #print "Can't fit " + pdbName + "/" + pdbId
> > +            break
> > +          fetchedSymbols[(pdbName, pdbId)] = symbolInfo
> > +          fetchedCount += symbolInfo.GetEntryCount()
> > +          #print "fetchedCount = " + str(fetchedCount) + "  after " + pdbName + "/" + pdbId
> 
> we have a lot of print statements commented out, can we remove these?

Sure.

> ::: talos/ttest.py
> @@ +590,5 @@
> > +                            except MemoryError as e:
> > +                                utils.info("Ran out of memory while trying to symbolicate profile")
> > +                            except Exception as e:
> > +                                utils.info(e)
> > +                                utils.info("Encountered an exception during profile symbolication")
> 
> could we be more explicit about which test we were not able to get sps
> profiles for in the error message?

Good idea.

David Teller [:Yoric] - still alive but not very active

Comment 69

•

9 years ago

Comment on attachment 8550521 [details] [diff] [review]
08-Bug_1121571___Call_finish___immediately_when_sessionstore_windows_restored_fires_.diff

Review of attachment 8550521 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see any reason why this wouldn't work. So r=me.

Attachment #8550521 - Flags: review?(dteller) → review+

Markus Stange [:mstange]

Assignee

Comment 70

•

9 years ago

Attached patch 03-Bug_1121571___Add_Profiler_js_capabilities_.diff (deleted) — Details — Splinter Review

Attachment #8550515 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 71

•

9 years ago

Attached patch 14-Bug_1121571___Do_symbolication_right_on_the_talos_machine_.diff (deleted) — Details — Splinter Review

Attachment #8550529 - Attachment is obsolete: true

Attachment #8551437 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 72

•

9 years ago

Attached patch 15-Bug_1121571___Read_symbols_from_system_libraries_on_OS_X__r_jmaher.diff (deleted) — Details — Splinter Review

Attachment #8550532 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 73

•

9 years ago

Attached patch 17-Bug_1121571___Retry_rmtree_for_twenty_minutes__in_case_the_file_is_still_locked_by_another_process_.diff (deleted) — Details — Splinter Review

Attachment #8550535 - Attachment is obsolete: true

Attachment #8551439 - Flags: review?(jmaher)

Markus Stange [:mstange]

Assignee

Comment 74

•

9 years ago

New push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a40998427912
This one has an updated mozilla-central base. I hope it won't hit bug 1122947 during the Talos runs.

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Blocks: 1123446

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

9 years ago

Attachment #8551437 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 75

•

9 years ago

Comment on attachment 8551439 [details] [diff] [review]
17-Bug_1121571___Retry_rmtree_for_twenty_minutes__in_case_the_file_is_still_locked_by_another_process_.diff

Review of attachment 8551439 [details] [diff] [review]:
-----------------------------------------------------------------

this looks good.  My reservations about the times were misdirected.

Attachment #8551439 - Flags: review?(jmaher) → review+

Markus Stange [:mstange]

Assignee

Comment 76

•

9 years ago

I'm going to land all of these patches now. The remaining issues can be fixed in follow-up bugs. It's already strictly an improvement over the old state.

Markus Stange [:mstange]

Assignee

Comment 77

•

9 years ago

Pushed: https://hg.mozilla.org/build/talos/pushloghtml?changeset=e7e1a81cba8d

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Depends on: 1123852

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Markus Stange [:mstange]

Assignee

Updated

•

9 years ago

Depends on: 1149364

You need to log in before you can comment on or make changes to this bug.