Closed Bug 1590116 Opened 5 years ago Closed 5 years ago

Investigate performance improvement of cold load tests after switching to real profile

Categories

(Testing :: Raptor, task, P1)

Version 3
task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Bebe, Assigned: Bebe)

References

Details

Attachments

(2 files)

After Bug 1589070 - Raptor desktop cold load tests are reusing the initial profile and so are not cold loads
We are seeing performance improvements on cold page load tests

Investigate these improvements.

Name Sub test Platform
raptor-tp6-bing-firefox-cold fcp linux64-shippable-qr opt
raptor-tp6-bing-firefox-cold fcp macosx1014-64-shippable opt
raptor-tp6-bing-firefox-cold fcp windows10-64-shippable opt
raptor-tp6-bing-firefox-cold fcp windows7-32-shippable opt
raptor-tp6-google-mail-firefox-cold fcp linux64-shippable opt
raptor-tp6-google-mail-firefox-cold linux64-shippable opt
raptor-tp6-instagram-firefox-cold fcp windows10-64-shippable opt
raptor-tp6-instagram-firefox-cold fcp windows7-32-shippable opt
raptor-tp6-linkedin-firefox-cold fcp macosx1014-64-shippable opt
raptor-tp6-linkedin-firefox-cold macosx1014-64-shippable opt
raptor-tp6-linkedin-firefox-cold fcp windows10-64-shippable opt
raptor-tp6-linkedin-firefox-cold windows10-64-shippable opt
raptor-tp6-linkedin-firefox-cold fcp windows10-64-shippable-qr opt
raptor-tp6-linkedin-firefox-cold windows10-64-shippable-qr opt
raptor-tp6-linkedin-firefox-cold fcp windows7-32-shippable opt
raptor-tp6-linkedin-firefox-cold windows7-32-shippable opt
raptor-tp6-outlook-firefox-cold fcp linux64-shippable opt
raptor-tp6-outlook-firefox-cold fcp linux64-shippable-qr opt
raptor-tp6-outlook-firefox-cold fcp windows10-64-shippable-qr opt
raptor-tp6-outlook-firefox-cold windows10-64-shippable-qr opt
raptor-tp6-outlook-firefox-cold fcp windows7-32-shippable opt
raptor-tp6-reddit-firefox-cold fcp macosx1014-64-shippable opt
raptor-tp6-yahoo-mail-firefox-cold fcp windows10-64-shippable opt
raptor-tp6-yahoo-mail-firefox-cold fcp windows7-32-shippable opt
raptor-tp6-youtube-firefox-cold fcp windows7-32-shippable opt

Note:

most of the improvements are on FCP tests

Blocks: 1589325
Priority: -- → P1

:acreskey maybe you have a idea why this happen.

Maybe there is a difference for these sites in FCP (first contentful paint) when is loading it from the profile cache and when is getting it from the web...

Flags: needinfo?(acreskey)

I was wondering about that myself.
I have seen cases where the fcp will come earlier when fewer resources are available, e.g. in a cold load. (the actual content in that fcp may be minimal).
But I'm not sure if that's the case here.
I did some local testing (OSX, reddit) and it looks like that may be the case here.

Florin - I would try to capture profiles with each change and see if that's the case: in the cold load cases the early FCPs just have much less content.

Flags: needinfo?(acreskey)
Flags: needinfo?(fstrugariu)
Priority: P1 → P2
Flags: needinfo?(fstrugariu)
Priority: P2 → P1
Flags: needinfo?(fstrugariu)

this was blocked by bug 1594330

I will generate the profiles today

Depends on: 1594330

tried to trigger the new profiles but it looks like we need new try builds

Assignee: nobody → fstrugariu
Status: NEW → ASSIGNED
Flags: needinfo?(fstrugariu)

As we had issues with the profiles decided to do things differently.
The before contains the current build with gecko profiles
The after contains the same commits with Bug 1589070 reverted.

Before:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a321fdac5b44143cee1467fddce86b6379aff5d9

After:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d2557513d05c0d67c29e3d4ad970e71a2a14c85b

it looks like for the "After" code without the Bug 1589070 fix gecko profiles are empty.
Needs more investigation

so the issue is that initially we where creating new profiles and doing the setup but we where using the first creates profile to run each cycle.
when doing that no gecko profiles where generated as the first cycle is removed from the list.

Tried to avoid this but got in to further failures and issues.

I don't think we can generate profiles for this bug and investigate the performance difference.

I'm not sure if it's that important to find the root cause here.
I mentioned earlier that I've seen earlier fcp in cases where there were no cached resources -- earlier fcp but less content in that fcp.
That may be the case here.

Given that this looks to be time consuming to get to the bottom of, I suggest closing or de-prioritizing.

Andrew can you take a look over the profiles?

Thanks,
Bebe

Flags: needinfo?(acreskey)

Yes, I can do that Bebe. I'm going to make new pushes w/ profiles because I also need the geckoprofiler screenshots in them.

Bebe, I can't figure this out -- I'm trying to push jobs before and after this bug landed (but with geckoprofiler screenshots enabled via raptor)

Before the change:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f27807bc51dcac38af688d6ca0872578155e0bbd

After the change:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=95a4bda186508a3ee232d4e61aafc129e416d5d2

For whatever reason the jobs selected with try choose (tp6-11-cold, tp6-19-cold, tp6-23-cold for Windows10-64 shippable) are all missing.
And I can't add new jobs via treeherder.
If you have any ideas, please let me know.

Flags: needinfo?(acreskey) → needinfo?(fstrugariu)

I had great issues generating those profiles.

I will generate them today.

Hmm, I'm getting 404s on those profiles Florin. Sorry, I delayed on that -- I would have expected that the artifacts last longer than 21 days?
I'm still not convinced that this bug is worth pursuing however.

Flags: needinfo?(acreskey)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #18)

Hmm, I'm getting 404s on those profiles Florin. Sorry, I delayed on that -- I would have expected that the artifacts last longer than 21 days?
I'm still not convinced that this bug is worth pursuing however.

@davehunt should we still push on this

Flags: needinfo?(dave.hunt)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #19)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #18)

Hmm, I'm getting 404s on those profiles Florin. Sorry, I delayed on that -- I would have expected that the artifacts last longer than 21 days?
I'm still not convinced that this bug is worth pursuing however.

@davehunt should we still push on this

What was the issue with generating the profiles? Also, how long do we expect profiles to be available for? At this point I'm more interested in improving our ability to generate profiles to assist with investigation of future unexpected results like this. It might make sense to attach profiles to bugs for investigation instead of relying on the retention of those profiles in Treeherder.

Flags: needinfo?(dave.hunt)
Flags: needinfo?(fstrugariu)

@DaveHunt there where multiple issues here that made this go bad.:

  1. CI migration
  2. Bug 1589070 - Raptor desktop cold load tests are reusing the initial profile and so are not cold loads
  3. Bug 1594330 - Task-cluster jobs fail when generating raptor gecko-profile tasks
  4. Artifact got deleted.

Because of this bugs we had to manually generate the commits and make sure everything if ok

Flags: needinfo?(fstrugariu)

These profiles are again unavailable.
Dave should we continue this investigation?

Flags: needinfo?(dave.hunt)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #23)

These profiles are again unavailable.
Dave should we continue this investigation?

I think this demonstrates that we should not link to profiles in Treeherder, but instead should upload them somewhere that they will not expire. Could you generate them again and upload them as attachments to this bug?

Flags: needinfo?(dave.hunt)

Now we can just re-trigger the builds..
I will upload the profiles after they run.

Added profiles

These profiles capture the unexpected behaviour:

After bug 1589070 reverted (so warm loads): the fcp contains the instgram logo
FirstContentfulPaint — Contentful paint after 254ms for URL https://www.instagram.com/, foreground tab

Before bug 1589070 reverted (so cold loads)
FirstContentfulPaint — Contentful paint after 213ms for URL https://www.instagram.com/, foreground tab

This one doesn't have a screenshots track, but so I can't tell exactly what is in that fcp, but I'm betting that it doesn't contain the image because I don't see any pngs coming in until after:
https://perfht.ml/2S3EOeN

Flags: needinfo?(acreskey)

do we need to do anything on this?

Flags: needinfo?(dave.hunt)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #30)

do we need to do anything on this?

From Andrew's investigation it sounds like we're hitting contentful paint before any images are displayed. Have I interpreted that correctly Andrew? Why would the screenshots track be missing in the cold load profile?

Flags: needinfo?(dave.hunt)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #31)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #30)

do we need to do anything on this?

From Andrew's investigation it sounds like we're hitting contentful paint before any images are displayed. Have I interpreted that correctly Andrew? Why would the screenshots track be missing in the cold load profile?

Yes, FCP can be anything from the DOM rendered and because the images haven't come in yet at the time of the marker, I'm guessing that it just rendered a background or similar.

That makes sense. Let's close this then, and look ahead to visual metrics.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: