Closed Bug 1084493 Opened 10 years ago Closed 7 years ago

[Meta] Reduce/mitigate time taken for pushes to show in Treeherder (compared to TBPL client-side fetching)

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P3)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rillian, Unassigned)

References

Details

(Keywords: meta, regression)

I've noticed recently that treeherder generally gives me an error when I click on the link the hg server returns on push. If I click the tbpl link instead, it shows a page with the commit information, even if nothing is available from buildbot yet.

I would like to have at least as good behaviour from treeherder:

Links for new pushes should immediately return a valid page.
That page should load commit information, and load job information as soon as it's available.
Summary: Treeherder slow to show new jobs → Treeherder slower to show new pushes than TBPL
The current treeherder behaviour for the try link workflow is:
1) User sees a slightly confusing error message.
2) The page automatically refreshes (but the user may not have spotted this mentioned in the error message), and so the push will appear within a few minutes.

The reason TBPL displays the push sooner than treeherder, is that TBPL merely requests the pushlog in the client, and then polls the TBPL backend for the job results that are then overlayed on top of it in a non-persistent manner. This is extremely limiting, since the backend has no concept of a "push" and so greatly limits the features that can be implemented. Treeherder on the other hand, ingests the pushlog server-side and combines it with the job results.

This means that to some extent, it's unavoidable that there is a slightly longer delay (eg: couple of mins) than with TBPL.

Bug 1080408 is improving the message shown when visiting an new/unknown push - with that fixed, is there a particular reason for needing the pushlog to be loaded immediately? I'm just wondering if your worry is that the page isn't going to refresh, or that you want to open an hgweb URL straight away? If the latter for example, we could add the hgweb URL to the hg.mozilla.org response in addition to the treeherder link.

As for reducing the lag between pushing and it appearing in treeherder's UI, things we can try:
1) Reduce the time between "push complete" and "treeherder starts to ingest that push". Currently treeherder polls once a minute for each repo - we could perhaps try lowering this to every 30 seconds. To improve on that any more, we'd need bug 1022701 to switch to a push-based approach instead of polling.
2) Reduce the time taken for the push to be processed in the service. I don't know how long this takes currently.
3) Increase the frequency with which the UI polls the service for new result sets (pushes). Or alternatively in the future, if socket.io can be made reliable, switch back to that again and use push-based notifications.

However, I think even with #1-3 fixed - in the "click a try link immediately after pushing" case, we still won't have the push quick enough that the user can immediately see it. As such, I think most of this bug really hinges on the messaging change above & understanding why users may not be happy with the page refreshing after a couple of mins - and solving those use-cases in another way.
Blocks: 1076750
Depends on: 1080408, 1022701
Priority: -- → P2
Summary: Treeherder slower to show new pushes than TBPL → Reduce time taken for pushes to show in Treeherder, so it's closer to TBPL behaviour
With the understanding that bug 1080408 will:
* Make the wording of the message clearer
* Provide a link to the pushlog for the specified revision immediately (eg https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=8d578891aa89)

...and that already:
* The page refreshes automatically
* Pushes should appear within 3-4 mins (let me know if it takes longer than this)

What are the use-cases for needing the push to be visible sooner than 3-4 mins, so that we might be able to work around them?
Flags: needinfo?(giles)
Thanks for the detailed summary. I think showing a better message and being able to see the pushed commits immediately would be bring parity with tbpl, superiour if it actually shows what was pushed, not just what's new in the tree.
Flags: needinfo?(giles)
(In reply to Ralph Giles (:rillian) from comment #3)
> being
> able to see the pushed commits immediately would be bring parity with tbpl,
> superiour if it actually shows what was pushed, not just what's new in the
> tree.

This unfortunately isn't what bug 1080408 is adding. It's adding a single link to the pushlog view already available on hg.mozilla.org, and not adding a list of files changed.

What is your use case for viewing a try push in the <5 mins after pushing?
When I push to try, the server responds with a link. If I'm interested in quick feedback, I click the link there, so it's open in a tab and I can watch results as they come in.

The most important thing is that that work. I don't want to have to dig through my scroll back, or check my email to retrieve the link some time later when I think results might be available. I certainly failed to understand from the error that the page would later update with correct information. Knowing that is most of the issue for me personally.

Having information about the revisions pushed is important when I have multiple pushes active, so I can tell which tab is which. I also sometimes use the hg links in tbpl to confirm details of what I pushed.

There's also just cognitive dissonance. It doesn't make sense as a naive user that the hg server could give me a link immediately which the treeherder server couldn't serve for several minutes. You mention polling. If the backend needs to know the same data, can it not pull when it sees the id in the user request? Can the client not pre-fill with its own fetch, and replace it with data from the backend later?
Blocks: 1096863
Depends on: 1096919
I think there are a few different aspects to this bug:
1) The page refresh actually seems broken at the moment -> bug 1090531.
2) A UX/reassurance issue (unclear that lag is expected & page will refresh) -> bug 1080408.
3) Potentially some workflows can circumvent Treeherder altogether -> the tweak in bug 1080408 comment 18, plus bug 1096917.
4) Sheriffs watching non-try repos could still benefit from pushes appearing sooner, since it makes it easier to spot that say someone has already pushed a backout/followup fix, when about to close the tree/investigate. -> Longer term: bug 1022701, short term: bug 1096919.
Depends on: 1096917, 1090531
Keywords: meta
Summary: Reduce time taken for pushes to show in Treeherder, so it's closer to TBPL behaviour → [Meta] Reduce/mitigate time taken for pushes to show in Treeherder (compared to TBPL client-side fetching)
(In reply to Ed Morley [:edmorley] from comment #6)
> I think there are a few different aspects to this bug:
> 1) The page refresh actually seems broken at the moment -> bug 1090531.
> 2) A UX/reassurance issue (unclear that lag is expected & page will refresh)
> -> bug 1080408.

These two already block bug 1059400 directly, and the rest IMO aren't severe enough to cause anyone not to switch, so making this bug not block bug 1059400, given it's pretty open ended otherwise.
No longer blocks: treeherder-dev-transition
Keywords: regression
Component: Treeherder → Treeherder: Data Ingestion
Depends on: 1124269
No longer depends on: 1124269
Priority: P2 → P3
Depends on: 1065567
No longer depends on: 1022701
Treeherder now ingests both Hg and Git pushes via pulse (the former as of bug 1065567), so the ingestion time is much quicker. The UI still only polls the backend every 60s, but for the "opening a try link from an email/hg push console message" use-case, this doesn't matter.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.