Closed Bug 1547084 Opened 6 years ago Closed 5 years ago

Support many replaying children

Tracking

(firefox68 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla68

Tracking Flags:

Tracking

Status

firefox68

---

fixed

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

(Blocks 1 open bug)

Details

Attachments

(9 files)

patch 5 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
Bug 1547084 Part 1 - Use correct element type when indicating size of freed HashTable buffers, r=luke. 5 years ago Brian Hackett [Laid off!] (deleted), text/x-phabricator-request		Details
Bug 1547084 Part 2 - Remove recordReplayDirective interface and uses, r=mccr8. 5 years ago Brian Hackett [Laid off!] (deleted), text/x-phabricator-request		Details
Bug 1547084 Part 3 - Remove reverseStepIn and reverseStepOut logic, r=loganfsmyth. 5 years ago Brian Hackett [Laid off!] (deleted), text/x-phabricator-request		Details
Bug 1547084 Part 4 - C++ changes and removal for new control logic, r=loganfsmyth. 5 years ago Brian Hackett [Laid off!] (deleted), text/x-phabricator-request		Details
Bug 1547084 Part 5 - Switch to a search focused control logic architecture, r=loganfsmyth. 5 years ago Brian Hackett [Laid off!] (deleted), text/x-phabricator-request		Details
Bug 1547084 Part 6 - Debugger changes for new control logic, r=loganfsmyth. 5 years ago Brian Hackett [Laid off!] (deleted), text/x-phabricator-request		Details
Bug 1547084 Part 7 - Console changes for new control logic, r=nchevobbe. 5 years ago Brian Hackett [Laid off!] (deleted), text/x-phabricator-request		Details
Bug 1547084 Part 8 - Test changes for new control logic, r=loganfsmyth. 5 years ago Brian Hackett [Laid off!] (deleted), text/x-phabricator-request		Details

Brian Hackett [Laid off!]

Assignee

Description

•

6 years ago

Right now a middleman process supports at most four child processes at once: an optional recording process, two replaying processes for restoring old program states, and an optional replaying process which searches for logpoint hits. When integrated with the cloud (or on a powerful local machine) we could support many more replaying processes, which would make it much faster to restore program states at particular points or search the recording for logpoint hits.

As part of this, the middleman should also be able to survive replaying children that crash or go unresponsive. Right now the middleman crashes immediately if any of its children crash. We used to start up a new replaying process to recover after crashing, but because of the limited amount of children the UI could go unresponsive for large amounts of time after a crash happened, and this feature was removed. Having more replaying children makes it more likely that one will crash, but also means that it should be faster to recover the UI with a child at a nearby point after a crash occurs.

Calixte Denizet (:calixte)

Updated

•

6 years ago

Type: defect → enhancement

Jason Laster [:jlast]

Updated

•

6 years ago

Priority: -- → P5

Brian Hackett [Laid off!]

Assignee

Comment 1

•

5 years ago

Attached patch patch (deleted) — Details — Splinter Review

This is a big patch that makes the architectural changes necessary to allow efficient use of many replaying children. The main problem with the existing approach is that it is focused on running back and forth through the recording in search of breakpoint hits. With many children, this process is inefficient and hard to coordinate.

The new strategy introduced by this patch reorients things around scans of the recording. As the recording is being made we have different children scanning different parts of the recording. The scans determine the set of execution points where each script location is hit. Scanning is slower than simple replaying, but by spreading the load between any number of replaying processes we'll be able to keep the scan data up to date even while the recording continues to grow. The scan data is used as the basis for most debugger features. When the user wants to run forward or backward through the interior of the recording, we don't reexecute that code (as we have to do now), but just look at the scan data to figure out the execution point where the next breakpoint hit will occur. Once that point is determined, we warp a replaying process there and pause. Similarly, logpoints are handled by looking at the scan data to find all the execution points where the logpoint should hit, then sending different background replaying processes to those points. The entire recording does not need to be reexecuted, as it does now.

The end result of this is a system that can efficiently operate with many children in parallel, while still keeping things simple and elegant. This patch is a net loss of more than 2000 lines of C++ (most of it the navigation logic used when searching for breakpoint hits) and a net gain of about 350 lines of JS.

Performance right now will not be great, especially when stepping. This patch works well enough to pass tests but hasn't been optimized at all. There will be a fair number of additional changes to get performance where it should be, per the dependencies of bug 1547081. Also, this patch doesn't take care of crash recovery, which will need to be done in another bug.

Assignee: nobody → bhackett1024

Brian Hackett [Laid off!]

Assignee

Comment 2

•

5 years ago

Attached file Bug 1547084 Part 1 - Use correct element type when indicating size of freed HashTable buffers, r=luke. (deleted) — Details

Brian Hackett [Laid off!]

Assignee

Comment 3

•

5 years ago

Attached file Bug 1547084 Part 2 - Remove recordReplayDirective interface and uses, r=mccr8. (deleted) — Details

Depends on D30841

Brian Hackett [Laid off!]

Assignee

Comment 4

•

5 years ago

Attached file Bug 1547084 Part 3 - Remove reverseStepIn and reverseStepOut logic, r=loganfsmyth. (deleted) — Details

Depends on D30842

Brian Hackett [Laid off!]

Assignee

Comment 5

•

5 years ago

Attached file Bug 1547084 Part 4 - C++ changes and removal for new control logic, r=loganfsmyth. (deleted) — Details

Depends on D30843

Brian Hackett [Laid off!]

Assignee

Comment 6

•

5 years ago

Attached file Bug 1547084 Part 5 - Switch to a search focused control logic architecture, r=loganfsmyth. (deleted) — Details

Depends on D30844

Brian Hackett [Laid off!]

Assignee

Comment 7

•

5 years ago

Attached file Bug 1547084 Part 6 - Debugger changes for new control logic, r=loganfsmyth. (deleted) — Details

Depends on D30845

Brian Hackett [Laid off!]

Assignee

Comment 8

•

5 years ago

Attached file Bug 1547084 Part 7 - Console changes for new control logic, r=nchevobbe. (deleted) — Details

Depends on D30846

Brian Hackett [Laid off!]

Assignee

Comment 9

•

5 years ago

Attached file Bug 1547084 Part 8 - Test changes for new control logic, r=loganfsmyth. (deleted) — Details

Depends on D30847

Jason Laster [:jlast]

Comment 10

•

5 years ago

The goal of the bug is for web replay's architecture to be more performant.

One happy side-effect is that the architecture is also significantly less intrusive on platform (# of messages sent, child process, hash table). This helps us achieve our goal of reducing the platform footprint.

Pulsebot

Comment 11

•

5 years ago

Pushed by bhackett@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6cd72bb81d89 Part 1 - Remove recordReplayDirective interface and uses, r=mccr8. https://hg.mozilla.org/integration/mozilla-inbound/rev/ae4aa5c6c430 Part 2 - Remove reverseStepIn and reverseStepOut logic, r=loganfsmyth. https://hg.mozilla.org/integration/mozilla-inbound/rev/4d7ef85fc81f Part 3 - C++ changes and removal for new control logic, r=loganfsmyth. https://hg.mozilla.org/integration/mozilla-inbound/rev/ffc633295190 Part 4 - Switch to a search focused control logic architecture, r=loganfsmyth. https://hg.mozilla.org/integration/mozilla-inbound/rev/fdd4b06edab9 Part 5 - Debugger changes for new control logic, r=loganfsmyth. https://hg.mozilla.org/integration/mozilla-inbound/rev/85b94102fa34 Part 6 - Console changes for new control logic, r=nchevobbe. https://hg.mozilla.org/integration/mozilla-inbound/rev/d394c173d6a4 Part 7 - Test changes for new control logic, r=loganfsmyth.

Natalia Csoregi [:nataliaCs]

Comment 12

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6cd72bb81d89
https://hg.mozilla.org/mozilla-central/rev/ae4aa5c6c430
https://hg.mozilla.org/mozilla-central/rev/4d7ef85fc81f
https://hg.mozilla.org/mozilla-central/rev/ffc633295190
https://hg.mozilla.org/mozilla-central/rev/fdd4b06edab9
https://hg.mozilla.org/mozilla-central/rev/85b94102fa34
https://hg.mozilla.org/mozilla-central/rev/d394c173d6a4

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

Nicolas Chevobbe [:nchevobbe]

Updated

•

5 years ago

Regressions: 1552420

BMO Automation

Updated

•

5 years ago

Product: Core → Core Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Support many replaying children

Categories

(Core Graveyard :: Web Replay, enhancement, P5)

Tracking

(firefox68 fixed)

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(9 files)

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Updated

Attachment

General

Description

File Name

Content Type