Support many replaying children
Categories
(Core Graveyard :: Web Replay, enhancement, P5)
Tracking
(firefox68 fixed)
Tracking | Status | |
---|---|---|
firefox68 | --- | fixed |
People
(Reporter: bhackett1024, Assigned: bhackett1024)
References
(Blocks 1 open bug)
Details
Attachments
(9 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
Right now a middleman process supports at most four child processes at once: an optional recording process, two replaying processes for restoring old program states, and an optional replaying process which searches for logpoint hits. When integrated with the cloud (or on a powerful local machine) we could support many more replaying processes, which would make it much faster to restore program states at particular points or search the recording for logpoint hits.
As part of this, the middleman should also be able to survive replaying children that crash or go unresponsive. Right now the middleman crashes immediately if any of its children crash. We used to start up a new replaying process to recover after crashing, but because of the limited amount of children the UI could go unresponsive for large amounts of time after a crash happened, and this feature was removed. Having more replaying children makes it more likely that one will crash, but also means that it should be faster to recover the UI with a child at a nearby point after a crash occurs.
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
This is a big patch that makes the architectural changes necessary to allow efficient use of many replaying children. The main problem with the existing approach is that it is focused on running back and forth through the recording in search of breakpoint hits. With many children, this process is inefficient and hard to coordinate.
The new strategy introduced by this patch reorients things around scans of the recording. As the recording is being made we have different children scanning different parts of the recording. The scans determine the set of execution points where each script location is hit. Scanning is slower than simple replaying, but by spreading the load between any number of replaying processes we'll be able to keep the scan data up to date even while the recording continues to grow. The scan data is used as the basis for most debugger features. When the user wants to run forward or backward through the interior of the recording, we don't reexecute that code (as we have to do now), but just look at the scan data to figure out the execution point where the next breakpoint hit will occur. Once that point is determined, we warp a replaying process there and pause. Similarly, logpoints are handled by looking at the scan data to find all the execution points where the logpoint should hit, then sending different background replaying processes to those points. The entire recording does not need to be reexecuted, as it does now.
The end result of this is a system that can efficiently operate with many children in parallel, while still keeping things simple and elegant. This patch is a net loss of more than 2000 lines of C++ (most of it the navigation logic used when searching for breakpoint hits) and a net gain of about 350 lines of JS.
Performance right now will not be great, especially when stepping. This patch works well enough to pass tests but hasn't been optimized at all. There will be a fair number of additional changes to get performance where it should be, per the dependencies of bug 1547081. Also, this patch doesn't take care of crash recovery, which will need to be done in another bug.
Assignee | ||
Comment 2•5 years ago
|
||
Assignee | ||
Comment 3•5 years ago
|
||
Depends on D30841
Assignee | ||
Comment 4•5 years ago
|
||
Depends on D30842
Assignee | ||
Comment 5•5 years ago
|
||
Depends on D30843
Assignee | ||
Comment 6•5 years ago
|
||
Depends on D30844
Assignee | ||
Comment 7•5 years ago
|
||
Depends on D30845
Assignee | ||
Comment 8•5 years ago
|
||
Depends on D30846
Assignee | ||
Comment 9•5 years ago
|
||
Depends on D30847
Comment 10•5 years ago
|
||
The goal of the bug is for web replay's architecture to be more performant.
One happy side-effect is that the architecture is also significantly less intrusive on platform (# of messages sent, child process, hash table). This helps us achieve our goal of reducing the platform footprint.
Comment 11•5 years ago
|
||
Comment 12•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6cd72bb81d89
https://hg.mozilla.org/mozilla-central/rev/ae4aa5c6c430
https://hg.mozilla.org/mozilla-central/rev/4d7ef85fc81f
https://hg.mozilla.org/mozilla-central/rev/ffc633295190
https://hg.mozilla.org/mozilla-central/rev/fdd4b06edab9
https://hg.mozilla.org/mozilla-central/rev/85b94102fa34
https://hg.mozilla.org/mozilla-central/rev/d394c173d6a4
Updated•5 years ago
|
Description
•