Closed
Bug 1362953
Opened 7 years ago
Closed 6 years ago
Refactor lock-protected TelemetryHistogram code
Categories
(Toolkit :: Telemetry, enhancement, P4)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla62
Tracking | Status | |
---|---|---|
firefox62 | --- | fixed |
People
(Reporter: gfritzsche, Assigned: mcalabrese85, Mentored)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [lang=c++])
Attachments
(1 file, 4 obsolete files)
In TelemetryHistogram.cpp [1], all functions that are named `internal_*()` are meant to be called with a global TelemetryHistogram lock to be held (on `gTelemetryHistogramMutex`).
A safer pattern is to require passing a lock instance as seen in [2].
We should refactor TelemetryHistogram to that pattern, so e.g.:
> void
> internal_SetHistogramRecordingEnabled(...)
... becomes
> void
> SetHistogramRecordingEnabled(const StaticMutexAutoLock& lock, ...)
... and callers become:
> internal_SetHistogramRecordingEnabled(locker, ...);
1: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp
2: https://dxr.mozilla.org/mozilla-central/rev/17d8a1e278a9c54a6fdda9d390abce4077e55b20/toolkit/components/telemetry/TelemetryEvent.cpp#280
Reporter | ||
Comment 1•7 years ago
|
||
This will require:
- Changing TelemetryHistogram.cpp as described.
- We will also need to update the comments to match the code changes.
- Making sure tests pass:
- first: mach test toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js
- before submitting a patch, also make sure the other tests pass too: mach test toolkit/components/telemetry/tests/unit/
Reporter | ||
Comment 2•7 years ago
|
||
Note that the following functions can't be adopted yet:
- internal_JSHistogram_*()
- internal_WrapAndReturnHistogram()
- internal_KeyedHistogram_SnapshotImpl()
- internal_JSKeyedHistogram_*()
- internal_WrapAndReturnKeyedHistogram()
This is due to bug 1278821.
Reporter | ||
Updated•7 years ago
|
Mentor: chutten
Comment 3•7 years ago
|
||
I was able to build successfully. The patch passed all tests.
Some notes:
The methods |KeyedHistogram::Add()|, |KeyedHistogram::GetHistogram()| and |KeyedHistogram::GetEnumId()| also now require a lock instance to be passed as a parameter, as they are called from |internal_*()| methods and also call certain |internal_*()| methods (specifically, KeyedHistogram::GetHistogram() calls internal_HistogramGet() and KeyedHistogram::GetEnumId() calls internal_GetHistogramEnumId()).
The method |KeyedHistogram::ReflectKeyedHistogram()| does not require passing a lock instance even though it calls |internal_ReflectHistogramSnapshot()|. It, along with a few of the listed methods that cannot be adopted yet, but call other |internal_*()| methods, acquires |StaticMutexAutoLock locker(gTelemetryHistogramMutex);| with a limited scope.
Attachment #8866551 -
Flags: review?(chutten)
Comment 4•7 years ago
|
||
Comment on attachment 8866551 [details] [diff] [review]
Proposed patch for bug 1362953.
Review of attachment 8866551 [details] [diff] [review]:
-----------------------------------------------------------------
Overall you should not be creating any lockers. If you find yourself in a situation where you have to, then it gets interesting.
If any call descending from where you think you need a new locker might call into the JS engine, then we can't do it. Deadlocks will hit us. In that case we need to remove the locker parameter from those internal_* methods and accept that we may race.
If it doesn't call into the JS engine, then a tightly-scoped new locker may be possible. Call these cases out specifically when you put it up for review so that I don't accidentally miss one :)
::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +858,5 @@
> +// it is used, and (2) because it is never de-initialised, and
> +// a normal Mutex would show up as a leak in BloatView. StaticMutex
> +// also has the "OffTheBooks" property, so it won't show as a leak
> +// in BloatView.
> +static StaticMutex gTelemetryHistogramMutex;
It doesn't appear as though you need to move these lines
@@ +1081,5 @@
> return false;
> }
>
> + {
> + StaticMutexAutoLock locker(gTelemetryHistogramMutex);
This is likely to cause problems as reflecting the histogram snapshot could cause the JS engine to try to allocate memory which could cause it to start a GC which would cause it to accumulate telemetry which would deadlock on trying to reacquire this mutex.
Your patch should not actually contain any StaticMutexAutoLock creations. It should only pass the ones around that already have been created.
@@ +2058,5 @@
> TelemetryHistogram::Accumulate(mozilla::Telemetry::HistogramID aID,
> const nsCString& aKey, uint32_t aSample)
> {
> + {
> + StaticMutexAutoLock locker(gTelemetryHistogramMutex);
In this case and a few others in your patch you can just raise the original `locker` to the top of the function.
Attachment #8866551 -
Flags: review?(chutten) → review-
Comment 5•7 years ago
|
||
(In reply to Chris H-C :chutten from comment #4)
> Comment on attachment 8866551 [details] [diff] [review]
> Proposed patch for bug 1362953.
>
> Review of attachment 8866551 [details] [diff] [review]:
> -----------------------------------------------------------------
> Your patch should not actually contain any StaticMutexAutoLock creations. It
> should only pass the ones around that already have been created.
Oops. I thought it needs to be enforced to all |internal_*()| methods, which necessitated adding more. I've fixed this now.
> @@ +2058,5 @@
> > TelemetryHistogram::Accumulate(mozilla::Telemetry::HistogramID aID,
> > const nsCString& aKey, uint32_t aSample)
> > {
> > + {
> > + StaticMutexAutoLock locker(gTelemetryHistogramMutex);
>
> In this case and a few others in your patch you can just raise the original
> `locker` to the top of the function.
That would've worked too. I thought there was a specific reason behind keeping it lower, and hence, chose fine-grained locking over coarse-grained locking. I've fixed this now.
Comment 6•7 years ago
|
||
I was able to build successfully. The patch passed all tests.
I've added a tightly scoped locker to internal_IdentifyCorruptHistograms(), even though it is called by TelemetryHistogram::CreateHistogramSnapshots(), as it would allow enforcing the pattern to internal_Accumulate(), and subsequently, to internal_RemoteAccumulate(), internal_HistogramAdd() and internal_CanRecordExtended().
For the above-mentioned implementation, the declaration of |gTelemetryHistogramMutex| has been moved up.
The following functions have been left untouched :
** internal_CanRecordBase() -> called by internal_JSHistogram_Add()
** internal_GetHistogramEnumId() -> called by internal_JSHistogram_Add()
** internal_HistogramClear() -> called by internal_JSHistogram_Clear()
** internal_ReflectHistogramSnapshot() -> called by internal_JSHistogram_Snapshot(), internal_KeyedHistogram_SnapshotImpl(), and TelemetryHistogram::CreateHistogramSnapshots()
** internal_IsEmpty() -> called by TelemetryHistogram::CreateHistogramSnapshots()
** internal_IsExpired() -> called by TelemetryHistogram::CreateHistogramSnapshots()
** internal_ShouldReflectHistogram() -> called only by TelemetryHistogram::CreateHistogramSnapshots()
** internal_IdentifyCorruptHistograms() -> called only by TelemetryHistogram::CreateHistogramSnapshots()
** internal_GetHistogramByEnumId() -> called by TelemetryHistogram::CreateHistogramSnapshots()
** internal_GetSubsessionHistogram() -> called by TelemetryHistogram::CreateHistogramSnapshots()
** internal_GetHistogramMapEntry() -> called by internal_GetHistogramEnumId(), which is further called by internal_JSHistogram_Add().
Since internal_GetHistogramEnumId() can also be called by functions possessing the lock, it cannot try to possess lock again.
** internal_CloneHistogram() -> called by internal_GetSubsessionHistogram(), which is further called by TelemetryHistogram::CreateHistogramSnapshots().
Since internal_GetSubsessionHistogram() can also be called by functions possessing the lock, it cannot try to possess the lock again.
** internal_HistogramGet() -> called by internal_GetHistogramByEnumId(), which is further called by TelemetryHistogram::CreateHistogramSnapshots().
Since internal_GetHistogramByEnumId() can also be called by functions possessing the lock, it cannot try to possess the lock again.
** internal_ReflectHistogramAndSamples() -> called only by internal_ReflectHistogramSnapshot(). Therefore, left untouched.
** internal_FillRanges() -> called only by internal_ReflectHistogramAndSamples(). Therefore, left untouched.
** internal_CheckHistogramArguments() -> called only by internal_HistogramGet(). Therefore, left untouched.
Attachment #8866551 -
Attachment is obsolete: true
Attachment #8867307 -
Flags: review?(chutten)
Comment 7•7 years ago
|
||
Comment on attachment 8867307 [details] [diff] [review]
Rectified patch for bug 1362953.
Review of attachment 8867307 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +742,5 @@
> } else if (check & Histogram::COUNT_LOW_ERROR) {
> corruptID = mozilla::Telemetry::TOTAL_COUNT_LOW_ERRORS;
> }
> + {
> + StaticMutexAutoLock locker(gTelemetryHistogramMutex);
If this is okay, then the entirety of internal_IdentifyCorruptHistograms is okay. If the entirety's okay, then you can push the tightly-scoped locker into TelemetryHistogram::CreateHistogramSnapshots around the call site. Then this would be one fewer internal_* function missing a locker.
@@ +1174,4 @@
> }
>
> bool
> +internal_RemoteAccumulate(const StaticMutexAutoLock& lock,
end-of-line whitespace should be omitted.
@@ +1209,4 @@
> return true;
> }
>
> +void internal_Accumulate(const StaticMutexAutoLock& lock,
more EOL whitespace
Attachment #8867307 -
Flags: review?(chutten) → review-
Updated•7 years ago
|
Assignee: nobody → vedant.sareen
Status: NEW → ASSIGNED
Comment 8•7 years ago
|
||
Attachment #8867307 -
Attachment is obsolete: true
Attachment #8867849 -
Flags: review?(chutten)
Comment 9•7 years ago
|
||
Comment on attachment 8867849 [details] [diff] [review]
Cleaned up patch for bug 1362953.
Review of attachment 8867849 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, I'll send this to try tomorrow morning.
In the meantime, if you're looking at getting the rest of these in Part 2, may I recommend tracking your work on bug 1278821? Apparently we have a bug for it we've been neglecting this past little while :)
Attachment #8867849 -
Flags: review?(chutten) → review+
Comment 10•7 years ago
|
||
Comment 11•7 years ago
|
||
It seems as though some of the media tests are, once again, showing us when Telemetry locks up. Somewhere in this patch we must have introduced a deadlock.
:jseward, there are tools available to help us out with this, right? I seem to recall the name "TSan" being bandied about, is that available on all of our Tier 1 platforms? (and is there documentation about running it?)
Flags: needinfo?(jseward)
Comment 12•7 years ago
|
||
TSan is one of those tools that "probably works for Firefox" most of the time,
but AFAIK nobody actually ensures that Fx always works on it. It's only
available on x86_64-linux. I haven't run it for about six months.
If you can reproduce the hangs in a debugger, obviously the thing to do is look
at the stacks of the 2 (or more) threads involved.
If not, you have a difficult problem. One thing you could look for is
inconsistencies in lock order acquisition. That is, if there are two locks, A
and B, that must be acquired in order to complete some activity, then all
threads must acquire them in the same order. It doesn't matter what the order
is (it can be A first and then B, or B first and then A). What can happen if
the order is inconsistent, is that one thread can acquire A and then wait for
B, whilst another acquires B and then waits for A. So they deadlock.
I believe that debug builds of Firefox have some amount lock cycle detection
built in, although I suspect that doesn't apply to all flavours of locks.
Nathan might know more.
Flags: needinfo?(jseward) → needinfo?(nfroyd)
Comment 13•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #12)
> I believe that debug builds of Firefox have some amount lock cycle detection
> built in, although I suspect that doesn't apply to all flavours of locks.
> Nathan might know more.
Yes, we'll do deadlock detection for mozilla::Mutex. We won't do it for the lock flavor used by the IPC code, which appears to be implicated in some of the crash stacks from that try run. Perhaps converting those locks to mozilla::Mutex, which would be an excellent cleanup on its own, might assist you in finding the bug?
Flags: needinfo?(nfroyd)
Comment 14•7 years ago
|
||
According to the comments[1] we are using StaticMutex to ensure thread-safe initialization and to skip BloatView. Are those concerns no longer founded? (can we convert safely?)
[1]: http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp#1135
Flags: needinfo?(nfroyd)
Comment 15•7 years ago
|
||
(In reply to Chris H-C :chutten from comment #14)
> According to the comments[1] we are using StaticMutex to ensure thread-safe
> initialization and to skip BloatView. Are those concerns no longer founded?
> (can we convert safely?)
Oh, sorry. When I said "lock flavor used by the IPC code", I meant the locks used primarily (only?) by the code under $SRCDIR/ipc:
http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/lock.h#13
It looked like you might be calling into that code from Telemetry and dead/livelocking in some way, which would be difficult to detect with the IPC-specific locks not participating in our deadlock detection. (TSan runs ought to be able to catch that sort of thing, but our deadlock detection would have the virtue of working well on try, so...)
The comments that you refer to are still valid.
Flags: needinfo?(nfroyd)
Comment 16•7 years ago
|
||
(To be clear, the errors in communication were entirely on my end!)
Reporter | ||
Comment 17•7 years ago
|
||
Hi Vedant, are you still working on this?
Flags: needinfo?(gfritzsche)
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(gfritzsche) → needinfo?(vedantsareen)
Comment 18•7 years ago
|
||
Hey!
Yes, I am still working on it. However, my pace of work has been extremely slow, as I've been caught up in a lot of other work too :/
Since the bug activity has been idle for 4 months, I guess it would make sense to mark this bug as unassigned for now, just in case someone else is also interested in working on it.
If the bug remains free and if I'm able to make some progress in this, I'll make sure to upload a WIP patch for it :)
I humbly apologize for the inconvenience caused :/
Flags: needinfo?(vedantsareen)
Reporter | ||
Comment 19•7 years ago
|
||
That sounds good to me :)
And no worries, thanks for your work on it so far!
Assignee: vedantsareen → nobody
Status: ASSIGNED → NEW
Priority: P3 → P4
Comment 20•7 years ago
|
||
Hi, I'd like to try fixing this bug. Where should I start?
Flags: needinfo?(gfritzsche)
Reporter | ||
Comment 21•7 years ago
|
||
Chris spent some time on refactoring the TelemetryHistogram code recently.
Chris, can you help get Leo started?
Flags: needinfo?(gfritzsche) → needinfo?(chutten)
Comment 22•7 years ago
|
||
The first step would be to get the Firefox source code and build it. This is not an easy bug, so I will trust that you are able to find out how to do this on your own. If you can't, no worries! We have lots of easy bugs to start with[1] and you can always come back to this one later.
The second step would be to carefully read the bug comments above. There is a lot of information we learned through the first attempt at this bug, and any successful attempt would do well to keep those lessons in mind.
The third will be to rework the TelemetryHistogram code so that TelemetryHistogram::* methods continue to generate locks, passing them as proof to internal_* methods. (see early comments on this bug)
The fourth will be to test this locally. It should build without errors or warnings, and ./mach test toolkit/components/telemetry/tests should succeed without errors or deadlock or timeout.
The fifth will be to submit the code here as a patch that I can review. You can use MozReview or attach the patch as a file to this bug.
Then we review and retest for a bit. Then we push the patch to try to make sure it builds and tests on all of the Operating Systems we need to support.
Then we push it to mozilla-central and it gets included in a build!
Do you wish to continue with this bug?
[1]: https://www.joshmatthews.net/bugsahoy/?internals=1
Flags: needinfo?(chutten) → needinfo?(lkhodel)
Comment 23•7 years ago
|
||
(In reply to Chris H-C :chutten from comment #22)
> The first step would be to get the Firefox source code and build it. This is
> not an easy bug, so I will trust that you are able to find out how to do
> this on your own. If you can't, no worries! We have lots of easy bugs to
> start with[1] and you can always come back to this one later.
No problem, I've landed a very simple bug fix, so I know the process.
>
> The second step would be to carefully read the bug comments above. There is
> a lot of information we learned through the first attempt at this bug, and
> any successful attempt would do well to keep those lessons in mind.
Will do!
>
> The third will be to rework the TelemetryHistogram code so that
> TelemetryHistogram::* methods continue to generate locks, passing them as
> proof to internal_* methods. (see early comments on this bug)
>
> The fourth will be to test this locally. It should build without errors or
> warnings, and ./mach test toolkit/components/telemetry/tests should succeed
> without errors or deadlock or timeout.
>
> The fifth will be to submit the code here as a patch that I can review. You
> can use MozReview or attach the patch as a file to this bug.
>
> Then we review and retest for a bit. Then we push the patch to try to make
> sure it builds and tests on all of the Operating Systems we need to support.
>
> Then we push it to mozilla-central and it gets included in a build!
Got it.
>
> Do you wish to continue with this bug?
>
> [1]: https://www.joshmatthews.net/bugsahoy/?internals=1
Yes, I do. I'll get started and will let you know as soon as I have any questions.
Thanks!
Flags: needinfo?(lkhodel)
Updated•7 years ago
|
Assignee: nobody → lkhodel
Status: NEW → ASSIGNED
Comment 24•7 years ago
|
||
Hello :lvk, how are things progressing here? Anything I can help with?
Flags: needinfo?(lkhodel)
Comment 25•7 years ago
|
||
Hi :chutten, I'm getting familiar with the code that I'm going to be changing but I got ahead myself trying to build a callgraph for the internal methods using rtags. I've got Eclipse set up with the index and I'm going to try to walk through the call hierarchy through the UI, but it'd be really nice to have a tool that can help me do it more efficiently. I found 'Callgraph' here:
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Callgraph
but it depends on Treehydra, which was abandoned in 2010, so I'm not sure it's the best option. Is there something else available that I'm missing? I've Googled around but couldn't find anything that worked out of the box.
Flags: needinfo?(lkhodel) → needinfo?(chutten)
Comment 26•7 years ago
|
||
Hrm, that's a tough one. My tooling amounts to just a couple of vim instances, so I might not be the best person to ask. The #developer channel on irc.mozilla.org[1] might be able to help you out.
In general, calls come in on TelemetryHistogram::* methods where they acquire the lock before calling any number of internal_* methods in any order. There are also the JS entrypoints we register (the JSHistogram and JSKeyedHistogram functions) which have a problematic lack of locks (which are being addressed in bug 1278821), and are alternate places calls may enter the module.
[1]: https://wiki.mozilla.org/Irc
Flags: needinfo?(chutten)
Comment 27•7 years ago
|
||
(In reply to Chris H-C :chutten from comment #26)
> Hrm, that's a tough one. My tooling amounts to just a couple of vim
> instances, so I might not be the best person to ask. The #developer channel
> on irc.mozilla.org[1] might be able to help you out.
Got it, thanks. I found Sourcetrail[1], which is free for non-commercial use, but haven't had a chance to get it running.
>
> In general, calls come in on TelemetryHistogram::* methods where they
> acquire the lock before calling any number of internal_* methods in any
> order. There are also the JS entrypoints we register (the JSHistogram and
> JSKeyedHistogram functions) which have a problematic lack of locks (which
> are being addressed in bug 1278821), and are alternate places calls may
> enter the module.
>
> [1]: https://wiki.mozilla.org/Irc
Good to know, thanks! I'm going to get started refactoring and will let you know if I have any questions.
[1]: https://www.sourcetrail.com/
Comment 28•7 years ago
|
||
Hello, Leo! How are things going with this bug?
If it hasn't been going well, no worries. I'm here to help. And if that help means helping find you a different bug to work on, that's okay, too!
Flags: needinfo?(lkhodel)
Updated•7 years ago
|
Assignee: lkhodel → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(lkhodel)
Comment 29•7 years ago
|
||
Hello there,
Mind if I give this one a try ?
It seems like a heavy one !
Comment 30•7 years ago
|
||
Sure! Let me know if you have any questions.
Assignee: nobody → jeremy.lempereur
Status: NEW → ASSIGNED
Comment 31•7 years ago
|
||
Hey Jeremy, I know you have a few other things you're looking into across the project. Are you still planning on looking into this sometime soon?
Flags: needinfo?(jeremy.lempereur)
Comment 32•7 years ago
|
||
Hey Chris :),
Considering the Autotimer / Autocounter + benchmarks I'd rather pass on that one ^^
Flags: needinfo?(jeremy.lempereur)
Updated•7 years ago
|
Assignee: jeremy.lempereur → nobody
Status: ASSIGNED → NEW
Comment 33•7 years ago
|
||
Michael, would you be interested in taking this to the finish line?
Flags: needinfo?(mcalabrese85)
Assignee | ||
Comment 34•7 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #33)
> Michael, would you be interested in taking this to the finish line?
Hey Alessio, I'm looking into this now. I may be a little slow on it for about two weeks as I'll be traveling, though. While I'm looking it over, I want to make sure I have the constraints right:
1. No new locks should be made, only the ones that are already there should be passed around.
2. Each lock will be passed into each internal_* function that follows the creation of the lock, and will continue to be passed on if the callee calls another internal_* function. Ex: internal_JSHistogram_Add creates a lock before calling internal_Accumulate. internal_Accumulate should take the lock as an argument and use it when it calls internal_RemoteAccumulate, which has also be modified to accept the lock as a parameter.
3. If necessary, fine-grained locks can become more coarse-grained ones if needed to cover more internal functions. This may be done in a function like TelemetryHistogram::SetHistogramRecordingEnabled, where internal_IsHistogramEnumId is called without a lock although one is created before internal_SetHistogramRecordingEnabled.
4. Locks should not occur before calling interal_JS* functions because they may try to create their own lock, causing a deadlock.
Does that seem right? Or am I missing something?
Also, is markdown still available for comments? I tried a few things, but nothing shows up in the preview and I don't see the expected checkbox that is mentioned somewhere on the Bugzilla site.
Flags: needinfo?(mcalabrese85)
Comment 35•7 years ago
|
||
(In reply to Michael Calabrese from comment #34)
> (In reply to Alessio Placitelli [:Dexter] from comment #33)
> > Michael, would you be interested in taking this to the finish line?
>
> Hey Alessio, I'm looking into this now. I may be a little slow on it for
> about two weeks as I'll be traveling, though. While I'm looking it over, I
> want to make sure I have the constraints right:
>
> 1. No new locks should be made, only the ones that are already there should
> be passed around.
>
> 2. Each lock will be passed into each internal_* function that follows the
> creation of the lock, and will continue to be passed on if the callee calls
> another internal_* function. Ex: internal_JSHistogram_Add creates a lock
> before calling internal_Accumulate. internal_Accumulate should take the lock
> as an argument and use it when it calls internal_RemoteAccumulate, which has
> also be modified to accept the lock as a parameter.
>
> 3. If necessary, fine-grained locks can become more coarse-grained ones if
> needed to cover more internal functions. This may be done in a function like
> TelemetryHistogram::SetHistogramRecordingEnabled, where
> internal_IsHistogramEnumId is called without a lock although one is created
> before internal_SetHistogramRecordingEnabled.
>
> 4. Locks should not occur before calling interal_JS* functions because they
> may try to create their own lock, causing a deadlock.
>
> Does that seem right? Or am I missing something?
Yes, this sounds about right! Let's evaluate (3) on a case by case basis.
> Also, is markdown still available for comments? I tried a few things, but
> nothing shows up in the preview and I don't see the expected checkbox that
> is mentioned somewhere on the Bugzilla site.
Markdown is just available for comments in MozReview I'm afraid.
Assignee | ||
Comment 36•6 years ago
|
||
Awesome, thank you. I don't intend to mess with case 3 much, if at all, just wanted it out there as a possibility. I plan on working on this for at least a few hours each day over the next few days and hopefully more over the weekend if necessary.
Comment 37•6 years ago
|
||
(In reply to Michael Calabrese from comment #36)
> Awesome, thank you. I don't intend to mess with case 3 much, if at all, just
> wanted it out there as a possibility. I plan on working on this for at least
> a few hours each day over the next few days and hopefully more over the
> weekend if necessary.
Definitely, thank you for that! Take your time and let us know if you're blocked or want to discuss about it :)
Assignee: nobody → mcalabrese85
Comment hidden (mozreview-request) |
Comment 39•6 years ago
|
||
mozreview-review |
Comment on attachment 8976371 [details]
Bug 1362953 - Refactored lock-protected TelemetryHistogram code to be more similar to other sections of code.
https://reviewboard.mozilla.org/r/244534/#review250652
This looks good as a start, thank you! I'm mostly pointing out that the argument needs to be rename, hence the long list of changes requested. Moreover, I would suggest you rebase your change on a more recent version of our mozilla-central: a few bugs landed lately that changed TelemetryHistogram.cpp. Even better, you might want to wait a couple of days for bug 1457127 to land before rebasing :)
::: commit-message-3c9d6:1
(Diff revision 1)
> +Bug 1362953 - Refactored lock-protected TelemetryHistogram code to be more similar to other sections of code. r?Dexter
Let's be specific about what we did here :) Something like "Bug 1362953 - Require lock proof in TelemetryHistogram internal functions. r?dexter"
::: toolkit/components/telemetry/TelemetryHistogram.cpp:266
(Diff revision 1)
> ProcessID aProcessId)
> {
> return aHistogramId * size_t(ProcessID::Count) + size_t(aProcessId);
> }
>
> -size_t internal_HistogramStorageIndex(HistogramID aHistogramId,
> +size_t internal_HistogramStorageIndex(const StaticMutexAutoLock& lock,
Let's follow our [code style guide](https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style) and call the parameter `aLock` instead of `lock`. Let's do this here and in other places where you added it.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:278
(Diff revision 1)
> "Too many histograms and processes to store in a 1D array.");
>
> return aHistogramId * size_t(ProcessID::Count) + size_t(aProcessId);
> }
>
> -Histogram* internal_GetHistogramFromStorage(HistogramID aHistogramId,
> +Histogram* internal_GetHistogramFromStorage(const StaticMutexAutoLock& lock,
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:286
(Diff revision 1)
> {
> - size_t index = internal_HistogramStorageIndex(aHistogramId, aProcessId);
> + size_t index = internal_HistogramStorageIndex(lock, aHistogramId, aProcessId);
> return gHistogramStorage[index];
> }
>
> -void internal_SetHistogramInStorage(HistogramID aHistogramId,
> +void internal_SetHistogramInStorage(const StaticMutexAutoLock& lock,
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:333
(Diff revision 1)
> return aID < HistogramCount;
> }
>
> // Look up a plain histogram by id.
> Histogram*
> -internal_GetHistogramById(HistogramID histogramId, ProcessID processId, bool instantiate = true)
> +internal_GetHistogramById(const StaticMutexAutoLock& lock, HistogramID histogramId, ProcessID processId, bool instantiate = true)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:377
(Diff revision 1)
> return kh;
> }
>
> // Look up a histogram id from a histogram name.
> nsresult
> -internal_GetHistogramIdByName(const nsACString& name, HistogramID* id)
> +internal_GetHistogramIdByName(const StaticMutexAutoLock& lock, const nsACString& name, HistogramID* id)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:389
(Diff revision 1)
> return NS_OK;
> }
>
> // Clear a histogram from storage.
> void
> -internal_ClearHistogramById(HistogramID histogramId, ProcessID processId)
> +internal_ClearHistogramById(const StaticMutexAutoLock& lock, HistogramID histogramId, ProcessID processId)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:421
(Diff revision 1)
> return gCanRecordExtended;
> }
>
> // Note: this is completely unrelated to mozilla::IsEmpty.
> bool
> -internal_IsEmpty(const Histogram *h)
> +internal_IsEmpty(const StaticMutexAutoLock& lock, const Histogram *h)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:427
(Diff revision 1)
> {
> return h->is_empty();
> }
>
> bool
> -internal_IsExpired(Histogram* h)
> +internal_IsExpired(const StaticMutexAutoLock& lock, Histogram* h)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:433
(Diff revision 1)
> {
> return h == gExpiredHistogram;
> }
>
> void
> -internal_SetHistogramRecordingEnabled(HistogramID id, bool aEnabled)
> +internal_SetHistogramRecordingEnabled(const StaticMutexAutoLock& lock, HistogramID id, bool aEnabled)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:440
(Diff revision 1)
> MOZ_ASSERT(internal_IsHistogramEnumId(id));
> gHistogramRecordingDisabled[id] = !aEnabled;
> }
>
> bool
> internal_IsRecordingEnabled(HistogramID id)
Any reason why you didn't add the lock here as well?
::: toolkit/components/telemetry/TelemetryHistogram.cpp:602
(Diff revision 1)
>
> return h;
> }
>
> nsresult
> -internal_HistogramAdd(Histogram& histogram,
> +internal_HistogramAdd(const StaticMutexAutoLock& lock,
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:743
(Diff revision 1)
>
> return NS_OK;
> }
>
> bool
> -internal_ShouldReflectHistogram(Histogram* h, HistogramID id)
> +internal_ShouldReflectHistogram(const StaticMutexAutoLock& lock, Histogram* h, HistogramID id)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:990
(Diff revision 1)
> // PRIVATE: thread-unsafe helpers for the external interface
>
> namespace {
>
> bool
> -internal_RemoteAccumulate(HistogramID aId, uint32_t aSample)
> +internal_RemoteAccumulate(const StaticMutexAutoLock& lock, HistogramID aId, uint32_t aSample)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:1005
(Diff revision 1)
> TelemetryIPCAccumulator::AccumulateChildHistogram(aId, aSample);
> return true;
> }
>
> bool
> -internal_RemoteAccumulate(HistogramID aId,
> +internal_RemoteAccumulate(const StaticMutexAutoLock& lock,
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:1021
(Diff revision 1)
>
> TelemetryIPCAccumulator::AccumulateChildKeyedHistogram(aId, aKey, aSample);
> return true;
> }
>
> -void internal_Accumulate(HistogramID aId, uint32_t aSample)
> +void internal_Accumulate(const StaticMutexAutoLock& lock, HistogramID aId, uint32_t aSample)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:1034
(Diff revision 1)
> MOZ_ASSERT(h);
> - internal_HistogramAdd(*h, aId, aSample, ProcessID::Parent);
> + internal_HistogramAdd(lock, *h, aId, aSample, ProcessID::Parent);
> }
>
> void
> -internal_Accumulate(HistogramID aId,
> +internal_Accumulate(const StaticMutexAutoLock& lock, HistogramID aId,
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:1048
(Diff revision 1)
> MOZ_ASSERT(keyed);
> keyed->Add(aKey, aSample, ProcessID::Parent);
> }
>
> void
> -internal_AccumulateChild(ProcessID aProcessType, HistogramID aId, uint32_t aSample)
> +internal_AccumulateChild(const StaticMutexAutoLock& lock, ProcessID aProcessType, HistogramID aId, uint32_t aSample)
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:1062
(Diff revision 1)
> NS_WARNING("Failed GetHistogramById for CHILD");
> }
> }
>
> void
> -internal_AccumulateChildKeyed(ProcessID aProcessType, HistogramID aId,
> +internal_AccumulateChildKeyed(const StaticMutexAutoLock& lock, ProcessID aProcessType, HistogramID aId,
This should be `aLock`.
::: toolkit/components/telemetry/TelemetryHistogram.cpp:1075
(Diff revision 1)
> MOZ_ASSERT(keyed);
> keyed->Add(aKey, aSample, aProcessType);
> }
>
> void
> -internal_ClearHistogram(HistogramID id)
> +internal_ClearHistogram(const StaticMutexAutoLock& lock, HistogramID id)
This should be `aLock`.
Attachment #8976371 -
Flags: review?(alessio.placitelli)
Updated•6 years ago
|
Attachment #8867849 -
Attachment is obsolete: true
Assignee | ||
Comment 40•6 years ago
|
||
Oops, sorry about the variable name. That can be corrected easily, thankfully.
internal_IsRecordingEnabled cannot be locked because KeyedHistogram::Add does not have a lock to pass to the function. I'm not sure how KeyedHistogram::Add is called, so I'm not sure if a lock can be created in that function.
internal_IsHistogramEnumId cannot be locked because internal_IsRecordingEnabled, internal_KeyedHistogram_SnapshotImpl and internal_JSHistogram_Add do not contain locks. internal_JSKeyedHistogram_Clear and TelemetryHistogram::GetHistogramName have locks, but they are created after internal_IsHistogramEnumId is called.
In internal_JSKeyedHistogram_Clear the lock is created right after internal_IsHistogramEnumId is called, so that could be moved. In TelemetryHistogram::GetHistogramName, the same could probable happen as there is only an NS_WARN_IF and MOZ_ASSERT_UNREACHABLE call before the lock is created. internal_IsRecordingEnabled has the problem listed above, which is the same for caller functions regarding internal_KeyedHistogram_SnapshotImpl and internal_JSHistogram_Add.
I see that Bug 1457127 may change the code further, so I will keep an eye on it. I had merged changes that had been made already, but can understand that it may make it difficult for others if multiple people are working on the same file.
Comment 41•6 years ago
|
||
Hi Michael, bug 1457127 landed and this bug is now unblocked :) Feel free to rebase and work on it again!
Flags: needinfo?(mcalabrese85)
Comment hidden (mozreview-request) |
Assignee | ||
Updated•6 years ago
|
Attachment #8976371 -
Attachment is obsolete: true
Assignee | ||
Comment 43•6 years ago
|
||
Hi Alessio,
I updated the patch to work with the new code. As far as I can tell the new code did not require additional functions to use the lock as a parameter (or at least none that you did not already implement in the other patches for Bug 1457127), but please let me if this is incorrect.
Flags: needinfo?(mcalabrese85)
Comment 44•6 years ago
|
||
mozreview-review |
Comment on attachment 8979820 [details]
Bug 1362953 - Require lock proof in TelemetryHistogram internal functions where possible.
https://reviewboard.mozilla.org/r/245986/#review252048
Hi Michael, this looks like it's almost there. A few whitespaces were introduced mistakenly, so let's get rid of these and we should be done!
::: toolkit/components/telemetry/TelemetryHistogram.cpp:398
(Diff revision 1)
> return kh;
> }
>
> // Look up a histogram id from a histogram name.
> nsresult
> -internal_GetHistogramIdByName(const nsACString& name, HistogramID* id)
> +internal_GetHistogramIdByName(const StaticMutexAutoLock& aLock,
nit: there's a trailing space at the end of this
::: toolkit/components/telemetry/TelemetryHistogram.cpp:413
(Diff revision 1)
> }
>
> // Clear a histogram from storage.
> void
> -internal_ClearHistogramById(HistogramID histogramId, ProcessID processId)
> +internal_ClearHistogramById(const StaticMutexAutoLock& aLock,
> + HistogramID histogramId,
nit: there's a trailing space at the end of this
::: toolkit/components/telemetry/TelemetryHistogram.cpp:468
(Diff revision 1)
> {
> return h == gExpiredHistogram;
> }
>
> void
> -internal_SetHistogramRecordingEnabled(HistogramID id, bool aEnabled)
> +internal_SetHistogramRecordingEnabled(const StaticMutexAutoLock& aLock,
nit: there's a trailing space at the end of this
::: toolkit/components/telemetry/TelemetryHistogram.cpp:639
(Diff revision 1)
>
> return h;
> }
>
> nsresult
> -internal_HistogramAdd(Histogram& histogram,
> +internal_HistogramAdd(const StaticMutexAutoLock& aLock,
nit: there's a trailing space at the end of this
::: toolkit/components/telemetry/TelemetryHistogram.cpp:1225
(Diff revision 1)
> MOZ_ASSERT(keyed);
> keyed->Add(aKey, aSample, ProcessID::Parent);
> }
>
> void
> -internal_AccumulateChild(ProcessID aProcessType, HistogramID aId, uint32_t aSample)
> +internal_AccumulateChild(const StaticMutexAutoLock& aLock,
nit: there's a trailing space at the end of this
Attachment #8979820 -
Flags: review?(alessio.placitelli)
Assignee | ||
Comment 45•6 years ago
|
||
Hello Alessio,
Thank you for your review. Is the trailing space what was being shown in the diff as a solid red block? I wasn't sure what it meant, thought it just meant an additional line break was added. And when I push the commit for review, do I just mention that trailing whitespace was removed or should I treat it as a new patch that is used in place of the one previously submitted?
Comment 46•6 years ago
|
||
(In reply to Michael Calabrese from comment #45)
> Hello Alessio,
>
> Thank you for your review. Is the trailing space what was being shown in the
> diff as a solid red block? I wasn't sure what it meant, thought it just
> meant an additional line break was added.
Yes, the solid red blocks :)
> And when I push the commit for
> review, do I just mention that trailing whitespace was removed or should I
> treat it as a new patch that is used in place of the one previously
> submitted?
You can amend the commit and push again the updated version. With the current commit as the "tip" of your repo, you can do this:
> hg log -r .
To check that your commit is the most recent one. Then you take off the whitespaces. After that, you do
> hg diff
And verify that you only fixed the whitespaces.
Finally
> hg commit --amend
> hg push review
Comment hidden (mozreview-request) |
Comment 48•6 years ago
|
||
mozreview-review |
Comment on attachment 8979820 [details]
Bug 1362953 - Require lock proof in TelemetryHistogram internal functions where possible.
https://reviewboard.mozilla.org/r/245986/#review252800
This looks good to me now, thank you for your efforts and patience!
Attachment #8979820 -
Flags: review?(alessio.placitelli) → review+
Comment 49•6 years ago
|
||
Hey Michael, looks like this is failing to build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b74f2f8576bef4a44cebdd53f5f1cba147293bbe&selectedJob=180188595
Would you kindly take a look at what's going on?
Flags: needinfo?(mcalabrese85)
Assignee | ||
Comment 50•6 years ago
|
||
Hey Alessio,
Looks like I didn't look over some of the functions that were added closely enough after all. I didn't catch them because I have only been testing on Windows so far, but I will be changing that habit in the future. The patch is now updated and builds and runs correctly on my Android phone and tablet.
Unfortunately, I cannot run the automated tests for some reason. I'm not sure if it is because I'm running in a virtual machine or because I need to have the devices rooted at the moment. I'll have to look into it more over the weekend and try to figure it out before I push the revised patch.
Flags: needinfo?(mcalabrese85)
Comment 51•6 years ago
|
||
(In reply to Michael Calabrese from comment #50)
> Hey Alessio,
>
> Looks like I didn't look over some of the functions that were added closely
> enough after all. I didn't catch them because I have only been testing on
> Windows so far, but I will be changing that habit in the future. The patch
> is now updated and builds and runs correctly on my Android phone and tablet.
Ah, no problem for that. I think you can even trigger a try push for these platforms yourself from MozReview! From "Automation" pick "Trigger a try build", then on the left select the platform (both debug and optimized) and finally the test suite. I would go for xpcshell,gtest,mochitest-e10s-bc !
> Unfortunately, I cannot run the automated tests for some reason. I'm not
> sure if it is because I'm running in a virtual machine or because I need to
> have the devices rooted at the moment. I'll have to look into it more over
> the weekend and try to figure it out before I push the revised patch.
It could be because of both things :) Don't worry, I can push to try if you cannot!
Flags: needinfo?(mcalabrese85)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 53•6 years ago
|
||
Hey Alessio,
No luck with getting the tests to work on my computer. I'm sure it's something small I'm not doing right, but I'll have to wait to jump onto the IRC channel to see if someone can give me a hand.
In the meantime, I pushed the revised patch. I cannot trigger a try build on my own, but I will keep an eye out on the results if you can do it for me.
Comment 54•6 years ago
|
||
(In reply to Michael Calabrese from comment #53)
> Hey Alessio,
>
> No luck with getting the tests to work on my computer. I'm sure it's
> something small I'm not doing right, but I'll have to wait to jump onto the
> IRC channel to see if someone can give me a hand.
No worries, looks like you dealt with the problem the right way :) Unfortunately working on Android is still a bit rough on the edges :)
> In the meantime, I pushed the revised patch. I cannot trigger a try build on
> my own, but I will keep an eye out on the results if you can do it for me.
I've done that (https://treeherder.mozilla.org/#/jobs?repo=try&revision=eeda541f14bb70c3bcd453afb3efcfbdf8d0b657&selectedJob=180668966), looks like this is good to land now!
Flags: needinfo?(mcalabrese85)
Comment 55•6 years ago
|
||
Pushed by alessio.placitelli@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9f9b17412a65
Require lock proof in TelemetryHistogram internal functions where possible. r=Dexter
Comment 56•6 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox62:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in
before you can comment on or make changes to this bug.
Description
•