Closed Bug 664758 Opened 13 years ago Closed 13 years ago

Add page fault reporting to telemetry

Categories

(Core :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

Attachments

(1 file, 5 obsolete files)

Having telemetry on page faults would be useful in general, but also for understanding whether the changes we make in bug 664291 are helpful.
(In reply to comment #0)
> Having telemetry on page faults would be useful in general, but also for
> understanding whether the changes we make in bug 664291 are helpful.

How do you intend to gather page fault stats?
Bug 664486.  I already have something working on Linux and Mac.
URL: 664291
No longer depends on: 664291
Blocks: 664291
URL: 664291
Attached patch Patch v1 (obsolete) (deleted) β€” β€” Splinter Review
Attachment #541399 - Flags: review?(tglek)
Comment on attachment 541399 [details] [diff] [review]
Patch v1

I don't think that's a useful way to measure paging traffic. As I understand it, you are recording the total amount of pagefaults that occured up till probe point..on every probe. This is wasteful.

so a) Decrease the max to 1000ish.
b) Record the current pagefault count in this._prev_pagefaults, then in the idle handler do

faults = current("hard-page-faults") - this._prev_pagefaults
histogram.Add(faults)
Attachment #541399 - Flags: review?(tglek) → review-
That's *much* better way of doing it.

How often are the probes fired?  What's the variance in their spacing?
(In reply to comment #5)
> That's *much* better way of doing it.
> 
> How often are the probes fired?  What's the variance in their spacing?

At most once a minute, but this only fired on idle. http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#52

May be worth tying that to some ui event in the future so we don't run down batteries for no reason.
Attached patch Patch v2 (obsolete) (deleted) β€” β€” Splinter Review
Updating to be compatible with v2 of the page fault patch; one line change to use the new units field on memory reporters.
Attachment #541399 - Attachment is obsolete: true
(In reply to comment #6)
> (In reply to comment #5)
> > That's *much* better way of doing it.
> > 
> > How often are the probes fired?  What's the variance in their spacing?
> 
> At most once a minute, but this only fired on idle.
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/
> TelemetryPing.js#52

Well, we could try to do better than this, but it's probably good as a first pass.

> May be worth tying that to some ui event in the future so we don't run down
> batteries for no reason.

Indeed!
> Decrease the max to 1000ish.

I've seen 40k page faults in a short period of time, so I'm going to set it to 64k, if that's OK with you.  If we have to page in more than (64 * 1024) pages * 4kb/page = 256mb, I'm happy to call that "massive paging".
(In reply to comment #9)
> > Decrease the max to 1000ish.
> 
> I've seen 40k page faults in a short period of time, so I'm going to set it
> to 64k, if that's OK with you.  If we have to page in more than (64 * 1024)
> pages * 4kb/page = 256mb, I'm happy to call that "massive paging".

ok
Attached patch Patch v3 (obsolete) (deleted) β€” β€” Splinter Review
Attachment #541803 - Flags: review?(tglek)
Attachment #541783 - Attachment is obsolete: true
Comment on attachment 541803 [details] [diff] [review]
Patch v3

I have no idea if this actually works, but it passes the Telemetry test.  I'd be happy to write another test, if that's possible!
Attached patch Patch v3 (rebased) (obsolete) (deleted) β€” β€” Splinter Review
Attachment #541812 - Flags: review?(tglek)
Attachment #541803 - Attachment is obsolete: true
Attachment #541803 - Flags: review?(tglek)
(In reply to comment #12)
> Comment on attachment 541803 [details] [diff] [review] [review]
> Patch v3
> 
> I have no idea if this actually works, but it passes the Telemetry test. 
> I'd be happy to write another test, if that's possible!

Modify the existing test to add a double in addition to an int
Comment on attachment 541812 [details] [diff] [review]
Patch v3 (rebased)

>+
>+        // Read mr.amount just once so our arithmetic is consistent.
>+        let curVal = mr.amount;
>+        let prevVal = this._prevValues[mr.path];
>+        if (!prevVal)
>+          prevVal = 0;
This should be { prevVal = curVal; continue}

cold startup causes a bunch of pagefaults, you do not want to count those.

It's also kinda weird to have a check based on Ci.nsIMemoryReporter.UNITS_COUNT. 
Add a comment that 
// this if block is measuring counting pagefaults per period
Attachment #541812 - Flags: review?(tglek) → review+
> It's also kinda weird to have a check based on Ci.nsIMemoryReporter.UNITS_COUNT.
> Add a comment that 
> // this if block is measuring counting pagefaults per period

Were you commenting that it's confusing that I check UNITS_COUNT and need to add a comment, or that I should resolve the weirdness in some other way?
Assignee: nobody → justin.lebar+bug
(In reply to comment #14)
> Modify the existing test to add a double in addition to an int

The reason I added that double conversion code in the first place was because the page fault count was coming in as an int64 (thus a js double) and was causing the test to fail.  I guess the other memory reporters didn't cause a failure because the code did some math on them before adding them to the histogram, so they got downcast to int32s.

Now that we're also doing some math on the UNITS_COUNT value (subtraction), I suspect the jsdouble branch isn't exercised.  But maybe we want to leave it in, since the old behavior was kind of weird.

In that case, it's not clear to me how to test that branch.  test_TelemetryPing.js doesn't appear to add any custom reporters.  Can I even do that without modifying TelemetryHistograms.h?
(In reply to comment #17)

> In that case, it's not clear to me how to test that branch. 
> test_TelemetryPing.js doesn't appear to add any custom reporters.  Can I
> even do that without modifying TelemetryHistograms.h?

Yes, test_nsITelemetry.js does it. nITelemetry::newHistogram lets you create adhoc histograms. If the code isn't exercising it, land this without conversion stuff I'll just add double conversion + testcase to bug 666309.
Attached patch Patch v4 (removes unneded double-conversion code) (obsolete) (deleted) β€” β€” Splinter Review
Attachment #541812 - Attachment is obsolete: true
> Yes, test_nsITelemetry.js does it.

Ah, I see!

> If the code isn't exercising it, land this without conversion stuff I'll just 
> add double conversion + testcase to bug 666309.

I just checked, and it's indeed not exercising that path.  So your plan sounds good to me!
Attached patch Patch v4.1 (fixing stupid bug) (deleted) β€” β€” Splinter Review
Attachment #542180 - Attachment is obsolete: true
(In reply to comment #21)
> Created attachment 542185 [details] [diff] [review] [review]
> Patch v4.1 (fixing stupid bug)

did you mean to r? this or carry over r+?
I was going to carry over the r+, but you're welcome to take another look if you'd like!  I just removed the double conversion code and fixed an accidental |curVal| instead of |val|.
Comment on attachment 542185 [details] [diff] [review]
Patch v4.1 (fixing stupid bug)

>+        name = "Memory:" + mr.path;
>+
>+        // Read mr.amount just once so our arithmetic is consistent.
>+        let curVal = mr.amount;
>+        let prevVal = this._prevValues[mr.path];
>+        if (!prevVal) {
>+          // If this is the first time we're reading this reporter, store its
>+          // current value but don't report it in the telemetry ping, so we
>+          // ignore the effect startup had on the reporter.
>+          this._prevValues[mr.path] = curVal;
>+          continue;
>+        }
>+        val = curVal - prevVal;
>+        this._prevValues[mr.path] = val;
 isn't it supposed to be
this._prevValues[mr.path] = curVal?
Attachment #542185 - Flags: review-
>  isn't it supposed to be
> this._prevValues[mr.path] = curVal?

Ack, yes.  Thanks for noticing that.
Pushed to m-i: http://hg.mozilla.org/integration/mozilla-inbound/rev/3b0160b38c3d
Whiteboard: [inbound]
http://hg.mozilla.org/mozilla-central/rev/3b0160b38c3d
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: