Closed Bug 970686 Opened 11 years ago Closed 11 years ago

Add media packet loss and RTT statistics to the stats API

Categories

(Core :: WebRTC: Audio/Video, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: bwc, Assigned: jib)

References

Details

Attachments

(5 files, 7 obsolete files)

(deleted), patch
jesup
: review+
Details | Diff | Splinter Review
(deleted), patch
jib
: review+
Details | Diff | Splinter Review
(deleted), patch
jesup
: review+
Details | Diff | Splinter Review
(deleted), patch
jib
: review+
Details | Diff | Splinter Review
(deleted), patch
philor
: review+
Details | Diff | Splinter Review
We should expose these statistics via the stats API. Where the API is incomplete, we should push the standards process along.
Assignee: nobody → jib
Blocks: 970682
Summary: Add packet loss and RTT statistics to the stats API → Add media packet loss and RTT statistics to the stats API
Attached patch Add stat.packetsLost (obsolete) (deleted) — Splinter Review
Adds stat.packetsLost to RTCInboundRTPStreamStats per https://groups.google.com/forum/#!topic/mozilla.dev.media/ZLAVAeKnJ-U
Attachment #8376060 - Flags: review?(rjesup)
Attached patch Add stat.packetsLost (2) (deleted) — Splinter Review
Last patch had more changes than necessary.
Attachment #8376060 - Attachment is obsolete: true
Attachment #8376060 - Flags: review?(rjesup)
Attachment #8376066 - Flags: review?(rjesup)
Attachment #8376066 - Flags: review?(rjesup) → review+
Attached patch Add stat.packetsLost (3) (obsolete) (deleted) — Splinter Review
Fixes android bustage. Carrying forward r+ from jesup. Try - https://tbpl.mozilla.org/?tree=Try&rev=c9faa100481d
Attachment #8376066 - Attachment is obsolete: true
Attachment #8376475 - Flags: review+
Attachment #8376475 - Attachment is obsolete: true
Attachment #8376475 - Flags: review+
Whoops! Wrong bug # for last patch. please ignore.
Attachment #8376066 - Attachment is obsolete: false
Good point, I'll add some.
Whiteboard: [leave-open]
Attached patch RTP/RTCP stats tests (obsolete) (deleted) — Splinter Review
Attachment #8380886 - Flags: review?(drno)
Looks like the test found at least one real bug, from https://tbpl.mozilla.org/php/getParsedLog.php?id=35194673&tree=Try&full=1#error0 : > 22:52:09 INFO - 1715 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Valid timestamp. res.timestamp = 3373866330899.1606 <= 1393311129237 > 22:52:09 INFO - 1716 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Valid timestamp. res.timestamp = 1393311129252.0486 <= 1393311129237 The first line (Windows only) shows the RTCP times out of whack by 63 years. Looking into it (maybe an endian problem). The second line (intermittent and on Mac too on the first try) shows timestamps 15 ms into the future. I gather I need to allow for some slop here, but anyone know why this would happen on the builders?
Comment on attachment 8380886 [details] [diff] [review] RTP/RTCP stats tests Review of attachment 8380886 [details] [diff] [review]: ----------------------------------------------------------------- In general looks good to me. Mostly nits. Only one open question if the test could verify even more. ::: dom/media/tests/mochitest/pc.js @@ +1575,5 @@ > var res = stats[key]; > + // validate stats > + ok(res.id == key, "Coherent stats id"); > + ok(res.timestamp >= this.whenCreated && > + res.timestamp <= now, "Valid timestamp"); Not sure if it is just a personal preference, but for testing code I prefer to have separate checks for condition (so two three ok() calls instead of two). That should prevent having to add more instrumentation in case this check starts to fail - and runtime optimization should not be topmost concern for test code. @@ +1582,5 @@ > + > + switch (res.type) { > + case "inboundrtp": > + case "outboundrtp": { > + ok(res.ssrc.length > 0 && res.ssrc.length < 11, "Valid ssrc"); Break into two ok() checks (see above). @@ +1584,5 @@ > + case "inboundrtp": > + case "outboundrtp": { > + ok(res.ssrc.length > 0 && res.ssrc.length < 11, "Valid ssrc"); > + if (res.remoteId) { > + var rem = stats[res.remoteId]; Is this guaranteed to exist in all cases? @@ +1589,5 @@ > + ok(rem.isRemote && > + rem.remoteId == res.id && > + ((res.type == "inboundrtp" && rem.type == "outboundrtp") || > + (res.type == "outboundrtp" && rem.type == "inboundrtp")), > + "Remote links Rtp and Rtcp"); Break into multiple ok() checks (see above). @@ +1590,5 @@ > + rem.remoteId == res.id && > + ((res.type == "inboundrtp" && rem.type == "outboundrtp") || > + (res.type == "outboundrtp" && rem.type == "inboundrtp")), > + "Remote links Rtp and Rtcp"); > + ok(rem.ssrc == res.ssrc, "Rtp/Rtcp ssrc's match"); So here we have matching RTP and RTCP. Can we maybe compare additionally the amount of send and received packets and/or bytes? Probably would need to take into account some percent of possible packet loss. But for sure not more received packets then send. And zero received packets should also fail. @@ +1596,5 @@ > + if (rem.type == "inboundrtp") { > + ok(rem.bytesReceived !== undefined && > + rem.packetsReceived !== undefined && > + rem.packetsLost !== undefined && > + rem.jitter !== undefined, "Valid outboundrtcp"); Break into multiple ok() checks (see above). @@ +1599,5 @@ > + rem.packetsLost !== undefined && > + rem.jitter !== undefined, "Valid outboundrtcp"); > + } else { > + ok(rem.bytesSent !== undefined && > + rem.packetsSent !== undefined, "Valid inboundrtcp"); Break into multiple ok() checks (see above). @@ +1601,5 @@ > + } else { > + ok(rem.bytesSent !== undefined && > + rem.packetsSent !== undefined, "Valid inboundrtcp"); > + } > + } If I understand correct this means we haven't received any RTCP from the far end yet. Is there anything we could about it? Maybe at least log an info() line that remote RTCP is missing? @@ +1604,5 @@ > + } > + } > + if (res.type == "inboundrtp") { > + ok(res.bytesReceived !== undefined && > + res.packetsReceived !== undefined, "Valid inboundrtp"); Break into two ok() checks (see above). @@ +1607,5 @@ > + ok(res.bytesReceived !== undefined && > + res.packetsReceived !== undefined, "Valid inboundrtp"); > + } else { > + ok(res.bytesSent !== undefined && > + res.packetsSent !== undefined, "Valid outboundrtp"); Break into two ok() checks (see above).
Attachment #8380886 - Flags: review?(drno) → review+
(In reply to Nils Ohlmeier [:drno] from comment #12) > > + case "inboundrtp": > > + case "outboundrtp": { > > + ok(res.ssrc.length > 0 && res.ssrc.length < 11, "Valid ssrc"); > > + if (res.remoteId) { > > + var rem = stats[res.remoteId]; > > Is this guaranteed to exist in all cases? If res.remoteId is non-zero, then yes it better point to a record. The test should fail on the very next line dereferencing rem if rem is undefined. > So here we have matching RTP and RTCP. Can we maybe compare additionally the > amount of send and received packets and/or bytes? > Probably would need to take into account some percent of possible packet > loss. But for sure not more received packets then send. And zero received > packets should also fail. Good idea, though RTCP packets are by definition outdated and they're received less frequently, which makes comparing progress numbers hard for anything but basic invariants. I suppose we can never receive more packets and bytes than were sent. But is zero received packets abnormal? Stats are a snapshot, and we don't control how much time has passed, packets may drop, and it may just be starting up. Higher-level tests that control the context may be able to make time-based assertions, but this checkStats test happens on every invocation, so I think certain things are beyond what it could and should do. > If I understand correct this means we haven't received any RTCP from the far > end yet. Is there anything we could about it? Maybe at least log an info() > line that remote RTCP is missing? Repeating answer directly above, these are low-level tests, and currently only test the RTCP info for structure if found. There's currently no guarantee that the suite of tests run long enough for all info to appear before the getStats call (RTCP packets takes a couple of seconds to come in). A quick perusing of the logs shows that they are being tested though, at least intermittently, and that coverage should increase as we add purposeful tests. Sure I can add an info(), but missing RTCP is not an error at this level.
(In reply to Nils Ohlmeier [:drno] from comment #12) > Review of attachment 8380886 [details] [diff] [review]: Additionally, to address comment 11, I'll make this modification. Let me know if you're ok with this so I can carry forward your r+: - var now = Date.now(); + var nowish = Date.now() + 1000; // +- 15 ms observed The other appears to be an actual bug (yay for testing!), so I'll get someone else to review that.
Flags: needinfo?(drno)
(In reply to Jan-Ivar Bruaroey [:jib] from comment #13) > (In reply to Nils Ohlmeier [:drno] from comment #12) > > If I understand correct this means we haven't received any RTCP from the far > > end yet. Is there anything we could about it? Maybe at least log an info() > > line that remote RTCP is missing? > > Repeating answer directly above, these are low-level tests, and currently > only test the RTCP info for structure if found. There's currently no > guarantee that the suite of tests run long enough for all info to appear > before the getStats call (RTCP packets takes a couple of seconds to come > in). A quick perusing of the logs shows that they are being tested though, > at least intermittently, and that coverage should increase as we add > purposeful tests. Sure I can add an info(), but missing RTCP is not an error > at this level. I understand it is not an error. That's why I suggested an info() line, rather then a fail. But as you pointed out the RTCP results are intermittent right now. So if you compare logs I think it would be nice to get a hint on why one test execution looks different then the other. In any case not a show stopper. Feel free to carry forward my r+.
Flags: needinfo?(drno)
Attached patch RTP/RTCP stats tests (2) (obsolete) (deleted) — Splinter Review
Incorporated feedback. Enough changes that you should probably take another look. Doesn't address the Windows bug yet. Try - https://tbpl.mozilla.org/?tree=Try&rev=ae468dded15a
Attachment #8380886 - Attachment is obsolete: true
Attachment #8383312 - Flags: review?(drno)
Comment on attachment 8383312 [details] [diff] [review] RTP/RTCP stats tests (2) Looks good to me.
Attachment #8383312 - Flags: review?(drno) → review+
Depends on: 978239
Attached patch RTP/RTCP stats tests (3) (obsolete) (deleted) — Splinter Review
Increased test tolerance for rtcp clock drift from 1 to 2 seconds because of 1.5 second clock drift observed on Windows builders https://hg.mozilla.org/try/rev/c1a4ae21f52c New try - https://tbpl.mozilla.org/?tree=Try&rev=a079f73230a3 Carrying forward r+ from drno.
Attachment #8385133 - Flags: review+
(In reply to Jan-Ivar Bruaroey [:jib] from comment #18) > 1.5 second clock drift observed on Windows builders Correct link to old try log https://tbpl.mozilla.org/php/getParsedLog.php?id=35446148&tree=Try&full=1
And it may not be clock drift as much as it may be slow js execution, since the now timestamp is gotten outside the for-loop. However, when I tried refreshing the timestamp inside the for-loop, I saw a giant 4.5 second drift of change of some kind in the opposite direction (note the >= going the other way in these asserts): https://tbpl.mozilla.org/php/getParsedLog.php?id=35562261&tree=Try&full=1 - not quite sure what is going on here. So I moved the getting of the timestamp back outside the for-loop and doubled the slop.
Blocks: 979649
Added +/- 10 second cushion and filed Bug 979649 for XP clock drift. Carrying forward r+ from drno. Try - https://tbpl.mozilla.org/?tree=Try&rev=19fc596673fa
Attachment #8383312 - Attachment is obsolete: true
Attachment #8385133 - Attachment is obsolete: true
Attachment #8385754 - Flags: review+
What exactly do we mean with round-trip time in the context of a webrtc call?
Flags: needinfo?(rjesup)
Whiteboard: [leave-open]
Whiteboard: [leave-open]
Or is this just rtt according to RTCP? If so, I don't that we collect this in the webrtc.org code, so I'm looking for tips on how to go about calculating it correctly.
This is according to the RTCP reports. RTT can be calculated from RTCP reports; it may/probably is somewhere already.
Flags: needinfo?(rjesup)
Depends on: 980497
Fixes problems with the RTCP getStats patch landed above. Changes: The following applies to information returned by getStats: - RTCP (remote) stats timestamp should now be correct on outbound connections, reflecting the time the RTCP receiver-report (RR) arrives back at the sender (the most accurate time available, as RR's don't contain timestamps). In the previous patch, the times returned were not from the RR data they were associated with but erroneously from the most recent outgoing RTCP sender-report (SR) instead. - RTCP (remote) stats packets/bytesReceived should now be correct on outbound connections, better reflecting the information returned to the sender in RR. These quantities are derived from knowing what was sent (packets/bytesSent in matching SR) minus packet-loss returned in RR, making bytesReceived approximate. - Re-enabled and re-tightened RTCP tests (fixing Bug 980497 and Bug 981256). Try opt - https://tbpl.mozilla.org/?tree=Try&rev=05b8937a0bec Try dbg - https://tbpl.mozilla.org/?tree=Try&rev=2bdfce09e977 To extract this new information from the webrtc.org code, I added new methods and code to it that disambiguate SR and RR. Why this information wasn't already there: The getStats spec's symmetrical use the RTC[In|Out]boundRTPStreamStats structures to pair RTP and RTCP information, demands information that the asymmetrical RTP/RTCP API, with its SR and RR, does not expose naturally. Specifically, RR do not have a timestamp, nor is its arrival recorded, and it contains packet-loss, but not actual packets/bytes received. Adding to this, the way RTCP info is abstracted in the higher level webrtc.org methods (some confusing names and a general conflation of SR and RR info) made it hard for me to get the right information and easy to get the wrong one. Previous patch fallout: In my previous patch I extended some wrong webrtc.org methods. I'll remove that damage in a separate patch.
Attachment #8390898 - Flags: review?(rjesup)
Depends on: 981256
(In reply to Jan-Ivar Bruaroey [:jib] from comment #29) > Fixes problems with the RTCP getStats patch landed above. Wherever I refer to "patch above", I mean the RTCP patch in Bug 947665, a predecessor to this bug. The reason I'm adding the patch here is because this is where the rtcp tests are which are driving my current work on this. Sorry for any confusion.
Reduces overall modification of webrtc.org code, by undoing extensions we ended up not needing from rtcp patch in Bug 947665. At some point we need to merge with upstream and rebase a combined patch of the three and file a bug for upstream.
Attachment #8390968 - Flags: review?(rjesup)
Comment on attachment 8390898 [details] [diff] [review] Outbound getStats: Fixed RTCP timestamps and remote packets/bytes received. Review of attachment 8390898 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/webidl/RTCStatsReport.webidl @@ +39,5 @@ > double jitter; > unsigned long packetsLost; > long mozAvSyncDelay; > long mozJitterBufferDelay; > + long mozRtt; One wonders how RTT can be negative (call the press!), but possibly due to how the calculation is made or clock changes, etc - I'm not worried about overflowing a long, so long is fine. ::: media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtcp_receiver.cc @@ +237,5 @@ > +int32_t RTCPReceiver::GetReportBlockInfo(uint32_t remoteSSRC, > + uint32_t* NTPHigh, > + uint32_t* NTPLow, > + uint32_t* PacketsReceived, > + uint64_t* OctetsReceived) const { { on new line @@ +521,5 @@ > rb.CumulativeNumOfPacketsLost; > + if (sentPackets > rb.CumulativeNumOfPacketsLost) { > + uint32_t packetsReceived = sentPackets - rb.CumulativeNumOfPacketsLost; > + reportBlock->remotePacketsReceived = packetsReceived; > + reportBlock->remoteOctetsReceived = sentOctets / sentPackets * packetsReceived; use ()'s when mixing / and * please ::: media/webrtc/trunk/webrtc/video_engine/vie_channel.cc @@ +1196,5 @@ > + // for (std::list<RtpRtcp*>::const_iterator it = simulcast_rtp_rtcp_.begin(); > + // it != simulcast_rtp_rtcp_.end(); > + // it++) { > + // RtpRtcp* rtp_rtcp = *it; > + // } Modify TODO - still need to consider if simulcast info is needed here. ::: media/webrtc/trunk/webrtc/voice_engine/channel.cc @@ +3803,5 @@ > + uint64_t& receivedOctetCount, > + uint32_t* jitter, > + uint16_t* fractionLost, > + uint32_t* cumulativeLost, > + int32_t* rttMs) Why the mix of & and *? We're not null-checking the pointers ::: media/webrtc/trunk/webrtc/voice_engine/include/voe_rtp_rtcp.h @@ -87,5 @@ > int bytesReceived; > int packetsReceived; > }; > > -// See section 6.4.1 in http://www.ietf.org/rfc/rfc3550.txt for details. was this removed for a reason? ::: media/webrtc/trunk/webrtc/voice_engine/voe_rtp_rtcp_impl.cc @@ +382,3 @@ > { > WEBRTC_TRACE(kTraceApiCall, kTraceVoice, VoEId(_shared->instance_id(), -1), > "GetRemoteRTCPData(channel=%d,...)", channel); GetRemoteRTCPReceiverInfo
Attachment #8390898 - Flags: review?(rjesup) → review+
Attachment #8390968 - Flags: review?(rjesup) → review+
Please file an Issue for this at webrtc.org and upload a patch (against trunk) to their codereview. This can be done next week, or perhaps can be moved to pkerr
Updated nits. Carrying forward r+ from jesup. Added back one second slop to mochitest to account for clock drift on WinXP (Bug 979649) https://tbpl.mozilla.org/php/getParsedLog.php?id=36065826&tree=Try
Attachment #8390898 - Attachment is obsolete: true
Attachment #8391389 - Flags: review+
Keywords: checkin-needed
Whiteboard: [leave-open]
Attachment #8391389 - Attachment description: Outbound getStats: Fixed RTCP timestamps and remote packets/bytes received (2) r=jesup → Part 3: Outbound getStats: Fixed RTCP timestamps and remote packets/bytes received (2) r=jesup
Attachment #8390968 - Attachment description: Undo extensions to wrong rtcp methods in webrtc.org → Part 4: Undo extensions to wrong rtcp methods in webrtc.org
Attachment #8391664 - Flags: feedback?(avihpit)
Comment on attachment 8391664 [details] [diff] [review] Part 5: Skip rtcp timestamp tests on XP because of clock drift Not that I have any right to review anything here, even tests, but it seems reasonable to me.
Attachment #8391664 - Flags: review?(philringnalda) → review+
I guess I should have put the patch on Bug 981256 but oh well.
I'm not sure exactly what's the ni? for, and I don't have a clear grasp of the exact issue you're facing (you're welcome to elaborate). I do know, however, that on _some_ XP machines our timers resolution is roughly 15ms. That's true for performance.now() and IIRC also Date.now(). Since there's no direct method which I know of to detect that we're currently in "low resolution timers" mode, at bug 749894 we deduced it when it's an XP machine (using 'navigator' string parsing) and all the performance.now() calls so far returned integers. If these conditions were both true, the test adds more cushion. See the comments at http://dxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/general/test_performance_now.html
Comment on attachment 8391664 [details] [diff] [review] Part 5: Skip rtcp timestamp tests on XP because of clock drift Review of attachment 8391664 [details] [diff] [review]: ----------------------------------------------------------------- Removing the flag without properly reviewing. See comment 40. Please request feedback again if you need.
Attachment #8391664 - Flags: feedback?(avihpit)
(In reply to Avi Halachmi (:avih) from comment #40) Thanks for clarifying the XP clock situation. I'm notsure how this can entirely explain the 5+ second (!) clock drift we're seeing on some of the XP build boxes on the tests of the RTCP timestamps we generate (Bug 979649), but I can certainly see it being related (the webrtc clock uses the timeGettime() calls on Windows) With that bug filed and TODO'ed in the test, I think I've heard enough to think it wise to disable the timestamp tests on XP for now, i.e. land Part 5.
Keywords: checkin-needed
Blocks: 980497
No longer depends on: 980497
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
No longer blocks: webrtc_upstream_bugs
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: