Closed Bug 1296453 Opened 8 years ago Closed 8 years ago

~10% Failure rate on Amazon Prime (EME) Video probably due to OOM

Categories

(Core :: Audio/Video: Playback, defect)

x86
Windows
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
platform-rel --- +
firefox49 + fixed
firefox50 + fixed
firefox51 --- fixed

People

(Reporter: cpearce, Assigned: jya)

References

(Blocks 1 open bug)

Details

(Whiteboard: [platform-rel-Amazon][platform-rel-AmazonVideo])

Attachments

(2 files)

Amazon have reported a 10% playback failure rate spike in Firefox >= 49 32bit. They're seeing a MEDIA_DECODE_ERROR being fired at the media element. Shift72 have reported the same thing, and can repro in house. I've made builds with extra logging to try to pinpoint the problem, and we appear to be failing when calling AllocUnsafeShmem in BufferTextureData::CreateInternal(). Logging failures of CreateFileMapping() in SharedMemory::Create() didn't get hit for some reason. Not sure why. This is a recent regression, which just spiked in 49. So I'd say it coincides with the e10s rollout.
[Tracking Requested - why for this release]: Having a 10% failure rate on Amazon Prime Video in Firefox 49 is not something we want to ship to our release users.
Is this bad enough to warrant clamping down on the e10s rollout on release?
(Note that blassey is currently on PTO - not sure who decisions of that nature fall to while he's away)
We have observed this in the beta 49 population only. Both these populations had/have almost the same users with e10s enabled; both have about 20% of the total population enabled, but 49 has 2% more as there are a few more users with addons that are in the e10s whitelist. So given that this occurs in 48 and 49 and there wasn't a significant change in the e10s-ness between these populations, I think we can concluded that this is not an issue with e10s. We can reproduce this issue if we have the WebConsole open. about:memory reports that there are strings leaking in the web console. We don't think this can be the cause of the 10% Amazon failure; we don't believe that 10% of Amazon Prime Video session have the WebConsole open! So we think that we're failing to allocate frames as a consequence of some other leak. Possibly, the heap size is growing so much that allocation of shmems is failing. This could be either the shmems used to shuffle video frames between the gmp and content process, or between the content and chrome process.
No longer blocks: e10s
Summary: [e10s] ~10% Failure rate on Amazon Prime (EME) Video with e10s enabled → ~10% Failure rate on Amazon Prime (EME) Video probably due to OOM
(In reply to Chris Pearce (:cpearce) from comment #4) > So given that this occurs in 48 and 49 and there wasn't a significant change > in the e10s-ness between these populations, I think we can concluded that > this is not an issue with e10s. I meant to say: given that this doesn't occur in 48 but does in 49 and there wasn't a significant change in the e10s settings between these populations, I think we can conclude that this is not an issue with e10s.
Assignee: cpearce → jyavenard
Comment on attachment 8783820 [details] Bug 1296453: [MSE] P2. Clean up SourceBufferResource. https://reviewboard.mozilla.org/r/73492/#review71310 r+ Just wondering why the offset is signed (and then you need to make it unsigned in a few places). But it's everywhere, so probably too much work touching it right now.
Attachment #8783820 - Flags: review?(gsquelart) → review+
Pushed by jyavenard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6f87d2ddadaf [MSE] P1. Fix eviction. r=gerald https://hg.mozilla.org/integration/autoland/rev/13294e0b2e81 [MSE] P2. Clean up SourceBufferResource. r=gerald
Summary of debugging; we were able to observe two leaks using about:memory, by watching the heap-unclassified rise over time while watching Amazon Prime. One leak occurs if the WebConsole is open; Firefox is leaking strings. The other (fixed by jya here) was a regression from https://hg.mozilla.org/integration/mozilla-inbound/rev/57a920361b81 . Our theory is that the leaks cause the heap to expand. Once the heap expands, it doesn't contract. As it grows it reserves virtual address space. If it grows enough, it'll overlap the address space used to allocate shmems, and we'll fail to allocate shmems used to shuffle frames between the GMP <-> content <-> main processes. This results in a failure to allocate a video frame, which percolates up into an error event fired at the video element.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
I've filed Bug 1297525 for the Web Console leak.
Comment on attachment 8783819 [details] Bug 1296453: [MSE] P1. Fix eviction. Approval Request Comment [Feature/regressing bug #]:1261900 [User impact if declined]: Memory leak, frequent OOM particularly on 32 bits platforms. [Describe test coverage new/current, TreeHerder]: In central, manually checked, confirmed fixed with 3rd party [Risks and why]: None, we used to evict everything anyway for the past year. [String/UUID change made/needed]: None
Attachment #8783819 - Flags: approval-mozilla-beta?
Attachment #8783819 - Flags: approval-mozilla-aurora?
Comment on attachment 8783819 [details] Bug 1296453: [MSE] P1. Fix eviction. Fix for memory leak causing crashes with media player. Looks low-risk, let's uplift to beta. This should make it into tomorrow's beta 7 build.
Attachment #8783819 - Flags: approval-mozilla-beta?
Attachment #8783819 - Flags: approval-mozilla-beta+
Attachment #8783819 - Flags: approval-mozilla-aurora?
Attachment #8783819 - Flags: approval-mozilla-aurora+
jya- I assume both patches should uplift here.
Flags: needinfo?(jyavenard)
first one is enough, but both are okay too :)
Flags: needinfo?(jyavenard)
platform-rel: --- → +
Whiteboard: [platform-rel-Amazon][platform-rel-AmazonVideo]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: