1219369 - Leak checking does not work in Windows content processes due to being unable to create the bloat log

Reporter

Description

•

9 years ago

philor did a try run of the current m-c when it is switched over to Aurora, and there are many leaks, in Windows 7 and Windows 8 debug browser chrome mochitests, of a single SyncObject. I'm going to land a suppression for this so the tree doesn't turn orange, but it would be good if we could fix this. https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f45b9cad594&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-classifiedState=unclassified

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

9 years ago

This is a leak in the content process. I filed bug 1219371 for the suppression.

Sotaro Ikeda [:sotaro]

Updated

•

9 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1232549

Timothy Nikkel (:tnikkel)

Comment 2

•

9 years ago

Is this still an issue? Not clear to me

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Reporter

Comment 3

•

9 years ago

(In reply to Timothy Nikkel (:tnikkel) from comment #2) > Is this still an issue? Not clear to me It is still an issue. If you go to the mozilla-aurora tree, then look at a log for say Windows 7 Mochitest browser-chrome-1 (note that this is non-e10s!), you can see that the leakcheck entries include the following: TEST-INFO | leakcheck | tab process: leaked 1 SyncObject I landed a suppression in bug 1219371 so the tree isn't orange. Though, looking into this now, I'm not sure the underlying issue is actually a graphics leak. In Aurora bc1, the two leaking processes are created during this test: browser/base/content/test/plugins/browser_pluginCrashReportNonDeterminism.js On mozilla-central, this test seems to as part of bc-6 (6? why so high of a number? What is going on with chunking). And, fun fact, it doesn't leak at all. In fact, leakcheck doesn't seem to be aware of the processes that the test creates at all. I'll move this out of graphics and try to figure out what is happening.

Component: Graphics: Layers → General

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Reporter

Comment 4

•

9 years ago

mconley, you fixed another issue that seemed to be related to browser_pluginCrashReportNonDeterminism.js running on Aurora but not Nightly, in bug 1153659. Did you ever figure out what was going on there?

Flags: needinfo?(mconley)

Andrew McCreight [:mccr8]

Reporter

Comment 5

•

9 years ago

Running this test locally with Nightly on OSX, I see the two tab process leaks, as on Aurora Windows. I suspect there are two issues here: first, why is this test maybe not really running on Windows Nightly? Two, when we do run this test, why do we leak a SyncObject? I think we may just leak a SyncObject on Windows in a child process, whereas we don't on any other platform. But, maybe that's just some object that CompositorChild or ImageBridge happen to entrain on Windows, in which case there isn't a new leak.

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Assignee: nobody → continuation

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Flags: needinfo?(mconley)

Andrew McCreight [:mccr8]

Reporter

Comment 6

•

9 years ago

Ok, I figured out what is happening. On Windows m-c, nsTraceRefcnt (in InitLog) is failing to create the bloat log in content processes. Entries like this appear in the mochitest log: ### XPCOM_MEM_BLOAT_LOG defined -- unable to log bloat/leaks to c:\users\cltbld\appdata\local\temp\tmpefotq8.mozrunner\runtests_leaks_tab_pid3520.log In contrast, the bloat log is being successfully created on Aurora Windows, and Linux m-c. This file is being created directly with a fopen call. I'd guess that Windows m-c has a stricter sandbox than either Aurora Windows or Linux, which is causing the file to fail to be created. Bob, does that sound plausible? There are two problems that need to be addressed: 1) Something needs to be adjusted so that the bloat log is actually created on Windows Nightly. 2) Failure to create the bloat log needs to turn the tree orange somehow. I'll file a separate bug for this, blocked by this bug.

tracking-e10s: --- → ?

Component: General → XPCOM

Flags: needinfo?(bobowen.code)

Summary: SyncObject leaks on Windows when we merge to Aurora → Leak checking does not work in Windows content processes due to being unable to create the bloat log

Whiteboard: [MemShrink]

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Blocks: 1051230

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Blocks: 1243949

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1243950

Bob Owen (:bobowen)

Assignee

Comment 7

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #6) > Ok, I figured out what is happening. On Windows m-c, nsTraceRefcnt (in > InitLog) is failing to create the bloat log in content processes. Entries > like this appear in the mochitest log: > > ### XPCOM_MEM_BLOAT_LOG defined -- unable to log bloat/leaks to > c:\users\cltbld\appdata\local\temp\tmpefotq8. > mozrunner\runtests_leaks_tab_pid3520.log > > In contrast, the bloat log is being successfully created on Aurora Windows, > and Linux m-c. This file is being created directly with a fopen call. > > I'd guess that Windows m-c has a stricter sandbox than either Aurora Windows > or Linux, which is causing the file to fail to be created. Bob, does that > sound plausible? Yes, we only have the Windows content sandbox turned on, on Nightly at the moment. Actually I'm hoping to get the low integrity part of the sandbox up to Aurora soon. > There are two problems that need to be addressed: > 1) Something needs to be adjusted so that the bloat log is actually created > on Windows Nightly. Does the bloat log need to be created in a particular place or could it be created in the low integrity temp directory? If it uses the directory service then TmpD in the child process gets changed to point to something that looks like: C:\Users\<user name>\AppData\LocalLow\Mozilla\Temp-<some UUID> The UUID is set per profile in the pref: security.sandbox.content.tempDirSuffix The directory gets deleted by the parent process on shutdown however, so that could well make this difficult. But the content process can write to anywhere in LocalLow. It looks like the env var XPCOM_MEM_BLOAT_LOG is used, so if that could point to LocalLow, instead of Local, that would work at least for Windows. The other alternative would be to have some test specific code that knows the path of the log and adds it as a policy rule for write access, but I'd prefer writing to LocalLow if possible.

Flags: needinfo?(bobowen.code)

Bob Owen (:bobowen)

Assignee

Comment 8

•

9 years ago

(In reply to Bob Owen (:bobowen) from comment #7) > The other alternative would be to have some test specific code that knows > the path of the log and adds it as a policy rule for write access, but I'd > prefer writing to LocalLow if possible. Actually, while I was thinking about bug 1193861, I wonder if we shouldn't add some way for the tests to add file policy rules for this sort of thing. LocalLow, will work for the moment, but we might find even writing there will break as we try to tighten the sandbox. Another alternative, which would be more cross-platform, is some mechanism for the child to request the file handles from the parent, for certain things when running tests. Jed what do you think? ... how does the RemoteOpenFile stuff work and what protections are there to stop a compromised child from just asking for any file it likes?

Flags: needinfo?(jld)

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 9

•

9 years ago

As it exists now, PRemoteOpenFile is only for reading files, to implement remoteopenfile:/// URIs, and it's part of Necko (it's a subprotocol managed by PNecko). In principle it could be extended to allow writing, I think, but it might not be the right place to do that. The authorization logic lives in NeckoParent::AllocPRemoteOpenFileParent; currently https://dxr.mozilla.org/mozilla-central/source/netwerk/ipc/NeckoParent.cpp#569. It's pretty specific to B2G app:// URIs, but that could be changed. Another potential problem here is for files that need be opened before the usual IPC things are set up. In the Linux world that's not as much of a problem, because (most of) the sandbox won't be enabled that early, so the file can be opened directly, and the fd acts as a capbility to allow future reads/writes; but that might not apply as well to other OSes. Having a common interface for dynamically adding FS policy rules could be done on Linux by redoing my broker to make the policy dynamic (which I was hoping to avoid, but in hindsight that might not have been the right idea), but OS X has a problem there because it's not using a broker that can be dynamically adjusted after sandbox startup: the paths are part of the policy program that's given to Sandbox.kext when the sandbox is enabled.

Flags: needinfo?(jld)

Bob Owen (:bobowen)

Assignee

Comment 10

•

9 years ago

Thanks Jed. From looking at ContentProcess::Init I think that the IPC stuff is set up before we initialise XPCOM, if that's early enough. If it isn't, it will be a problem on Windows because we have to start at low integrity even before we've lowered the sandbox. As I've just mentioned in bug 1193861, on Windows I could just add the TEMP directory to the policy, if I can tell that we're running tests. I think the IPC stuff in the Windows sandbox is definitely early enough, because I can add rules for the loading of the DLLs when Firefox and the child process is started from a network drive.

Andrew McCreight [:mccr8]

Reporter

Comment 11

•

9 years ago

If it helps, even though the bloat log file is opened very early, I think we don't actually write anything to it until XPCOM shutdown. I could refactor things to not do the fopen() it until it is used if that makes something easier.

Bob Owen (:bobowen)

Assignee

Comment 12

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #11) > If it helps, even though the bloat log file is opened very early, I think we > don't actually write anything to it until XPCOM shutdown. I could refactor > things to not do the fopen() it until it is used if that makes something > easier. It probably would, if we go down that route. A quick fix on Windows would be to add write access to TEMP, I think. Do you know if there is an env var that gets set for test runs? If not, I can add a new one specific to this.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Reporter

Comment 13

•

9 years ago

(In reply to Bob Owen (:bobowen) from comment #12) > It probably would, if we go down that route. Alright. Well, let me know what I can do to help to refactor stuff. > A quick fix on Windows would be to add write access to TEMP, I think. Alright. Another thing that might work would be to change XPCOM_MEM_BLOAT_LOG to be some kind of low integrity directory. It is set here: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#1214 A brief bit of searching on the internet didn't turn up anything simple about getting the Windows low integrity directory from Python, though. > Do you know if there is an env var that gets set for test runs? > If not, I can add a new one specific to this. XPCOM_MEM_BLOAT_LOG is being set, for one. Another thing that could be useful is that we only do leak checking in DEBUG builds, so you could #ifdef DEBUG anything you add.

Flags: needinfo?(continuation)

Bob Owen (:bobowen)

Assignee

Comment 14

•

9 years ago

OK, I completely misunderstood the process log one (bug 1193861), it's not being blocked by the sandbox. The log is written from the parent anyway, it's just the sandbox launch code doesn't write it. Given that I think adding TEMP write access in DEBUG builds is fine. Debug Try push with those two patches (quite a lot of leaking): https://treeherder.mozilla.org/#/jobs?repo=try&revision=2408383f640b Here's some more with e10s forced on: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c20425cc4b9 Opt push, just to show the process log is always written: https://treeherder.mozilla.org/#/jobs?repo=try&revision=274550410540

Andrew McCreight [:mccr8]

Reporter

Comment 15

•

9 years ago

Thanks for looking at this, Bob. I'll try to think of some reasonable way to suppress those leaks. For now, you can try the patch in bug 1219919 and see if that covers it. That's what we've been landing on Aurora to cover the Win7 leaks that show up there. If that works, you can just land it and I can work on something more sophisticated separately.

Assignee: continuation → bobowen.code

Bob Owen (:bobowen)

Assignee

Comment 16

•

9 years ago

No problem, here's a push with the latest patch from that bug: https://treeherder.mozilla.org/#/jobs?repo=try&revision=255a9f1f497e By the way, I noticed that the bug number in the comment in the patch says 1219916 not 1219919.

Status: NEW → ASSIGNED

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Depends on: 1219919

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Blocks: 1091917

Brad Lassey [:blassey] (use needinfo?)

Updated

•

9 years ago

tracking-e10s: ? → +

Priority: -- → P3

Andrew McCreight [:mccr8]

Reporter

Comment 17

•

9 years ago

I did a try push with your patches and with an updated version of the leak suppressions in bug 1219919 and it looks green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdc3933abda0 (I fixed the one orange in there by bumping the number of SyncObject leaks in the suppression list.)

Bob Owen (:bobowen)

Assignee

Comment 18

•

9 years ago

Attached patch In Windows debug builds allow write access to TEMP for logging purposes (deleted) — Details — Splinter Review

Attachment #8715210 - Flags: review?(tabraldes)

Tim Abraldes [:TimAbraldes] [:tabraldes]

Comment 19

•

9 years ago

Comment on attachment 8715210 [details] [diff] [review] In Windows debug builds allow write access to TEMP for logging purposes Review of attachment 8715210 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp @@ +55,5 @@ > > +#if defined(DEBUG) > + // Allow write access to TEMP directory in debug builds for logging purposes. > + wchar_t tempPath[MAX_PATH + 2]; > + uint32_t pathLen = ::GetTempPathW(MAX_PATH + 1, tempPath); nit: Please add a comment explaining the "+ 1" and "+ 2"

Attachment #8715210 - Flags: review?(tabraldes) → review+

Bob Owen (:bobowen)

Assignee

Comment 20

•

9 years ago

Thanks Tim. (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #19) > > + // Allow write access to TEMP directory in debug builds for logging purposes. > > + wchar_t tempPath[MAX_PATH + 2]; > > + uint32_t pathLen = ::GetTempPathW(MAX_PATH + 1, tempPath); > > nit: Please add a comment explaining the "+ 1" and "+ 2" Added locally: // The path from GetTempPathW can have a length up to MAX_PATH + 1, including // the null, so we need MAX_PATH + 2, so we can add an * to the end.

Pulsebot

Comment 21

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/92029305820c

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Blocks: 1247025

Carsten Book [:Tomcat]

Comment 22

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/92029305820c

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox47: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla47

Bob Owen (:bobowen)

Assignee

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1345046