668594 - while running reftest style tests, we seem to have a memory leak and fennec hangs

Reporter

Description

•

13 years ago

In debugging some failures on tinderbox for jsreftest, crashtest and reftest, I find that we are failing tests part way through a run (usually in the same spot, but not always) and fennec is still running until the harness times out after 2400 seconds. While looking at logcat (http://people.mozilla.org/~jmaher/android_dumps/jsreftest-2.log), I don't see anything useful other than the browser starting and no other gecko information. What I do see while looking at a process list when this happens is we only have a few processes running, not the usual 25-30 processes. [theory]I suspect this happens because we leak memory and android is freeing up space by closing down non essential processes [/theory] To reproduce this, grab a tinderbox ready tegra, and a tests.zip (or a 'make package-tests' in your objdir) file and run: cd reftests python remotereftest.py --deviceIP=192.168.1.101 --app=org.mozilla.fennec --xre-path=../../bin --extra-profile-file=jsreftest/tests/user.js --enable-privilege --total-chunks=2 --this-chunk=1 jsreftest/tests/jstests.list where 192.168.1.101 is the ip address of your tinderbox tegra. this reproduces the problem consistently on my tegra.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

13 years ago

Blocks: 662468, 663657

David Mandelin [:dmandelin]

Comment 1

•

13 years ago

(In reply to comment #0) > In debugging some failures on tinderbox for jsreftest, crashtest and > reftest, I find that we are failing tests part way through a run (usually in > the same spot, but not always) and fennec is still running until the harness > times out after 2400 seconds. > > While looking at logcat > (http://people.mozilla.org/~jmaher/android_dumps/jsreftest-2.log), I don't > see anything useful other than the browser starting and no other gecko > information. What I do see while looking at a process list when this > happens is we only have a few processes running, not the usual 25-30 > processes. [theory]I suspect this happens because we leak memory and > android is freeing up space by closing down non essential processes [/theory] Any idea how to test that hypothesis? > To reproduce this, grab a tinderbox ready tegra, What's a tinderbox-ready tegra, and how would I get one? We have a tegra running text-console Ubuntu. Is that good enough? > and a tests.zip (or a 'make > package-tests' in your objdir) file and run: > cd reftests > python remotereftest.py --deviceIP=192.168.1.101 --app=org.mozilla.fennec > --xre-path=../../bin --extra-profile-file=jsreftest/tests/user.js > --enable-privilege --total-chunks=2 --this-chunk=1 > jsreftest/tests/jstests.list > > where 192.168.1.101 is the ip address of your tinderbox tegra. > > this reproduces the problem consistently on my tegra.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

13 years ago

I need to monitor the total memory consumption of the tegra during the test. It would be nice to figure out how to query (or dump to the log file) the total memory that Fennec thinks it is using. Other than that, we need to figure out why all the processes are going away and as soon as they do we stop running tests. Any tips for dumping the total memory that Fennec/Firefox thinks it is consuming from inside javascript/XUL? tinderbox-ready tegra is a tegra with the sutagent installed and accessible on the same network your machine is on. There is code to run these tests through ADB (usb cable), but we have had trouble getting that to work on a few different devices (including tegra). Does the ubuntu installation on the Tegra support python? There is a python version of SUTAgent which I have used to develop and fix bugs in testing remotely. That can be found here: http://people.mozilla.org/~jmaher/remotetesting/ Why this is in a people account vs version control is there is some newer code in there to support all the installation and tegra management stuff which has only been tested on linux (ubuntu) and not osx and win32.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 3

•

13 years ago

so I dumped some system level stuff every 10 seconds to see what available memory we had and who was using it up. Low and behold plugin-container spikes and consumes all the available memory. Here is a log: http://people.mozilla.org/~jmaher/android_dumps/jsreftest4.log search for MemTotal or MemFree in the log and you will see the total system memory stats. In addition, scroll down past that and see a procrank (top for memory): PID Vss Rss Pss Uss cmdline 1622 82028K 75844K 47865K 41276K org.mozilla.fennec 1027 49320K 43308K 20339K 16740K system_server 1657 15492K 15492K 10918K 7420K /data/data/org.mozilla.fennec/plugin-container near the end when the process hangs, we see: PID Vss Rss Pss Uss cmdline 1657 720284K 720284K 715179K 711176K /data/data/org.mozilla.fennec/plugin-container 1622 81208K 75160K 46649K 39552K org.mozilla.fennec 1027 49584K 43548K 20576K 16980K system_server this reproduces every time for me, so the repro steps are still valid and very accurate.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 4

•

13 years ago

I found 2 files so far which seem to be problematic: http://mxr.mozilla.org/mozilla-central/source/js/src/tests/e4x/GC/regress-280844-1.js http://mxr.mozilla.org/mozilla-central/source/js/src/tests/e4x/GC/regress-280844-2.js commenting these out the we don't see plugin-container have memory problems until e4x/Regress.

cmtalbert

Comment 5

•

13 years ago

(In reply to comment #1) > > To reproduce this, grab a tinderbox ready tegra, > > What's a tinderbox-ready tegra, and how would I get one? We have a tegra > running text-console Ubuntu. Is that good enough? > Dave, Bmoss and I have one down here (2nd floor) that's not being used at the moment if you want to borrow it. (It's got Android on it, which may be required to repro. Not sure if this issue will repro on your ubuntu setup).

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

13 years ago

running the e4x/GC tests with similar code to about:memory(http://mxr.mozilla.org/mozilla-central/source/toolkit/components/aboutmemory/content/aboutMemory.js) after each test completes, I get this in the output: http://people.mozilla.org/~jmaher/android_dumps/e4x_gc_memory.log explicit/js/gc-heap is by far the biggest growing pain I see, but you can see everything else. I think my logic for main vs content is a bit lacking in the accuracy department, but I can continue to work on this as time goes on.

Alon Zakai (:azakai)

Comment 7

•

13 years ago

Several JS reftests are expected to use a lot of memory. These aren't leaks (well, in theory there could also be a leak), they are tests for problems that involve large amounts of memory. We recently disabled a few that were OOMing on desktop (after landing the script stack quota removal, they became more of a problem), if the tegras have less memory then we might need to disable some more for them. I ran |make jstestbrowser| on desktop fennec now, and it looks like memory usage gets to the 512MB-1GB area (for the plugin-container process) several times during the run. It's probably very similar on Android.

Alon Zakai (:azakai)

Comment 8

•

13 years ago

Note that js browser tests change prefs like gczeal, which can affect memory usage, but this is broken in the multiprocess case (you can't set prefs in the child process). I filed bug 669949 for this. There is some chance fixing that would have an effect here.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 9

•

13 years ago

Attached patch skip slow jsreftests on android (1.0) (deleted) — Details — Splinter Review

this patch skips what we believe to be the slowest and most resource intensive jsreftests. passes on try server and local testing on my tegra. Keep in mind this is jsreftest only, we should take a closer look at crashtest and reftest as we see similar behavior there.

Attachment #544795 - Flags: review?(bclary)

Bob Clary [:bc] (inactive)

Comment 10

•

13 years ago

Comment on attachment 544795 [details] [diff] [review] skip slow jsreftests on android (1.0) thanks!

Attachment #544795 - Flags: review?(bclary) → review+

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

13 years ago

Whiteboard: [android][tegra][mobile_unittests][mobile_dev_needed] → [android][tegra][mobile_unittests][inbound]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 11

•

13 years ago

http://hg.mozilla.org/mozilla-central/rev/5a3a13adf235

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → Firefox 8

Jesse Ruderman

Comment 12

•

13 years ago

Marking tests as skip-if(Android) without comments isn't very future-proof. It makes it unlikely that our tests or code will be fixed. It also causes us to duplicate effort: if we port to iOS or change desktop to use content processes or try to run our tests under Valgrind, we'll have to evaluate all these tests again. The tests in this bug should be things like: slow-if(MaxRAM<500) <-- changes the timeout and skips it in some test runs skip-if(MaxVM<500) <-- relevant for devices with swapping disabled expect-OOM <-- meaning this is a test of OOM behavior And elsewhere: skip-if(OOPContent) fails-if(screenWidth<600) skip-if(ARM) skip-if(AndroidWidget) In many cases, there should also be bugs filed on fixing the tests or fixing the code, and those bugs should be referenced in comments in the manifest.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

13 years ago

Jesse, agree with you here. There are hundreds of these cases and in some scenarios large directories commented out. This would take one person at least a month (probably closer to two) to go through all of these and properly categorize these for the type of failures and file appropriate bugs.

Camelia Urian

Updated

•

13 years ago

Whiteboard: [android][tegra][mobile_unittests][inbound] → [android][tegra][mobile_unittests][inbound][qa?]

Bugzilla

while running reftest style tests, we seem to have a memory leak and fennec hangs

Categories

(Firefox for Android Graveyard :: General, defect)

Tracking

(Not tracked)

People

(Reporter: jmaher, Unassigned)

References

Details

(Whiteboard: [android][tegra][mobile_unittests][inbound][qa?])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Comment 12

Comment 13

Updated

Attachment

General

Description

File Name

Content Type