Closed Bug 979173 Opened 11 years ago Closed 11 years ago

[MTBF] Phone get really slow when running MTBF on v1.3 branch

Categories

(Remote Protocol :: Marionette, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wachen, Assigned: wachen)

References

Details

(Whiteboard: [perf-reviewed])

Attachments

(2 files)

I am not sure about if it is a gaia-ui-test issue, marionette issue, or gaia issue. Like test_keyboard_basic function, it start with only less than 30 seconds of running. After 3 hours, it took 146.885s to run a single test.

Whoever want to run it, I can provide github links here:
1. clone https://github.com/Mozilla-TWQA/gaia
2. checkout https://github.com/Mozilla-TWQA/gaia/tree/mtbf-v1.3-20140218 branch
3. install gaia-ui-tests as usual(Or, you can look it up in MDN)
4. clone https://github.com/Mozilla-TWQA/MTBF-Driver
5. run it like python MTBF.py --testvars=(testvars file) (the mtbf keyboard test cases folder)
6. keep it running for 3~4 hours, and you will see the change
I will try to get logcat and running result later after it failed.
Attached file logcat.txt (deleted) β€”
The logcat.txt before it crashes.
Possibly related to bug 976584?
yeah, looks like it. Maybe we can pend this one on another bug so we can see if it reproduce after bug 976584 resolved?
Depends on: 976584
We think this is likely a bug in the harness and not directly a perf issue.
Component: General → JSMarionette
Product: Firefox OS → Testing
Whiteboard: [perf-reviewed]
Component: JSMarionette → Marionette
Can you attach a get_about_memory dump after the phone starts running slowly?  See instructions here:  https://developer.mozilla.org/en-US/Firefox_OS/Debugging/Debugging_OOMs#Step_2.3A_Collect_memory_reports
I know how to do that. However, my concern is that after doing that, it usually failed for other reasons
Do you have the flag to work with the NUWA process?

  MOZ_IGNORE_NUWA_PROCESS=1 ./tools/get_about_memory.py

Over in bug 981871 it appears we are leaking marionette sandboxes somehow.  If you have the same problem and can get a report with gc/cc logs we may be able to track it down.
Getting there. Will try to set up machine to do so this weekend.
Hi, Ben & Joanthan, 
I got something like:
14:49:12 + MOZ_IGNORE_NUWA_PROCESS=true ./get_about_memory.py
14:49:13 
Got 1/3 files.
Got 1/3 files.
Got 1/3 files.
Got 2/3 files.
Got 2/3 files.
Got 3/3 files.
14:49:19 Pulled files into about-memory-3.
14:49:19 
14:49:19 I just tried to open the memory report in Firefox.  If that didn't
14:49:19 work for some reason, or if you want to open this report at a later
14:49:19 time, open the following URL in a Firefox nightly build:
14:49:19 
14:49:19   about:memory?file=/var/lib/jenkins/workspace/mtbf.memory.multi/label/master/resource/B2G-flash-tool/B2G/tools/about-memory-3/memory-reports
14:49:19 
14:49:19 Pulling GC/CC logs...

14:49:19 Got 4/6 files.
Got 4/6 files.We've waited 120s but the only relevant files we see are
14:53:23   /data/local/tmp/memory-reports/gc-edges.18196.1394779750.log
14:53:23   /data/local/tmp/memory-reports/cc-edges.138.1394779750.log
14:53:23   /data/local/tmp/memory-reports/cc-edges.18196.1394779750.log
14:53:23   /data/local/tmp/memory-reports/gc-edges.138.1394779750.log
14:53:23 We expected 6 but see only 4 files.  Giving up...
14:53:23 Traceback (most recent call last):
14:53:23   File "./get_about_memory.py", line 289, in <module>
14:53:23     get_and_show_info(args)
14:53:23   File "./get_about_memory.py", line 229, in get_and_show_info
14:53:23     get_gc_cc_log.get_logs(args, out_dir=out_dir, get_procrank_etc=False)
14:53:23   File "/var/lib/jenkins/workspace/mtbf.memory.multi/label/master/resource/B2G-flash-tool/B2G/tools/get_gc_cc_log.py", line 90, in get_logs
14:53:23     utils.run_and_delete_dir_on_exception(do_work, out_dir)
14:53:23   File "/var/lib/jenkins/workspace/mtbf.memory.multi/label/master/resource/B2G-flash-tool/B2G/tools/include/device_utils.py", line 147, in run_and_delete_dir_on_exception
14:53:23     return fun()
14:53:23   File "/var/lib/jenkins/workspace/mtbf.memory.multi/label/master/resource/B2G-flash-tool/B2G/tools/get_gc_cc_log.py", line 83, in do_work
14:53:23     out_dir=out_dir)
14:53:23   File "/var/lib/jenkins/workspace/mtbf.memory.multi/label/master/resource/B2G-flash-tool/B2G/tools/include/device_utils.py", line 222, in notify_and_pull_files
14:53:23     _wait_for_remote_files(outfiles_prefixes, num_expected_files, old_files)
14:53:23   File "/var/lib/jenkins/workspace/mtbf.memory.multi/label/master/resource/B2G-flash-tool/B2G/tools/include/device_utils.py", line 304, in _wait_for_remote_files
14:53:23     raise Exception("Unable to pull some files.")
14:53:23 Exception: Unable to pull some files.
14:53:23 Build step 'Execute shell' marked build as failure
14:53:23 Finished: FAILURE

Any ideas?
I'm not sure; can you attach the files that were successfully pulled, and then we can ask someone to look at it?
Can you indicate who can I ask?
Thanks,
Walter
If you attach the files, I can needinfo some people, depending on what they look like.
Hi, Jonathan, do you mean the files pulled out?
It's here and make sure that you are in Mozilla Co network or in vpn for connection to taipei
http://mtbf-1.corp.tpe1.mozilla.com:8080/view/v1.3/job/mtbf.memory.multi/ws/label/master/resource/B2G-flash-tool/B2G/tools/about-memory-6/

Anything else you need please let me know.
Unfortunately, I can't access that url with the Mozilla VPN, and the old office VPN that used to work has been decommissioned.  Can you zip up the files in that report and attach it to the bug?  Thanks!
Attached file about-memory-6.zip (deleted) β€”
Didn't expect that VPN doesn't work. Is there a ticket for IT to work on that?
37,669 (100.0%) -- message-manager
└──37,669 (100.0%) -- referent
   β”œβ”€β”€37,258 (98.91%) -- child-process-manager
   β”‚  β”œβ”€β”€32,221 (85.54%) ── strong
   β”‚  └───5,037 (13.37%) -- weak
   β”‚      β”œβ”€β”€4,473 (11.87%) ── dead
   β”‚      └────564 (01.50%) ── alive
   └─────411 (01.09%) ++ (2 tiny)

There are lots of message-manager objects here, which looks much like what we're seeing in bug 981871.
(In reply to Walter Chen[:ypwalter][:wachen] from comment #16)
> Created attachment 8393289 [details]
> about-memory-6.zip
> 
> Didn't expect that VPN doesn't work. Is there a ticket for IT to work on
> that?

I just filed one: bug 985613
(In reply to Jonathan Griffin (:jgriffin) from comment #17)
> There are lots of message-manager objects here, which looks much like what
> we're seeing in bug 981871.

Yes.  Tons of SettingsManager objects:

  bkelly@lenir:/srv/tmp/bug979173/about-memory-6$ grep SettingsManager gc-edges.138.log cc-edges.138.log | grep ' G ' | sort | uniq | wc -l
  8973

Kyle, I believe we have the source to these tests if you want to investigate here.  :-)
Flags: needinfo?(khuey)
(In reply to Ben Kelly [:bkelly] from comment #19)
> (In reply to Jonathan Griffin (:jgriffin) from comment #17)
> > There are lots of message-manager objects here, which looks much like what
> > we're seeing in bug 981871.
> 
> Yes.  Tons of SettingsManager objects:
> 
>   bkelly@lenir:/srv/tmp/bug979173/about-memory-6$ grep SettingsManager
> gc-edges.138.log cc-edges.138.log | grep ' G ' | sort | uniq | wc -l
>   8973
> 

Bug 985827 is likely the cause of this.
> Kyle, I believe we have the source to these tests if you want to investigate
> here.  :-)
Flags: needinfo?(khuey)
Blocks: MTBF-meta
Fixing bugs 985042 and 985827 should have helped here.  Are we seeing improvements?
Flags: needinfo?(wachen)
Thanks for the information, and I will move forward to verify it today, and it might take few days before the result comes out.
Depends on: 985042
Flags: needinfo?(wachen)
Assignee: nobody → wachen
Walter, did those fixes mentioned in comment 21 help out?
Flags: needinfo?(wachen)
Sorry about the late response. I need several runs to confirm that the bug no longer reproduce. Also, I was trying to get ways to fetch some memory report. That took me this long to get here.I actually see no reproduction of this bug anymore!!!

However, bug 990837 (which I was intended to reproduce, and it was reproduced in v1.2, v1.3, and v1.4 came out again! Devs doesn't want to fix that in v1.2 and ask me to reproduce it in v1.3. Now, I can get the same result with memory reports and other related information.

If there should be any reproduction later, I will open a new bug for that. Kyle and Clint, thank you for helping.
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(wachen)
Resolution: --- → FIXED
No longer blocks: MTBF-meta
Blocks: MTBF-meta
Blocks: MTBF-B2G
No longer blocks: MTBF-meta
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: