Closed Bug 915083 Opened 11 years ago Closed 11 years ago

[1.2] Camera App launch time has regressed (compared to [1.1])

Categories

(Firefox OS Graveyard :: Gaia::Camera, defect, P1)

All
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:koi+)

RESOLVED WORKSFORME
1.2 C3(Oct25)
blocking-b2g koi+

People

(Reporter: mvikram, Assigned: mikeh)

References

Details

(Keywords: perf, regression, Whiteboard: [c=progress p= s= u=1.2] )

Attachments

(5 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36

Steps to reproduce:

Launch the Camera App (QRD 7x27 device). Measurements are made using a high speed camera measuring the time from when the touch event was recognized to camera preview being visible.



Actual results:

Measured time on (1.2) is:
3271ms
Measured time on (1.1):
2075 ms
Depends on: 914916
OS: All → Gonk (Firefox OS)
Priority: -- → P1
Measured time on master/unagi*: ~1.8s. Build is:
- gecko: m-c:729c5f06584a
- gaia: master:f9484a9d70edf7b68ec070ec44f99c48d0333cf2

*And unagi has a sloooooow camera.
blocking-b2g: --- → koi+
Measured time on master/hamachi: ~3.4s. Build is:
- gecko: m-c:23708fab4a39
- gaia: master:70800e30b4e8f1162d32bcff4228216b524ffee2

It's worth pointing out that 2s of this time is spent in the AOSP call Camera::connect(), which we don't really have visibility into. So in the _best_ case for hamachi, eliminating all other overhead, the camera start-up time would still be 2s.
Summary: [1.2] Camera App launch time has regressed (compared to [1.1] → [1.2] Camera App launch time has regressed (compared to [1.1])
Further to comment 2, almost all of the time spent in Camera::connect() is in the call to initialize() in CameraService.cpp:

    hardware = new CameraHardwareInterface(camera_device_name);
    if (hardware->initialize(&mModule->common) != OK) {
        hardware.clear();
        return NULL;
    }
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: 917892
Keywords: perf
Whiteboard: [c=progress p= s= u=1.2]
So I finally have some numbers to compare the b2g18 branch (v1.1) to m-c (v1.2, at least as of yesterday).

The most interesting result is that camera launch latency is much more variable in b2g18 than it is in m-c. In m-c, on hamachi, I measured start-up times all between 3.2 and 3.4s.

On b2g18, I see start-up times ranging from 3.0 to 5.2s!

In the case of b2g18 builds, the extra time either appears in the touch-to-launch interval, or in the launch-to-getCamera interval.
Assigning to Mike since I'm marking bug 871821 as a duplicate of this and Mike is assigned to that one.
Assignee: nobody → mhabicher
Status: NEW → ASSIGNED
Keywords: regression
Blocks: 915068
Depends on: 920713
Some more results, comparing v1.1 (b2g18) to v1.2 (now aurora) running on Inari:
1. b2g18 camera launch latency: 1798ms; aurora: 2793ms, or 995ms longer

The biggest difference in the profiles of these two builds is the time between getting the touch events in RecvRealTouchEvent and actually launching the Camera (launchApp in b2g18, launch in aurora):
2. b2g18: 131ms; aurora: 1281ms, or 1150ms longer

Once the preallocated app "becomes" the Camera app, it takes longer on aurora builds to actually start up:
3. preallocated app becoming the Camera app to calling getCamera, b2g18: 124ms; aurora: 359ms, or 235ms longer

However, this is more than offset by the reduced process-wake-to-become-Camera latency:
4. b2g18: 564ms; aurora: 216ms, or 348ms faster

Once the Camera app has started, both b2g18 and aurora builds take ~400ms to initialize the hardware and return it to JS. Similarly, configuring the camera hardware and starting the preview takes ~214ms; and once the preview is started, enabling the UI takes a further ~146ms.

Note: during the extra touch-to-launch time (2) on the aurora build, the Homescreen app starts a GC/CC cycle that lasts between 440 and 1100ms, overlapping with the Camera start-up. The b2g18 build also shows a GC cycle (~300ms later on) in its timeline, though by that time the Camera app has already started.

So, why does it take 1150ms long to launch an app in aurora vs b2g18? It's not clear, but the profiles show that during this interval, the Main Threads in both the Homescreen app and the b2g parent process and mostly idle. Also, launching is not significantly slower on aurora either:
5. launchApp() on b2g18: 105ms; aurora: 230ms (though seen to be as high as 680ms)
Further to comment 8, nor is the b2g parent process spinnings its wheels for 1 second:
  http://people.mozilla.org/~bgirard/cleopatra/#report=decbabdaa0d73beb38eb22af00a8ebb0b91b4e73
Bizarre! After instrumenting the app-launching process (to figure out how it works :) the Camera app launch time is now as fast (or faster) than it was in b2g18, and the ~1.1s dead time between touching and calling launch() is gone.

I wish I had recorded the build info in comment 7, but I don't believe it is any different from what I'm building today:
- gecko: aurora:cd4ec57115ab
- gaia: v1.2:a13c76f8d3c617ee57c302c103da04ed1a6298d1

Will unroll the instrumentation changes and try again. (Since I tend to plow through the FTU wizard and often associate with "Mozilla Guest" by default, I wondered if not having a network connection while WFH sped up the launch time. It turns out it does not. Today, anyway.)

(Also, for what it's worth, profiling-enabled builds table ~200-250ms longer to launch the Camera app compared to normal builds.)
Also, with the build in comment 10, I see a Camera app launch time of <1.28s.
Just did a new pull/build with:
- gecko: git/aurora:7ce074a54a95940a63d7c697adc72d34fbc4466c
- gaia: v1.2:5e0d0df6a762cf1e1812eeb735fba72e2539dc0c

With this build, it takes a profiling build 2.4s to launch the camera, and a profiling report shows that the touch-to-launch delay (here about 900ms) is still present.
This patch modifies the 'top' utility in the gonk toolbox to make the '-d' option to take its argument in milliseconds; it also adds a '-u' option to preface each report with the current millisecond count reported by clock_gettime(CLOCK_MONOTONIC).

These are used to get subsecond reports on CPU usage:

  adb shell "top -m 10 -d 100 -t -u"
This script processes the output of the command in comment 13 to produce a pivot-table with process CPU usage at each reported interval. It ignores processes named 'top' and 'sh', and any process reporting 0% CPU usage.

Example:
  adb shell "top -m 10 -d 100 -t -u" > log.txt
  pspostproc2.py log.txt > log.csv

The .csv file can be loaded into your favourite spreadsheet to get a scatter plot (or a cumulative area plot) of CPU usage by process over time.
Mike, looking at this profile:

  http://people.mozilla.org/~bgirard/cleopatra/#report=6deb68a49898a0ca7062b6c516f06cf6587797b0

Can you explain what the pre-allocated process (621) is doing?  Is that the pre-allocated process for the next app to launch or is it the process that just became camera?

I ask because I noticed it is in the middle of something called "PreloadSlowThings" around that delay time.  Not sure if its related.

Also, if it is the pre-allocated process for the next app launch, why is it starting so soon?  It should wait 5 seconds at a minimum I think.
(In reply to Ben Kelly [:bkelly] from comment #15)
> 
> Can you explain what the pre-allocated process (621) is doing?  Is that the
> pre-allocated process for the next app to launch or is it the process that
> just became camera?
> 
> I ask because I noticed it is in the middle of something called
> "PreloadSlowThings" around that delay time.  Not sure if its related.
> 
> Also, if it is the pre-allocated process for the next app launch, why is it
> starting so soon?  It should wait 5 seconds at a minimum I think.

621 is (should be?) the Preallocated app for the next process after the camera. Now that you mention it, I thought it was supposed to wait 5 seconds too. I wonder if "PreloadSlowThings" could be delaying the b2g process's app.launch.

This doesn't mean much to me:
  http://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.h#l174
Do we have iostat or anything?  Maybe we're I/O bound somehow.

The weird thing to me about that 621 process is it appears to exist prior to your tap event.  If I'm reading things right...

Is this a case of camera being launched in the middle of the pre-allocated app spinning up?  As in we're missing the pre-allocation optimization?  That might explain some of the variability in reproducing it.
Depends on: 916029
No longer depends on: 916029
Mandyam,

I've used a 240fps camera to capture/compare the Camera app launch times between v1.1 and v1.2. The results in this attached spreadsheet were obtained by stepping through the videos frame-by-frame in Kdenlive.

- b2g18/v1.1: http://people.mozilla.org/~mhabicher/PA111086.AVI
- aurora/v1.2: http://people.mozilla.org/~mhabicher/PA111087.AVI

According to these videos, from touch-to-UI-active takes 2.2s under b2g18, and 2.1s under aurora--basically the same. This reflects what I am seeing programmatically, where touch-to-preview-started time is also slightly faster under v1.2.

These numbers are based on inari; under hamachi, the camera driver takes much longer to initialize so the times are longer overall, but v1.2 is still slightly faster.

Are you still seeing this on your end?
Flags: needinfo?(mvikram)
Can you please explain how you loaded the 1.2 build on the ZTE device? When we flash 1.2 Gaia/Gecko on the device, it doesn't work.
Flags: needinfo?(mvikram)
Using a 1.2 build (from l0/9) on a TCL device, using our high speed camera, we show:
1808 ms.
(In reply to Mandyam Vikram from comment #20)
>
> Using a 1.2 build (from l0/9) on a TCL device, using our high speed camera,
> we show: 1808 ms.

How does this compare to the time you're seeing on v1.1?

(I loaded 1.2 onto the ZTE device using |BRANCH=v1.2 ./config.sh inari && ./build.sh && ./flash.sh|. The exact versions of gecko/gaia that I'm using are noted in the spreadsheet in comment 18.)
Flags: needinfo?(mvikram)
Target Milestone: --- → 1.2 C3(Oct25)
Attachment #816053 - Attachment description: Visual launch-time comparison between v1.1 and v1.2, 240fps → Visual launch-time comparison between v1.1 and v1.2 on Inari, 240fps
Videos captured for the analysis:
- b2g18: http://people.mozilla.org/~mhabicher/PA151088.AVI
- aurora: http://people.mozilla.org/~mhabicher/PA151090.AVI

Frame-by-frame examination of these videos shows a last-touch-to-UI-active time of:
- b2g18: 4062ms
- aurora: 4229ms

Once again, v1.2 is slightly faster (and the slower times overall are consistent with this device).

Mandyam, can you please share your videos with us so we can look at what you're seeing? Without the comparison, I'm going to have to close this as WORKFORME.
Mike, do you still see the delay from comment 16?  Or has that been resolved?
(In reply to Ben Kelly [:bkelly] from comment #23)
>
> Mike, do you still see the delay from comment 16?  Or has that been resolved?

I'm not sure, but I don't see any visual impact on the camera launch time, at least compared to v1.1. I'll try to get some profile logs along with video captures to see.

In all of the above cases (and those in bug 920713), the app launch was carried out after a device reboot, and after |adb shell b2g-ps| had confirmed that the Preallocated app was finished loading (idle, with static memory footprints).
Please don't close it yet. We'll try to get the numbers for v1.1 on the TCL device.
Flags: needinfo?(mvikram)
Attached file Visual launch-time of v1.2 on Hamachi (deleted) —
In addition to this spreadsheet, see also, 240fps video:
  http://people.mozilla.org/~mhabicher/PA161095.AVI

Cleopatra profile:
  http://people.mozilla.org/~bgirard/cleopatra/#report=96ce8de174d35f930a4aa89970318a58ec9423f0

The profile shows the time from the last RecvRealTouchEvent to the OnPreviewStateChange.after event as 4929ms, ~600ms longer than what seems to be taking place in the video. Even with that slack, it looks like the Webapps.jsm::launch() call in the profile happens between the "black app screen becomes full size" and "Camera app icon disappears" events in the spreadsheet. If that's correct, then between 787 and 1025ms of the Camera app launch seem to be due to various delays and animations.
(In reply to Mike Habicher [:mikeh] from comment #22)
> 
> Frame-by-frame examination of these videos shows a last-touch-to-UI-active
> time of:
> - b2g18: 4062ms
> - aurora: 4229ms

These are backwards, and should be (from the spreadsheet):
- b2g18: 4229ms
- aurora: 4062ms
(In reply to Mike Habicher [:mikeh] from comment #22)
> ...
> Mandyam, can you please share your videos with us so we can look at what
> you're seeing? Without the comparison, I'm going to have to close this as
> WORKFORME.

Vikram & Michael Vines,

Please attach videos of the tests showing these regressions. Mike Habicher has exhausted all avenues trying to reproduce this and these videos are now essential to confirm that we're measuring the same thing.

Thanks,
Mike
Flags: needinfo?(mvines)
Flags: needinfo?(mvikram)
Flags: needinfo?(mvines)
We'll try it on a 1.1 build and confirm.
Flags: needinfo?(mvikram)
On the TCL device we are seeing a launch latency of 3665 on a commercial TA launch build.
The discrepancy may be on the QRD device. 
Please feel free to close this issue out.
Thanks, Vikram. Please file a new issue if this happens again.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
No longer depends on: 920713
Target Milestone: 1.2 C3(Oct25) → 1.3 Sprint 3 - 10/25
Setting Target Milestone for all FxOS Perf 1.2 issues fixed for 10.25.
Target Milestone: 1.3 Sprint 3 - 10/25 → 1.2 C3(Oct25)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: