Closed
Bug 1024928
Opened 10 years ago
Closed 7 years ago
Video Cold Launch Time 200ms Regression
Categories
(Core :: DOM: Content Processes, defect, P3)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: jhylands, Assigned: cyu)
References
Details
(Keywords: perf, regression, Whiteboard: [c=regression p= s= u=])
On or about Feb 5, 2014, the Video app's cold load launch time regressed by 200ms due to this patch:
5a5fa6d369ab2e3c430cb9e1a5e909e9922c4a3f is the first bad commit
commit 5a5fa6d369ab2e3c430cb9e1a5e909e9922c4a3f
Author: Fabrice Desré <fabrice@mozilla.com>
Date: Tue Jan 14 16:00:25 2014 -0800
Bug 950266 - Re-enable the Nuwa process on B2G by default r=me
(Please see bug 1002842 for the bisection leading to this)
Clearly, this can't be backed out at this time, but we need to figure out specifically what caused the issue, and see if it can be fixed.
Reporter | ||
Comment 1•10 years ago
|
||
Fabrice, can you look at this and comment?
Updated•10 years ago
|
Severity: normal → critical
OS: Windows 7 → Gonk (Firefox OS)
Priority: -- → P1
Hardware: x86_64 → ARM
Updated•10 years ago
|
Blocks: 950266
Component: Gaia::Video → IPC
Keywords: regression
Product: Firefox OS → Core
Version: unspecified → 28 Branch
Updated•10 years ago
|
Component: IPC → DOM: Content Processes
Comment 2•10 years ago
|
||
ha ha... Nuwa changes a zillion things. Is the video app the only one impacted?
Flags: needinfo?(fabrice)
Reporter | ||
Comment 3•10 years ago
|
||
Honestly, I don't know at this point. I can certainly pull the data for a few other apps, and check.
Comment 4•10 years ago
|
||
What's the user impact of the delay? Did this regress since 1.3?
Flags: needinfo?(jhylands)
Reporter | ||
Comment 6•10 years ago
|
||
It looks like all the apps regressed on the same date. There appears to be a corresponding drop around Feb 7 or 8 - was the Nuwa patch backed out then, or was it something else?
Comment 7•10 years ago
|
||
QA,
Can we please verify if there is a drop significant to 1.3?
Keywords: qawanted
Comment 8•10 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #7)
> QA,
>
> Can we please verify if there is a drop significant to 1.3?
This isn't a request QA does - we don't do perf studies of this kind.
Keywords: qawanted
Comment 9•10 years ago
|
||
Mike
Is this regression on par for your acceptance criteria? If not we'll need this fixed so please help make a call here
Flags: needinfo?(mlee)
Comment 10•10 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #9)
> ...
> Is this regression on par for your acceptance criteria? If not we'll need
> this fixed so please help make a call here
Launch speed regressions of 200ms meet our FxOS Performance Criteria for blocking [1]. That said the apparent cause of this regression, Nuwa [2] is a core feature we're unlikely to backout especially so late in the 1.4 release cycle.
The effort needed to correct or improve this regression is probably best targeted for 2.0 or later so noming as a 2.0 blocker.
[1] https://wiki.mozilla.org/FirefoxOS/Performance/Triage#Blockers
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=771765#c0
(In reply to Jon Hylands [:jhylands] from comment #6)
> It looks like all the apps regressed on the same date. There appears to be a
> corresponding drop around Feb 7 or 8 - was the Nuwa patch backed out then,
> or was it something else?
Fabrice, can you respond to Jon's question re: Nuwa being backed out?
blocking-b2g: 1.4? → 2.0?
Flags: needinfo?(mlee) → needinfo?(fabrice)
Comment 11•10 years ago
|
||
(In reply to Mike Lee [:mlee] from comment #10)
> Launch speed regressions of 200ms meet our FxOS Performance Criteria for
> blocking [1]. That said the apparent cause of this regression, Nuwa [2] is a
> core feature we're unlikely to backout especially so late in the 1.4 release
> cycle.
Backing out nuwa (or even disabling it) is just not an option at all at this point, on any branch.
> (In reply to Jon Hylands [:jhylands] from comment #6)
> > It looks like all the apps regressed on the same date. There appears to be a
> > corresponding drop around Feb 7 or 8 - was the Nuwa patch backed out then,
> > or was it something else?
>
> Fabrice, can you respond to Jon's question re: Nuwa being backed out?
All the fun happened in bug 950266, in which:
- it initially landed on 01/31 (https://bugzilla.mozilla.org/show_bug.cgi?id=950266#c78)
- was backed out on 02/01 (https://bugzilla.mozilla.org/show_bug.cgi?id=950266#c85)
- relanded on m-c on 02/04 (https://bugzilla.mozilla.org/show_bug.cgi?id=950266#c89)
- merged on 1.3t on 02/07 (https://bugzilla.mozilla.org/show_bug.cgi?id=950266#101)
Flags: needinfo?(fabrice)
Reporter | ||
Comment 12•10 years ago
|
||
Okay, interesting, so the drop on 2/7 in m-c must have been something else.
Reporter | ||
Comment 13•10 years ago
|
||
And while I also don't think disabling Nuwa is an option, it might be helpful if we can figure out why it causes a ~200ms cold load time regression, and see if we can fix that in 2.0.
Comment 14•10 years ago
|
||
Given we shipped 1.3 and 1.4 with this, I am not sure why we would specifically block a 2.0 release for this issue. Can you help here?
Flags: needinfo?(mlee)
Comment 15•10 years ago
|
||
Nuwa was a critical feature committed for 1.3 [1], that's why we shipped with these results. 1.4, as far as I know, hasn't yet shipped and is still being locked down, blocking that release now is obviously not an option.
A 200ms regression in launch time is worth blocking. It represents more than 15% of our app launch goal [2] and is 4 times longer than the maximum amount of time it takes a person to perceive successive visual change [3]. We should make an effort to fix or reduce this regression.
With performance these degradations in speed accumulate over time. It's best to address them early when there are fewer contributing causes.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=771765#c217
[2] https://wiki.mozilla.org/FirefoxOS/Performance/Release_Acceptance#Criteria
[3] https://mozilla.app.box.com/s/aww17rx74k7fjds5vada
Flags: needinfo?(mlee) → needinfo?(bbajaj)
Updated•10 years ago
|
blocking-b2g: 2.0? → 2.0+
Reporter | ||
Comment 16•10 years ago
|
||
Fabrice, given that we're looking at this as a 2.0 blocker now, can you point me at the right person to look at this code, to see if we can figure out why re-enabling Nuwa caused this 200ms launch regression?
Flags: needinfo?(fabrice)
Updated•10 years ago
|
Severity: critical → blocker
Whiteboard: [c=regression p= s= u=] → [c=regression p= s= u=2.0]
We lived with this on 1.3 and 1.4, this really shouldn't block.
Comment 18•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #17)
> We lived with this on 1.3 and 1.4, this really shouldn't block.
I agree.
Cervantes can help there. Did you try to build with Nuwa disabled on 2.0/2.1 to see where we are?
Flags: needinfo?(fabrice) → needinfo?(cyu)
Assignee | ||
Comment 19•10 years ago
|
||
I don't usually build with Nuwa disabled. 200 ms of code launch time doesn't look normal: the app should be launched from the same starting point: an app process is preallocated with preload.js loaded. The difference between Nuwa and non-Nuwa preallocated process should only be COW-pages and wrapper functions, but the difference should be minimal in theory. I can take a look to see if something is wrong.
Assignee: nobody → cyu
Flags: needinfo?(cyu)
Comment 20•10 years ago
|
||
(In reply to Mike Lee [:mlee] from comment #15)
> Nuwa was a critical feature committed for 1.3 [1], that's why we shipped
> with these results. 1.4, as far as I know, hasn't yet shipped and is still
> being locked down, blocking that release now is obviously not an option.
>
> A 200ms regression in launch time is worth blocking. It represents more than
> 15% of our app launch goal [2] and is 4 times longer than the maximum amount
> of time it takes a person to perceive successive visual change [3]. We
> should make an effort to fix or reduce this regression.
>
> With performance these degradations in speed accumulate over time. It's best
> to address them early when there are fewer contributing causes.
>
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=771765#c217
> [2]
> https://wiki.mozilla.org/FirefoxOS/Performance/Release_Acceptance#Criteria
> [3] https://mozilla.app.box.com/s/aww17rx74k7fjds5vada
thanks, that make sense..and jsmith also pointed me to the blocking criteria established here: https://wiki.mozilla.org/FirefoxOS/Performance/Triage#Blockers, so I'll leave it here to understand why this regressed.
Flags: needinfo?(bbajaj)
If the performance team thinks this should block 2.0 after living with it for two releases the performance team really should put up the manpower to investigate it.
Assignee | ||
Comment 22•10 years ago
|
||
I ran the test locally against unagi using m-c- revision 83c09fe3a658. I tested the cold launch time of the the Video and the Template app. The following tests are done using b2gperf --itertion=10 --delay=10 <App Name>:
Without Nuwa:
Results for Video, cold_load_time: median:1367, mean:1375, std: 40, max:1486, min:1336
Results for Video, cold_load_time: median:1412, mean:1418, std: 58, max:1572, min:1331
Results for Video, cold_load_time: median:1377, mean:1397, std: 67, max:1553, min:1327
Results for Template, cold_load_time: median:574, mean:557, std: 59, max:664, min:470
Results for Template, cold_load_time: median:580, mean:559, std: 64, max:666, min:466
Results for Template, cold_load_time: median:575, mean:571, std: 72, max:674, min:468
With Nuwa:
Results for Video, cold_load_time: median:1461, mean:1470, std: 38, max:1549, min:1425
Results for Video, cold_load_time: median:1444, mean:1456, std: 32, max:1552, min:1433
Results for Video, cold_load_time: median:1445, mean:1481, std: 66, max:1629, min:1420
Results for Template, cold_load_time: median:562, mean:557, std: 61, max:654, min:458
Results for Template, cold_load_time: median:543, mean:537, std: 51, max:611, min:451
Results for Template, cold_load_time: median:590, mean:575, std: 58, max:694, min:479
It can be seen that the launch time of an empty app is the same, w/wo Nuwa, but the Video app shows some difference (several dozens to 100 ms). Because ranges of launch time overlaps w/wo Nuwa, taking a closer look at the constitution of the app launch time might produce a misleading result. I think we might need to run the tests with SPS enabled.
Mike, is there any way we can run b2gperf with SPS enabled?
Flags: needinfo?(mlee)
Comment 23•10 years ago
|
||
Jonathan, can someone on the ATeam answer Cervantes question:
> Is there any way we can run b2gperf with SPS enabled?
Status: NEW → ASSIGNED
Flags: needinfo?(mlee) → needinfo?(jgriffin)
Comment 24•10 years ago
|
||
You could certainly run b2gperf with a local SPS-enabled build; we don't have the capacity to run b2gperf with custom builds in our infrastructure at this time.
Other than a custom build, would b2gperf have to work differently with SPS?
Flags: needinfo?(jgriffin)
Assignee | ||
Comment 25•10 years ago
|
||
SPS is enabled by starting b2g with MOZ_PROFILER_STARTUP=1 environment variable. But I just realized that running b2gperf with SPS doesn't help in diagnosing app launch time problems because the app process is started and then killed. The launch time is so short that SPS can't get statistically meaningful data.
Event ordering might be relevant to the increased app launch time. With Nuwa, in app launch we always get 'loadend' DOM event before sending the 'loadend' browserelementapi:call message to the chrome process to notify the system app. In contrast, without Nuwa, we seldom see the DOM event received before notifying the system ap. Why the 2 events appear in this unfavourable order with Nuwa enabled is still unknown. I need to take a deeper look.
Assignee | ||
Comment 27•10 years ago
|
||
I am tracing the events that happen during app launch to see what makes the events happen in this unfavourable order when Nuwa is enabled.
Flags: needinfo?(cyu)
We have far more important things to focus on at this point. This should be 2.0-.
Flags: needinfo?(jsmith)
Updated•10 years ago
|
Flags: needinfo?(jsmith)
Comment 29•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28)
> We have far more important things to focus on at this point. This should be
> 2.0-.
Mike - What do you think?
Flags: needinfo?(mlee)
Comment 30•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28)
> We have far more important things to focus on at this point. This should be
> 2.0-.
Actually, when I discussed this bug with all engineering managers and EPMs in Taipei yesterday, Thinker and I agreed with what Kyle said here. But, seems like Mike Lee provided a firmed feedback in comment #10.
Comment 31•10 years ago
|
||
Okay. Agree to minus this from 2.0.
Severity: blocker → normal
blocking-b2g: 2.0+ → ---
Flags: needinfo?(mlee)
Priority: P1 → P3
Whiteboard: [c=regression p= s= u=2.0] → [c=regression p= s= u=]
Assignee | ||
Comment 32•10 years ago
|
||
In my test using unagi, there is ~100ms difference in app launch times with or without Nuwa. It appears that the timing of the XHR sent in gaia/shared/js/l10n.js to load locales-obj/en-US.json contributes most of the added startup time of the Video app.
Assignee | ||
Updated•7 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•