Closed Bug 1178451 Opened 9 years ago Closed 9 years ago

[Dialer] Spamming keypad inputs during a call can cause the device to become unresponsive.

Categories

(Firefox OS Graveyard :: Gaia::Dialer, defect, P3)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.5+, b2g-v2.0 unaffected, b2g-v2.1 affected, b2g-v2.2 affected, b2g-master affected)

RESOLVED FIXED
blocking-b2g 2.5+
Tracking Status
b2g-v2.0 --- unaffected
b2g-v2.1 --- affected
b2g-v2.2 --- affected
b2g-master --- affected

People

(Reporter: NicholasN, Assigned: gsvelto)

References

()

Details

(Keywords: regression, Whiteboard: [3.0-Daily-Testing][Spark])

Attachments

(4 files)

Attached file logcat_dialer.txt (deleted) —
Description: Two devices are in a call, and device A brings up the keypad. For 1.5 to 3 minutes, device A spams keypad inputs and then device B hangs up. Device A stays on the call screen and is temporarily unresponsive. On Aries the device eventually recovers, depending on how long inputs were spammed. On flame the device eventually rebooted after hanging. Although this user scenario is uncommon, the result is significantly bad. Repro Steps: 1) Update a Aries to 20150629130633 2) Open dialer and call another device. 3) Bring up the key pad and rapidly spam inputs. 4) After 1.5-3 minutes, hang up on the second device. 5) Observe device A. Actual: Device hangs after spamming inputs during call. Expected: Device continues to function appropriately after spamming inputs. Notes: Environmental Variables: Device: Aries 2.5 Build ID: 20150629130633 Gaia: 0b166043ef2a1f235a4d7d4f40a51b625784195a Gecko: e137fc38c431 Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd Version: 41.0a1 (2.5) Firmware Version: D5803_23.1.A.1.28_NCB.ftf User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0 Repro frequency: 4/5 See attached: video clip, logcat
This issue also reproduced on Flame 2.5 and Flame 2.2, but not Flame 2.1 or Flame 2.0 Flame 2.0 (did not occur) Actual Result: After spamming numbers the device did not slow down or hang. Environmental Variables: Device: Flame 2.0 BuildID: 20150629000206 Gaia: 5552bf529d3d6775a968942e9afa6c1d4037362c Gecko: b11263ce4493 Gonk: bd9cb3af2a0354577a6903917bc826489050b40d Version: 32.0 (2.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0 Flame 2.1 (did not occur) Actual Result: After spamming numbers the device did not slow down or hang. Environmental Variables: Device: Flame 2.1 BuildID: 20150628001204 Gaia: f8b848c82d1ed589f7a1eb5cc099830c867ff1d4 Gecko: d3e432c4546a Gonk: bd9cb3af2a0354577a6903917bc826489050b40d Version: 34.0 (2.1) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0 Flame 2.2 (Occurs) Actual Result: Spamming numbers during a call caused the phone to hang and eventually reboot automatically. Environmental Variables: Device: Flame 2.2 BuildID: 20150626162505 Gaia: 0179935627012dfde3ca036c9a71035be463b7ad Gecko: 330f52ef6a2d Gonk: bd9cb3af2a0354577a6903917bc826489050b40d Version: 37.0 (2.2) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0 Flame 2.5 (Occurs) Actual Result: Spamming numbers during a call caused the phone to hang and eventually reboot automatically. Environmental Variables: Device: Flame 2.5 BuildID: 20150629010206 Gaia: 8a1e4ae522c121c5cacd39b20a5386ec9055db82 Gecko: eaf4f9b45117 Gonk: a4f6f31d1fe213ac935ca8ede7d05e47324101a4 Version: 41.0a1 (2.5) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Keywords: regression
Whiteboard: [3.0-Daily-Testing][Spark]
[Blocking Requested - why for this release]: Regression that happened after 2.1.
blocking-b2g: --- → 2.5?
Comms triage: Regression
blocking-b2g: 2.5? → 2.5+
Requesting a window.
Flags: needinfo?(pbylenga)
QA Contact: jmercado
This issue actually is affected on 2.1, but I was unable to reproduce this on 2.0. Environmental Variables: Device: Flame 2.1 BuildID: 20150629100738 Gaia: 7080a7c28b0242f81d689d2339dfa1177e23f48f Gecko: c201e76c63d1 Version: 34.0 (2.1) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0 Environmental Variables: Device: Flame 2.0 BuildID: 20150629192043 Gaia: 5552bf529d3d6775a968942e9afa6c1d4037362c Gecko: 88eedcc6fd9b Version: 32.0 (2.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
I was able to reproduce this issue in Central all the way back to the earliest Flame build we have available. This is a 2.0 build which means that it was likely fixed at some point in the 2.0 Branch but we don't have access to builds from JB Flame 2.0 branch anymore so I can't continue this window. Environmental Variables: Device: Flame 2.0 BuildID: 20140506163010 Gaia: 98ca8c55dbe2f21a8661d0eaa87f34d316c3bc98 Gecko: 4e4e0f502969 Version: 32.0a1 (2.0) Firmware Version: v123 User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Assignee: nobody → drs
Identified as paper cut. P3
Priority: -- → P3
Status: NEW → ASSIGNED
Hi Doug, any progress here?
Flags: needinfo?(drs)
I've reproduced this on the emulator and grabbed a profile while this happens, you can find it here: http://people.mozilla.org/~bgirard/cleopatra/#report=3cffd416ad1238cd1beedc6e465230f1dcd953e6&filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A391743,%22end%22%3A396986}]&selection=0,1,24,25,26,27,28,29,30,31,36,37 The problem is completely different than what I originally thought: every time the user taps a key we try to adapt the font of the text so that it would fit within the dialer display without ellipsis. Past a certain point this is obviously impossible as we've got a minimal font size and then we start showing an ellipsis. The issue is that as the user keeps typing we keep trying to measure the text to see if we can find a "good" font size to display it entirely and this becomes more and more expensive as the string length becomes longer. In the end we spend 100s of ms within CanvasRenderingContext2D.measureText() and the phone becomes unresponsive because of this; this is made worse by the fact that the callscreen now lives in the system app (facepalm) and thus starves the main app's main thread which is busy computing font sizes all the time (double-facepalm). The offending line is here: https://github.com/mozilla-b2g/gaia/blob/9d4b4b23eb73ac0bd4ad65ce14390c9e94087235/shared/js/font_size_utils.js#L167 Solving this won't be easy; one way to do this is to cache the last string we've measured and if we've already hit the smallest font size and a longer string is submitted which starts with the cached one then we can return the smallest font size directly without measuring it. This is going to be a little tricky though as we need to ensure that everything else stayed the same between calls, besides this is shared code so changing it might break stuff elsewhere so we need to be extra-cautious. Alternatively we can implement this fix in the dialer code: past a certain point stop requesting new font sizes as you already know it's not going to yield a different one.
Slight change in my analysis. The FontSizeUtils is calling CanvasRenderingContext2D.measureText() in a loopm within the getOverflowCount() function always starting from 0 characters which means that function has an O(n^2) cost, ouch. The fix is trivial though, we can cache the last overflow count and start from there, discarding the result and starting over if we find it's the wrong starting value.
Stealing the bug.
Assignee: drs → gsvelto
Flags: needinfo?(drs)
Comment on attachment 8676303 [details] [gaia] gabrielesvelto:bug-1178451-cache-last-overflow-count-computation > mozilla-b2g:master This PR tries to reuse the last overflow count computation instead of starting over each time. This works well for repeated calls with the same string prefix, e.g.: "a", "ab", "abc", "abcd", etc... In other scenarios if the first computation doesn't yield a correct result we fall back to the original behavior. Michael can you review this? It's shared code and you wrote most of it so I think you're the best candidate to review my change.
Attachment #8676303 - Flags: review?(mhenretty)
Thanks for the patch Gabriele, I'll take a look at this tomorrow. I'm wondering if the gaia font-fit web component [1] has the same problem, or if not if we should just switch over to using that. I'll have more time to deep dive tomorrow. https://github.com/gaia-components/font-fit/blob/08114e2/font-fit.js#L57
In font-fit we're caching the canvas context, but not the result itself (I tried it in the past and it didn't bring improvements). Used in gaia-header it doesn't have the same behavior either as we're throttling the font-fit call.
Comment on attachment 8676303 [details] [gaia] gabrielesvelto:bug-1178451-cache-last-overflow-count-computation > mozilla-b2g:master Nice fix! I left some comments on github. Also, let's add a unit test or two for this optimization [1]. 1.) https://github.com/mozilla-b2g/gaia/blob/47dd6cf/apps/sharedtest/test/unit/font_size_utils_test.js
Attachment #8676303 - Flags: review?(mhenretty)
I'd like to use the gaia font-fit component in the dialer but such a big change is too risky so close to the 2.5 branching date. I'll address your comments and post an updated PR soon, thanks!
Comment on attachment 8676303 [details] [gaia] gabrielesvelto:bug-1178451-cache-last-overflow-count-computation > mozilla-b2g:master Pushed a new patch with a unit-test added and a comment. I didn't factorize those two lines into a function because we'd need two outputs: the result width and the substring so they're not really a good candidate for putting them in a single function.
Attachment #8676303 - Flags: review?(mhenretty)
Comment on attachment 8676303 [details] [gaia] gabrielesvelto:bug-1178451-cache-last-overflow-count-computation > mozilla-b2g:master Thanks!
Attachment #8676303 - Flags: review?(mhenretty) → review+
Thanks for reviewing this, merged to gaia/master 41b23d23f0ad911f3e7763b9fff495c6a8bcfedc https://github.com/mozilla-b2g/gaia/commit/41b23d23f0ad911f3e7763b9fff495c6a8bcfedc
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Hi Gabriele, Could you please take a look at the results below again? Please note that user need input more spamming in the call, then device hangs. ------------------------------------------------------------------------------------------------ This bug has been verified as "Fail" on the latest build of Flame KK v2.5 and Aries KK v2.5 by the STR in comment 0. Actual results: Spam inputting many numbers during a call can cause the device to become unresponsive. See attachments: Verified_logcat_1551.txt Verified_Aries_v2.5.3gp Reproduce rate: 10/10 Device: Aries KK 2.5 (Fail) Build ID 20151026223015 Gaia Revision b564b21e08ffc3e4962f08850843c7482932ee7b Gaia Date 2015-10-26 14:22:21 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/6c7c983bce46a460c2766fbdd73283f6d2b03a69 Gecko Version 44.0a1 Device Name aries Firmware(Release) 4.4.2 Firmware(Incremental) eng.worker.20151026.214940 Firmware Date Mon Oct 26 21:49:48 UTC 2015 Bootloader s1 Device: Flame KK 2.5 (Fail) Build ID 20151026030217 Gaia Revision a677ddd3aa3a81058775938bd56008d96dbc78b0 Gaia Date 2015-10-26 04:48:31 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/5ca03a00d26823ce91ee0eaa2937bed605bd53c1 Gecko Version 44.0a1 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20151026.070911 Firmware Date Mon Oct 26 07:09:23 EDT 2015 Firmware Version v18D v4 Bootloader L1TC000118D0
Attached video Verified_Aries_v2.5.3gp (deleted) —
Attached file Verified_logcat_1551.txt (deleted) —
Flags: needinfo?(gsvelto)
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][MGSEI-Triage+]
Thanks for checking again Sharon, I've tried spamming the keypad on my device for 90s but I can't seem to hit the issue anymore. I've noticed another issue though it seems that we're "piling up" the audible tones for each key and we keep playing them long after the user has finished typing I'll file a separate bug for that. I'm now looking at the RIL log, I suspect the issue might not lie in the dialer.
I finally found out what the problem is, though it's different than what I initially thought. The issue isn't caused by simply spamming the keyboard, it just doesn't manifest if you do it with only one finger, it's caused by multi-tap. When typing really fast we can often hit two keys at the same time and that triggers an area of the code that's known to cause problems. I'll open a separate bug for that and fix it there.
Flags: needinfo?(gsvelto)
Blocks: 1219249
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: