907087 - [Messages] Application hangs attempting to open message

Reporter

Description

•

11 years ago

Phone type (Keon, Peak, other): Keon OS version (Settings>Device information>More information>): 1.2.0.0-prerelease Build Identifier (...>More information>): 20130820022245 Update channel (...>More information>): default Steps to Reproduce: 1) open SMS ("Messages") app 2) select message Expected Result: SMS message is displayed Actual Result: App hangs until terminated Is this issue sometimes or always reproducible: Always

Rick Waldron [:rwaldron]

Comment 1

•

11 years ago

Unfortunately, we don't have one of these devices at Bocoup, so I can't confirm by reproduction.

Rick Waldron [:rwaldron]

Updated

•

11 years ago

Summary: SMS app hangs attempting to open message → [Messages] Application hangs attempting to open message

Hubert Figuiere [:hub]

Comment 2

•

11 years ago

Same thing is happening here. On Keon. Using the official nightly.

blocking-b2g: --- → koi?

Keywords: regression

Robert Helmer [:rhelmer]

Reporter

Comment 3

•

11 years ago

I tried sending a message, which worked, and now I am able to load messages again (!)

Robert Helmer [:rhelmer]

Reporter

Comment 4

•

11 years ago

(In reply to Robert Helmer [:rhelmer] from comment #3) > I tried sending a message, which worked, and now I am able to load messages > again (!) Killed the app w/ task manager and re-opened, it's hanging again as before.

Hubert Figuiere [:hub]

Comment 5

•

11 years ago

BTW my Keon build is 20130818024607 (Nightly, OTA updates)

Julien Wajsberg [:julienw]

Comment 6

•

11 years ago

Is there any meaningful logcat line ?

Hubert Figuiere [:hub]

Comment 7

•

11 years ago

Not sure if it is relevant, but I get these: I/Gecko ( 585): ###################################### forms.js loaded I/Gecko ( 585): ############################### browserElementPanning.js loaded I/Gecko ( 585): ######################## BrowserElementChildPreload.js loaded I/Gecko ( 109): [Parent 109] WARNING: waitpid failed pid:521 errno:10: file /home/geeksphone/FOS/keon/gecko/ipc/chromium/src/base/process_util_posix.cc, line 260 I/Gecko ( 109): [Parent 109] WARNING: waitpid failed pid:521 errno:10: file /home/geeksphone/FOS/keon/gecko/ipc/chromium/src/base/process_util_posix.cc, line 260 I/Gecko ( 109): [Parent 109] WARNING: Failed to deliver SIGKILL to 521!(3).: file /home/geeksphone/FOS/keon/gecko/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc, line 118 The WARNING appear when I tap on a thread to read the messages (and I get nothing displayed).

Julien Wajsberg [:julienw]

Comment 8

•

11 years ago

And was 521 the PID for the Messages app ? Was the Messages app process still running ?

Hubert Figuiere [:hub]

Comment 9

•

11 years ago

Ok, I see what is happening. The message appears the second time I try, after kill the Messages app. In that case 521 is the PID of Messages app that I killed with the task manager. 109 is the PID if the main b2g process.

Hubert Figuiere [:hub]

Comment 10

•

11 years ago

I updated to nightly-images-keon-2013-08-26.Gecko-f216c74.Gaia-f94d4a9.zip (without erasing user data) It is still happening.

Robert Helmer [:rhelmer]

Reporter

Comment 11

•

11 years ago

(In reply to Hubert Figuiere [:hub] from comment #10) > I updated to nightly-images-keon-2013-08-26.Gecko-f216c74.Gaia-f94d4a9.zip > (without erasing user data) > > It is still happening. I have been keeping up with OTA updates, and same here (same user data). In fact it's a little worse, I was able to send messages (and that would fix it until it was restarted, see comment 3), but it also hangs when sending messages now :/

Hubert Figuiere [:hub]

Comment 12

•

11 years ago

Well, bug 908757 doesn't allow me to get the OTA updates :-/

Robert Helmer [:rhelmer]

Reporter

Comment 13

•

11 years ago

(In reply to Hubert Figuiere [:hub] from comment #12) > Well, bug 908757 doesn't allow me to get the OTA updates :-/ Oh! Yeah that's happening to me too, I guess I have not been keeping up to date :( Thanks!

Julien Wajsberg [:julienw]

Comment 14

•

11 years ago

it's easily reproducable with the reference workload. don't know what happens though, there is nothing interesting in the logs.

Julien Wajsberg [:julienw]

Updated

•

11 years ago

Blocks: b2g-central-dogfood

Jason Smith [:jsmith]

Comment 15

•

11 years ago

Can this be reproduced on a device other than Keon for 1.2?

Keywords: qawanted

Hubert Figuiere [:hub]

Comment 16

•

11 years ago

Can't reproduce on Inari using my eng build done this AM with a light reference workload.

dkumar

Updated

•

11 years ago

QA Contact: dkumar

dkumar

Comment 17

•

11 years ago

In response to comment 15 Tested on both Buri and Unagi and was not able to reproduce this issue. Environmental Variables Build ID: 20130827040201 Gecko: http://hg.mozilla.org/mozilla-central/rev/e42dce3209da Gaia: 599214a0f41eece076dc83cd85f5b27f8cfe67f2 Platform Version: 26.0a1 I was able to see and open up the SMS message successfully.

Robert Helmer [:rhelmer]

Reporter

Comment 18

•

11 years ago

Thanks for testing this. Could it have to do with our user data maybe?

Julien Wajsberg [:julienw]

Comment 19

•

11 years ago

I've tried with the same medium workload on an unagi and a one touch fire, and the bug hasn't occured. I'm thinking of a driver problem then, maybe we can try to turn off the hardware graphical acceleration on the Keon.

dkumar

Updated

•

11 years ago

Keywords: qawanted

Jason Smith [:jsmith]

Updated

•

11 years ago

No longer blocks: b2g-central-dogfood

Joe Cheng [:jcheng] (please needinfo)

Comment 20

•

11 years ago

clear the blocking flag as no one can reproduce please nominate if reproduced again. thanks

blocking-b2g: koi? → ---

Julien Wajsberg [:julienw]

Comment 21

•

11 years ago

Joe, I definitely reproduce on Keon.

blocking-b2g: --- → koi?

Jason Smith [:jsmith]

Comment 22

•

11 years ago

(In reply to Julien Wajsberg [:julienw] from comment #21) > Joe, I definitely reproduce on Keon. This needs to be reproducible on something other than Keon in order to block on this.

Hubert Figuiere [:hub]

Comment 23

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #22) > This needs to be reproducible on something other than Keon in order to block > on this. Isn't Keon an officially supported platform?

Julien Wajsberg [:julienw]

Comment 24

•

11 years ago

so, tried again on hamachi, and it doesn't reproduce there.

blocking-b2g: koi? → ---

Hubert Figuiere [:hub]

Comment 25

•

11 years ago

To put in perspective: this is the bug that forced me to stop dogfooding Firefox OS. This make it very hard.

Julien Wajsberg [:julienw]

Comment 26

•

11 years ago

Hubert: yes, this is important to me too, this is just not blocking the release, if I understood correctly.

Hubert Figuiere [:hub]

Comment 27

•

11 years ago

So let's put it that way: we are ready to release Firefox OS that won't work on one of the few unlocked devices in existence? This is ridiculous.

Julien Wajsberg [:julienw]

Updated

•

11 years ago

blocking-b2g: --- → koi?

Chris Lord [:cwiiis]

Comment 28

•

11 years ago

Just to note, this also breaks the tiled layers backend on the Keon. As soon as dup() is called on an FD that has mapped shared memory on a process beyond the first one, it halts during the dup. If it helps, this is very easy to reproduce using the tiled layers backend. Apply this patch: https://bugzilla.mozilla.org/attachment.cgi?id=792112 Then debug child processes, attach to the child process, break in ShareToProcess, then continue. If you step from this point, it gets stuck in some assembly inside dup(). This appears to be a regression, this worked fine for me a couple of weeks ago. Lost 3 days thinking it was a problem with my code :(

Chris Lord [:cwiiis]

Comment 29

•

11 years ago

Just to clarify my last comment (comment #28), I think the real bug here is that dup'ing a file descriptor hangs. I'm not sure how general this is, in my case, if more than one process tries to dup fd's mapped to shared memory, every subsequent process will hang indefinitely inside the dup after the 'swi' op.

Julien Wajsberg [:julienw]

Comment 30

•

11 years ago

Michael, this sounds like to be something for you (see comment 28 and 29).

Component: Gaia::SMS → General

Flags: needinfo?(mwu)

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 31

•

11 years ago

This sounds like a problem I was having with seccomp, for which I filed bug 907006: the syscall filter specifies that the process should be killed[*], and it does enter Z state but the usual notifications don't happen, which manifests as a hang. If the other devices mentioned don't have seccomp-bpf support in their kernels yet, then it makes sense that this wouldn't be reproducible there. We suspect a kernel bug; note that the kernel support had to be backported to the old kernel versions we're currently using. This can be verified by doing `adb shell cat /proc/N/status`, where N is the pid. If the hung process has "State: Z" and "Seccomp: 2", then it's the same bug. And if the child processes — live or otherwise — *don't* have "Seccomp: 2", then seccomp-bpf isn't being used. [*] There are a few known issues with Gecko's behavior and/or the seccomp whitelist; see bug 906996 (unlink) and bug 908907 (dup).

Michael Wu [:mwu]

Comment 32

•

11 years ago

(In reply to Julien Wajsberg [:julienw] from comment #30) > Michael, this sounds like to be something for you (see comment 28 and 29). Sounds like a seccomp issue which I'm not familiar with.

Flags: needinfo?(mwu)

Julien Wajsberg [:julienw]

Comment 33

•

11 years ago

Jed, I can confirm that it's the case. Is there anything we can do to help you here ? This is really painful for me because the keon is the only phone with moz-central that I have which works correctly on my network otherwise (not to mention dogfooders on keon)

Julien Wajsberg [:julienw]

Comment 34

•

11 years ago

Also confirmed that Buri doesn't use seccomp.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 35

•

11 years ago

Assuming that the current situation is considered untenable, where b2g is confusingly broken on devices that are commonly used for development or dogfooding but aren't part of normal testing, options are: (1) Disable seccomp entirely until it's fixed. I think we don't want to do that, either. (2) Whitelist any syscall that plugin-container is observed to use, even if it's not something we want to allow long-term (e.g., unlink), and file a bug to remove it. Also, --enable-content-sandbox-reporter by default on b2g to work around bug 907006, so that any failures will be obvious.

Julien Wajsberg [:julienw]

Updated

•

11 years ago

Depends on: 906996

Julien Wajsberg [:julienw]

Comment 36

•

11 years ago

* who could be a good assignee for (2) ? * what are the implication of the sandbox reporter ? Is it a good idea to always enable this ? Why is it not always enabled ? Thanks Jed !

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 37

•

11 years ago

I have done these things, and I'm filing/adjusting bugs for it, with bug 912791 as the meta-bug. As for why we don't want the reporter on all the time: I expect it's to reduce the attack surface. If the bad system call is in fact due to a compromise of the child process, it's safer (or, at least, no less safe) to kill the process immediately than to let it continue running in a signal handler. (The underlying assumption would be that we'll have done sufficient testing on this by the time it's released that there will be no false positives in production.)

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

Jason Smith [:jsmith]

Updated

•

11 years ago

blocking-b2g: koi? → ---