Closed
Bug 907087
Opened 11 years ago
Closed 11 years ago
[Messages] Application hangs attempting to open message
Categories
(Firefox OS Graveyard :: General, defect)
Firefox OS Graveyard
General
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 908907
People
(Reporter: rhelmer, Unassigned)
References
Details
(Keywords: regression, Whiteboard: [from-geeksphone])
Phone type (Keon, Peak, other): Keon
OS version (Settings>Device information>More information>): 1.2.0.0-prerelease
Build Identifier (...>More information>): 20130820022245
Update channel (...>More information>): default
Steps to Reproduce:
1) open SMS ("Messages") app
2) select message
Expected Result:
SMS message is displayed
Actual Result:
App hangs until terminated
Is this issue sometimes or always reproducible:
Always
Comment 1•11 years ago
|
||
Unfortunately, we don't have one of these devices at Bocoup, so I can't confirm by reproduction.
Updated•11 years ago
|
Summary: SMS app hangs attempting to open message → [Messages] Application hangs attempting to open message
Comment 2•11 years ago
|
||
Same thing is happening here. On Keon. Using the official nightly.
blocking-b2g: --- → koi?
Keywords: regression
Reporter | ||
Comment 3•11 years ago
|
||
I tried sending a message, which worked, and now I am able to load messages again (!)
Reporter | ||
Comment 4•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #3)
> I tried sending a message, which worked, and now I am able to load messages
> again (!)
Killed the app w/ task manager and re-opened, it's hanging again as before.
Comment 5•11 years ago
|
||
BTW my Keon build is 20130818024607 (Nightly, OTA updates)
Comment 6•11 years ago
|
||
Is there any meaningful logcat line ?
Comment 7•11 years ago
|
||
Not sure if it is relevant, but I get these:
I/Gecko ( 585): ###################################### forms.js loaded
I/Gecko ( 585): ############################### browserElementPanning.js loaded
I/Gecko ( 585): ######################## BrowserElementChildPreload.js loaded
I/Gecko ( 109): [Parent 109] WARNING: waitpid failed pid:521 errno:10: file /home/geeksphone/FOS/keon/gecko/ipc/chromium/src/base/process_util_posix.cc, line 260
I/Gecko ( 109): [Parent 109] WARNING: waitpid failed pid:521 errno:10: file /home/geeksphone/FOS/keon/gecko/ipc/chromium/src/base/process_util_posix.cc, line 260
I/Gecko ( 109): [Parent 109] WARNING: Failed to deliver SIGKILL to 521!(3).: file /home/geeksphone/FOS/keon/gecko/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc, line 118
The WARNING appear when I tap on a thread to read the messages (and I get nothing displayed).
Comment 8•11 years ago
|
||
And was 521 the PID for the Messages app ? Was the Messages app process still running ?
Comment 9•11 years ago
|
||
Ok, I see what is happening.
The message appears the second time I try, after kill the Messages app. In that case 521 is the PID of Messages app that I killed with the task manager.
109 is the PID if the main b2g process.
Comment 10•11 years ago
|
||
I updated to nightly-images-keon-2013-08-26.Gecko-f216c74.Gaia-f94d4a9.zip (without erasing user data)
It is still happening.
Reporter | ||
Comment 11•11 years ago
|
||
(In reply to Hubert Figuiere [:hub] from comment #10)
> I updated to nightly-images-keon-2013-08-26.Gecko-f216c74.Gaia-f94d4a9.zip
> (without erasing user data)
>
> It is still happening.
I have been keeping up with OTA updates, and same here (same user data).
In fact it's a little worse, I was able to send messages (and that would fix it until it was restarted, see comment 3), but it also hangs when sending messages now :/
Comment 12•11 years ago
|
||
Well, bug 908757 doesn't allow me to get the OTA updates :-/
Reporter | ||
Comment 13•11 years ago
|
||
(In reply to Hubert Figuiere [:hub] from comment #12)
> Well, bug 908757 doesn't allow me to get the OTA updates :-/
Oh! Yeah that's happening to me too, I guess I have not been keeping up to date :( Thanks!
Comment 14•11 years ago
|
||
it's easily reproducable with the reference workload.
don't know what happens though, there is nothing interesting in the logs.
Updated•11 years ago
|
Blocks: b2g-central-dogfood
Comment 15•11 years ago
|
||
Can this be reproduced on a device other than Keon for 1.2?
Keywords: qawanted
Comment 16•11 years ago
|
||
Can't reproduce on Inari using my eng build done this AM with a light reference workload.
Comment 17•11 years ago
|
||
In response to comment 15
Tested on both Buri and Unagi and was not able to reproduce this issue.
Environmental Variables
Build ID: 20130827040201
Gecko: http://hg.mozilla.org/mozilla-central/rev/e42dce3209da
Gaia: 599214a0f41eece076dc83cd85f5b27f8cfe67f2
Platform Version: 26.0a1
I was able to see and open up the SMS message successfully.
Reporter | ||
Comment 18•11 years ago
|
||
Thanks for testing this. Could it have to do with our user data maybe?
Comment 19•11 years ago
|
||
I've tried with the same medium workload on an unagi and a one touch fire, and the bug hasn't occured.
I'm thinking of a driver problem then, maybe we can try to turn off the hardware graphical acceleration on the Keon.
Updated•11 years ago
|
No longer blocks: b2g-central-dogfood
Comment 20•11 years ago
|
||
clear the blocking flag as no one can reproduce
please nominate if reproduced again. thanks
blocking-b2g: koi? → ---
Comment 22•11 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #21)
> Joe, I definitely reproduce on Keon.
This needs to be reproducible on something other than Keon in order to block on this.
Comment 23•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #22)
> This needs to be reproducible on something other than Keon in order to block
> on this.
Isn't Keon an officially supported platform?
Comment 24•11 years ago
|
||
so, tried again on hamachi, and it doesn't reproduce there.
blocking-b2g: koi? → ---
Comment 25•11 years ago
|
||
To put in perspective: this is the bug that forced me to stop dogfooding Firefox OS. This make it very hard.
Comment 26•11 years ago
|
||
Hubert: yes, this is important to me too, this is just not blocking the release, if I understood correctly.
Comment 27•11 years ago
|
||
So let's put it that way: we are ready to release Firefox OS that won't work on one of the few unlocked devices in existence? This is ridiculous.
Updated•11 years ago
|
blocking-b2g: --- → koi?
Comment 28•11 years ago
|
||
Just to note, this also breaks the tiled layers backend on the Keon. As soon as dup() is called on an FD that has mapped shared memory on a process beyond the first one, it halts during the dup.
If it helps, this is very easy to reproduce using the tiled layers backend. Apply this patch: https://bugzilla.mozilla.org/attachment.cgi?id=792112
Then debug child processes, attach to the child process, break in ShareToProcess, then continue. If you step from this point, it gets stuck in some assembly inside dup().
This appears to be a regression, this worked fine for me a couple of weeks ago. Lost 3 days thinking it was a problem with my code :(
Comment 29•11 years ago
|
||
Just to clarify my last comment (comment #28), I think the real bug here is that dup'ing a file descriptor hangs.
I'm not sure how general this is, in my case, if more than one process tries to dup fd's mapped to shared memory, every subsequent process will hang indefinitely inside the dup after the 'swi' op.
Comment 30•11 years ago
|
||
Michael, this sounds like to be something for you (see comment 28 and 29).
Component: Gaia::SMS → General
Flags: needinfo?(mwu)
Comment 31•11 years ago
|
||
This sounds like a problem I was having with seccomp, for which I filed bug 907006: the syscall filter specifies that the process should be killed[*], and it does enter Z state but the usual notifications don't happen, which manifests as a hang. If the other devices mentioned don't have seccomp-bpf support in their kernels yet, then it makes sense that this wouldn't be reproducible there. We suspect a kernel bug; note that the kernel support had to be backported to the old kernel versions we're currently using.
This can be verified by doing `adb shell cat /proc/N/status`, where N is the pid. If the hung process has "State: Z" and "Seccomp: 2", then it's the same bug. And if the child processes — live or otherwise — *don't* have "Seccomp: 2", then seccomp-bpf isn't being used.
[*] There are a few known issues with Gecko's behavior and/or the seccomp whitelist; see bug 906996 (unlink) and bug 908907 (dup).
Comment 32•11 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #30)
> Michael, this sounds like to be something for you (see comment 28 and 29).
Sounds like a seccomp issue which I'm not familiar with.
Flags: needinfo?(mwu)
Comment 33•11 years ago
|
||
Jed, I can confirm that it's the case.
Is there anything we can do to help you here ? This is really painful for me because the keon is the only phone with moz-central that I have which works correctly on my network otherwise (not to mention dogfooders on keon)
Comment 34•11 years ago
|
||
Also confirmed that Buri doesn't use seccomp.
Comment 35•11 years ago
|
||
Assuming that the current situation is considered untenable, where b2g is confusingly broken on devices that are commonly used for development or dogfooding but aren't part of normal testing, options are:
(1) Disable seccomp entirely until it's fixed. I think we don't want to do that, either.
(2) Whitelist any syscall that plugin-container is observed to use, even if it's not something we want to allow long-term (e.g., unlink), and file a bug to remove it. Also, --enable-content-sandbox-reporter by default on b2g to work around bug 907006, so that any failures will be obvious.
Comment 36•11 years ago
|
||
* who could be a good assignee for (2) ?
* what are the implication of the sandbox reporter ? Is it a good idea to always enable this ? Why is it not always enabled ?
Thanks Jed !
Comment 37•11 years ago
|
||
I have done these things, and I'm filing/adjusting bugs for it, with bug 912791 as the meta-bug.
As for why we don't want the reporter on all the time: I expect it's to reduce the attack surface. If the bad system call is in fact due to a compromise of the child process, it's safer (or, at least, no less safe) to kill the process immediately than to let it continue running in a signal handler. (The underlying assumption would be that we'll have done sufficient testing on this by the time it's released that there will be no false positives in production.)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Updated•11 years ago
|
blocking-b2g: koi? → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•