Closed Bug 857633 Opened 12 years ago Closed 12 years ago

[Leo] OTA updates will restart into an Unusable blank white screen

Categories

(Firefox OS Graveyard :: GonkIntegration, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:leo+, b2g18 fixed)

RESOLVED FIXED
blocking-b2g leo+
Tracking Status
b2g18 --- fixed

People

(Reporter: tchung, Assigned: leo.bugzilla.gecko)

References

()

Details

Attachments

(1 file)

Getting this bug on file first, and debugging more later. I updated a nightly leo build from 20130401070203 to 20130403070204, and got rebooted into a white screen. What happened? Doenst affect Unagi nightly updates. Also, this is using Moz Ril only. Will try to reproduce with a logcat. Repro: 1) install leo nightly build from apr 1: Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/f9f11b8cbf8a Gaia 663101b6eb809383e5882d9bc3868a923a57998a BuildID 20130401070203 Version 18.0 2) OTA update to: Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/d467369d1b0c Gaia 06e0e5ce42bdfb62bdbe38271de6b5b2d9e40e75 BuildID 20130403070204 Version 18.0 3) click restart and apply update 4) Verify screen comes up white, and nothing else you can do Expected: - screen comes up correctly with icons and stuff Actual: - update reboots into white screen.
Dave, any idea of what's happening here (or a better assignee?)
Assignee: nobody → dhylands
blocking-b2g: leo? → leo+
Not sure without more information. We'd need to check and see why it isn't starting. Having a logcat leading up to the white screen would be useful. If you can adb shell into the phone, then getting the output of the following would also be useful: adb shell stop b2g /system/bin/b2g.sh I'm swamped right now, and wouldn't be able to look at this for a while. Perhaps try marshall.
Assignee: dhylands → marshall
(In reply to Dave Hylands [:dhylands] from comment #2) > Not sure without more information. > > We'd need to check and see why it isn't starting. > > Having a logcat leading up to the white screen would be useful. > If you can adb shell into the phone, then getting the output of the > following would also be useful: > > adb shell > stop b2g > /system/bin/b2g.sh > > > I'm swamped right now, and wouldn't be able to look at this for a while. > Perhaps try marshall. Okay here's a logcat snippet. reproducible, shows white screen after you "Apply update" 04-05 09:21:57.449: I/Gecko(131): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv 04-05 09:21:57.449: I/Gecko(131): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv 04-05 09:21:57.449: I/Gecko(131): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv 04-05 09:21:57.459: I/Gecko(131): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv 04-05 09:21:57.549: I/IdleService(131): Remove idle observer 488fc130 (600 seconds) 04-05 09:21:57.589: I/Gonk(131): no window to draw, bailing i'll attach the full logcat now.
Attached file logcat (deleted) —
More information: if i hand power-cycle the phone, it will boot up with the new build that i had just OTA'd. so the problem lies within the whitescreen and not auto rebooting. I feel a little better knowing the OTA mechanism is working when manually rebooting; but the whitescreen issue still exists; no auto reboot.
Interesting. That's extremely useful to know. It suggests that something is OOMing around the time that the update is applied. So, it would be useful to get a more complete logcat output, preferably everything from the time you start the update until the white screen. If you can still get adb access to the phone when its in the white screen, it would be good to get the output of dmesg: adb shell dmesg > /data/local/tmp/dmesg.txt and then adb pull /data/local/tmp/dmesg.txt and also, the output of: adb shell b2g-ps around the time that you start the update.
I took a look at the attached logcat (sorry I was just reading what was posted in comment 3 earlier). It also looks like the logcat came from two different devices and was appended together? There's a big timestamp jump about half way through. So the relevant portion (from the end) seems to be: 04-05 09:21:57.229: I/Gecko(131): UpdatePrompt: Update downloaded, restarting to apply it 04-05 09:21:57.229: I/Gecko(131): Subprocesses are still alive. Doing emergency join. 04-05 09:21:57.249: V/AudioPolicyManagerBase(142): releaseOutput() 1 04-05 09:21:57.319: I/Gecko(131): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv 04-05 09:21:57.319: I/Gecko(131): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv So it looks like something about the restart failed. With my unagi, I don't normally see the "Subprocesses are still alive. Doing emergency join." message.
(In reply to Dave Hylands [:dhylands] from comment #7) > I took a look at the attached logcat (sorry I was just reading what was > posted in comment 3 earlier). > > It also looks like the logcat came from two different devices and was > appended together? There's a big timestamp jump about half way through. > > So the relevant portion (from the end) seems to be: > > 04-05 09:21:57.229: I/Gecko(131): UpdatePrompt: Update downloaded, > restarting to apply it > 04-05 09:21:57.229: I/Gecko(131): Subprocesses are still alive. Doing > emergency join. > 04-05 09:21:57.249: V/AudioPolicyManagerBase(142): releaseOutput() 1 > 04-05 09:21:57.319: I/Gecko(131): ###!!! [Parent][AsyncChannel] Error: > Channel error: cannot send/recv > 04-05 09:21:57.319: I/Gecko(131): ###!!! [Parent][AsyncChannel] Error: > Channel error: cannot send/recv > > So it looks like something about the restart failed. With my unagi, I don't > normally see the "Subprocesses are still alive. Doing emergency join." > message. ignore the timestamp jump, mwu uncovered to me that i probably merged something else. but anything starting at the 9:14 mark was recording when i started manually forcing the OTA check until the whitescreen.
OK, UpdatePrompt calls PowerManagerService::Restart https://mxr.mozilla.org/mozilla-central/source/dom/power/PowerManagerService.cpp#138 The emergency join message comes from: ContentParent::JoinAllSubprocesses() so that means that: StartForceQuitWatchdog(eHalShutdownMode_Restart, mWatchdogTimeoutSecs); isn't working properly on leo.
(In reply to Dave Hylands [:dhylands] from comment #9) > OK, UpdatePrompt calls PowerManagerService::Restart > https://mxr.mozilla.org/mozilla-central/source/dom/power/PowerManagerService. > cpp#138 > > The emergency join message comes from: ContentParent::JoinAllSubprocesses() > so that means that: > > StartForceQuitWatchdog(eHalShutdownMode_Restart, mWatchdogTimeoutSecs); > > isn't working properly on leo. Dave, you still need me to try and reproduce again, and get you that data from comment 6?
(In reply to Dave Hylands [:dhylands] from comment #9) > OK, UpdatePrompt calls PowerManagerService::Restart > https://mxr.mozilla.org/mozilla-central/source/dom/power/PowerManagerService. > cpp#138 > > The emergency join message comes from: ContentParent::JoinAllSubprocesses() > so that means that: > > StartForceQuitWatchdog(eHalShutdownMode_Restart, mWatchdogTimeoutSecs); > > isn't working properly on leo. Nah - I think that's not right. The watchdog timer defaults to 5 seconds. I may be misremembering not seeing the emergency join message.
(In reply to Tony Chung [:tchung] from comment #10) > (In reply to Dave Hylands [:dhylands] from comment #9) > Dave, you still need me to try and reproduce again, and get you that data > from comment 6? I think that we're good for now.
I think that there is something wrong with the leo startup scripts/init or something. On my unagi: adb shell b2g-ps kill PID (where PID is PID of b2g process) b2g will automatically restart On my leo phone, b2g doesn't restart.
(In reply to Dave Hylands [:dhylands] from comment #13) > I think that there is something wrong with the leo startup scripts/init or > something. > > On my unagi: > > adb shell > b2g-ps > kill PID (where PID is PID of b2g process) > > b2g will automatically restart > > On my leo phone, b2g doesn't restart. Any update on this? i can still reproduce this 100%. i ran b2g-ps, and there's no PID shown when it whitescreens. shell@android:/ $ b2g-ps APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME shell@android:/ $
I think its an issue with the leo phone setup and we should push back to the vendor.
Can someone from Leo's gecko team take a look at this? Not restarting gecko properly leaves the whitescreen problem.
Flags: needinfo?(leo.bugzilla.gecko)
it is caused there are the different Leo's RIL and Mozilla's RIL. leo's source didn't taken by Mozilla's GIT. so it make a difference of RIL. please renames this file like this. $> adb shell mv /system/b2g/distribution/bundles/libqc_b2g_ril/chrome.manifest /system/b2g/distribution/bundles/libqc_b2g_ril/chrome.manifest_ it will be help for mozilla teams.
Flags: needinfo?(leo.bugzilla.gecko)
(In reply to leo.bugzilla.gecko from comment #17) > it is caused there are the different Leo's RIL and Mozilla's RIL. > leo's source didn't taken by Mozilla's GIT. so it make a difference of RIL. > > please renames this file like this. > $> adb shell mv > /system/b2g/distribution/bundles/libqc_b2g_ril/chrome.manifest > /system/b2g/distribution/bundles/libqc_b2g_ril/chrome.manifest_ > > it will be help for mozilla teams. I just talked to Jermey to explain the problem. As we get the leo phone, it does, in fact, contain the above dsitribution directory, and if you just take the phone and flash gecko only, then you need to perform the steps in comment 17 in order to get b2g to start at all. However, if you build the entire system.img using our build, then we no longer have the distribution directory at all, so comment 17 doesn't apply. The real issue is that init isn't restarting b2g if b2g is killed (as per comment 13). Our OTA update causes b2g to kill itself, and expects android to restart b2g in order to start running the updated b2g.
ok, i misunderstood just a little bit. currently, leo has a some problems (we just found last weeks). the root cause is leo has a many init deamon. (exactly, 2~3). we try to inspect our init procedure. i will inform to mozilla team after fixing.
Assignee: marshall → leo.bugzilla.gecko
LEO released new version on 25-APR(v07e). and Before Leo's has a some problems that was made serveral init deamon on device. so, after OTA or after some crash, b2g process never start up by valid init deamon. new version(v07e) was solved problems. it will be notify to tony and dave through our system. if u have any problems. please notify to me. (kimminsoo.75@gmail.com)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Flags: in-moztrap?
UCID: sys-upd-001
Flags: in-moztrap? → in-moztrap+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: