Closed
Bug 821194
Opened 12 years ago
Closed 12 years ago
/system/bin/b2g.sh needs to defend against failed critical section during updating
Categories
(Firefox OS Graveyard :: GonkIntegration, defect, P1)
Tracking
(blocking-basecamp:+)
People
(Reporter: cjones, Assigned: bjacob)
References
Details
(Whiteboard: [b2g-gfx])
Attachments
(4 files, 3 obsolete files)
(deleted),
patch
|
mwu
:
review+
dhylands
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
mwu
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dhylands
:
review+
|
Details | Diff | Splinter Review |
In updater.cpp, there's a critical section that looks like
mv /system/b2g /system/b2g.bak
mv /system/b2g.bak/updated /system/b2g
This has to be bulletproof. If it fails, we leave behind a brick.
This isn't atomic, so we have one known bricking failure mode currently
1, mv /system/b2g /system/b2g.bak
2. [changes are flushed to disk]
3. crash / power loss etc.
In this case, we're left with only /system/b2g.bak, and b2g won't start back up.
If we were updating a /system/b2g symlink, we could do this atomically. But in the current update algorithm, the only way to fix this is through /system/bin/b2g.sh. On startup, it needs to make a check like
if !exists(/system/b2g):
mv /system/b2g.bak /system/b2g
This blocks using gecko updates.
mwu, want to grab?
Updated•12 years ago
|
Assignee: nobody → mwu
Comment 1•12 years ago
|
||
Hm this is trickier than just moving b2g.bak. We also need to remount system rw so we can move the directory, and then remount system ro. I don't know of any way to do this within a shell script, so we may need a dedicated program to do this.
Updated•12 years ago
|
Assignee: mwu → bjacob
Whiteboard: [b2g-gfx]
Comment 2•12 years ago
|
||
That change wasn't intentional, right?
Assignee: bjacob → mwu
Whiteboard: [b2g-gfx]
Comment 3•12 years ago
|
||
(In reply to Michael Wu [:mwu] from comment #2)
> That change wasn't intentional, right?
It was, thought we could take some load off you, but if you prefer to handle it yourself, understood.
Comment 4•12 years ago
|
||
(Understood you were OOO until at least Thursday)
Assignee | ||
Comment 5•12 years ago
|
||
(Happy to try this bug, let me know; hadn't started work on it yet.)
Comment 6•12 years ago
|
||
Ah ok. This was marked as b2g-gfx but it's not a graphics bug so I thought the change was a mistake.
bjacob, feel free to take this bug. Basically I think we need a program that:
1. Checks for the existence of /system/b2g
2. Quits if it exists, otherwise
3. Remounts the system partition rw
4. Moves /system/b2g.bak to /system/b2g
5. Remounts the system partition ro
See http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/updater/automounter_gonk.cpp for how remounting works.
This program would need to be run from b2g.sh in gonk-misc. The program can be added to gonk-misc and an appropriate module added to gonk-misc/Android.mk. Then PRODUCT_PACKAGSE in gonk-misc/b2g.mk and build/product/target/b2g.mk needs to have the new module/program added.
It's relatively straight forward but I just haven't been able to get to it due to travel.
Assignee | ||
Comment 7•12 years ago
|
||
Thanks for the explanation! Happy if I can save you some time.
Updated•12 years ago
|
Assignee: mwu → bjacob
Comment 8•12 years ago
|
||
Right, we're using [b2g-gfx] as a way to track what the graphics team is looking at, rather than just the bugs in graphics. Makes it easier for JP to chase me down :-)
Whiteboard: [b2g-gfx]
Assignee | ||
Comment 9•12 years ago
|
||
Since I don't understand the build system around here very well, and I had to link to automounter_gonk.cpp anyway, I figured that it would be simplest to make a proof of concept directly in the updater binary --- even if in the end that doesn't make any sense as this program must be installed outside of /system/b2g.
Usage:
> updater --b2g-ftw
I did minimal testing that it detects and renames directories correctly, but I just blindly trusted that GonkAutoMounter does its job.
For next steps,
- where should this program go, both in our source tree and in the device filesystem?
- do you have an example to follow for the build-system changes?
- what's the plan to link to automounter_gonk.cpp? Just have the source in the same directory?
Attachment #693576 -
Flags: feedback?(mwu)
Comment 10•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #9)
> For next steps,
> - where should this program go, both in our source tree and in the device
> filesystem?
> - do you have an example to follow for the build-system changes?
See the fakeperm module in https://github.com/mozilla-b2g/gonk-misc/blob/master/Android.mk#L56 for one example of how to put in a new module in the Android build system. It should go into the gonk-misc repo. The module then needs to be listed in PRODUCT_PACKAGES in gonk-misc/b2g.mk and build/product/target/b2g.mk.
> - what's the plan to link to automounter_gonk.cpp? Just have the source in
> the same directory?
We can just have a copy of automounter_gonk in gonk-misc. The only issue is that it's MPL licensed and code that's in gonk-misc should generally be apache 2 licensed.
Assignee | ||
Comment 11•12 years ago
|
||
Thanks! Another question for my local testing. I don't why it worked the first time I tried, but as expected, doing this in adb shell,
cd /system
mv b2g b2g.bak
fails as the partition is read-only. I can't easily unmount it as it's "busy". How can I remount /system read-write and then read-only from adb shell?
Comment 12•12 years ago
|
||
"adb remount" makes the system partition readwrite. Dunno if there's a way to make it readonly again.
Assignee | ||
Comment 13•12 years ago
|
||
Update: I now have a stand-alone executable in /system/bin, that works. I need to clean my patch a bit (currently it duplicates more gecko/ code than necessary; if I can use exported Gecko headers I'll be able to make a smaller patch). I've tested it by renaming b2g to b2g.bak, rebooting, and running my program; it just works. Will upload clean patch asap.
Question:
when this program terminates successfully, we probably want to reboot, right? Should this program take care of rebooting?
Comment 14•12 years ago
|
||
I don't think we need to reboot. This program can be called in https://github.com/mozilla-b2g/gonk-misc/blob/master/b2g.sh before attempting to start b2g.
Assignee | ||
Comment 15•12 years ago
|
||
How do you like this approach?
Pro:
- minimal size patch;
- no duplicate files;
- no modification to gecko files such as the automounter;
- no non-Apache-2 code in gonk-misc (and MPL2 is unambiguously not propagating copyleft through #includes).
Con:
- #including cpp files from gecko/ into gonk-misc is ugly.
I also have alternative patches if you prefer:
- either duplicating 3 files, automounter_gonk.* and updatelogging.cpp, into gonk-misc.
- or duplicating only automounter_gonk.* and tweaking them to remove the need for updatelogging.*.
Also, what name would you like for this program?
Attachment #693576 -
Attachment is obsolete: true
Attachment #693576 -
Flags: feedback?(mwu)
Attachment #693734 -
Flags: feedback?(mwu)
Assignee | ||
Comment 16•12 years ago
|
||
Note: this patch is only gonk-misc; also need this patch to project build/ :
$ ./repo diff
project build/
diff --git a/target/product/b2g.mk b/target/product/b2g.mk
index 7a11bd3..a036032 100644
--- a/target/product/b2g.mk
+++ b/target/product/b2g.mk
@@ -31,6 +31,7 @@ PRODUCT_PACKAGES := \
MozTT-Regular.ttf \
MozTT-Medium.ttf \
MozTT-Bold.ttf \
+ b2gftw \
$(NULL)
Assignee | ||
Comment 17•12 years ago
|
||
Also, regarding this line in the patch:
LOCAL_CFLAGS := -D__BEGIN_DECLS='extern "C" {' -D__END_DECLS='}'
This is because android_reboot.h, which is included by the automounter code, has __BEGIN_DECLS and __END_DECLS and it isn't clear to me where they are supposed to be defined, but it seemed clear that that was what they were supposed to expand to, from the linking errors that I got.
Assignee | ||
Comment 18•12 years ago
|
||
Figured that the issue I mentioned in my previous comment could be better addressed by #including this system header, which defines __{BEGIN,END}_DECLS.
Attachment #693734 -
Attachment is obsolete: true
Attachment #693734 -
Flags: feedback?(mwu)
Attachment #693741 -
Flags: feedback?(mwu)
Assignee | ||
Comment 19•12 years ago
|
||
Unless there is an objection to the design (see comment 15) this seems ready for review. Renamed to 'recover_from_backup'; removed another unneeded line from Android.mk.
Attachment #693741 -
Attachment is obsolete: true
Attachment #693741 -
Flags: feedback?(mwu)
Attachment #693889 -
Flags: review?(mwu)
Assignee | ||
Comment 20•12 years ago
|
||
The only tricky part, as explained in the comment, is that we must do that _before_ setting LD_PRELOAD or anything else that would add a dependency on /system/b2g.
Tested on an Otoro device:
- if I rename b2g to b2g.bak and reboot, it successfully recovers and boots
- if I rename b2g to something else and reboot, it fails with the expected error messages
Attachment #693956 -
Flags: review?(mwu)
Assignee | ||
Comment 21•12 years ago
|
||
Trivial but for the sake of completeness. Separate patch because separate git repo.
Attachment #693958 -
Flags: review?(mwu)
Updated•12 years ago
|
Attachment #693958 -
Flags: review?(mwu) → review+
Comment 22•12 years ago
|
||
Comment on attachment 693889 [details] [diff] [review]
implementation without code duplication #including .cpp files
Review of attachment 693889 [details] [diff] [review]:
-----------------------------------------------------------------
Overall, looks fine. Requesting review from dhylands for a sanity check on the recovery_from_backup.cpp implementation.
I don't like including those files from gecko, but I can't think of alternatives that are much better.
::: Android.mk
@@ +67,5 @@
> +LOCAL_MODULE_CLASS := EXECUTABLES
> +LOCAL_SRC_FILES := recover_from_backup.cpp
> +LOCAL_C_INCLUDES := $(GECKO_OBJDIR)/dist/include \
> + $(GECKO_OBJDIR)/dist/include/nspr \
> + gecko/toolkit/mozapps/update/common \
s/gecko/$(GECKO_PATH)/ (you may need to move this to make it work)
Attachment #693889 -
Flags: review?(mwu)
Attachment #693889 -
Flags: review?(dhylands)
Attachment #693889 -
Flags: review+
Comment 23•12 years ago
|
||
Comment on attachment 693956 [details] [diff] [review]
Use it in b2g.sh to actually recover
Review of attachment 693956 [details] [diff] [review]:
-----------------------------------------------------------------
::: b2g.sh
@@ +10,5 @@
> +#
> +# Important: this must be done before we set LD_PRELOAD, as this must run
> +# without any dependency on /system/b2g.
> +if [ ! -d /system/b2g ]; then
> + echo "No /system/b2g directory. Attempting recovery."
echo doesn't put any messages in the system log. We should also log to the system log when this happens. I think "log" is the command you'll want to use.
Comment 24•12 years ago
|
||
Comment on attachment 693889 [details] [diff] [review]
implementation without code duplication #including .cpp files
Review of attachment 693889 [details] [diff] [review]:
-----------------------------------------------------------------
All of my comments are really nits, so r=me with nits addressed.
::: recover_from_backup.cpp
@@ +15,5 @@
> + */
> +
> +#include "sys/cdefs.h" // for __BEGIN_DECLS needed under automounter_gonk.cpp
> +
> +#include "automounter_gonk.cpp"
I think its worth a comment explaining why you're #including the .cpp files
@@ +16,5 @@
> +
> +#include "sys/cdefs.h" // for __BEGIN_DECLS needed under automounter_gonk.cpp
> +
> +#include "automounter_gonk.cpp"
> +#include "updatelogging.cpp"
As far as I can tell, the only reason we're including this is because the automounter_gonk.cpp file calls LogFlush.
We don't seem to call LogInit anywhere so you might be better off to just #define LogFlush() and remove the #include
It might be even better to modify automounter_gonk.cpp to do:
#if !defined(LogFlush)
#define LogFlush()
#endif
and an appropriate comment
@@ +24,5 @@
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +
> +#undef LOG_TAG
It's probably worth mentioning that we're #undef ing these because we #included automounter_gonk.cpp
@@ +45,5 @@
> +int main()
> +{
> + struct stat buf;
> +
> + if (stat(system_b2g_path, &buf) == 0) {
Do we care if /system/b2g isn't a directory?
Probably not, since we're only trying to cover off the case that the updater failed, and I'm not sure if we're trying to be super paranoid here or not.
@@ +71,5 @@
> + "We're probably going to fail, but we have nothing to lose trying...\n");
> + // continue, we have nothing to lose.
> + }
> +
> + if (rename(system_b2g_backup_path, system_b2g_path) == -1) {
You should use != 0 to be consistent with your comparisons to stat above
Attachment #693889 -
Flags: review?(dhylands) → review+
Comment 25•12 years ago
|
||
(In reply to Michael Wu [:mwu] from comment #23)
> Comment on attachment 693956 [details] [diff] [review]
> Use it in b2g.sh to actually recover
>
> Review of attachment 693956 [details] [diff] [review]:
> -----------------------------------------------------------------
...snip...
> echo doesn't put any messages in the system log. We should also log to the
> system log when this happens. I think "log" is the command you'll want to
> use.
log -p E "Some message here"
will log with error severity.
Comment 26•12 years ago
|
||
I think that this could all be done from the script level:
if [ ! -d /system/b2g -a -d /system/b2g.bak ]; then
log -p W "No /system/b2g directory. Attempting recovery."
mount -w -o remount /system
mv /system/b2g.bak /system/b2g
mount -r -o remount /system
if [ -d /system/b2g ]; then
log "Recovery successful."
else
log -p E "Recovery failed."
fi
fi
Comment 27•12 years ago
|
||
Comment on attachment 693956 [details] [diff] [review]
Use it in b2g.sh to actually recover
Clearing review, waiting for a patch with better logging.
Alternately, if dhylands' suggestion about using the mount command works, we should just do that.
Attachment #693956 -
Flags: review?(mwu)
Assignee | ||
Comment 28•12 years ago
|
||
Yup, the suggestion in comment 26 works from quickly trying in |adb shell|. Looks like CC'ing Dave on this would have saved a bit of time :-P Turning comment 26 into a patch now.
Assignee | ||
Comment 29•12 years ago
|
||
Works!
Tried to add meaningful logging.
Attachment #695018 -
Flags: review?(dhylands)
Comment 30•12 years ago
|
||
Comment on attachment 695018 [details] [diff] [review]
do it all in b2g.sh per comment 26
Review of attachment 695018 [details] [diff] [review]:
-----------------------------------------------------------------
Looks reasonable to me
Attachment #695018 -
Flags: review?(dhylands) → review+
Comment 31•12 years ago
|
||
Comment on attachment 695018 [details] [diff] [review]
do it all in b2g.sh per comment 26
Review of attachment 695018 [details] [diff] [review]:
-----------------------------------------------------------------
::: b2g.sh
@@ +4,5 @@
> mkdir -p $TMPDIR
> chmod 1777 $TMPDIR
>
> +if [ ! -d /system/b2g ]; then
> +
And I'd probably remove these blank lines.
@@ +11,5 @@
> + if [ -d /system/b2g.bak ]; then
> +
> + mount -w -o remount /system
> + if [ $? -ne 0 ]; then
> + log -p E "Failed to remount /system read-write"
Actually, a tip here. You can do:
if ! mount -w -o remount /system; then
deal with error
fi
Assignee | ||
Comment 32•12 years ago
|
||
Tested, works, pull request:
https://github.com/mozilla-b2g/gonk-misc/pull/63/files
Comment 33•12 years ago
|
||
Component: Application Update → Builds
Product: Toolkit → Boot2Gecko
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•