Closed Bug 821194 Opened 12 years ago Closed 12 years ago

/system/bin/b2g.sh needs to defend against failed critical section during updating

Categories

(Firefox OS Graveyard :: GonkIntegration, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+)

RESOLVED FIXED
B2G C3 (12dec-1jan)
blocking-basecamp +

People

(Reporter: cjones, Assigned: bjacob)

References

Details

(Whiteboard: [b2g-gfx])

Attachments

(4 files, 3 obsolete files)

In updater.cpp, there's a critical section that looks like mv /system/b2g /system/b2g.bak mv /system/b2g.bak/updated /system/b2g This has to be bulletproof. If it fails, we leave behind a brick. This isn't atomic, so we have one known bricking failure mode currently 1, mv /system/b2g /system/b2g.bak 2. [changes are flushed to disk] 3. crash / power loss etc. In this case, we're left with only /system/b2g.bak, and b2g won't start back up. If we were updating a /system/b2g symlink, we could do this atomically. But in the current update algorithm, the only way to fix this is through /system/bin/b2g.sh. On startup, it needs to make a check like if !exists(/system/b2g): mv /system/b2g.bak /system/b2g This blocks using gecko updates. mwu, want to grab?
Assignee: nobody → mwu
Hm this is trickier than just moving b2g.bak. We also need to remount system rw so we can move the directory, and then remount system ro. I don't know of any way to do this within a shell script, so we may need a dedicated program to do this.
Assignee: mwu → bjacob
Whiteboard: [b2g-gfx]
That change wasn't intentional, right?
Assignee: bjacob → mwu
Whiteboard: [b2g-gfx]
(In reply to Michael Wu [:mwu] from comment #2) > That change wasn't intentional, right? It was, thought we could take some load off you, but if you prefer to handle it yourself, understood.
(Understood you were OOO until at least Thursday)
(Happy to try this bug, let me know; hadn't started work on it yet.)
Ah ok. This was marked as b2g-gfx but it's not a graphics bug so I thought the change was a mistake. bjacob, feel free to take this bug. Basically I think we need a program that: 1. Checks for the existence of /system/b2g 2. Quits if it exists, otherwise 3. Remounts the system partition rw 4. Moves /system/b2g.bak to /system/b2g 5. Remounts the system partition ro See http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/updater/automounter_gonk.cpp for how remounting works. This program would need to be run from b2g.sh in gonk-misc. The program can be added to gonk-misc and an appropriate module added to gonk-misc/Android.mk. Then PRODUCT_PACKAGSE in gonk-misc/b2g.mk and build/product/target/b2g.mk needs to have the new module/program added. It's relatively straight forward but I just haven't been able to get to it due to travel.
Thanks for the explanation! Happy if I can save you some time.
Assignee: mwu → bjacob
Right, we're using [b2g-gfx] as a way to track what the graphics team is looking at, rather than just the bugs in graphics. Makes it easier for JP to chase me down :-)
Whiteboard: [b2g-gfx]
Since I don't understand the build system around here very well, and I had to link to automounter_gonk.cpp anyway, I figured that it would be simplest to make a proof of concept directly in the updater binary --- even if in the end that doesn't make any sense as this program must be installed outside of /system/b2g. Usage: > updater --b2g-ftw I did minimal testing that it detects and renames directories correctly, but I just blindly trusted that GonkAutoMounter does its job. For next steps, - where should this program go, both in our source tree and in the device filesystem? - do you have an example to follow for the build-system changes? - what's the plan to link to automounter_gonk.cpp? Just have the source in the same directory?
Attachment #693576 - Flags: feedback?(mwu)
(In reply to Benoit Jacob [:bjacob] from comment #9) > For next steps, > - where should this program go, both in our source tree and in the device > filesystem? > - do you have an example to follow for the build-system changes? See the fakeperm module in https://github.com/mozilla-b2g/gonk-misc/blob/master/Android.mk#L56 for one example of how to put in a new module in the Android build system. It should go into the gonk-misc repo. The module then needs to be listed in PRODUCT_PACKAGES in gonk-misc/b2g.mk and build/product/target/b2g.mk. > - what's the plan to link to automounter_gonk.cpp? Just have the source in > the same directory? We can just have a copy of automounter_gonk in gonk-misc. The only issue is that it's MPL licensed and code that's in gonk-misc should generally be apache 2 licensed.
Thanks! Another question for my local testing. I don't why it worked the first time I tried, but as expected, doing this in adb shell, cd /system mv b2g b2g.bak fails as the partition is read-only. I can't easily unmount it as it's "busy". How can I remount /system read-write and then read-only from adb shell?
"adb remount" makes the system partition readwrite. Dunno if there's a way to make it readonly again.
Update: I now have a stand-alone executable in /system/bin, that works. I need to clean my patch a bit (currently it duplicates more gecko/ code than necessary; if I can use exported Gecko headers I'll be able to make a smaller patch). I've tested it by renaming b2g to b2g.bak, rebooting, and running my program; it just works. Will upload clean patch asap. Question: when this program terminates successfully, we probably want to reboot, right? Should this program take care of rebooting?
I don't think we need to reboot. This program can be called in https://github.com/mozilla-b2g/gonk-misc/blob/master/b2g.sh before attempting to start b2g.
How do you like this approach? Pro: - minimal size patch; - no duplicate files; - no modification to gecko files such as the automounter; - no non-Apache-2 code in gonk-misc (and MPL2 is unambiguously not propagating copyleft through #includes). Con: - #including cpp files from gecko/ into gonk-misc is ugly. I also have alternative patches if you prefer: - either duplicating 3 files, automounter_gonk.* and updatelogging.cpp, into gonk-misc. - or duplicating only automounter_gonk.* and tweaking them to remove the need for updatelogging.*. Also, what name would you like for this program?
Attachment #693576 - Attachment is obsolete: true
Attachment #693576 - Flags: feedback?(mwu)
Attachment #693734 - Flags: feedback?(mwu)
Note: this patch is only gonk-misc; also need this patch to project build/ : $ ./repo diff project build/ diff --git a/target/product/b2g.mk b/target/product/b2g.mk index 7a11bd3..a036032 100644 --- a/target/product/b2g.mk +++ b/target/product/b2g.mk @@ -31,6 +31,7 @@ PRODUCT_PACKAGES := \ MozTT-Regular.ttf \ MozTT-Medium.ttf \ MozTT-Bold.ttf \ + b2gftw \ $(NULL)
Also, regarding this line in the patch: LOCAL_CFLAGS := -D__BEGIN_DECLS='extern "C" {' -D__END_DECLS='}' This is because android_reboot.h, which is included by the automounter code, has __BEGIN_DECLS and __END_DECLS and it isn't clear to me where they are supposed to be defined, but it seemed clear that that was what they were supposed to expand to, from the linking errors that I got.
Figured that the issue I mentioned in my previous comment could be better addressed by #including this system header, which defines __{BEGIN,END}_DECLS.
Attachment #693734 - Attachment is obsolete: true
Attachment #693734 - Flags: feedback?(mwu)
Attachment #693741 - Flags: feedback?(mwu)
Unless there is an objection to the design (see comment 15) this seems ready for review. Renamed to 'recover_from_backup'; removed another unneeded line from Android.mk.
Attachment #693741 - Attachment is obsolete: true
Attachment #693741 - Flags: feedback?(mwu)
Attachment #693889 - Flags: review?(mwu)
The only tricky part, as explained in the comment, is that we must do that _before_ setting LD_PRELOAD or anything else that would add a dependency on /system/b2g. Tested on an Otoro device: - if I rename b2g to b2g.bak and reboot, it successfully recovers and boots - if I rename b2g to something else and reboot, it fails with the expected error messages
Attachment #693956 - Flags: review?(mwu)
Attached patch The build/ part (deleted) — Splinter Review
Trivial but for the sake of completeness. Separate patch because separate git repo.
Attachment #693958 - Flags: review?(mwu)
Attachment #693958 - Flags: review?(mwu) → review+
Comment on attachment 693889 [details] [diff] [review] implementation without code duplication #including .cpp files Review of attachment 693889 [details] [diff] [review]: ----------------------------------------------------------------- Overall, looks fine. Requesting review from dhylands for a sanity check on the recovery_from_backup.cpp implementation. I don't like including those files from gecko, but I can't think of alternatives that are much better. ::: Android.mk @@ +67,5 @@ > +LOCAL_MODULE_CLASS := EXECUTABLES > +LOCAL_SRC_FILES := recover_from_backup.cpp > +LOCAL_C_INCLUDES := $(GECKO_OBJDIR)/dist/include \ > + $(GECKO_OBJDIR)/dist/include/nspr \ > + gecko/toolkit/mozapps/update/common \ s/gecko/$(GECKO_PATH)/ (you may need to move this to make it work)
Attachment #693889 - Flags: review?(mwu)
Attachment #693889 - Flags: review?(dhylands)
Attachment #693889 - Flags: review+
Comment on attachment 693956 [details] [diff] [review] Use it in b2g.sh to actually recover Review of attachment 693956 [details] [diff] [review]: ----------------------------------------------------------------- ::: b2g.sh @@ +10,5 @@ > +# > +# Important: this must be done before we set LD_PRELOAD, as this must run > +# without any dependency on /system/b2g. > +if [ ! -d /system/b2g ]; then > + echo "No /system/b2g directory. Attempting recovery." echo doesn't put any messages in the system log. We should also log to the system log when this happens. I think "log" is the command you'll want to use.
Comment on attachment 693889 [details] [diff] [review] implementation without code duplication #including .cpp files Review of attachment 693889 [details] [diff] [review]: ----------------------------------------------------------------- All of my comments are really nits, so r=me with nits addressed. ::: recover_from_backup.cpp @@ +15,5 @@ > + */ > + > +#include "sys/cdefs.h" // for __BEGIN_DECLS needed under automounter_gonk.cpp > + > +#include "automounter_gonk.cpp" I think its worth a comment explaining why you're #including the .cpp files @@ +16,5 @@ > + > +#include "sys/cdefs.h" // for __BEGIN_DECLS needed under automounter_gonk.cpp > + > +#include "automounter_gonk.cpp" > +#include "updatelogging.cpp" As far as I can tell, the only reason we're including this is because the automounter_gonk.cpp file calls LogFlush. We don't seem to call LogInit anywhere so you might be better off to just #define LogFlush() and remove the #include It might be even better to modify automounter_gonk.cpp to do: #if !defined(LogFlush) #define LogFlush() #endif and an appropriate comment @@ +24,5 @@ > +#include <sys/types.h> > +#include <sys/stat.h> > +#include <unistd.h> > + > +#undef LOG_TAG It's probably worth mentioning that we're #undef ing these because we #included automounter_gonk.cpp @@ +45,5 @@ > +int main() > +{ > + struct stat buf; > + > + if (stat(system_b2g_path, &buf) == 0) { Do we care if /system/b2g isn't a directory? Probably not, since we're only trying to cover off the case that the updater failed, and I'm not sure if we're trying to be super paranoid here or not. @@ +71,5 @@ > + "We're probably going to fail, but we have nothing to lose trying...\n"); > + // continue, we have nothing to lose. > + } > + > + if (rename(system_b2g_backup_path, system_b2g_path) == -1) { You should use != 0 to be consistent with your comparisons to stat above
Attachment #693889 - Flags: review?(dhylands) → review+
(In reply to Michael Wu [:mwu] from comment #23) > Comment on attachment 693956 [details] [diff] [review] > Use it in b2g.sh to actually recover > > Review of attachment 693956 [details] [diff] [review]: > ----------------------------------------------------------------- ...snip... > echo doesn't put any messages in the system log. We should also log to the > system log when this happens. I think "log" is the command you'll want to > use. log -p E "Some message here" will log with error severity.
I think that this could all be done from the script level: if [ ! -d /system/b2g -a -d /system/b2g.bak ]; then log -p W "No /system/b2g directory. Attempting recovery." mount -w -o remount /system mv /system/b2g.bak /system/b2g mount -r -o remount /system if [ -d /system/b2g ]; then log "Recovery successful." else log -p E "Recovery failed." fi fi
Comment on attachment 693956 [details] [diff] [review] Use it in b2g.sh to actually recover Clearing review, waiting for a patch with better logging. Alternately, if dhylands' suggestion about using the mount command works, we should just do that.
Attachment #693956 - Flags: review?(mwu)
Yup, the suggestion in comment 26 works from quickly trying in |adb shell|. Looks like CC'ing Dave on this would have saved a bit of time :-P Turning comment 26 into a patch now.
Works! Tried to add meaningful logging.
Attachment #695018 - Flags: review?(dhylands)
Comment on attachment 695018 [details] [diff] [review] do it all in b2g.sh per comment 26 Review of attachment 695018 [details] [diff] [review]: ----------------------------------------------------------------- Looks reasonable to me
Attachment #695018 - Flags: review?(dhylands) → review+
Comment on attachment 695018 [details] [diff] [review] do it all in b2g.sh per comment 26 Review of attachment 695018 [details] [diff] [review]: ----------------------------------------------------------------- ::: b2g.sh @@ +4,5 @@ > mkdir -p $TMPDIR > chmod 1777 $TMPDIR > > +if [ ! -d /system/b2g ]; then > + And I'd probably remove these blank lines. @@ +11,5 @@ > + if [ -d /system/b2g.bak ]; then > + > + mount -w -o remount /system > + if [ $? -ne 0 ]; then > + log -p E "Failed to remount /system read-write" Actually, a tip here. You can do: if ! mount -w -o remount /system; then deal with error fi
Component: Application Update → Builds
Product: Toolkit → Boot2Gecko
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: