Closed Bug 1070442 Opened 10 years ago Closed 10 years ago

Support building gaia apps in parallel

Categories

(Firefox OS Graveyard :: Gaia::Build, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mwu, Assigned: rickychien)

References

Details

Attachments

(5 files)

No reason to build one app at a time. Simply allowing apps to be built in parallel allows us to do a build in half the time on a 4 core cpu. It slows down single threaded builds, but it's worth it since nobody is building on a single core computer.
Had to rewrite some rules to make things parallel safe, but everything seems to work here. I see a consistent 2x speed up when doing -j8.
Attachment #8492540 - Flags: review?(yurenju)
Comment on attachment 8492540 [details] [PR] Support building gaia apps in parallel :mwu, since we are migrating build script from Makefile to xpcshell scripts, I don't expect we should implement more features in Makefile. for parallel building apps, I think we can do that in xpcshell script by chrome worker[1]. also needinfo Alex since he may use chrome worker before. [1] https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Using_workers_in_JavaScript_code_modules
Attachment #8492540 - Flags: review?(yurenju)
Flags: needinfo?(poirot.alex)
Yes, we have been migrating most of the build logic to JS over the past months and this changeset goes backward in this process. We have as goal to be able to build gaia from Firefox, without requiring to have make or any special environment. But the feature itself is meant to be implemented in JS. We used to build in parralel in the past, we should redo the same in JS.
Flags: needinfo?(poirot.alex)
Please balance the needs of "everything should run in firefox" with "this should not be stupidly slow to build gaia". We need parallel app builds, and proper dependency support to not rebuild the world every time, all things we lost when moving to not use Make.
The bottom line I think is the mwu patch helps *now*. Whatever you refactor, make it as good.
I couldn't agree more on this and should have been more strict during this refactoring... But given the current state of the build system I think there is more to win by landing bug 997004 rather than this patch. Note that there is a subtancial amount of time spent in spawning multiple instances of xpcshell, we have seen a significant win in running only once for all apps. I tried to evaluate this patch but got exceptions while trying to run it: run-js-command gaia/post-app Exception: [Exception... "Component returned failure code: 0x80520006 (NS_ERROR_FILE_TARGET_DOES_NOT_EXIST) [nsIFile.copyTo]" nsresult: "0x80520006 (NS_ERROR_FILE_TARGET_DOES_NOT_EXIST)" location: "JS frame :: resource://gre/modules/commonjs/toolkit/loader.js -> file:///home/alex/gaia/build/copy-build-stage-data.js :: copySettingsJStoProfile :: line 85" data: no] copySettingsJStoProfile@resource://gre/modules/commonjs/toolkit/loader.js -> file:///home/alex/gaia/build/copy-build-stage-data.js:85:2 execute@resource://gre/modules/commonjs/toolkit/loader.js -> file:///home/alex/gaia/build/copy-build-stage-data.js:102:4 execute@resource://gre/modules/commonjs/toolkit/loader.js -> file:///home/alex/gaia/build/post-app.js:11:2 CommonjsRunner.prototype.run@/home/alex/gaia/build/xpcshell-commonjs.js:109:4 run@/home/alex/gaia/build/xpcshell-commonjs.js:124:2 @-e:1:0
Hey! Any progress here?
Flags: needinfo?(mwu)
Alex, do you have an example for xpcshell thread? I'm interesting in this bug. Thanks!
Flags: needinfo?(poirot.alex)
We mentioned with Yuren ChromeWorker in the past, but the issue is that you can't use XPCOM in them, so that you can't read'n write files... That's not handy for a build system script! https://developer.mozilla.org/en-US/docs/Web/API/ChromeWorker
Flags: needinfo?(poirot.alex)
Attached patch parallel app build scripts (deleted) — Splinter Review
Here is a quick'n dirty patch to do execute each app build script in parallel. It isn't as parallel as makefile as it won't try to paralellize everything, but just app build script. For ex, it won't try to copy all naive apps that don't have any build script in parallel. This patch allows to save only 4s on my laptop (32 to 28). I'm not sure the x2 speed win described in comment 1 is real, the patch in attachment 8492540 [details] throws during post-app and stops the build process sooner. Ricky, as I said during the meeting, I'm not sure parallel building is the key work regarding performances. Bugs like bug 969215 are way more important for ex. I looked at the current state of the build system. That would be great to investigate in each step and see why it takes so long: pre-app takes 2s (isn't that much compared to other steps, but still pre-app are very basic stuff) app - 13s (that looks quite fast, I would have expected way more!) post-app - 10s (that looks like a lot, and will be executed each time you call $make!) Within post-app, the most consuming steps are: media - 1s optimize - 2s zip - 7s (there must be something wrong here!) So, feel free to take over this patch. But I would also suggest you to look at why webapp-zip takes so much time and investigate here and there if we can optimize the running speed of each step!
Currently, webapp-zip pick file into zip one by one, I think there should be a better way to compress just one time.
Cool! Thanks you alex. After I applied you patch, some errors occurred in my mac at JSON.parse [1]. However, I fixed it by giving a empty object to manifest. Surprised that it saves lots of time in my mac reducing from 33s to 14s. What a huge performance improvement! [1] https://github.com/RickyChien/gaia/blob/3c901411771b4854c07c300ebcded0a793c625ed/build/webapp-optimize.js#L253
Attached file diff-build.sh (deleted) —
Ricky, can you use this script to verify your profile is still correctly built? If the build time to really short, it may throw somewhere and just ignore a part of the build system... This script allows to compare the profile built from two gaia revision and allow to spot any regression easily! You have to pass two git revision, one with your modification and one just before: $ cd gaia $ git log $ ./diff-build.sh edbce405a3dc087520df73207e204dc9a8d0f803 1910ed84c3aeb90f7a125d124adc3876e259c760 (on my branch, these changesets are going to test my parallel build patch)
If I switch to |GAIA_OPTIMIZE=1 PRODUCTION=1 make| in diff-build.sh, it got a diff at webapps/webapps.json since these "installTime" and "updateTime" are different. However, I'm unable to build with |LOCALE_BASEDIR ...|. Because another fail occurred.
(In reply to Ricky Chien [:rickychien] from comment #14) > If I switch to |GAIA_OPTIMIZE=1 PRODUCTION=1 make| in diff-build.sh, it got > a diff at webapps/webapps.json since these "installTime" and "updateTime" > are different. Yes, differences in webapps.json are expected. I tried to submit a patch to make it reproducible, see bug 1020259.
Sorry I mess up the code in your patch, my result is same as yours.
Attached file Gaia PR (deleted) —
Current status: I moved all post-app parts to each app in order to speed up parallel build. Most of changes are pretty dirty and it still has errors in patch. Errors caused incomplete profile/webapps/<app> that still have to spend some time investigating the reason.
Thanks for looking at this Ricky. However, Makefile integration is still necessary to some degree. For builds driven by make, you want to take advantage of the jobserver. This ensures that when gaia is being built as part of a complete system, the gaia build system doesn't use more resources than allocated for building.
Flags: needinfo?(mwu)
Hi Michael, if you don't mind, I'd like to take this bug. :)
Flags: needinfo?(mwu)
Hi Doug Sherk, I ran into a bug in incorrect build by |make APP=callscreen|. If you look at build_stage/callscreen/index.html L80, aria-label="Hang Up" attribute doesn't show. But running |make| can make it wonderful. I guess that extracting callscreen from communications but loading resources from ../communications/dialer/ without maintaining own locales caused non-localized files. I think it isn't a good way so that build system cannot perform properly. Parallel build manages to build each app singly. Therefore, it may occur such errors. I want to make sure this problem and hope we can maintain the locales folder in callscreen. If that is reasonable, I'm going to file a bug for this. :)
Flags: needinfo?(drs.bugzilla)
No need to ask - you've already taken it. I'll file another bug for better integration with make at a later point if you don't do it here.
Assignee: mwu → ricky060709
Flags: needinfo?(mwu)
Depends on: 1091348
Flags: needinfo?(drs.bugzilla)
Currently, we can reduce build time approximately 10s from 32s to 22s.
Comment on attachment 8512697 [details] Gaia PR Updated PR and set r? to gduan.
Attachment #8512697 - Attachment description: Gaia PR WIP → Gaia PR
Attachment #8512697 - Flags: review?(gduan)
Comment on attachment 8512697 [details] Gaia PR Good job! I left some comments in github. I assume you have compare file list and contents with/without your patch, including normal make, GAIA_OPTIMIZE=1 and please also verify it with LOCALE_BASEDIR in case. And I'm also wondering we should turn parallel build as default.
Attachment #8512697 - Flags: review?(gduan)
LOCALE_BASEDIR case result: Files 353ed10/webapps/settings.gaiamobile.org/application.unzip/resources/gaia_commit.txt and 0888735/webapps/settings.gaiamobile.org/application.unzip/resources/gaia_commit.txt differ Files 353ed10/webapps/webapps.json and 0888735/webapps/webapps.json differ Only in 0888735/webapps: {03ec926d-c463-5f45-8ae0-55e5467ecf83} Only in 353ed10/webapps: {5c6e3130-6625-5549-af98-62aaf6d88bb4} Only in 353ed10/webapps: {60c8641d-fa5c-b54f-a685-f050f0c1be3c} Only in 0888735/webapps: {8292f5a2-7613-9249-b086-9dc0f47a088f} all differences seem to be insignificant.
Attachment #8512697 - Flags: review?(gduan)
Status: NEW → ASSIGNED
PR updated! It included parallel build flag P=1 for switching parallel build and added a workaround for callscreen in order to fix unit test failures.
Comment on attachment 8512697 [details] Gaia PR Hi Douglas, Ricky has done some workaround in callscreen's app. could we have your feedback on it?
Attachment #8512697 - Flags: feedback?(drs.bugzilla)
Depends on: 1093267
Comment on attachment 8512697 [details] Gaia PR For starters, since this change impacts the Dialer component, I should be providing a full review and not just feedback, unless you don't think that this is ready for that yet. It looks like you copied and pasted all of the locales files that are currently in ./apps/communications/dialer/locales/ to ./apps/callscreen/locales/. Unfortunately, I don't think that this is a suitable fix as it really hurts our maintainability. We may need to repeat some strings in some cases due to differences in context, but in general, we should be sharing these strings instead. Ideally, we would move all shared strings into ./shared/locales/dialer/, and have all single-app strings in their app's locales folder. However, figuring out which strings are being shared or not is quite an undertaking and hasn't been worth the effort for us up until now. Francesco, can you verify this and provide any insight here? The structure of the locales files here has bitten us a few times recently, so I'll put fixing this on our roadmap. I've filed bug 1093267 for this. I want to get to it next sprint, but we may not be able to. I would suggest going with a fix that doesn't require the string changes for now, and waiting for bug 1093267 to land before going with this patch.
Flags: needinfo?(francesco.lodolo)
Attachment #8512697 - Flags: feedback?(drs.bugzilla) → review-
I completely agree, bug 1093267 is the correct way to approach this issue. I also confess that I don't understand the apps/callscreen/locales/shared.en-US.properties shared part: en-US has 1 string, all the other files have ~20 (I'm really looking forward to getting rid of the extra locales).
Flags: needinfo?(francesco.lodolo)
Comment on attachment 8512697 [details] Gaia PR I think we still have two things to close this bug. 1. parse appOptions as string as Alex's suggestion, instead of use env. 2. workaround for comment 31: we probably need to bypass communications app when opening new process for parallel build, and remove it once bug 1093267 is closed.
Attachment #8512697 - Flags: review?(gduan)
I think this patch is good to land since this issue mentioned on comment 20 was not only occurred in this patch but also in master over a long period of time and parallel build feature will enable by giving |make P=1| so that it seems not to impact on original build behavior. If you're ok, we're going to remove dependencies (bug 1091348 & bug 1093267) and land it. :)
Flags: needinfo?(drs.bugzilla)
Thanks, I didn't realize that I had filed a bug for this already, so I reso dup'd bug 1091348 to bug 1093267. There's really only one issue here. As for landing this, we can't, as this makes the situation with the Callscreen worse when using |make install-gaia|. Previously, |APP=callscreen make install-gaia| was broken but |make install-gaia| worked. With this patch, both are broken. We will have to find a workaround for now, or wait for bug 1093267 to be fixed.
No longer depends on: 1091348
Flags: needinfo?(drs.bugzilla)
Comment on attachment 8512697 [details] Gaia PR After offline discuss with George. We decided to add a workaround (it will be removed after bug1093267 is landed) for callscreen in build script for ensuring we do not break multilocale on callscreen. PR has been updated! I'm sure callscreen work properly with new patch. Please try |make| again.
Attachment #8512697 - Flags: review?(gduan)
Attachment #8512697 - Flags: review-
Attachment #8512697 - Flags: feedback?(drs.bugzilla)
Comment on attachment 8512697 [details] Gaia PR This seems fine to me. Thanks for adding the workaround. I'd suggest dropping the dependency on bug 1093267 here, filing a new bug in Gaia::Build to remove the workaround introduced here, and making it dependent on bug 1093267.
Attachment #8512697 - Flags: feedback?(drs.bugzilla) → feedback+
Comment on attachment 8512697 [details] Gaia PR Great work! r=gduan
Attachment #8512697 - Flags: review?(gduan) → review+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
So I think what happen is that multilocale is colliding when run in parrallel against multiple apps. Because of shared locales. It is surprising we only get an exception on windows, but I'm expecting weird things to happen on other platforms as well. We should be more careful about shared locales handling within build/multilocale.js. First we shouldn't cleanup shared locales on each app call to multilocale. Only once, after all apps are built. Or may be we shouldn't clean them up at all? And just prevent shipping them in zip file?? Then I imagine we should also tweak getL10nRessources in order to fetch already computed shared locale and prevent overriding them.
But our patch disables parallel build by default, it shouldn't break. George found bug 1046514 that maybe relate to here. nsIFile.exists() and nsIFiles.isDirectory() work improperly on windows so that we remove a non-exist dir or file.
This bug is always reproducible on Windows. Moreover, we can wrap try catch in every risk API.
Attached file Gaia PR (deleted) —
Attachment #8519891 - Flags: review?(gduan)
There are some failures arose on windows because of nsIFile.remove behaviors. However, I cannot find out any reason after investigating all day. So I currently remove part of build_stage clean behaviors ex: cleanLocaleFiles() and clean-build-files.js and make sure we still can package correct files into zip. I didn't seen that such errors happen again on windows and also ./diff-build.sh to check differences are correct.
(In reply to Ricky Chien [:rickychien] from comment #43) > But our patch disables parallel build by default, it shouldn't break. > > George found bug 1046514 that maybe relate to here. nsIFile.exists() and > nsIFiles.isDirectory() work improperly on windows so that we remove a > non-exist dir or file. The error we meet is actually NS_ERROR_FILE_ACCESS_DENIED when calling nsiFile.isDirectory().
Comment on attachment 8519891 [details] Gaia PR r=gduan, I think it's a right way to filter those non-necessary files to zip instead of wasting time to remove them. Ricky has compared file list and contents with/without his patch and most of it is ok. Some locales.properties files it doesn't actually remove when GAIA_CONCAT_LOCALES=1, I think we should file a follow-up bug to handle it and check why the locales lists are different bet linux and mac on master branch.
Attachment #8519891 - Flags: review?(gduan) → review+
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: