Closed
Bug 1070442
Opened 10 years ago
Closed 10 years ago
Support building gaia apps in parallel
Categories
(Firefox OS Graveyard :: Gaia::Build, defect)
Firefox OS Graveyard
Gaia::Build
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mwu, Assigned: rickychien)
References
Details
Attachments
(5 files)
No reason to build one app at a time. Simply allowing apps to be built in parallel allows us to do a build in half the time on a 4 core cpu. It slows down single threaded builds, but it's worth it since nobody is building on a single core computer.
Reporter | ||
Comment 1•10 years ago
|
||
Had to rewrite some rules to make things parallel safe, but everything seems to work here. I see a consistent 2x speed up when doing -j8.
Attachment #8492540 -
Flags: review?(yurenju)
Comment 2•10 years ago
|
||
Comment on attachment 8492540 [details]
[PR] Support building gaia apps in parallel
:mwu,
since we are migrating build script from Makefile to xpcshell scripts, I don't expect we should implement more features in Makefile.
for parallel building apps, I think we can do that in xpcshell script by chrome worker[1].
also needinfo Alex since he may use chrome worker before.
[1] https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Using_workers_in_JavaScript_code_modules
Attachment #8492540 -
Flags: review?(yurenju)
Flags: needinfo?(poirot.alex)
Comment 3•10 years ago
|
||
Yes, we have been migrating most of the build logic to JS over the past months and this changeset goes backward in this process.
We have as goal to be able to build gaia from Firefox, without requiring to have make or any special environment.
But the feature itself is meant to be implemented in JS.
We used to build in parralel in the past, we should redo the same in JS.
Flags: needinfo?(poirot.alex)
Comment 4•10 years ago
|
||
Please balance the needs of "everything should run in firefox" with "this should not be stupidly slow to build gaia". We need parallel app builds, and proper dependency support to not rebuild the world every time, all things we lost when moving to not use Make.
Comment 5•10 years ago
|
||
The bottom line I think is the mwu patch helps *now*. Whatever you refactor, make it as good.
Comment 6•10 years ago
|
||
I couldn't agree more on this and should have been more strict during this refactoring...
But given the current state of the build system I think there is more to win by landing bug 997004
rather than this patch.
Note that there is a subtancial amount of time spent in spawning multiple instances of xpcshell,
we have seen a significant win in running only once for all apps.
I tried to evaluate this patch but got exceptions while trying to run it:
run-js-command gaia/post-app
Exception: [Exception... "Component returned failure code: 0x80520006 (NS_ERROR_FILE_TARGET_DOES_NOT_EXIST) [nsIFile.copyTo]" nsresult: "0x80520006 (NS_ERROR_FILE_TARGET_DOES_NOT_EXIST)" location: "JS frame :: resource://gre/modules/commonjs/toolkit/loader.js -> file:///home/alex/gaia/build/copy-build-stage-data.js :: copySettingsJStoProfile :: line 85" data: no]
copySettingsJStoProfile@resource://gre/modules/commonjs/toolkit/loader.js -> file:///home/alex/gaia/build/copy-build-stage-data.js:85:2
execute@resource://gre/modules/commonjs/toolkit/loader.js -> file:///home/alex/gaia/build/copy-build-stage-data.js:102:4
execute@resource://gre/modules/commonjs/toolkit/loader.js -> file:///home/alex/gaia/build/post-app.js:11:2
CommonjsRunner.prototype.run@/home/alex/gaia/build/xpcshell-commonjs.js:109:4
run@/home/alex/gaia/build/xpcshell-commonjs.js:124:2
@-e:1:0
Assignee | ||
Comment 8•10 years ago
|
||
Alex, do you have an example for xpcshell thread? I'm interesting in this bug. Thanks!
Flags: needinfo?(poirot.alex)
Comment 9•10 years ago
|
||
We mentioned with Yuren ChromeWorker in the past, but the issue is that you can't use XPCOM in them,
so that you can't read'n write files... That's not handy for a build system script!
https://developer.mozilla.org/en-US/docs/Web/API/ChromeWorker
Flags: needinfo?(poirot.alex)
Comment 10•10 years ago
|
||
Here is a quick'n dirty patch to do execute each app build script in parallel.
It isn't as parallel as makefile as it won't try to paralellize everything,
but just app build script. For ex, it won't try to copy all naive apps
that don't have any build script in parallel.
This patch allows to save only 4s on my laptop (32 to 28).
I'm not sure the x2 speed win described in comment 1 is real,
the patch in attachment 8492540 [details] throws during post-app and stops the build process sooner.
Ricky, as I said during the meeting, I'm not sure parallel building is
the key work regarding performances. Bugs like bug 969215 are way more important for ex.
I looked at the current state of the build system.
That would be great to investigate in each step and see why it takes so long:
pre-app takes 2s (isn't that much compared to other steps, but still pre-app are very basic stuff)
app - 13s (that looks quite fast, I would have expected way more!)
post-app - 10s (that looks like a lot, and will be executed each time you call $make!)
Within post-app, the most consuming steps are:
media - 1s
optimize - 2s
zip - 7s (there must be something wrong here!)
So, feel free to take over this patch.
But I would also suggest you to look at why webapp-zip takes so much time
and investigate here and there if we can optimize the running speed of each step!
Comment 11•10 years ago
|
||
Currently, webapp-zip pick file into zip one by one, I think there should be a better way to compress just one time.
Assignee | ||
Comment 12•10 years ago
|
||
Cool! Thanks you alex.
After I applied you patch, some errors occurred in my mac at JSON.parse [1]. However, I fixed it by giving a empty object to manifest.
Surprised that it saves lots of time in my mac reducing from 33s to 14s. What a huge performance improvement!
[1] https://github.com/RickyChien/gaia/blob/3c901411771b4854c07c300ebcded0a793c625ed/build/webapp-optimize.js#L253
Comment 13•10 years ago
|
||
Ricky, can you use this script to verify your profile is still correctly built?
If the build time to really short, it may throw somewhere and just ignore a part of the build system...
This script allows to compare the profile built from two gaia revision and allow to spot any regression easily!
You have to pass two git revision, one with your modification and one just before:
$ cd gaia
$ git log
$ ./diff-build.sh edbce405a3dc087520df73207e204dc9a8d0f803 1910ed84c3aeb90f7a125d124adc3876e259c760
(on my branch, these changesets are going to test my parallel build patch)
Assignee | ||
Comment 14•10 years ago
|
||
If I switch to |GAIA_OPTIMIZE=1 PRODUCTION=1 make| in diff-build.sh, it got a diff at webapps/webapps.json since these "installTime" and "updateTime" are different.
However, I'm unable to build with |LOCALE_BASEDIR ...|. Because another fail occurred.
Comment 15•10 years ago
|
||
(In reply to Ricky Chien [:rickychien] from comment #14)
> If I switch to |GAIA_OPTIMIZE=1 PRODUCTION=1 make| in diff-build.sh, it got
> a diff at webapps/webapps.json since these "installTime" and "updateTime"
> are different.
Yes, differences in webapps.json are expected.
I tried to submit a patch to make it reproducible, see bug 1020259.
Assignee | ||
Comment 16•10 years ago
|
||
Sorry I mess up the code in your patch, my result is same as yours.
Assignee | ||
Comment 17•10 years ago
|
||
Current status:
I moved all post-app parts to each app in order to speed up parallel build. Most of changes are pretty dirty and it still has errors in patch.
Errors caused incomplete profile/webapps/<app> that still have to spend some time investigating the reason.
Reporter | ||
Comment 18•10 years ago
|
||
Thanks for looking at this Ricky. However, Makefile integration is still necessary to some degree. For builds driven by make, you want to take advantage of the jobserver. This ensures that when gaia is being built as part of a complete system, the gaia build system doesn't use more resources than allocated for building.
Flags: needinfo?(mwu)
Assignee | ||
Comment 19•10 years ago
|
||
Hi Michael, if you don't mind, I'd like to take this bug. :)
Flags: needinfo?(mwu)
Assignee | ||
Comment 20•10 years ago
|
||
Hi Doug Sherk,
I ran into a bug in incorrect build by |make APP=callscreen|. If you look at build_stage/callscreen/index.html L80, aria-label="Hang Up" attribute doesn't show. But running |make| can make it wonderful.
I guess that extracting callscreen from communications but loading resources from ../communications/dialer/ without maintaining own locales caused non-localized files. I think it isn't a good way so that build system cannot perform properly.
Parallel build manages to build each app singly. Therefore, it may occur such errors. I want to make sure this problem and hope we can maintain the locales folder in callscreen.
If that is reasonable, I'm going to file a bug for this. :)
Flags: needinfo?(drs.bugzilla)
Reporter | ||
Comment 22•10 years ago
|
||
No need to ask - you've already taken it. I'll file another bug for better integration with make at a later point if you don't do it here.
Assignee: mwu → ricky060709
Flags: needinfo?(mwu)
Assignee | ||
Comment 24•10 years ago
|
||
Currently, we can reduce build time approximately 10s from 32s to 22s.
Assignee | ||
Comment 25•10 years ago
|
||
Comment on attachment 8512697 [details]
Gaia PR
Updated PR and set r? to gduan.
Attachment #8512697 -
Attachment description: Gaia PR WIP → Gaia PR
Attachment #8512697 -
Flags: review?(gduan)
Comment 26•10 years ago
|
||
Comment on attachment 8512697 [details]
Gaia PR
Good job!
I left some comments in github.
I assume you have compare file list and contents with/without your patch, including normal make, GAIA_OPTIMIZE=1 and please also verify it with LOCALE_BASEDIR in case. And I'm also wondering we should turn parallel build as default.
Attachment #8512697 -
Flags: review?(gduan)
Assignee | ||
Comment 27•10 years ago
|
||
LOCALE_BASEDIR case result:
Files 353ed10/webapps/settings.gaiamobile.org/application.unzip/resources/gaia_commit.txt and 0888735/webapps/settings.gaiamobile.org/application.unzip/resources/gaia_commit.txt differ
Files 353ed10/webapps/webapps.json and 0888735/webapps/webapps.json differ
Only in 0888735/webapps: {03ec926d-c463-5f45-8ae0-55e5467ecf83}
Only in 353ed10/webapps: {5c6e3130-6625-5549-af98-62aaf6d88bb4}
Only in 353ed10/webapps: {60c8641d-fa5c-b54f-a685-f050f0c1be3c}
Only in 0888735/webapps: {8292f5a2-7613-9249-b086-9dc0f47a088f}
all differences seem to be insignificant.
Assignee | ||
Updated•10 years ago
|
Attachment #8512697 -
Flags: review?(gduan)
Assignee | ||
Updated•10 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 28•10 years ago
|
||
PR updated! It included parallel build flag P=1 for switching parallel build and added a workaround for callscreen in order to fix unit test failures.
Comment 29•10 years ago
|
||
Comment on attachment 8512697 [details]
Gaia PR
Hi Douglas,
Ricky has done some workaround in callscreen's app. could we have your feedback on it?
Attachment #8512697 -
Flags: feedback?(drs.bugzilla)
Assignee | ||
Comment 30•10 years ago
|
||
Try is all green. \o/
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=57e6c0396c95
Comment 31•10 years ago
|
||
Comment on attachment 8512697 [details]
Gaia PR
For starters, since this change impacts the Dialer component, I should be providing a full review and not just feedback, unless you don't think that this is ready for that yet.
It looks like you copied and pasted all of the locales files that are currently in ./apps/communications/dialer/locales/ to ./apps/callscreen/locales/. Unfortunately, I don't think that this is a suitable fix as it really hurts our maintainability. We may need to repeat some strings in some cases due to differences in context, but in general, we should be sharing these strings instead.
Ideally, we would move all shared strings into ./shared/locales/dialer/, and have all single-app strings in their app's locales folder. However, figuring out which strings are being shared or not is quite an undertaking and hasn't been worth the effort for us up until now.
Francesco, can you verify this and provide any insight here?
The structure of the locales files here has bitten us a few times recently, so I'll put fixing this on our roadmap. I've filed bug 1093267 for this. I want to get to it next sprint, but we may not be able to. I would suggest going with a fix that doesn't require the string changes for now, and waiting for bug 1093267 to land before going with this patch.
Flags: needinfo?(francesco.lodolo)
Attachment #8512697 -
Flags: feedback?(drs.bugzilla) → review-
Comment 32•10 years ago
|
||
I completely agree, bug 1093267 is the correct way to approach this issue.
I also confess that I don't understand the apps/callscreen/locales/shared.en-US.properties shared part: en-US has 1 string, all the other files have ~20 (I'm really looking forward to getting rid of the extra locales).
Flags: needinfo?(francesco.lodolo)
Comment 33•10 years ago
|
||
Comment on attachment 8512697 [details]
Gaia PR
I think we still have two things to close this bug.
1. parse appOptions as string as Alex's suggestion, instead of use env.
2. workaround for comment 31: we probably need to bypass communications app when opening new process for parallel build, and remove it once bug 1093267 is closed.
Attachment #8512697 -
Flags: review?(gduan)
Assignee | ||
Comment 34•10 years ago
|
||
I think this patch is good to land since this issue mentioned on comment 20 was not only occurred in this patch but also in master over a long period of time and parallel build feature will enable by giving |make P=1| so that it seems not to impact on original build behavior.
If you're ok, we're going to remove dependencies (bug 1091348 & bug 1093267) and land it. :)
Flags: needinfo?(drs.bugzilla)
Comment 35•10 years ago
|
||
Thanks, I didn't realize that I had filed a bug for this already, so I reso dup'd bug 1091348 to bug 1093267. There's really only one issue here.
As for landing this, we can't, as this makes the situation with the Callscreen worse when using |make install-gaia|. Previously, |APP=callscreen make install-gaia| was broken but |make install-gaia| worked. With this patch, both are broken. We will have to find a workaround for now, or wait for bug 1093267 to be fixed.
No longer depends on: 1091348
Flags: needinfo?(drs.bugzilla)
Assignee | ||
Comment 36•10 years ago
|
||
Comment on attachment 8512697 [details]
Gaia PR
After offline discuss with George. We decided to add a workaround (it will be removed after bug1093267 is landed) for callscreen in build script for ensuring we do not break multilocale on callscreen.
PR has been updated! I'm sure callscreen work properly with new patch. Please try |make| again.
Attachment #8512697 -
Flags: review?(gduan)
Attachment #8512697 -
Flags: review-
Attachment #8512697 -
Flags: feedback?(drs.bugzilla)
Comment 37•10 years ago
|
||
Comment on attachment 8512697 [details]
Gaia PR
This seems fine to me. Thanks for adding the workaround. I'd suggest dropping the dependency on bug 1093267 here, filing a new bug in Gaia::Build to remove the workaround introduced here, and making it dependent on bug 1093267.
Attachment #8512697 -
Flags: feedback?(drs.bugzilla) → feedback+
Comment 38•10 years ago
|
||
Comment on attachment 8512697 [details]
Gaia PR
Great work! r=gduan
Attachment #8512697 -
Flags: review?(gduan) → review+
Assignee | ||
Comment 39•10 years ago
|
||
Final test before landing
Gaia-try:
https://treeherder.mozilla.org/ui/#/jobs?repo=gaia-try&revision=3fbcf5a9b022
Try:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=8c089cef55b9
Assignee | ||
Comment 40•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 41•10 years ago
|
||
sorry had to revert this for bustage like https://treeherder.mozilla.org/ui/logviewer.html#?job_id=783660&repo=b2g-inbound
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 42•10 years ago
|
||
So I think what happen is that multilocale is colliding when run in parrallel against multiple apps.
Because of shared locales.
It is surprising we only get an exception on windows, but I'm expecting weird things to happen on other platforms as well.
We should be more careful about shared locales handling within build/multilocale.js.
First we shouldn't cleanup shared locales on each app call to multilocale. Only once, after all apps are built.
Or may be we shouldn't clean them up at all? And just prevent shipping them in zip file??
Then I imagine we should also tweak getL10nRessources in order to fetch already computed shared locale and prevent overriding them.
Assignee | ||
Comment 43•10 years ago
|
||
But our patch disables parallel build by default, it shouldn't break.
George found bug 1046514 that maybe relate to here. nsIFile.exists() and nsIFiles.isDirectory() work improperly on windows so that we remove a non-exist dir or file.
Assignee | ||
Comment 44•10 years ago
|
||
This bug is always reproducible on Windows. Moreover, we can wrap try catch in every risk API.
Assignee | ||
Comment 45•10 years ago
|
||
Attachment #8519891 -
Flags: review?(gduan)
Assignee | ||
Comment 46•10 years ago
|
||
There are some failures arose on windows because of nsIFile.remove behaviors. However, I cannot find out any reason after investigating all day. So I currently remove part of build_stage clean behaviors ex: cleanLocaleFiles() and clean-build-files.js and make sure we still can package correct files into zip.
I didn't seen that such errors happen again on windows and also ./diff-build.sh to check differences are correct.
Assignee | ||
Comment 47•10 years ago
|
||
Comment 48•10 years ago
|
||
(In reply to Ricky Chien [:rickychien] from comment #43)
> But our patch disables parallel build by default, it shouldn't break.
>
> George found bug 1046514 that maybe relate to here. nsIFile.exists() and
> nsIFiles.isDirectory() work improperly on windows so that we remove a
> non-exist dir or file.
The error we meet is actually NS_ERROR_FILE_ACCESS_DENIED when calling nsiFile.isDirectory().
Comment 49•10 years ago
|
||
Comment on attachment 8519891 [details]
Gaia PR
r=gduan,
I think it's a right way to filter those non-necessary files to zip instead of wasting time to remove them.
Ricky has compared file list and contents with/without his patch and most of it is ok. Some locales.properties files it doesn't actually remove when GAIA_CONCAT_LOCALES=1, I think we should file a follow-up bug to handle it and check why the locales lists are different bet linux and mac on master branch.
Attachment #8519891 -
Flags: review?(gduan) → review+
Assignee | ||
Comment 50•10 years ago
|
||
Thank you George!
merged.
https://github.com/mozilla-b2g/gaia/commit/d18a82562c3144e0eef3dddb61d76f53052eea00
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•