Closed Bug 932361 Opened 11 years ago Closed 11 years ago

Intermittent failure when installing simulator addon

Categories

(Toolkit :: Add-ons Manager, defect)

ARM
All
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla28
Tracking Status
firefox26 --- unaffected
firefox27 + verified
firefox28 + verified

People

(Reporter: ochameau, Assigned: mossop)

References

Details

Attachments

(1 file, 1 obsolete file)

In recent nightly, the simulator addon often fails to install with the following error:
*** ERROR addons.xpi: extractFilesAsync: failed to extract file /home/alex/.mozilla/firefox/76054eqo.default/extensions/staged/fxos_1_3_simulator@mozilla.org/resources/fxos_1_3_simulator/data/linux64/b2g/modules/services-crypto/WeaveCrypto.js: Unix error 24 during operation open (Too many open files)

I have only seen the addon fail to install on Mac.

Bug 562674 regression?
Certainly. This is a problem because we queue up lots of extractions at once I guess. I'm thinking that retrying opening the file in this case might do the trick unless you have another thought Blair?
Blocks: 562674
Flags: needinfo?(bmcbride)
Ooohhh..... because we queue up a bunch of OS.File operations, as one operation finishes we close the file - but I guess that ends up being queued too. So by the time it gets to processing that part of the queue, we'd have opened quite a number of file handles.

Hmm :\ Retrying should work, in theory (although it also means we're effectively doing a DoS on any other code trying to open files at the same time). Your original method for doing the extraction piece-by-peice would be safest, if less efficient.
Flags: needinfo?(bmcbride)
Yoric, maybe you should shed some light on this.

AFAIK, OS.File doesn't provide a way to safely and efficiently handle operations on a large number of files at once. What we're doing here currently is queuing up a lot of operations, instead of waiting for previous operations to complete before writing the next file. In extreme cases, that's ending up with too many file handles open. I had wanted to queue everything at once because waiting for every operation to complete and eventually pass from OS.File's worker to the main thread seems wasteful when we're dealing with a large number of files.
Flags: needinfo?(dteller)
Assignee: nobody → dtownsend+bugmail
I would suggest decompressing sequentially instead of attempting to maximize overlap between CPU and IO.

Rationale:
1. as we now see, this approach DoSes the process, which is probably not a good idea – in particular, I have no idea on the effect of such a DoS on, say, the disk cache, or Cu.import, etc, but I'm sure it doesn't look good;
2. this cannot be fixed entirely in OS.File as OS.File is not the only part of the process that requires file handles;
3. Dave's recent algorithm optimizes for throughput, which is good, but the maximization of overlap does not improve responsiveness, and there is a chance (I haven't checked) that it might actually hurt responsiveness by saturating CPU with the decompression algorithm – recall that in most cases, throughput barely improves UX while responsiveness is the most important factor;
4. there is also a chance that the overlapping algorithm may hurt the system in the long run by increasing disk thrashing (bad for the system and bad for battery) and fragmentation.

This being said, we may want to have OS.File retry opening if it failed due to too many open files, in case of contention with other subsystems. But I believe that would be the wrong solution to the current problem.
Flags: needinfo?(dteller)
I'll revert back to the old method (keeping the updating file access code). I'll also spin in support for bug 921229 and bug 923540
(In reply to Dave Townsend (:Mossop) from comment #5)
> I'll revert back to the old method (keeping the updating file access code).
> I'll also spin in support for bug 921229 and bug 923540

Actually I won't, since bug 921229 isn't on aurora and this patch needs to be :(
Attached patch patch rev 1 (obsolete) (deleted) — Splinter Review
This switches back to a Task to manage the extraction.

While I'm here I added a test consisting of an XPI containing a corrupt bootstrap.js file (I mangled the CRC-32 in the zip file).

In packed mode the add-ons manager doesn't notice though does fail to load the bootstrap script.

In unpacked mode we detect and recover sanely (yay!).

One weird thing, during install we send an onInstalling event for the new add-on but nothing for when the install failed. I added an onOperationCancelled event there but don't know what you think about that Blair?
Attachment #827697 - Flags: review?(bmcbride)
I am spinning that through try as we speak but I'm pretty confident it is going to pass.
Comment on attachment 827697 [details] [diff] [review]
patch rev 1

Review of attachment 827697 [details] [diff] [review]:
-----------------------------------------------------------------

*sigh* The best laid plans of mice and men...


(In reply to Dave Townsend (:Mossop) from comment #6)
> (In reply to Dave Townsend (:Mossop) from comment #5)
> > I'll revert back to the old method (keeping the updating file access code).
> > I'll also spin in support for bug 921229 and bug 923540
> 
> Actually I won't, since bug 921229 isn't on aurora and this patch needs to
> be :(

File a followup bug?


(In reply to Dave Townsend (:Mossop) from comment #7)
> One weird thing, during install we send an onInstalling event for the new
> add-on but nothing for when the install failed. I added an
> onOperationCancelled event there but don't know what you think about that
> Blair?

Sounds good to me.
Attachment #827697 - Flags: review?(bmcbride) → review+
(In reply to Blair McBride [:Unfocused] from comment #9)
> Comment on attachment 827697 [details] [diff] [review]
> patch rev 1
> 
> Review of attachment 827697 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> *sigh* The best laid plans of mice and men...
> 
> 
> (In reply to Dave Townsend (:Mossop) from comment #6)
> > (In reply to Dave Townsend (:Mossop) from comment #5)
> > > I'll revert back to the old method (keeping the updating file access code).
> > > I'll also spin in support for bug 921229 and bug 923540
> > 
> > Actually I won't, since bug 921229 isn't on aurora and this patch needs to
> > be :(
> 
> File a followup bug?

Bug 935189, already in your review queue ;)
Comment on attachment 827697 [details] [diff] [review]
patch rev 1

Or maybe this fails try runs :(
Attachment #827697 - Attachment is obsolete: true
Depends on: 935596
Attached patch patch rev 2 (deleted) — Splinter Review
Failed to close the zipreader after it was done. This passes try (with the patch from the dependant bug)
Attachment #828733 - Flags: review?(bmcbride)
Attachment #828733 - Flags: review?(bmcbride) → review+
https://hg.mozilla.org/mozilla-central/rev/38b01389507d
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Comment on attachment 828733 [details] [diff] [review]
patch rev 2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 562674
User impact if declined: Some large add-ons may fail to install
Testing completed (on m-c, etc.): On m-c since before the weekend
Risk to taking this patch (and alternatives if risky): The risk here is pretty low (the bulk of the patch is whitespace only changes) and this adds an automated test to verify a behaviour that wasn't tested previously
String or IDL/UUID changes made by this patch: None
Attachment #828733 - Flags: approval-mozilla-aurora?
Keywords: verifyme
Attachment #828733 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 940232
Verified as fixed using the latest Firefox 27 beta 1 (20131209204824) on Win 7 64-bit, Ubuntu 32-bit and Mac OS X 10.8.5. The initial issue was reproducible on Mac OS X.
QA Contact: petruta.rasa
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (X11; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0

Verified as fixed on latest Aurora 28.0a2 (Build ID: 20131212004008): no errors in the Console when installing Simulator addon.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: