Closed Bug 1202006 Opened 9 years ago Closed 8 years ago

Blob XHRs kept in memory

Categories

(Core :: DOM: Core & HTML, defect)

24 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: azakai, Assigned: baku)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(7 files, 5 obsolete files)

(deleted), application/x-bzip
Details
(deleted), application/octet-stream
Details
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
Attached file emscripten_temp.tar.bz2 (deleted) —
Comparing memory usage between Firefox and Chrome, Firefox uses a lot more memory when receiving large Blob XHRs. They appear to be kept in memory, while in Chrome, they seem to be kept on disk.

The attachment has a simple testcase that runs in a worker, and fetches 300MB of binary data in a Blob XHR. It then uses FileReaderSync to read from various locations in the file. In Chrome, memory usage hardly rises when loading the testcase; in Firefox, there is over 300MB of memory allocated for the Blob, showing up as "memory-file-data/large". This is also noticeable when viewing the OS processes.

Chrome's behavior where Blob XHRs are kept on disk would be helpful for applications with massive datasets, that can't even fit in memory. And FileReaderSync allows us to read from Blobs in a convenient manner in workers, which together would let us use massive datasets easily.
There are downsides to the Chrome approach as well. Firefox takes less than a second to load and read from those data files, while Chrome takes 4.5 seconds. Of which 3 seconds is the actual reads which for Firefox is just 0.023 (i.e. 23 ms). In other words, Chrome does a lot more disk IO here. Times are similar on two machines, on with an SSD and one without.
We've talked about paging out large blobs to disk before, but it's never something we've felt necessary to implement.
I understand. It's hard for me to guess at what the best thing here is, but I am looking into ways to reduce memory usage for large applications (games, image editors, etc.), that have lots of data and currently are struggling to run on the web. Luke suggested Blob+FileReaderSync as one promising approach.

In some more testing now, I tried to load a Blob, then store it to IDB, then read it from there and use FileReaderSync. This still takes the full amount of memory at first, but once in IDB, it looks like we do actually keep the Blob on disk efficiently. However, the initial process of retrieving the file and storing it in IDB is very memory-intensive, perhaps there are even two copies in memory at some point?

In Chrome on the other hand, memory usage is almost flat during the whole process. It receives the XHR, stores it to IDB, loads it from there, and uses it, all without it ever being actually in memory. This does seem to support their approach here, I think.

Benchmarking the times also supports them: Unlike before, both browsers now do save the file on disk. Now the times are quite similar, suggesting that Chrome stores the XHR Blob and then repurposes it for IDB. So both browsers end up spending the time to write and then read, and Chrome avoids a double write.

I'll attach a testcase showing this.
Blocks: gecko-games
Whiteboard: [MemShrink]
Attached file emscripten_temp.tar.bz2 (deleted) —
Testcase of XHR => IDB => use Blob in FileReaderSync.
khuey: I don't think we need anything fancy like paging out Blobs that were initially created from an in-memory source (like Blob(ArrayBuffer)), just for Blobs retrieved from the network.
Yes, I think Luke is right, after thinking about this some more I believe the issue here is binary XHRs where the user requested 'arraybuffer' vs 'blob' as the xhr.responseType.

It looks like right now in Firefox we handle memory the same in both cases, so the two are in effect almost the same, but in Chrome, when the user asked for a Blob, it is actually a normal on-disk Blob and memory is not used for it. That seems like very useful behavior.
Whiteboard: [MemShrink] → [MemShrink:P2]
We used to store XHR-blobs on the disk (bug 649133), but the ability was removed (bug 725993 and 758296), How does Chrome prevent XHR-blobs from running-out the disk space?
Ah, good to know the backstory.  So, scanning bug 725993, it looks like there was every intention to use files to avoid the OOM issues we're seeing here and it was only removed because the HTTP cache was the wrong place for it (and it sounds like there was no quota checking in place at the time).  It also appears there was the intention to implement a new file-backed mechanism (shared by both WebSockets and XHR) but we just never got around to doing it b/c XHR+Blob were still new and so it was a lower priority.

I'm don't know what Chrome does, but I have to assume there is a quota mechanism to prevent disk exhaustion.
Perhaps we could use the same quota mechanism we already have for IndexedDB, precompiled asm.js binaries, etc.? (and if a blob xhr fails the quota, we'd store it in memory; and if it is on disk, we delete it when the page is closed)
This would have been useful for PluotSorbet (j2me.js) as well, where we implemented a FileSystem on top of IndexedDB.
We have an interesting implementation in EncodedBufferCache (See /dom/media/EncodedBufferCache.*) where we go from a MemoryBlobImpl to a TemporaryBlobImpl when the buffer size is bigger than 1mb. Maybe we can use the same approach here.
Are we still interested in having this feature?
Yes, definitely still interested in this feature.
Assignee: nobody → amarchesini
Attachment #8763117 - Flags: review?(bugs)
Attached patch part 2 - Memory to Temporary BlobImpl (obsolete) (deleted) — Splinter Review
Smaug, I don't know who else can review these patches. If you are too busy, just move to them to somebody else. Thanks!
Attachment #8763118 - Flags: review?(bugs)
Comment on attachment 8763118 [details] [diff] [review]
part 2 - Memory to Temporary BlobImpl

Review of attachment 8763118 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerPrivate.h
@@ +1300,5 @@
>  
> +  uint32_t
> +  MaxMemoryForTemporaryFile() const
> +  {
> +    return mMaxMemoryForTemporaryFile;

This value is not initialized in the CTOR of WorkerPrivate. I did it. If you want I can upload a new patch with this change.
Attached patch part 2 - Memory to Temporary BlobImpl (obsolete) (deleted) — Splinter Review
Attachment #8763118 - Attachment is obsolete: true
Attachment #8763118 - Flags: review?(bugs)
Attachment #8763311 - Flags: review?(bugs)
The good thing of these patches is that it works also for normal blobs. This means that if the user does:

var a = ""; for (var i = 0; i < 2048; ++i) { a += "hello world!"; }
var blob = new Blob([a]);

this is written in a temporary file on disk.
but only if the size of the blob is > 1024000 ? And at which point to we write to disk? And do we do it even in private browsing mode?
How does the patch work in e10s, assuming the child process is sandboxed?
Comment on attachment 8763311 [details] [diff] [review]
part 2 - Memory to Temporary BlobImpl

This is doing main thread I/O. We don't want that.
Attachment #8763311 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] (high review load, please consider other reviewers) from comment #19)
> How does the patch work in e10s, assuming the child process is sandboxed?

Right. We have to support private Browsing. For e10s, if the I/O fails, we continue in memory. This should be enough.

About the I/O on main-thread, yes... I can try to move everything to an I/O thread. The code will be a bit more complex because I must play with mutex and refcounting, but it makes sense to try.
(In reply to Andrea Marchesini (:baku) from comment #21)
> Right. We have to support private Browsing. For e10s, if the I/O fails, we
> continue in memory. This should be enough.
Is it? Don't we plan to have sandboxed content processes everywhere, in which case the code wouldn't ever succeed to store the data in a file, except when chrome js uses Blobs.
Comment on attachment 8763117 [details] [diff] [review]
part 1 - BlobSet - indentation, moving code, etc

>+
>+  if (!bufferLen.isValid())
>+    return false;
>+
>+  void* data = realloc(mData, bufferLen.value());
>+  if (!data)
>+    return false;
>+

since you're anyhow doing random other changes too, want to put {} to these ifs
Attachment #8763117 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] (high review load, please consider other reviewers) from comment #19)
> How does the patch work in e10s, assuming the child process is sandboxed?

Good news: NS_OpenAnonymousTemporaryFile (which the patch uses) already knows how to message the parent to do the filesystem operations and send back a file descriptor/handle.

Bad news: it does sync IPC and main thread I/O, even if called off-main-thread in the content process; see bug 1149313.
about the bad news: do you think this should be handled in this patch, or I can propose a patch as I wrote and leave the main-thread I/O issue for bug 1149313 ?
Flags: needinfo?(jld)
My inclination would be to just use NS_OpenAnonymousTemporaryFile as-is and defer fixing it to bug 1149313 — it's possible that the longer-term fix might be to give content processes access to a writable temporary directory because they need it anyway on desktop.  But people with more experience on the DOM side might be more concerned about the blocking/syncness here.  (To clarify, that's only about creating/opening the file; the writes themselves can be done off-main-thread in the child and won't block anything else.)
Flags: needinfo?(jld)
Depends on: 1281748
Lufi [1], a file upload/download service, is limited by this behavior. It is downloading a file from encrypted chunks on a server (through websocket), decrypting each one and storing the result in a memory object. Then it creates a blob from this memory object for the user to download. The problem is that the memory used is twice the size of the downloaded object during the blob creation, and it hits the available computer memory quite often on big files (don't even mentioning 32bit systems). I think solving this ticket would be of much help for this service. Or is there another way to do it?

Lufi is used by framasoft for their framadrop [2] service, as part of their "De-google-ify Internet" [3] initiative. This service is a libre alternative to wetransfer or mega.

[1] https://framagit.org/luc/lufi/
[2] https://framadrop.org/
[3] https://degooglisons-internet.org/
Attachment #8763117 - Attachment is obsolete: true
Attached patch part 1 - MutableBlobStorage for XHR (obsolete) (deleted) — Splinter Review
At the moment this is just a subset of what BlobSet does.
Attachment #8763311 - Attachment is obsolete: true
Attachment #8792922 - Flags: review?(bugs)
Attachment #8792924 - Flags: review?(bugs)
Attachment #8792922 - Attachment is obsolete: true
Attachment #8792922 - Flags: review?(bugs)
Attachment #8792937 - Flags: review?(bugs)
Comment on attachment 8792937 [details] [diff] [review]
part 1 - MutableBlobStorage for XHR

Though, do we want to keep using BlobSet with moz_blob.
Attachment #8792937 - Flags: review?(bugs) → review+
Attachment #8792924 - Flags: review?(bugs) → review+
Attachment #8793830 - Flags: review?(bugs)
Attached patch part 4 - temp files (deleted) — Splinter Review
Attachment #8793831 - Flags: review?(bugs)
Comment on attachment 8793830 [details] [diff] [review]
part 3 - BlobSet and MutableBlobStorage for XHR

odd, I thought I had reviewed this before.
Attachment #8793830 - Flags: review?(bugs) → review+
Comment on attachment 8793831 [details] [diff] [review]
part 4 - temp files


>+    // We want to wait until all the WriteRunnable are completed. The way we do
>+    // this is to go to the I/O thread and then we come back: the runnables are
>+    // executed in order and this LastRunnable will be... the last one.
>+    nsresult rv = DispatchToIOThread(new LastRunnable(this, aParent,
>+                                                      aContentType, aCallback));
This doesn't work during shutdown, and that means that WriteRunnable might be the last thing to keep 
MutableBlobStorage alive and it might get deleted in non-main-thread. I guess that is fine, since at that moment MutableBlobStorage doesn't keep
non-main-thread alive. right?

Looks good, but, please add couple of more tests in a separate patch. Something running on workers, and both main thread and worker test should also test that
what happens when XHR is abort()'ed at random time before the resource has been loaded.

So, I'm expecting to see at least one more patch in this bug, but r+ to this one.
Attachment #8793831 - Flags: review?(bugs) → review+
> non-main-thread alive. right?

Yes. It's fine. What it will happen is:

1. the DTOR or MutableBlobStorage create a CloseFileRunnable runnable trying to dispatch it to the I/O thread.
2. also this runnable will not be dispatched, but the DTOR of CloseFileRunnable will close the fd (on the main-thread).
Attached patch part 5 - tests (obsolete) (deleted) — Splinter Review
I'm not sure we want to land this test because if you run it with --repeat X, after a while the server stops to respond. I guess it's because of the abort and the request is truncated and not correctly handled on the server side.
Attachment #8795666 - Flags: review?(bugs)
Attached patch part 5 - tests (deleted) — Splinter Review
Attachment #8795666 - Attachment is obsolete: true
Attachment #8795666 - Flags: review?(bugs)
Attachment #8795680 - Flags: review?(bugs)
This patch works with --repeat.
Comment on attachment 8795680 [details] [diff] [review]
part 5 - tests

># HG changeset patch
># User Andrea Marchesini <amarchesini@mozilla.com>
># Parent  c8705bdf03f64f22341e991a336f03ff78339c64
>
>diff --git a/dom/xhr/tests/common_temporaryFileBlob.js b/dom/xhr/tests/common_temporaryFileBlob.js
>new file mode 100644
>--- /dev/null
>+++ b/dom/xhr/tests/common_temporaryFileBlob.js
>@@ -0,0 +1,101 @@
>+var data = "";
>+for (var i = 0; i < 256; ++i) data += "1234567890ABCDEF";

var data = new Array(256).join("1234567890ABCDEF") might be a bit faster, but probably
doesn't matter since the string isn't super long anyhow.
Attachment #8795680 - Flags: review?(bugs) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/218abd044ee6
Memory Blob to Temporary File - part 1 - MutableBlobStorage for XHR, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/36f028447d58
Memory Blob to Temporary File - part 2 - BlobSet just for MultipartBlobImpl, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/abd96b988887
Memory Blob to Temporary File - part 3 - BlobSet and MutableBlobStorage for XHR, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/13637a65df3b
Memory Blob to Temporary File - part 4 - Temporary File, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/2d109b8a96f0
Memory Blob to Temporary File - part 5 - tests, r=smaug
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/249aeef1b718
Memory Blob to Temporary File - part 1 - MutableBlobStorage for XHR, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ba90bde149a
Memory Blob to Temporary File - part 2 - BlobSet just for MultipartBlobImpl, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b1af9d87ecd
Memory Blob to Temporary File - part 3 - BlobSet and MutableBlobStorage for XHR, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/aeb086e0404f
Memory Blob to Temporary File - part 4 - Temporary File, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/c634201ba01d
Memory Blob to Temporary File - part 5 - tests, r=smaug
Flags: needinfo?(amarchesini)
Depends on: 1349862
Depends on: 1373222
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: