Closed Bug 1262661 Opened 8 years ago Closed 8 years ago

Many large SessionStore:update messages

Categories

(Firefox :: Session Restore, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 49
Tracking Status
e10s + ---
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: mccr8, Assigned: gkrizsanits)

References

Details

Attachments

(3 files, 3 obsolete files)

Bill landed some telemetry in bug 1260908 to record the number and size of IPC message, and message sent by the message manager (MESSAGE_MANAGER_MESSAGE_SIZE). I haven't done any formal analysis, but it looks like the largest source of large messages (300KB or so and up) is the SessionStore:update message. There were, in the last few days, hundreds of thousands of messages sent that above 1MB in size. Allocations this large are going to contribute to heap fragmentation, even if they are fairly short lived. It would be nice if we could make these messages smaller somehow.
Blocks: 1262918
No longer blocks: e10s-oom
Here's the current data that telemetry is showing, for the child process.

No other message had more than a few thousand total in the 844.92k and above buckets, whereas this kind of message had 26k messages in the 8MB and up category.
I had thought we worked on something similar to this before. Turns out the older bug was related to update messages sent within the browser process. See bug 1212244.
(The parent process looks a little similar, though at the high end the messages are around 1/5 as frequent.)
SessionStore:history also produces some large messages in the parent process, but it is about 1/10th as common as :update.
Depends on: 1263028
No longer depends on: 1263028
Bear in mind that this won't really affect the OnMessageReceivedFromLink OOMs in the content process. SessionStore:update messages are always from the child to the parent. They only show up in telemetry for the parent process because of in-process tabs, like about: pages, which don't go over IPC.

(Also keep in mind that the telemetry is recorded on the sending side, so we should be looking at the parent if we want to figure out what's causing receive OOMs in the child.)
(In reply to Bill McCloskey (:billm) from comment #5)
> Bear in mind that this won't really affect the OnMessageReceivedFromLink
> OOMs in the content process.

Aren't the OnMessageReceivedFromLink OOMs going to be fixed by your patch in bug 1235633?

I realize it won't affect the receiver OOMs, but allocating many large messages is going to be bad for the address space of the sender, too.
(In reply to Andrew McCreight [:mccr8] from comment #6)
> (In reply to Bill McCloskey (:billm) from comment #5)
> > Bear in mind that this won't really affect the OnMessageReceivedFromLink
> > OOMs in the content process.
> 
> Aren't the OnMessageReceivedFromLink OOMs going to be fixed by your patch in
> bug 1235633?

Well, it avoids the one copy in that specific signature, but we're still allocating when reading the data off the network and also inside ParamTraits.

You know, it's funny that we haven't seen an OOM signature for when we read the data in from the socket. It seems like that would trigger just as much as the OnMessageReceivedFromLink OOM since they essentially happen in pairs.

> I realize it won't affect the receiver OOMs, but allocating many large
> messages is going to be bad for the address space of the sender, too.

That's true.
(In reply to Bill McCloskey (:billm) from comment #7)
> You know, it's funny that we haven't seen an OOM signature for when we read
> the data in from the socket. It seems like that would trigger just as much
> as the OnMessageReceivedFromLink OOM since they essentially happen in pairs.

I was thinking the same thing. My only theory is that all of the various copying that goes on when a message is received results in a high peak of memory usage (socket data -> message copy -> nsCString), and the initial read is before the peak really takes off.
(We do have at least one signature for the message copy -> nsCString part of the surge in memory, bug 1259183.)
Flags: needinfo?(kchen)
Assignee: nobody → kchen
Bill, you wrote in bug 1263004 comment 6 that maybe we can improve the structured cloning to return the data non-contiguously. I think you mean the underlying JSStructuredClone{Writer,Reader}, right? Then we could make StructuredCloneData::WriteIPCParams to write data in chunks.

BTW, it seems if we use Move here we could avoid a copy
https://dxr.mozilla.org/mozilla-central/rev/21bf1af375c1fa8565ae3bb2e89bd1a0809363d4/dom/ipc/StructuredCloneData.cpp#129
Flags: needinfo?(wmccloskey)
tracking-e10s: --- → ?
I filed bug 1264642 about reducing the size of these messages at the structured clone level. I think we should leave this bug to be about improvements at the session storage level, so we don't end up duplicating effort.
Kanru, if you decide to pick up bug 1264642 Gabor can pick this one up (or vice versa).
Flags: needinfo?(gkrizsanits)
(In reply to Jim Mathies [:jimm] from comment #12)
> Kanru, if you decide to pick up bug 1264642 Gabor can pick this one up (or
> vice versa).

I'll pickup bug 1264642
Assignee: kchen → gkrizsanits
Kan-ru seems to have figured out the buffer stuff, so I'm clearing Bill's needinfo.
Flags: needinfo?(wmccloskey)
This should be fixed by bug 1262671 mostly, but there are some improvement that could be done here. First we always send the entire session storage [1] instead we could just send messages about the changed [domain, key, value] entries. Then we could merger these changes in MessageQueue smarter and when the changes reach a certain size we could force the MessageQueue to fire instead of waiting for more changes to come in (currently the MessageQueue sending the messages in batches [2])

[1]: http://hg.mozilla.org/mozilla-central/annotate/ae7413abfa4d/browser/components/sessionstore/content/content-sessionStore.js#l524

[2]: http://mxr.mozilla.org/mozilla-central/source/browser/components/sessionstore/content/content-sessionStore.js#588
Depends on: 1262671
Flags: needinfo?(gkrizsanits)
Priority: -- → P1
Uhh. This is worse than I thought. There is more here. SessionStore:update can come from various places:
http://mxr.mozilla.org/mozilla-central/search?string=Causes+a+SessionStore%3Aupdate+message+to+be+sent&find=&findi=&filter=%5E%5B%5E%5C0%5D*%24&hitlimit=&tree=mozilla-central

From these places the problematic ones are: history, form data, session store 

1. history listener claims to be cheap: http://mxr.mozilla.org/mozilla-central/source/browser/components/sessionstore/content/content-sessionStore.js#212

but with some playing around the nested history of a short gmail session is anything but cheap. Especially considering the fact that we send full nested history for many-many cases (navigate / refresh any iframe for example) so we send this data a LOT. For gmail for people with long sessions this in itself can be devastating probably (after a few days session for someone who is reading/answering hundreds of email per day I don't want to know how big this might be)...

2. form data: I have one concern here, what if someone uses the form to send a big file? how does that work? it should be tested, I haven't yet. But even without that I can imagine long forms where sending the whole data frequently is kind of a waste

3. session history: comment 15

Again, it does not help that we can merge these messages into one either.

For these problems, what we need is a mechanism that creates a deep diff between two objects and then we can send the diff as the message. And on the other side using the inverse of this mechanism to patch the cached data with that diff instead of sending the entire new state over and replacing the old one with it.

Do we have such mechanism in tree somewhere? Or should we use a lib for it? (I would prefer not to write it myself from scratch since we want this fast and reliable, and just to write tests for it would take too much time, and in the end it would not be upliftable for sure)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #16)
> For these problems, what we need is a mechanism that creates a deep diff
> between two objects and then we can send the diff as the message. And on the
> other side using the inverse of this mechanism to patch the cached data with
> that diff instead of sending the entire new state over and replacing the old
> one with it.
> 
> Do we have such mechanism in tree somewhere? Or should we use a lib for it?
> (I would prefer not to write it myself from scratch since we want this fast
> and reliable, and just to write tests for it would take too much time, and
> in the end it would not be upliftable for sure)

Do you know if we have something like this already for sync?
Flags: needinfo?(rnewman)
We don't.

We're actually touching on this a little in Tofino; there we're using Immutable.js as part of a CQRS-style and Redux-ey command/action/diff loop. Our IPC world is even worse than yours: every browser window, every page, and the main data storage service are all in different processes.

On the one hand, I would be a little leery of trying to shoehorn a diff-based message system into one that wasn't designed for it: because both sides are likely mutable, you can end up with divergence, and that's bad.

On the other hand, perhaps it's a good thing that you'll have to tease out all of the places that might mutate state…
Flags: needinfo?(rnewman)
Hey ttaubert, I know you've been hands-off with SessionStore for a while, but I wonder if you had some ideas on how we could shrink these update messages. gabor's looking at sending diffs in the update instead of full-blown state. Do you have any other suggestions?
Flags: needinfo?(ttaubert)
I think the quickest win here is that for some reason we never clear the MessageQueue, which means if a site quickly updates a numeric value in sessions store let's say, we always send the whole history with it as a bonus (and vica versa).

see this._data here: http://hg.mozilla.org/mozilla-central/annotate/ae7413abfa4d/browser/components/sessionstore/content/content-sessionStore.js#l705

I think I will start there, then see if we need to send the entire history all of those cases.

It might make sense to do some extra analysis on telemetry about which ones of these cases the most trouble in general and try to quick fix that. The diff-patch solution is likely more complex, I would not want to block e10s by that if I don't have to.
Also, with some gmail email openings, the history was 100k in a minute... I think that's a more urgent use case, and might be easier to fix given the fixed format of the history entries compared to session store.
I seem to recall there being some facility for doing compression on IPC messages. I'm not sure if message manager messages have access to that feature, but it might be useful here.
Attached patch clear the MessageQueue after send. v1 (obsolete) (deleted) — Splinter Review
This patch will not fix the problem but it's trivial and will make a difference. If I have a 1-200k session history I don't want to send the whole thing every time someone changes a flag in his session store for example (I've seen sites doing that in every second...)
Attachment #8746057 - Flags: review?(ttaubert)
Comment on attachment 8746057 [details] [diff] [review]
clear the MessageQueue after send. v1

I can take this one.
Attachment #8746057 - Flags: review?(ttaubert) → review?(mconley)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #22)
> I seem to recall there being some facility for doing compression on IPC
> messages. I'm not sure if message manager messages have access to that
> feature, but it might be useful here.

Is there? There is a |compress| attribute in ipdl but that does something else (used for things like mouse move). If we do have something like that we should use that low level for every message above a certain size. Although at that point we already allocated a huge chunk for it for Pickle... So yeah maybe we should use that mechanism earlier in message manager. But anyway, I cannot find anything like that...

(In reply to Mike Conley (:mconley) - Needinfo me! from comment #24)
> I can take this one.

Thanks, it's all yours :)
Bill, do we have any kind of data compress support for IPC? If not, do you think we could/should hook one in into StructuredCloneData? I'm not sure we should always compress the structured cloned object in sendAsyncMessage we could fire the compressor conditionally... (I was going to suggest gzip but from IRC: bsmedberg> gabor, if we were going to support that it should probably be our xz compressor anyway: faster and more efficient)
Flags: needinfo?(wmccloskey)
I don't think compressing the data would help. We still have to uncompress it and store it on the other side. I guess it could make the IPC go a little faster, but I doubt that would be a net win given the time to compress/uncompress.
Flags: needinfo?(wmccloskey)
If the session store stuff is really sending the full message history every time, and there are often multiple of these messages in flight at once, then the compress attribute would actually be useful, because you don't really need to send the earlier full history if you are sending a later one. However, this would require using a separate kind of IPDL message for this history thing, as these are currently just all generic message manager messages, so it might be fairly involved.
If there are multiple messages in flight, we should just fix that.
Comment on attachment 8746057 [details] [diff] [review]
clear the MessageQueue after send. v1

Review of attachment 8746057 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8746057 - Flags: review?(mconley) → review+
(In reply to Andrew McCreight [:mccr8] from comment #28)
> time, and there are often multiple of these messages in flight at once, then

I don't think that happens. The whole point of the MessageQueue is to avoid that. You push stuff to it and after a timeout it fires all of them. It's not actually a stack it's just a map, so if you push a lot of history changes to it real quickly only the last one will be in the map when the timeout fires and only the last one will be sent. The patch I added only makes sure that if there were no history changes then it will not send the last one again.

(In reply to Bill McCloskey (:billm) from comment #27)
> I don't think compressing the data would help. We still have to uncompress
> it and store it on the other side. I guess it could make the IPC go a little
> faster, but I doubt that would be a net win given the time to
> compress/uncompress.

It depends on the structured clone algorithm and the compression algorithm. The structured cloning happens in chunks it seems, if we can feed these chunks into a compressor algorithm then we win some memory on the sender side since StructuredCloneData will only contain the compressed JSON.  On the target side we will have to create the object that is granted, but we don't have to allocate memory for the whole JSON string just for the compressed one, and then we can feed the other end of the structured cloned algorithm with uncompressed chunks. At least that was my idea. Probably a bit crazy. I'm not entirely familiar with the structured clone implementation, so this might be totally impossible, and this set up would be quite limiting for the compressor algorithm too.
sorry had to back out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=26801502&repo=mozilla-inbound
Flags: needinfo?(gkrizsanits)
(In reply to Carsten Book [:Tomcat] from comment #33)
> sorry had to back out for test failures like
> https://treeherder.mozilla.org/logviewer.html#?job_id=26801502&repo=mozilla-
> inbound

Thanks, I'll look into this.
Flags: needinfo?(gkrizsanits)
> (In reply to Bill McCloskey (:billm) from comment #27)
> > I don't think compressing the data would help. We still have to uncompress
> > it and store it on the other side. I guess it could make the IPC go a little
> > faster, but I doubt that would be a net win given the time to
> > compress/uncompress.
> 
> It depends on the structured clone algorithm and the compression algorithm.
> The structured cloning happens in chunks it seems, if we can feed these
> chunks into a compressor algorithm then we win some memory on the sender
> side since StructuredCloneData will only contain the compressed JSON.  On
> the target side we will have to create the object that is granted, but we
> don't have to allocate memory for the whole JSON string just for the
> compressed one, and then we can feed the other end of the structured cloned
> algorithm with uncompressed chunks. At least that was my idea. Probably a
> bit crazy. I'm not entirely familiar with the structured clone
> implementation, so this might be totally impossible, and this set up would
> be quite limiting for the compressor algorithm too.

Just to point out why I think this would matter, for stuff like session history the JSON is 100k, with zip it's 4,5k. And I keep referring to JSON but I don't know what is the actual structured cloned output format looks like... maybe it's more optimized already. Even if we're less efficient with compressing because of the limitations, that can be a huge win in some cases. And when we actually create the object on the target side the JS engine will highly optimize it (shapes/identifier there the same property names over hundreds of times for example) and that won't be one big chunk of memory anyway. So if we can get away with unzipping on the other side in chunks I think this could help. Does that make sense? I'm sorry if I'm missing something trivial.
Flags: needinfo?(wmccloskey)
I guess if you did the compression in a streaming fashion, it could be a win. It seems like a lot of work though.

Right now we're trying to make the IPC code read messages in small chunks. After that we're going to make the structured clone code process those chunks without requiring them to be in one big contiguous buffer. Once that's done, we could think about compression, but I'm not sure it would be worth it at that point.

I think the focus of this bug should be on reducing how much data session restore sends rather than on platform improvements.
Flags: needinfo?(wmccloskey)
Not sure what's going wrong with the browser_tab_detach_restore.js test. It seems like when the tab is moved to a new window a new SessionHistoryListener is created and inited which sends a history message with a single about:blank in it. And then when the window is closed session store decides that a window with that single tab in it does not worth to be saved (even though the actual tab is displaying a page not about:blank). If we don't clear the message queue than we somehow resend the old history message after this about:blank report and that overwrites it and fixes this problem (does that mean we have two SessionHistoryListener reporting for the same thing concurrently?). So basically two bugs negate each other. I cannot reproduce it without the test framework, if I just click on a tab to open it in a new window that mysterious about:blank history message never appears, because SessionHistory.isEmpty(docShell) reports that the docshell is empty and we don't fire that meaningless history message.

On the one hand I'm tempted to just turn off the test and land this patch, but on the other I really want to understand what's going on here, and I suspect that there is an actual session restore bug that should be fixed here.

Mike, do you know how does this work in concept? I mean transferring a tab to a window? I'm interested in the SessionHistoryListener bits... Is there a docshell swap? How do we transfer the history?
Flags: needinfo?(mconley)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #38)
> Mike, do you know how does this work in concept? I mean transferring a tab
> to a window? I'm interested in the SessionHistoryListener bits... Is there a
> docshell swap? How do we transfer the history?

As a bystander that has looked at some of this recently, I thought I would try to throw in what I know at least... 

When you move a tab to a new window, `replaceTabWithWindow` is called in tabbrowser.xml[1], which opens a new browser window and passes it the XUL <tab> element.  The new window starts up its own <xul:browser> and calls `swapBrowsersAndCloseOther`[2][3] to transfer browser state from the old browser to the new one.  This includes calling `swapDocShells` from browser.xml[4] on the <xul:browser>, which swaps various <browser> element fields including `_docShell`.  (However this field is null for remote browsers.)  This in turn calls the platform `swapFrameLoaders` machinery[5].

As for the history, I believe the browsers point into the session history cache via the `permanentKey` property, and that's one of the things between <browser> elements in these code paths[6].

[1]: https://dxr.mozilla.org/mozilla-central/rev/1461a4071341c282afcf7b72e33036412d2251d4/browser/base/content/tabbrowser.xml#2952
[2]: https://dxr.mozilla.org/mozilla-central/rev/1461a4071341c282afcf7b72e33036412d2251d4/browser/base/content/browser.js#1091-1123
[3]: https://dxr.mozilla.org/mozilla-central/rev/1461a4071341c282afcf7b72e33036412d2251d4/browser/base/content/tabbrowser.xml#2569
[4]: https://dxr.mozilla.org/mozilla-central/rev/1461a4071341c282afcf7b72e33036412d2251d4/toolkit/content/widgets/browser.xml#1167
[5]: https://dxr.mozilla.org/mozilla-central/rev/1461a4071341c282afcf7b72e33036412d2251d4/dom/base/nsFrameLoader.cpp#1085
[6]: https://dxr.mozilla.org/mozilla-central/rev/1461a4071341c282afcf7b72e33036412d2251d4/browser/base/content/tabbrowser.xml#2714-2716
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #39)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #38)
> > Mike, do you know how does this work in concept? I mean transferring a tab
> > to a window? I'm interested in the SessionHistoryListener bits... Is there a
> > docshell swap? How do we transfer the history?
>
> As for the history, I believe the browsers point into the session history
> cache via the `permanentKey` property, and that's one of the things between
> <browser> elements in these code paths[6].

...one of the things _swapped_ between <browser> elements...
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #39)
> As a bystander that has looked at some of this recently, I thought I would
> try to throw in what I know at least... 

Thanks Ryan, this was extremely helpful!
Flags: needinfo?(mconley)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #38)
> 
> Mike, do you know how does this work in concept? I mean transferring a tab
> to a window? I'm interested in the SessionHistoryListener bits... Is there a
> docshell swap? How do we transfer the history?

Sorry to be late on this. :/ I was trying to remember how this all worked, but the Browser Toolbox was giving me hell yesterday, so it was slow going.

jryans pretty much knocked this out of the park with his answer. The one piece that's missing is that when we swap the browsers, we can't swap docshells (which is what we do in single-process mode). Instead, we swap frameloaders - see:

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.cpp#1147

which calls:

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.cpp#870

Swapping frameloaders essentially moves the about:blank browser that was in the new window to the original tab, and then tries to close that tab as soon as possible.

Thankfully, this swapping of frameloaders allows us to avoid having to create a new <browser> and send down state to it.
Hey gabor,

I looked at your latest patch, and it looks fine. Thanks for the nice documentation there, too.

One thing I'm curious about though - it says,

> Normally it is empty at this point
> but in a test env. the initial about:blank might have a children in which
> case we fire off a history message here with about:blank in it.

In what cases does browser_tab_detach_restore.js open a new window to about:blank where the about:blank has some history?

I've also done some bc7 retriggers on your try push just in case. If, when you see this, those look okay, you should be good to land.
Flags: needinfo?(gkrizsanits)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #44)
> > Normally it is empty at this point
> > but in a test env. the initial about:blank might have a children in which
> > case we fire off a history message here with about:blank in it.
> 
> In what cases does browser_tab_detach_restore.js open a new window to
> about:blank where the about:blank has some history?

This line creates a window and moves the tab to it: http://hg.mozilla.org/mozilla-central/annotate/0a25833062a8/browser/base/content/test/general/browser_tab_detach_restore.js#l18

Every docshell initially I think starts form about:blank, but we don't count that usually:

http://hg.mozilla.org/mozilla-central/annotate/77cead2cd203/browser/components/sessionstore/SessionHistory.jsm#l91 (this is where the "might have children" idea came from, which I'm
not too sure that the real reason is... I wish I new this code better)

But to be honest it's beyond me why is the history.count = 1 here in the mochitest env unlike when I test this manually (then it's 0 as I would expect it to be). In both cases the uri is about:blank.

Anyway whatever is the reason for this function to report that this new docshell's history is not empty, I'm quite sure that we need this fix, so we don't confuse the messages during the frame loader swap.

> 
> I've also done some bc7 retriggers on your try push just in case. If, when
> you see this, those look okay, you should be good to land.

Thanks! It still looks green so I'll push it.
Flags: needinfo?(gkrizsanits)
https://hg.mozilla.org/mozilla-central/rev/35b65fe2ce5b
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 49
Actually I was going to land some more patches here just forgot to set the leave-open flag. Clearing needinfo from Tim, I think that needinfo is outdated.
Status: RESOLVED → REOPENED
Flags: needinfo?(ttaubert)
Resolution: FIXED → ---
Attached patch optimization on session storage messages. v1 (obsolete) (deleted) — Splinter Review
I've been thinking about this and I think we should do the simple things first, it might be enough. And do the heavyweight optimization later. This patch is collecting the individual changes on session store, and if there is no reason to send over the entire data set, we just send those. What do you think about this approach? 

The patch is a bit unpolished yet (sorry for the typos and unpolished coding style in advance), I just wanted to get an early response. I'm planning to do something similar for history (add new entry / moving back and forth / whatever else is simple and frequent)
Attachment #8750778 - Flags: feedback?(mconley)
Comment on attachment 8750778 [details] [diff] [review]
optimization on session storage messages. v1

Review of attachment 8750778 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/content/content-sessionStore.js
@@ +549,5 @@
> +      if (!this._changes[domain]) {
> +        this._changes[domain] = {};
> +      }
> +      this._changes[domain][key] = newValue;
> +      this._changesLength = this.estimateStorageSize(this._changes);

What is _changesLength used for?

@@ +551,5 @@
> +      }
> +      this._changes[domain][key] = newValue;
> +      this._changesLength = this.estimateStorageSize(this._changes);
> +
> +      let _this = this;

The arrow function means that its inner `this` refers to the `this` in the enclosing scope, so you don't need _this. Just use this. :)
Attachment #8750778 - Flags: feedback?(mconley) → feedback+
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #50)
> What is _changesLength used for?

I wanted to do some optimization with it, but I'm not sure if it's worth it so I removed that part, but apparently not entirely...

> The arrow function means that its inner `this` refers to the `this` in the
> enclosing scope, so you don't need _this. Just use this. :)

Whoops... I forgot, thanks!

Let's see if it works out on try with the other part of the patch together: https://treeherder.mozilla.org/#/jobs?repo=try&revision=751948cc8367
Blocks: 1272423
Updating the first patch for uplift approval to match the landed version.( transferring r=mconley on it from earlier)
Attachment #8746057 - Attachment is obsolete: true
Attachment #8752186 - Flags: review+
Comment on attachment 8752186 [details] [diff] [review]
clear the MessageQueue after send. v2

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: We send history/sessionstorage/etc. messages again and again, since the message queue is never cleared. This patch should reduce OOM issues for e10s.
[Describe test coverage new/current, TreeHerder]: Patch is on mc and the fix did not break any existing test.
[Risks and why]: I cannot see any risk.
[String/UUID change made/needed]: none
Attachment #8752186 - Flags: approval-mozilla-beta?
After try server is opened again I could push a new run for the part2 of the fix: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e564a5a95fd0
Comment on attachment 8752186 [details] [diff] [review]
clear the MessageQueue after send. v2

Helps reduce e10s OOM crashes, Beta47+
Attachment #8752186 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Hi Gabor, this might also need to be uplifted to Aurora 48. Right?
Flags: needinfo?(gkrizsanits)
(In reply to Ritu Kothari (:ritu) from comment #56)
> Hi Gabor, this might also need to be uplifted to Aurora 48. Right?

Yes, I should have set that flag as well, thanks.
Flags: needinfo?(gkrizsanits)
Comment on attachment 8752186 [details] [diff] [review]
clear the MessageQueue after send. v2

Approval Request Comment
Same as above for beta uplift.
Attachment #8752186 - Flags: approval-mozilla-aurora?
It's hard to tell if it's green or not on try with all these intermittents... I _think_ it's green, but sill poking at it...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e564a5a95fd0

It just seems impossible to retrigger some of the orange jobs :(

One thing that I'm a bit concerned that I don't check the limit of the session storage in this patch... not sure how to do that in a smart way or how important it is, might be something to do in a followup...
Attachment #8750778 - Attachment is obsolete: true
Attachment #8753324 - Flags: review?(mconley)
Comment on attachment 8753324 [details] [diff] [review]
optimization on session storage and history messages. v2

Review of attachment 8753324 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing review request until I can get a better explanation of what's going on with that MAX_SAFE_INTEGER stuff (if the reasoning is sound, which is likely, then this is probably a good opportunity to put in some inline documentation in the patch about it).

::: browser/components/sessionstore/content/content-sessionStore.js
@@ +258,5 @@
> +  // and send the entire history. We always send the additional info like the current selected
> +  // index (so for going back and forth between history entries we set the index to kLastIndex
> +  // if nothing else changed send an empty array and the additonal info like the selected index)
> +  collectFrom: function (idx) {
> +    if (this._fromIdx <= idx) {

I'm confused here - maybe you can help clear this up...

this._fromIdx starts out at MAX_SAFE_INTEGER, and this conditional gets entered (and returns) if anything comes in that's less than or equal to MAX_SAFE_INTEGER.

That's going to be true in the case of calls from OnHistoryGoBack, OnHistoryGoForward and OnHistoryGotoIndex - which pass in MAX_SAFE_INTEGER - 1 every time.

So what we're really waiting for the first time is the call to OnHistoryNewEntry. We're waiting for a call to OnHistoryNewEntry where oldIndex will be MAX_SAFE_INTEGER + 1 (or MAX_SAFE_INTEGER + 2, which according to [1] is equal to MAX_SAFE_INTEGER + 1 anyways. Lovely!)

Can you explain why we're waiting for such a large integer from OnHistoryGotoIndex? And once we use it, and set it to _fromIdx, don't the calls from OnHistoryGoBack, OnHistoryGoForward and OnHistoryGotoIndex continue to be ignored, since kLastIndex is going to be less than anything that was greater than MAX_SAFE_INTEGER?

Or have I totally misunderstood what's happening here?

[1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER

@@ +574,5 @@
> +    this._changes = undefined;
> +  },
> +
> +  collectFromEvent: function (event) {
> +    // TODO: we should take browser.sessionstore.dom_storage_limit into an account here.

Follow-up bug number in here?

::: docshell/shistory/nsISHistoryListener.idl
@@ +29,5 @@
>     * added to session history by docshell when new pages are loaded in a frame
>     * or content area, for example via nsIWebNavigation::loadURI()
>     *
>     * @param aNewURI     The URI of the document to be added to session history.
> +   * @param aLastIdx    The index of the current history item before the operation.

Should match what's in the definition, so either make this aOldIndex, or make aOldIndex aLastIndex.
Attachment #8753324 - Flags: review?(mconley)
(In reply to Mike Conley (:mconley) - (needinfo me!) from comment #60)
> this._fromIdx starts out at MAX_SAFE_INTEGER, and this conditional gets
> entered (and returns) if anything comes in that's less than or equal to
> MAX_SAFE_INTEGER.
> 
> That's going to be true in the case of calls from OnHistoryGoBack,
> OnHistoryGoForward and OnHistoryGotoIndex - which pass in MAX_SAFE_INTEGER -
> 1 every time.

So far so good. One thing to note here is what I mention in the comment too, that
even though in this case the array we send is empty, we do update the currently
selected index at the time we actually send the message to the parent. So
the message we send is like: {entries: [], currentIndex: x, someOtherProperties: ...}
And no matter how many change happened in the index it's enough to update it once.

> 
> So what we're really waiting for the first time is the call to
> OnHistoryNewEntry. We're waiting for a call to OnHistoryNewEntry where
> oldIndex will be MAX_SAFE_INTEGER + 1 (or MAX_SAFE_INTEGER + 2, which
> according to [1] is equal to MAX_SAFE_INTEGER + 1 anyways. Lovely!)

I think this is where you got confused. When OnHistoryNewEntry is called
it will get the index of the previously selected history element according to
nsSHistory. It has nothing to do with the _fromIdx that we set to MAX_SAFE_INTEGER.
It will always be smaller than that (from 0 to 50 or something in practice). And
we are not waiting for a big integer here we are waiting for a small one.

_fromIndex that we change in JS is not an index that is currently selected or has been
selected. It's value means that any history element with smaller index has not changed.
History elements with bigger index might have changed, so let's send them all from this
index.

> 
> Can you explain why we're waiting for such a large integer from
> OnHistoryGotoIndex?

So we are not.

> And once we use it, and set it to _fromIdx, don't the
> calls from OnHistoryGoBack, OnHistoryGoForward and OnHistoryGotoIndex
> continue to be ignored, since kLastIndex is going to be less than anything
> that was greater than MAX_SAFE_INTEGER?

I think I partially explained this already, but wanted to be sure we're on the
same page here. OnHistoryGoSomething messages are ignored if a new entry was added
because that message will send the current index anyway.

> 
> Or have I totally misunderstood what's happening here?

Does this make sense? I'm going to add some extra explanation if it is clear now, but let's talk
first if is not.

> 
> [1]:
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/
> Global_Objects/Number/MAX_SAFE_INTEGER

I could use an arbitrarily big integer instead... all I care is that it's bigger than the maximum number of history elements we allow. I could just use some flags alternatively, this just felt more natural to me. But because of my background in math what is natural to me is often just feels messy / confusing to many programmers... sorry if that's the case :)

> > +    // TODO: we should take browser.sessionstore.dom_storage_limit into an account here.
> 
> Follow-up bug number in here?

Alright I'll file a follow up.

> 
> ::: docshell/shistory/nsISHistoryListener.idl
> @@ +29,5 @@
> >     * added to session history by docshell when new pages are loaded in a frame
> >     * or content area, for example via nsIWebNavigation::loadURI()
> >     *
> >     * @param aNewURI     The URI of the document to be added to session history.
> > +   * @param aLastIdx    The index of the current history item before the operation.
> 
> Should match what's in the definition, so either make this aOldIndex, or
> make aOldIndex aLastIndex.

Thanks, seems like I forgot to update this one.
Flags: needinfo?(mconley)
Ritu, is there a chance to speed up the approval process for the first part here, or should I land it on aurora and beta? I would really like to have this in the tree.
Flags: needinfo?(rkothari)
Comment on attachment 8753324 [details] [diff] [review]
optimization on session storage and history messages. v2

Review of attachment 8753324 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/TabStateCache.jsm
@@ +60,5 @@
>      return this._data.get(browserOrTab.permanentKey);
>    },
>  
> +
> +  updatePartialStorageChange: function (data, change) {

Docstring for this and updatePartialHistoryChange please

@@ +95,5 @@
> +    for (let key of Object.keys(change)) {
> +      if (key == "entries") {
> +        if (change.fromIdx != kLastIndex) {
> +          history.entries.splice(change.fromIdx + 1);
> +          while (change.entries.length) {

I think you can just Array.prototype.concat these, no?

::: browser/components/sessionstore/content/content-sessionStore.js
@@ +258,5 @@
> +  // and send the entire history. We always send the additional info like the current selected
> +  // index (so for going back and forth between history entries we set the index to kLastIndex
> +  // if nothing else changed send an empty array and the additonal info like the selected index)
> +  collectFrom: function (idx) {
> +    if (this._fromIdx <= idx) {

Okay, I understand my confusion. I was, for some reason, interpreting this as:

if (idx <= this._fromIdx) {

I think it's because I was internally understanding that this._fromIdx is large (maybe even max large), and that my brain found it unusual that we were checking if that was less than something.

I wonder if it'd make more sense to do:

if (idx >= this._fromIdx) {

instead, with a comment about what we're protecting from there. Meh. Or maybe I just need to read more carefully. :)
Attachment #8753324 - Flags: review+
Flags: needinfo?(mconley)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #62)
> Ritu, is there a chance to speed up the approval process for the first part
> here, or should I land it on aurora and beta? I would really like to have
> this in the tree.

Sure. Lemme approve the uplift to Aurora48 now.
Flags: needinfo?(rkothari)
Comment on attachment 8752186 [details] [diff] [review]
clear the MessageQueue after send. v2

Helps improve e10s OOM situation, Aurora48+
Attachment #8752186 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Uploading the updated/landed patch for posterity.

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: This patch is reducing the amount and size of messages we send for the most common cases for session storage and history. My hope is that this patch will make a significant difference.
[Describe test coverage new/current, TreeHerder]: Seems to stuck on MC, I would wait a day or two just in case but all the existing tests pass.
[Risks and why]: The patch is an optimization on session storage and history, so it comes with some risk. If something is not done right in the patch, until the next full update message the session history or session storage can break. I do not think that is the case as we would likely see test failures, but it's a risk that should be taken into consideration (I cannot guarantee full test coverage for every use case, but the level of coverage seems decent to me).

For the beta uplift I realize that it's late in the cycle, and I expect a rejection, but I want to point out that this patch would only effect the e10s experiment, and has no effect on the actual release (if e10s is off this code does not run). On the other hand I would love to see if this patch helps the OOM crashes on beta.
[String/UUID change made/needed]: none
Attachment #8753324 - Attachment is obsolete: true
Attachment #8756371 - Flags: review+
Attachment #8756371 - Flags: approval-mozilla-beta?
Attachment #8756371 - Flags: approval-mozilla-aurora?
Comment on attachment 8756371 [details] [diff] [review]
optimization on session storage and history messages. v3

At this point, I would like to only accept uplifts for critical recent regressions, severe stability, security, perf, mlk issues only. This is to minimize code churn and to ensure we ship a high-quality Fx47 in 2 weeks.

Aurora48+, Beta47-
Attachment #8756371 - Flags: approval-mozilla-beta?
Attachment #8756371 - Flags: approval-mozilla-beta-
Attachment #8756371 - Flags: approval-mozilla-aurora?
Attachment #8756371 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/8e697c3542ff
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Depends on: 1480951
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: