Closed
Bug 743112
Opened 13 years ago
Closed 12 years ago
JS GC: Incremental deferred release
Categories
(Core :: XPConnect, defect)
Core
XPConnect
Tracking
()
RESOLVED
FIXED
mozilla17
People
(Reporter: billm, Assigned: billm)
References
Details
(Whiteboard: [Snappy])
Attachments
(2 files, 3 obsolete files)
(deleted),
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smaug
:
review+
RyanVM
:
feedback+
|
Details | Diff | Splinter Review |
I've been looking at my own garbage collector time logs as well as some other people's. I've noticed that, on rare occasions, the XPConnect deferred release code takes a really long time. As an example, I'm looking at a GC that took 1841ms, with the deferred release phase taking 1709ms.
I don't know of any reason why we couldn't do deferred release in little slices. Could we inject a task into the event loop that would do deferred release for, say, 10ms. If it had more work to do after that, it could schedule another task.
Olli, does this seem feasible?
Comment 1•13 years ago
|
||
Yes.
Assignee | ||
Comment 2•13 years ago
|
||
I did some light testing and this seems to work okay.
Attachment #612945 -
Flags: review?(bugs)
Comment on attachment 612945 [details] [diff] [review]
patch
>+ public:
>+ IncrementalReleaseRunnable(nsTArray<nsISupports *> &items) : items(items) {}
FYI you can use SwapElements() here to avoid copying the array.
Assignee | ||
Comment 4•13 years ago
|
||
(In reply to ben turner [:bent] from comment #3)
> FYI you can use SwapElements() here to avoid copying the array.
The class stores a reference, so I don't think it's going to copy. Although now that you mention it, this could be a problem. It's possible that one of these events could be outstanding after the XPCJSRuntime has been destroyed. I'm also realizing now that we could maybe get shutdown leaks if we don't release right away during shutdown.
I'll post a new patch in a bit.
Comment 5•13 years ago
|
||
Comment on attachment 612945 [details] [diff] [review]
patch
>+NS_IMETHODIMP
>+IncrementalReleaseRunnable::Run()
>+{
>+ MOZ_ASSERT(NS_IsMainThread());
>+
>+ PRTime started = PR_Now();
>+ int counter = 0;
>+ while (1) {
>+ PRUint32 count = items.Length();
>+ if (!count)
>+ break;
>+
>+ nsISupports *wrapper = items[count-1];
space before and after -
>+ items.RemoveElementAt(count-1);
Same here, though, I wonder if RemoveElementAt shows up in the profiles.
This ends up doing lots of memmoves. Could you replace the existing value with null, and
remove the whole range of null elements at once.
>+ NS_RELEASE(wrapper);
>+
>+ /* We don't want to call PR_Now() too often. */
>+ counter++;
>+ if (counter == 20) {
20 is perhaps a bit low.
>+ counter = 0;
>+ if (PR_Now() - started >= SliceTime)
Perhaps you could use TimeStamp() for this.
Clearing the review request since I assume a new patch is coming.
Attachment #612945 -
Flags: review?(bugs)
(In reply to Olli Pettay [:smaug] from comment #5)
> >+ items.RemoveElementAt(count-1);
> Same here, though, I wonder if RemoveElementAt shows up in the profiles.
> This ends up doing lots of memmoves. Could you replace the existing value
> with null, and
> remove the whole range of null elements at once.
He's removing from the end, so there shouldn't be any memmove going on here.
Comment 7•13 years ago
|
||
Oops, I missed that.
Assignee | ||
Comment 8•13 years ago
|
||
In this version, the runnable has its own list of stuff to release. It uses SwapElements as Ben suggested. And I think it handles shutdown properly. I looked at the code for NS_DispatchToMainThread. It correctly returns an error if the event loop has stopped running. However, it prints some sort of warning in debug builds. Oh well.
Attachment #612945 -
Attachment is obsolete: true
Attachment #613001 -
Flags: review?(bugs)
Comment 9•13 years ago
|
||
Comment on attachment 613001 [details] [diff] [review]
patch v2
>+NS_IMETHODIMP
>+IncrementalReleaseRunnable::Run()
>+{
>+ MOZ_ASSERT(NS_IsMainThread());
>+
>+ TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+ TimeStamp started = TimeStamp::Now();
>+ int counter = 0;
PRUint32
Attachment #613001 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 10•12 years ago
|
||
This patch adds a field to the runtime saying whether the most recent GC was incremental or not. It exposes the field via jsfriendapi.h.
Attachment #613001 -
Attachment is obsolete: true
Attachment #646237 -
Flags: review?(jcoppeard)
Assignee | ||
Comment 11•12 years ago
|
||
The previous patch didn't work because we sometimes GC synchronously and expect certain objects to die right away. This patch fixes the problem by doing the incremental release only when the GC was incremental. It passes tryserver.
Olli, the last time we talked about this, you wanted me to addref nsLayoutStatics, I think. I think that the incremental GC check should take care of your concerns, since we never do incremental GC during shutdown.
Attachment #646238 -
Flags: review?(bugs)
Comment 12•12 years ago
|
||
Comment on attachment 646238 [details] [diff] [review]
patch v3
>+class IncrementalReleaseRunnable : public nsRunnable
>+{
>+ nsTArray<nsISupports *> items;
Extra space after nsISupports.
s/items/mItems/
>+
>+ static const PRTime SliceMillis = 10; /* ms */
>+
>+ public:
>+ IncrementalReleaseRunnable(nsTArray<nsISupports *> &items);
>+
>+ NS_DECL_NSIRUNNABLE
>+};
>+
>+IncrementalReleaseRunnable::IncrementalReleaseRunnable(nsTArray<nsISupports *> &items)
ditto, and & goes with type, not with param name. aItem.
(though, xpconnect coding style is odd.)
>+IncrementalReleaseRunnable::Run()
>+{
>+ MOZ_ASSERT(NS_IsMainThread());
>+
>+ TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+ TimeStamp started = TimeStamp::Now();
>+ PRUint32 counter = 0;
>+ while (1) {
while (true) {
>+
>+ items.Compact();
Do we need this? I think not.
>+
>+ if (items.Length()) {
>+ nsresult rv = NS_DispatchToMainThread(this);
>+ if (NS_FAILED(rv))
>+ Run();
>+ }
>+
>+ return NS_OK;
>+}
>+
>+static void
>+ReleaseIncrementally(nsTArray<nsISupports *> &array)
nsTArray<nsISupports*>& aArray
Because ::Run may dispatch the runnable again, I think
IncrementalReleaseRunnable should addref/release
layoutstatics.
It is really just
nsLayoutStatics::AddRef(); in ctor and
nsLayoutStatics::Release(); in dtor
With those, r=me
Attachment #646238 -
Flags: review?(bugs) → review+
Comment 13•12 years ago
|
||
> (though, xpconnect coding style is odd.)
In theory, XPConnect is supposed to be JS style, so I don't think Bill needs to do the aArg stuff. ;)
Updated•12 years ago
|
Whiteboard: [Snappy]
Comment 14•12 years ago
|
||
/me argues that all js/* should move to use saner Gecko coding style :)
Comment 15•12 years ago
|
||
Comment on attachment 646237 [details] [diff] [review]
JS engine changes
Review of attachment 646237 [details] [diff] [review]:
-----------------------------------------------------------------
Look fine to me.
Attachment #646237 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 16•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5b21cffeebf
I didn't make the syntactic changes to the xpconnect code, since it follows JS style, as Andrew said. I fixed the nsLayoutStatics refcounting.
Comment 17•12 years ago
|
||
Unfortunately after RyanVM and myself spent a good 8 hours of our weekend playing avoid the coalescing with retriggers, intermittent but pretty severe mochitest-other leaks were attributed to this landing.
Examples:
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941428&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941444&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941458&tree=Mozilla-Inbound
These leaks happen predominantly on WinXP debug & Linux {32,64} debug, and only around ~20-30% of the time.
In addition, there were a few other unexplained failures that occurred on this push, in the course of the final stage of (bisecting) retriggers:
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941312&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941290&tree=Mozilla-Inbound
Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29bff59d3bbe
Before this lands again, please can we have a green try run (with URL pasted in-bug so sheriffs can use it for elimination purposes), with at least 10 retriggers of mochitest-other on each of WinXP debug, Linux32 debug and Linux 64 debug.
Out of curiosity, did this get sent to try before landing? (I can't see a try url in-bug)
Comment 18•12 years ago
|
||
Some more instances of the leak:
https://tbpl.mozilla.org/php/getParsedLog.php?id=13940006&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13938924&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13939362&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13940258&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13935653&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13938206&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13937756&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13930112&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13931939&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13932186&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13932761&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13932761&tree=Mozilla-Inbound
Assignee | ||
Comment 19•12 years ago
|
||
I'm sorry this happened, Ed. I did push the patch to try and it was as green as things ever are nowadays. There was one leak reported, but it was only three windows, which I thought we normally ignore.
https://tbpl.mozilla.org/?tree=Try&rev=fb75c988363e
The link no longer shows up, though. I don't know why.
If it helps at all, please consider that I'll be spending the next several days toiling away in frustration. These leak reports are almost certainly spurious, but tracking down why they happen is going to take forever.
Comment 20•12 years ago
|
||
You could try running with the patches in bug 728294, they supposedly reduce false positives a bit.
Comment 21•12 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #19)
> I'm sorry this happened, Ed. I did push the patch to try and it was as green
> as things ever are nowadays. There was one leak reported, but it was only
> three windows, which I thought we normally ignore.
You mean 3 nsGlobalWindow objects?
There should be no nsGlobalWindow object leaks ever.
Comment 22•12 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #19)
> I'm sorry this happened, Ed. I did push the patch to try and it was as green
> as things ever are nowadays. There was one leak reported, but it was only
> three windows, which I thought we normally ignore.
> https://tbpl.mozilla.org/?tree=Try&rev=fb75c988363e
> The link no longer shows up, though. I don't know why.
No worries, these things happen :-) Thank you for using try; the 3 DOMWindows leak is a common intermittent, in retrospect given the low rate of failure, I don't think there was much else you could have done here. (The try repo was reset unexpectedly recently, which is why the try link won;t work; bug 778062).
Not envious of the debugging - good luck!
Comment 23•12 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #22)
> the 3
> DOMWindows leak is a common intermittent,
It is? Do we have a bug # for it? It really should be fixed.
Assignee | ||
Comment 24•12 years ago
|
||
(In reply to Olli Pettay [:smaug] (limited connectivity July 26-29) from comment #21)
> You mean 3 nsGlobalWindow objects?
> There should be no nsGlobalWindow object leaks ever.
The leaks being discussed are not shutdown leaks. There's a different leak detector that uses the ++DOMWINDOW and --DOMWINDOW output to try to detect leaks before shutdown. It frequently reports false positives and seems pretty broken.
Comment 25•12 years ago
|
||
(In reply to Olli Pettay [:smaug] (limited connectivity July 26-29) from comment #23)
> It is? Do we have a bug # for it? It really should be fixed.
We do, sadly it's a pick three from:
https://bugzilla.mozilla.org/showdependencytree.cgi?id=754804&hide_resolved=1
Bug 728294 should cut out any false positives; how many remain after that is yet to be seen.
Comment 26•12 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #24)
>
> The leaks being discussed are not shutdown leaks.
Oh, nm then.
Assignee | ||
Comment 27•12 years ago
|
||
This seems to fix the leak. Here's the try link:
https://tbpl.mozilla.org/?tree=Try&rev=7206a5fca258
We never leak more than 4 windows in the tests, which I think is normal.
To look for leaks, we force a GC&CC at some point in the tests. This GC is non-incremental, so objects are released synchronously. However, there may have been an incremental GC that happened right before it that released objects asynchronously. In that case, there may still be some objects to be released even after the non-incremental GC finishes, and that's why we were reporting leaks.
This patch keeps a pointer to the outstanding IncrementalReleaseRunnable and ensures that it never lives beyond the next GC. This fixes the problem above, and it also ensures that we don't get into situations where the number of objects to be released incrementally grows without bound.
Attachment #646238 -
Attachment is obsolete: true
Attachment #648492 -
Flags: review?(bugs)
Comment 28•12 years ago
|
||
Comment on attachment 648492 [details] [diff] [review]
patch v4
>+XPCIncrementalReleaseRunnable::~XPCIncrementalReleaseRunnable()
>+{
>+ nsLayoutStatics::Release();
>+}
Could you add assertion here that 'this' != runtime->mReleaseRunnable
>+
>+void
>+XPCIncrementalReleaseRunnable::ReleaseNow(bool limited)
>+{
>+ MOZ_ASSERT(NS_IsMainThread());
>+
>+ TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+ TimeStamp started = TimeStamp::Now();
>+ PRUint32 counter = 0;
>+ while (1) {
>+ PRUint32 count = items.Length();
>+ if (!count)
>+ break;
>+
>+ nsISupports *wrapper = items[count - 1];
>+ items.RemoveElementAt(count - 1);
>+ NS_RELEASE(wrapper);
>+
>+ /* We don't want to read the clock too often. */
>+ counter++;
I would put 'if (limited) {' before counter and remove limited below
>+ if (counter == 100) {
>+ counter = 0;
>+ if (limited && TimeStamp::Now() - started >= sliceTime)
>+ break;
>+ }
>+ }
case JSGC_BEGIN:
{
+ if (self->mReleaseRunnable)
+ self->mReleaseRunnable->ReleaseNow(false);
+
Add a comment why this is needed, and why it is safe to call ReleaseNow()
Attachment #648492 -
Flags: review?(bugs) → review+
Comment 29•12 years ago
|
||
Comment on attachment 648492 [details] [diff] [review]
patch v4
4 leaks is normal, yes. LGTM!
Attachment #648492 -
Flags: feedback+
Assignee | ||
Comment 30•12 years ago
|
||
Comment 31•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Updated•12 years ago
|
Summary: Deferred release during JS GC should happen... more later → JS GC: Incremental deferred release
You need to log in
before you can comment on or make changes to this bug.
Description
•