Closed Bug 931864 Opened 11 years ago Closed 11 years ago

can we remove the activeGCInAtomsZone limitation in JS::CanCompileOffThread?

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox28 --- fixed

People

(Reporter: luke, Assigned: luke)

References

(Blocks 1 open bug)

Details

(Whiteboard: [games][qa-])

Attachments

(1 file, 2 obsolete files)

For sites to really be able to benefit from off-main-thread parsing, it would be good if it was very predictable when a script would be parsed off-main-thread.  The only problem I can see now is the !cx->runtime()->activeGCInAtomsZone() requirement in CanCompileOffThread.  In my testing, this happens not infrequently.  So, my question is: how hard would it be to remove this restriction?
One relatively easy option would be to put off the parallel parse until the GC is done.

Anything more than that is tricky. The parsing thread can point to atoms in the atoms zone, so we need to worry about marking that thread's stack and also about write barriers. Once we finish marking, it should be safe to run parallel parsing. Unfortunately, the atoms zone is usually the last one that we finish marking because it has pointers from all other zones.

If we're really serious about this, we could support marking multiple thread stacks in the GC. Brian has said he'd like to hold off on this until exact rooting lands, and that seems sensible. We could always revise if it's a major issue though. It would also be possible to make the write barriers be threadsafe. That would be a lot of work though, and we might have to switch to a different layout for mark bits.

Another way to address this would be to allow at least some atoms be zone-local (rather than always going in the atoms zone). I'm not entirely certain what requirements we have about equality between atoms in different zones. It would be interesting to break all these invariants and see what breaks. I've wanted to do that for a while.

Finally, we could just cancel the GC of the atoms compartment when we decide to do parallel parsing. I'd be worried about that though. We already skip the atoms compartment in a lot of cases, so there's a danger that it could get pretty big if you're parsing fairly often. Maybe I'm being overly conservative though.

I don't have time to try any of this though :-(. We can do the quick fix now and leave the rest for later, unless Brian has better ideas.
(In reply to Bill McCloskey (:billm) from comment #1)
> One relatively easy option would be to put off the parallel parse until the GC is done.

Is there a synchronous way to say "finish up the current igc" such that CanCompileOffThread to say

  if (cx->runtime()->activeGCInAtomsZone())
      FinishGC();

?  Otherwise, I guess we'd just return 'true', enqueue the ParseTask and add some extra logic to handleParseWorkload to only run ParseTasks when GC is complete?

> If we're really serious about this, we could support marking multiple thread
> stacks in the GC.

Would this require the worker-pause feature that bug 928050 is removing?  Speaking of, how does bug 928050 handle a GC that starts while workers are parse tasks are active?  Does it cancel GC'ing the atoms?
(In reply to Luke Wagner [:luke] from comment #2)
> (In reply to Bill McCloskey (:billm) from comment #1)
> > One relatively easy option would be to put off the parallel parse until the GC is done.
> 
> Is there a synchronous way to say "finish up the current igc" such that
> CanCompileOffThread to say
> 
>   if (cx->runtime()->activeGCInAtomsZone())
>       FinishGC();

Yes, this code does it:
        JS::PrepareForIncrementalGC(rt);
        JS::FinishIncrementalGC(rt, JS::gcreason::API);

> ?  Otherwise, I guess we'd just return 'true', enqueue the ParseTask and add
> some extra logic to handleParseWorkload to only run ParseTasks when GC is
> complete?

Yeah, that seems better to me since it doesn't block the main thread.

> > If we're really serious about this, we could support marking multiple thread
> > stacks in the GC.
> 
> Would this require the worker-pause feature that bug 928050 is removing? 

Yes, something like it. Although I think the state of the art is to ask each thread to scan its own stack and to resume the GC when they're all done.

> Speaking of, how does bug 928050 handle a GC that starts while workers are
> parse tasks are active?  Does it cancel GC'ing the atoms?

Yeah, the GC won't collect the atoms compartment if there's an ongoing parse task. I guess that's not fundamentally different than canceling the atoms compartment if the ordering is the other way.
Blocks: gecko-games
> > ?  Otherwise, I guess we'd just return 'true', enqueue the ParseTask and add
> > some extra logic to handleParseWorkload to only run ParseTasks when GC is
> > complete?> 
> Yeah, that seems better to me since it doesn't block the main thread.

Ok, this one sounds like the way to go.  No real downside, no main-thread pauses.  The patch is simple I think given the answers to a few questions:
 - what is the right lock to take to test cx->runtime->activeGCInAtomsZone from the parser thread (canStartParseTask)?
 - where is the right place to workerThreadState.notify(WorkerThreadState::PRODUCER) when the GC completes?  (In particular, it'd be important that we always hit this point (for all manner of GCs) and that, when we Notify(), activeGCInAtomsZone is already false)?
 - where is the condition tested that makes the GC not collect the atoms compartment if a parse task is active?

(If it's more work to answer than to just write the patch, feel free to take ;)
(In reply to Luke Wagner [:luke] from comment #4)
> Ok, this one sounds like the way to go.  No real downside, no main-thread
> pauses.  The patch is simple I think given the answers to a few questions:
>  - what is the right lock to take to test cx->runtime->activeGCInAtomsZone
> from the parser thread (canStartParseTask)?

I think it would be better to do everything on the main thread. We would have a separate queue of "deferred parse tasks". StartOffThreadParseScript would decide to place the new task in that queue based on activeGCInAtomsZone. Then when the GC is over, we could shift all these tasks over to the regular queue.

>  - where is the right place to
> workerThreadState.notify(WorkerThreadState::PRODUCER) when the GC completes?
> (In particular, it'd be important that we always hit this point (for all
> manner of GCs) and that, when we Notify(), activeGCInAtomsZone is already
> false)?

The easiest place is probably EndSweepPhase. I don't think we should hold any locks then. You could transfer the tasks at that point and notify.

>  - where is the condition tested that makes the GC not collect the atoms
> compartment if a parse task is active?

isGCScheduled() is called by the GC to decide what zones to collect. It calls canCollect(), which checks the atoms condition.
http://mxr.mozilla.org/mozilla-central/source/js/src/gc/Zone.h?force=1#184
Attached patch rm-atoms-zone-limitation (obsolete) (deleted) — Splinter Review
With all your advice, it was easy... almost too easy.
Assignee: nobody → luke
Status: NEW → ASSIGNED
Attachment #8336890 - Flags: review?(wmccloskey)
Blocks: 942276
Comment on attachment 8336890 [details] [diff] [review]
rm-atoms-zone-limitation

Review of attachment 8336890 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thanks!

::: js/src/jsworkers.cpp
@@ +271,5 @@
>          return false;
>  
>      workercx.forget();
>  
> +    if (!task->init(cx, options))

Why did you move this below the workercx.forget()? If it fails, we won't clean up workercx.

Maybe it's because the enterCompartment call makes it a bad idea to release workercx? That's fine with me. Personally I think it's fine to leak once in a while on OOM paths.
Attachment #8336890 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #7)
> Why did you move this below the workercx.forget()? If it fails, we won't
> clean up workercx.

This is to fix a possible double-free-on-OOM I happened to see: if the ParseTask construction succeeds, then the ParseTask takes ownership of workercx and will delete workercx in ~ParseTask.  That means if init() fails, both workercx and the ParseTask would try to free the ExclusiveContext.
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/c7c9fae33603

Of course, nothing can ever be easy, so the best two places to hit "Assertion failure: canCollect(), at ../../../js/src/gc/Zone.h:171" are 10.7 reftests (we intend to stop running tests on 10.7 by Tuesday and turn the slaves into 10.6 slaves) and Linux64 browser-chrome (which runs on the old Fedora slaves, so there are only 15 of the 31 remaining slaves actually working), but unless my retriggers take a turn for the worse, https://tbpl.mozilla.org/?tree=Mozilla-Inbound&tochange=5ecede5e0ceb&fromchange=9bd5d4cb820e is going to say this is what started it.
Bill: my theory is that, after we call EnqueuePendingParseTasksAfterGC, that another GC cycle happens and we collect the atoms compartment in this GC.  This would explain the
  Assertion failure: canCollect(), at ../../../js/src/gc/Zone.h:171
assertion failure.  Is there a more appropriate place to call EnqueuePendingParseTasksAfterGC?

To wit, I tried to add a JS_ASSERT(!cx->runtime()->activeGCInAtomsZone()) in EnqueuePendingParseTasksAfterGC, but this trivially fails because the GC isn't yet marked as complete.
philor: Looking at https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=5ecede5e0ceb again, it looks like the only failures that were not starred to some other bug were the 2 assertions involving canCollect().  Does that mean that you think that only those two failures should be attributed to this patch and the others were random orange?
Flags: needinfo?(philringnalda)
So I tried my theory in comment 14 by moving the EqueuePendingParseTasksAfterGC out of GCCycle:
  https://tbpl.mozilla.org/?tree=Try&rev=4939850dd069
but it hits the same assertion.

I think I can see why this hits, though: we are in the middle of an igc, we start an off-thread parse including the ExclusiveContext which immediately makes atomsZone->canCollect() false.  Now if the GC needs a cycle, setGCState() will assert in BeginMarkPhase.  Does that sound plausible Bill?
Pretty as a picture, ain't it? Yeah, only the assertions were yours.
Flags: needinfo?(philringnalda)
Attached patch rm-atoms-zone-limitation v.2 (obsolete) (deleted) — Splinter Review
This contains the fixes to the two points I brought up above.  The main thing I'm unsure about is putting EnqueuePendingParseTasksAfterGC where I did.

Try's down, so I can't try this yet, but I'll post when I do.
Attachment #8336890 - Attachment is obsolete: true
Attachment #8338002 - Flags: review?(wmccloskey)
Comment on attachment 8338002 [details] [diff] [review]
rm-atoms-zone-limitation v.2

Review of attachment 8338002 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. Sorry I missed this.

::: js/src/jsgc.cpp
@@ +4112,5 @@
>  
>      FinishMarkingValidation(rt);
>  
>      rt->gcLastGCTime = PRMJ_Now();
> +

Remove.

@@ +4757,5 @@
>  
>      rt->gcShouldCleanUpEverything = ShouldCleanUpEverything(rt, reason, gckind);
>  
> +    {
> +        gcstats::AutoGCSlice agc(rt->gcStats, collectedCount, zoneCount, compartmentCount, reason);

Is it necessary that EnqueuePendingParseTasksAfterGC happen outside the scope of this? This class is to account for time spent in the GC, and I'd rather not leave anything out.

::: js/src/jsworkers.cpp
@@ +290,5 @@
> +    } else {
> +        // Delay making the runtime as used by an exclusive-context until the
> +        // context is actually enqueued.
> +        cx->runtime()->setUsedByExclusiveThread(global->zone());
> +        task->cx->enterCompartment(global->compartment());

Could we add a method ParseTask::active that would do these two actions? Then we could call it from both places.
Attachment #8338002 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #19)
> Is it necessary that EnqueuePendingParseTasksAfterGC happen outside the
> scope of this? This class is to account for time spent in the GC, and I'd
> rather not leave anything out.

Oh, hah, there are two AutoGCSlice's, and I was reading the one in jsgc.cpp.  You're right.

> Could we add a method ParseTask::active that would do these two actions?
> Then we could call it from both places.

Yes, good idea.
Oops, I meant activate, not active.
(In reply to Bill McCloskey (:billm) from comment #21)
> Oops, I meant activate, not active.

Yeah, I figured.  Pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=472a07113899
Well, unfortunately now there are two instances of:
Assertion failure: backgroundFinalizeState[thingKind] == BFS_DONE, at ../../../js/src/jsgc.cpp:1371

https://tbpl.mozilla.org/php/getParsedLog.php?id=31063883&tree=Try&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=31068199&tree=Try&full=1

Any idea of what this means Bill?
Thinking about this, the assertion is probably too tight. I think it's valid for the state to be BFS_JUST_FINISHED as well. This would happen if the previous background sweep completely freed all the arenas and then no more were allocated before the next GC. I'm kind of surprised that we haven't hit this before, if it is something that can happen.

Anyway, maybe push the change to try and see if the problem gets fixed?
No, I'm wrong. This code here should prevent the problem I mentioned:
  http://mxr.mozilla.org/mozilla-central/source/js/src/jsgc.cpp#1418

So I'm not sure why this assertion would trigger. It would be useful to know the value of backgroundFinalizeState[thingKind].
Let's see.  Hopefully printf shows up in the logs..
  https://tbpl.mozilla.org/?tree=Try&rev=7d0cc733701a
D'oh, bhackett pointed out I need fprintf(stderr):
  https://tbpl.mozilla.org/?tree=Try&rev=340cf20c4812
Depends on: 939643
These pushes seem to expose a timing-dependent test bug that is fortunately fixed in bug 939643.  Interestingly, no BFS_DONE asserts in those pushes.  Trying again:
  https://tbpl.mozilla.org/?tree=Try&rev=f5cab3df33b9
I was able to catch this under a debugger.  The thingKind is OBJECT0_BACKGROUND and the state is BFS_JUST_FINISHED.  I also had some logging and, right before this assert, a Zone was enqueued in parseWaitingOnGC, but it's not the Zone containing the backgroundFinalizeState we're asserting on.  The faulting zone however is a Zone from a previous off-thread compilation that had completed and is scheduledForDestruction.
Attached patch rm-atoms-zone-limitation v.3 (deleted) — Splinter Review
Bill pointed out that we need do the normalization of backgroundFinalizeState on both ArenaLists now that an exclusive context's zone can be GC'd before the ParseTask starts executing.

https://tbpl.mozilla.org/?tree=Try&rev=5e7f6477b3f4
Attachment #8339590 - Flags: review?(wmccloskey)
Attachment #8338002 - Attachment is obsolete: true
Woohoo, looks good.
Attachment #8339590 - Flags: review?(wmccloskey) → review+
Unfortunately, it looks like this patch causes a mochiteest to time out a lot more frequently.  There is already a bug on this (bug 942473), so I'm assuming this is a pre-existing race condition.  I'll see if I can get someone to look at it or if I can disable it.
Depends on: 942473
Landed with the problem test disabled, with permission in bug 942473:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1d9356c6c769
https://hg.mozilla.org/mozilla-central/rev/1d9356c6c769
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Depends on: 961318
Whiteboard: [games] → [games][qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: