723892 - investigate rooting issues with RegExpShared

Reporter

Description

•

13 years ago

Currently RegExpPrivate uses refcounting to keep itself alive when it is shared between multiple objects and when it is decoupled from any JS objects like in the case of a GC during regexp searching. However in the last case I do not see what exactly keeps RegExpPrivate::source alive. In particular, AFAICS there is no code that traces into RegExpPrivate::source.

Igor Bukanov

Reporter

Updated

•

13 years ago

Summary: investigate rooting issues withs RegExpPrivate → investigate rooting issues with RegExpPrivate

Luke Wagner [:luke]

Comment 1

•

13 years ago

btw: s/RegExpPrivate/RegExpShared/ If the RegExpShared-alive-without-RegExpObject case only shows up while the regexp is being executed (is that true?), it seems like an AutoStringRooter in RegExpMatcher would do the trick. Chris: given that RegExpShared can be shared between compartments and JSLinearString can be !isAtom, this seems like a cross-compartment hazard in waiting. With this in mind, the obvious fix seems to be to atomize the pattern string.

Igor Bukanov

Reporter

Comment 2

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #1) > Chris: given that RegExpShared can be shared between compartments and > JSLinearString can be !isAtom, this seems like a cross-compartment hazard in > waiting. With this in mind, the obvious fix seems to be to atomize the > pattern string. Atoms can be collected just as well. We really need either to root the string or replace the string with malloced memory.

Luke Wagner [:luke]

Comment 3

•

13 years ago

Actually, it seems like RegExpShared doesn't even need to keep 'source' at all! In all but 1 case, 'source' is just transmitting a 'source' argument present in an outer call. The exception is that, when a regexp is finalized, we remove it from the cache. We could flip this around and say that the regexp cache (which is purged on every gc) holds a strong ref. IIUC, in the cases where we create a RegExpShared without a RegExpObject, we immediately remove it from the cache when the stack guard is popped, so this would actually allow those RegExpShared's to be reused.

Luke Wagner [:luke]

Comment 4

•

13 years ago

(In reply to Igor Bukanov from comment #2) > Atoms can be collected just as well. We really need either to root the > string or replace the string with malloced memory. I know, the atom comment was "in addition" to the rooting comment. (Except that both are, if I haven't missed a bit, nullified by comment 3.)

Igor Bukanov

Reporter

Comment 5

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #3) > We could flip this around and say > that the regexp cache (which is purged on every gc) holds a strong ref. That is a nice idea and eliminates the need to store the keys in cache values.

Chris Leary [:cdleary] (not checking bugmail)

Comment 6

•

13 years ago

I think this is okay (but probably gross enough that it should be addressed), but please check my reasoning. Since all RegExpObjects get purged of their RegExpShared on every GC and the RegExpShared cache is deleted on every GC, the source field of the RegExpShared that's not referenced by any RegExpObject (only held in a RegExpMatcher) will be otherwise unreachable by the program. The AutoRefCount will just keep it alive for long enough to let the RegExpCode code to finish its work without be deallocated, which makes no use of the source field.

Luke Wagner [:luke]

Comment 7

•

13 years ago

I just learned at lunch that regexp objects drop their RegExpShared on every gc. That suggests a wholly simpler model: RegExpShared isn't ref-counted. There is a single linked list maintained by the runtime of all live RegExpShareds. The entire list gets deleted on every gc (comp or full) with the exception of RegExpShareds that are actively being used in a regexp operation at the time of gc. To know this, we (re-)add a ref count to RegExpShared counting the number of active uses of the regexp (so inc'd/dec'd in RegExpMatcher's ctor/dtor). (This also lets RegExpShared::source be removed.) Does that miss any corner cases?

Igor Bukanov

Reporter

Comment 8

•

13 years ago

(In reply to Chris Leary [:cdleary] from comment #6) > RegExpMatcher) will be otherwise unreachable by the program. The > AutoRefCount will just keep it alive for long enough to let the RegExpCode > code to finish its work without be deallocated, which makes no use of the > source field. The source field will be used in RegExpShared::decref to purge the regexp from the cache. As RegExpShared::decref also dereferences the source in source->isAtom, we do have a hazard here.

Luke Wagner [:luke]

Updated

•

13 years ago

Assignee: general → luke

Summary: investigate rooting issues with RegExpPrivate → investigate rooting issues with RegExpShared

Luke Wagner [:luke]

Comment 9

•

13 years ago

Duh, Bill pointed out obviously per-runtime causes dangling pointers during compartment GC from the non-GC-ed compartment. So of course the list and cache need to be per-compartment.

Igor Bukanov

Reporter

Comment 10

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #9) > Duh, Bill pointed out obviously per-runtime causes dangling pointers during > compartment GC from the non-GC-ed compartment. So of course the list and > cache need to be per-compartment. Hm, which pointers? With the source field removed the RegExpShared shall contain only allocated data. So what exactly makes it compartment-sensitive? Also, do we really need the list? Why we cannot simply enumerate the cache hashtable entries?

Igor Bukanov

Reporter

Comment 11

•

13 years ago

(In reply to Igor Bukanov from comment #10) > With the source field removed the RegExpShared shall contain only allocated data. I meant malloc-allocated data.

Chris Leary [:cdleary] (not checking bugmail)

Comment 12

•

13 years ago

(In reply to Igor Bukanov from comment #8) > The source field will be used in RegExpShared::decref to purge the regexp > from the cache. As RegExpShared::decref also dereferences the source in > source->isAtom, we do have a hazard here. Oh yeah, so for operations that can create more regexps inside in a nested fashion (like String.prototype.replace) we should have a hazard. Nice catch. I vote for the AutoStringRooter (because RegExpShareds are created lazily anyway, so it shouldn't add a lot of auto-rooting overhead). If we're in agreement on that I can make the patch.

Luke Wagner [:luke]

Comment 13

•

13 years ago

(In reply to Igor Bukanov from comment #10) Dangling pointers from RegExpObjects to RegExpShareds. If every single RegExpShared in a compartment is in the hashtable then yes, the list would be superfluous. Right now that isn't the case, but it could be. Then the hashmap wouldn't so really be a "cache" at all but an owner of RegExpShareds. I like it.

Luke Wagner [:luke]

Comment 14

•

13 years ago

(In reply to Chris Leary [:cdleary] from comment #12) If you don't mind, I'd like to do what we were just talking about in the above comments. So far the patch is allowing a lot of simplifications to the code.

Igor Bukanov

Reporter

Comment 15

•

13 years ago

For branches (if this could not wait FF 13) we may need a simple solution. Always atomizing the source in RegExpShared could be that solution. This will avoid dereferencing the atom in decref.

Igor Bukanov

Reporter

Comment 16

•

13 years ago

The complication with the global cache that is wiped during each GC - this requires to traverse the heap of all compartments to find all references to RegExpShared to reset them to zero. So we either should use per-compartment cache or continue to ref count RegExpShared to avoid wiping RegExpShared pointed from other compartments. So the original idea from the comment 3 looks like the best option.

Luke Wagner [:luke]

Comment 17

•

13 years ago

(In reply to Igor Bukanov from comment #16) > So we either should use per-compartment cache Yes, that is what comment 9 said. (In reply to Igor Bukanov from comment #15) > For branches (if this could not wait FF 13) we may need a simple solution. The simplest solution is coment 1: root the source in the matcher. Now that I think about it, Chris, if you wanted to do that early, I can just file a non-ss followup with simplification patch.

Assignee: luke → general

Chris Leary [:cdleary] (not checking bugmail)

Comment 18

•

13 years ago

I suppose that fix would actually require me to create a new AutoRooter that hangs off of the runtime instead of a context (in order to be scoped to an appropriate lifetime). Instead, as a quick fix, I can keep a cache generation counter to avoid the problem mentioned in comment 8. If the cache generation isn't equal to the one where the regexp shared was created, we don't attempt a cache removal. Any objections to using that approach as a stopgap instead?

Igor Bukanov

Reporter

Comment 19

•

13 years ago

(In reply to Chris Leary [:cdleary] from comment #18) > I suppose that fix would actually require me to create a new AutoRooter that > hangs off of the runtime instead of a context (in order to be scoped to an > appropriate lifetime). Instead, as a quick fix, I can keep a cache > generation counter to avoid the problem mentioned in comment 8. You mean a counter that is increased each time we prune the cache during the GC? That should work I suppose. Still, why cannot we simply always atomize the source string?

Luke Wagner [:luke]

Comment 20

•

13 years ago

If a regexp shared's lifetime can only be extended past GC by regexp execution, why is a stack guard not appropriately scoped?

Luke Wagner [:luke]

Comment 21

•

13 years ago

The patch in bug 724748 should fix the issue on trunk (it removes RegExpShared::source and does the per-compartment ownership).

Daniel Veditz [:dveditz]

Comment 22

•

13 years ago

(In reply to Chris Leary [:cdleary] from comment #18) > Any objections to using that approach as a stopgap instead? Any hints on how our fuzzers might be able to flush out any actual vulnerabilities in this code? If we don't know of a specific problem we could skip the stop-gap and just wait for Firefox 13 to fix the problem. Hm, does "currently" in comment 0 include Firefox 10 and 10-esr? If so maybe we do need to worry about a stop-gap.

David Mandelin [:dmandelin]

Comment 23

•

13 years ago

Luke, with bug 724748 fixed, is this also fixed?

Luke Wagner [:luke]

Comment 24

•

13 years ago

Yes it is on trunk and aurora.

David Mandelin [:dmandelin]

Updated

•

13 years ago

Group: core-security

Status: NEW → RESOLVED

Closed: 13 years ago

Depends on: 724748

Resolution: --- → WORKSFORME