Closed Bug 723892 Opened 13 years ago Closed 13 years ago

investigate rooting issues with RegExpShared

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: igor, Unassigned)

References

Details

Currently RegExpPrivate uses refcounting to keep itself alive when it is shared between multiple objects and when it is decoupled from any JS objects like in the case of a GC during regexp searching. However in the last case I do not see what exactly keeps RegExpPrivate::source alive. In particular, AFAICS there is no code that traces into RegExpPrivate::source.
Summary: investigate rooting issues withs RegExpPrivate → investigate rooting issues with RegExpPrivate
btw: s/RegExpPrivate/RegExpShared/ If the RegExpShared-alive-without-RegExpObject case only shows up while the regexp is being executed (is that true?), it seems like an AutoStringRooter in RegExpMatcher would do the trick. Chris: given that RegExpShared can be shared between compartments and JSLinearString can be !isAtom, this seems like a cross-compartment hazard in waiting. With this in mind, the obvious fix seems to be to atomize the pattern string.
(In reply to Luke Wagner [:luke] from comment #1) > Chris: given that RegExpShared can be shared between compartments and > JSLinearString can be !isAtom, this seems like a cross-compartment hazard in > waiting. With this in mind, the obvious fix seems to be to atomize the > pattern string. Atoms can be collected just as well. We really need either to root the string or replace the string with malloced memory.
Actually, it seems like RegExpShared doesn't even need to keep 'source' at all! In all but 1 case, 'source' is just transmitting a 'source' argument present in an outer call. The exception is that, when a regexp is finalized, we remove it from the cache. We could flip this around and say that the regexp cache (which is purged on every gc) holds a strong ref. IIUC, in the cases where we create a RegExpShared without a RegExpObject, we immediately remove it from the cache when the stack guard is popped, so this would actually allow those RegExpShared's to be reused.
(In reply to Igor Bukanov from comment #2) > Atoms can be collected just as well. We really need either to root the > string or replace the string with malloced memory. I know, the atom comment was "in addition" to the rooting comment. (Except that both are, if I haven't missed a bit, nullified by comment 3.)
(In reply to Luke Wagner [:luke] from comment #3) > We could flip this around and say > that the regexp cache (which is purged on every gc) holds a strong ref. That is a nice idea and eliminates the need to store the keys in cache values.
I think this is okay (but probably gross enough that it should be addressed), but please check my reasoning. Since all RegExpObjects get purged of their RegExpShared on every GC and the RegExpShared cache is deleted on every GC, the source field of the RegExpShared that's not referenced by any RegExpObject (only held in a RegExpMatcher) will be otherwise unreachable by the program. The AutoRefCount will just keep it alive for long enough to let the RegExpCode code to finish its work without be deallocated, which makes no use of the source field.
I just learned at lunch that regexp objects drop their RegExpShared on every gc. That suggests a wholly simpler model: RegExpShared isn't ref-counted. There is a single linked list maintained by the runtime of all live RegExpShareds. The entire list gets deleted on every gc (comp or full) with the exception of RegExpShareds that are actively being used in a regexp operation at the time of gc. To know this, we (re-)add a ref count to RegExpShared counting the number of active uses of the regexp (so inc'd/dec'd in RegExpMatcher's ctor/dtor). (This also lets RegExpShared::source be removed.) Does that miss any corner cases?
(In reply to Chris Leary [:cdleary] from comment #6) > RegExpMatcher) will be otherwise unreachable by the program. The > AutoRefCount will just keep it alive for long enough to let the RegExpCode > code to finish its work without be deallocated, which makes no use of the > source field. The source field will be used in RegExpShared::decref to purge the regexp from the cache. As RegExpShared::decref also dereferences the source in source->isAtom, we do have a hazard here.
Assignee: general → luke
Summary: investigate rooting issues with RegExpPrivate → investigate rooting issues with RegExpShared
Duh, Bill pointed out obviously per-runtime causes dangling pointers during compartment GC from the non-GC-ed compartment. So of course the list and cache need to be per-compartment.
(In reply to Luke Wagner [:luke] from comment #9) > Duh, Bill pointed out obviously per-runtime causes dangling pointers during > compartment GC from the non-GC-ed compartment. So of course the list and > cache need to be per-compartment. Hm, which pointers? With the source field removed the RegExpShared shall contain only allocated data. So what exactly makes it compartment-sensitive? Also, do we really need the list? Why we cannot simply enumerate the cache hashtable entries?
(In reply to Igor Bukanov from comment #10) > With the source field removed the RegExpShared shall contain only allocated data. I meant malloc-allocated data.
(In reply to Igor Bukanov from comment #8) > The source field will be used in RegExpShared::decref to purge the regexp > from the cache. As RegExpShared::decref also dereferences the source in > source->isAtom, we do have a hazard here. Oh yeah, so for operations that can create more regexps inside in a nested fashion (like String.prototype.replace) we should have a hazard. Nice catch. I vote for the AutoStringRooter (because RegExpShareds are created lazily anyway, so it shouldn't add a lot of auto-rooting overhead). If we're in agreement on that I can make the patch.
(In reply to Igor Bukanov from comment #10) Dangling pointers from RegExpObjects to RegExpShareds. If every single RegExpShared in a compartment is in the hashtable then yes, the list would be superfluous. Right now that isn't the case, but it could be. Then the hashmap wouldn't so really be a "cache" at all but an owner of RegExpShareds. I like it.
(In reply to Chris Leary [:cdleary] from comment #12) If you don't mind, I'd like to do what we were just talking about in the above comments. So far the patch is allowing a lot of simplifications to the code.
For branches (if this could not wait FF 13) we may need a simple solution. Always atomizing the source in RegExpShared could be that solution. This will avoid dereferencing the atom in decref.
The complication with the global cache that is wiped during each GC - this requires to traverse the heap of all compartments to find all references to RegExpShared to reset them to zero. So we either should use per-compartment cache or continue to ref count RegExpShared to avoid wiping RegExpShared pointed from other compartments. So the original idea from the comment 3 looks like the best option.
(In reply to Igor Bukanov from comment #16) > So we either should use per-compartment cache Yes, that is what comment 9 said. (In reply to Igor Bukanov from comment #15) > For branches (if this could not wait FF 13) we may need a simple solution. The simplest solution is coment 1: root the source in the matcher. Now that I think about it, Chris, if you wanted to do that early, I can just file a non-ss followup with simplification patch.
Assignee: luke → general
I suppose that fix would actually require me to create a new AutoRooter that hangs off of the runtime instead of a context (in order to be scoped to an appropriate lifetime). Instead, as a quick fix, I can keep a cache generation counter to avoid the problem mentioned in comment 8. If the cache generation isn't equal to the one where the regexp shared was created, we don't attempt a cache removal. Any objections to using that approach as a stopgap instead?
(In reply to Chris Leary [:cdleary] from comment #18) > I suppose that fix would actually require me to create a new AutoRooter that > hangs off of the runtime instead of a context (in order to be scoped to an > appropriate lifetime). Instead, as a quick fix, I can keep a cache > generation counter to avoid the problem mentioned in comment 8. You mean a counter that is increased each time we prune the cache during the GC? That should work I suppose. Still, why cannot we simply always atomize the source string?
If a regexp shared's lifetime can only be extended past GC by regexp execution, why is a stack guard not appropriately scoped?
The patch in bug 724748 should fix the issue on trunk (it removes RegExpShared::source and does the per-compartment ownership).
(In reply to Chris Leary [:cdleary] from comment #18) > Any objections to using that approach as a stopgap instead? Any hints on how our fuzzers might be able to flush out any actual vulnerabilities in this code? If we don't know of a specific problem we could skip the stop-gap and just wait for Firefox 13 to fix the problem. Hm, does "currently" in comment 0 include Firefox 10 and 10-esr? If so maybe we do need to worry about a stop-gap.
Luke, with bug 724748 fixed, is this also fixed?
Yes it is on trunk and aurora.
Group: core-security
Status: NEW → RESOLVED
Closed: 13 years ago
Depends on: 724748
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.