Closed
Bug 599610
Opened 14 years ago
Closed 14 years ago
PurgeScriptFragments may not purge all fragments
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta8+ |
status1.9.2 | --- | wontfix |
status1.9.1 | --- | unaffected |
People
(Reporter: igor, Assigned: igor)
References
Details
(Whiteboard: [sg:critical] [qa-examined-192] [qa-needs-STR])
Attachments
(2 files, 4 obsolete files)
(deleted),
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
gal
:
review+
dveditz
:
approval1.9.2.14-
dveditz
:
approval1.9.2.18-
dveditz
:
approval1.9.2.20-
|
Details | Diff | Splinter Review |
PurgeScriptFragments only looks for trace fragments that are found in TraceMonitor for the current thread. This misses other threads that also may contain the fragments for the script. For example, the GC never runs on a web worker thread so the GC will never purge any fragments for web worker functions. This is hazardous as the fragments may match a newly allocate script that reused memory from the deleted script.
Assignee | ||
Comment 1•14 years ago
|
||
The safest fix for branches would be to enumerate over all threads in search for script fragment. But with web workers activated this would imply that any finalized script (including function scripts) would need to scan at least 2 extra vmfragments instances implying a read of minimum 2 2K segment per script finalization. That could influence the GC pause timing.
An alternative is to put all fragments from JSScript into a list so a script can enumerate them all.
Assignee: general → igor
Updated•14 years ago
|
Whiteboard: [sg:critical]
Assignee | ||
Updated•14 years ago
|
blocking2.0: --- → ?
Updated•14 years ago
|
blocking2.0: ? → beta8+
Assignee | ||
Comment 3•14 years ago
|
||
On branches there is a problem with just enumerating over all threads. There JS_DestroyScript really destroy script and that function can be called outside the GC. Thus enumerating over all threads brings the complexity of using the GC machinery just to enumerate the thread data. This is too complex to add to branches.
A safer solution would be to add a field to JSScript that would track the thread that was used for the first recording and will abort the following recordings if that thread would mismatch the current one. That could hurt web workers if the worker is transferred to another thread, but that should be acceptable.
Assignee | ||
Comment 4•14 years ago
|
||
The idea above of using an owner thread field in JSScript would not lead to a simple patch. With such change it would be necessary to clear such field when JSThread is GC-ed but the script is not. That brings a lot of extra complexity.
So here is an implementation of that enumerating all threads idea that should be easy to port to branches. On branches it would not address the issue of JS_DestroyScript been called outside the GC for scripts having fragments on other threads, but such usage does not exists in FF code base.
I would imagine this affects 1.9.1 as well, yes?
Assignee | ||
Comment 6•14 years ago
|
||
(In reply to comment #5)
> I would imagine this affects 1.9.1 as well, yes?
No - on 1.9.1 the GC flashes all JIT-generated code. So there the bug can only affect explicitly destroyed scripts outside the GC phase, but those scripts do not escape to other threads in FF code base.
Updated•14 years ago
|
blocking1.9.1: ? → ---
Assignee | ||
Comment 7•14 years ago
|
||
Rebased version of v1 with minimal amount of changes.
Attachment #481230 -
Attachment is obsolete: true
Assignee | ||
Comment 8•14 years ago
|
||
The new patch adds a hash of scripts with recorded fragments to offset performance impact of enumerating the fragments for all threads when destructing a script.
Attachment #482507 -
Attachment is obsolete: true
Attachment #482522 -
Flags: review?(gal)
Assignee | ||
Comment 9•14 years ago
|
||
The new version adds missing OOM reporting.
Attachment #482522 -
Attachment is obsolete: true
Attachment #482547 -
Flags: review?(gal)
Attachment #482522 -
Flags: review?(gal)
Comment 10•14 years ago
|
||
gal, can you do this review please?
Updated•14 years ago
|
Whiteboard: [sg:critical] → [sg:critical][needs r=gal]
Updated•14 years ago
|
Attachment #482547 -
Flags: review?(gal) → review+
Updated•14 years ago
|
Whiteboard: [sg:critical][needs r=gal] → [sg:critical]
Comment 11•14 years ago
|
||
Fell off the plate until sayrer pinged me. Sorry for the delay.
Assignee | ||
Comment 12•14 years ago
|
||
landed to TM - http://hg.mozilla.org/tracemonkey/rev/f9785814bdbc
For 1.9.2 the patch requires some porting efforts as the hashmap is not available there. So I suggest to move this to the next release on that branch.
blocking1.9.2: .13+ → ?
Comment 13•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 14•14 years ago
|
||
per comment 12 it's now the "next release" on the branch.
blocking1.9.2: ? → .14+
Assignee | ||
Comment 15•14 years ago
|
||
Here is untested backport to 1.9.2 that uses JSDHashTable to record traced scripts.
Assignee | ||
Comment 16•14 years ago
|
||
The patch for 1.9.2 deviates from the trunk too much and requires own review. Part of the reason for that is that JSDhashTable does not have the clear method that required to finish table to remove all elements during flushing.
Attachment #497967 -
Attachment is obsolete: true
Attachment #498162 -
Flags: review?(gal)
Assignee | ||
Comment 17•14 years ago
|
||
gal - do you have time to look at 192 patch today before the code freeze ?
Comment 18•14 years ago
|
||
Comment on attachment 498162 [details] [diff] [review]
patch for 192
If I don't review a patch within 24h, please nag (I overlooked the email in that case).
Attachment #498162 -
Flags: review?(gal) → review+
Assignee | ||
Updated•14 years ago
|
Attachment #498162 -
Flags: approval1.9.2.14?
Comment 19•14 years ago
|
||
Its blocking, no approval needed.
Comment 20•14 years ago
|
||
(In reply to comment #19)
> Its blocking, no approval needed.
That's never the case on the branch, please see the tree rules at the top of every tinderbox page. That may be true (most of the time) on the trunk; the branches, however, operate in permanent "last days of the release" mode. (I assume you're referring to branch since it was checked in on the trunk in comment 13.)
Comment 21•14 years ago
|
||
Comment on attachment 498162 [details] [diff] [review]
patch for 192
a=LegNeato for 1.9.2.14. Please land this asap today.
Attachment #498162 -
Flags: approval1.9.2.14? → approval1.9.2.14+
Comment 22•14 years ago
|
||
I see. My mistake. In that case though, whats the point of "blocking" a release branch? We should probably not use blocking on branches then, only wanted and approved.
Comment 23•14 years ago
|
||
blocking means we'll hold the release for it, wanted means we want it but won't hold the release. Because of the short cycle times we want extra eyes on the actual patch -- think of it as a super-review-required type step.
Comment 24•14 years ago
|
||
We briefly had a super-review-required rule for security patches, but we backed off of that because it was pointless.
Now the rule seems to be back, just different and even more pointless. Previously the super review was coming from a super-reviewer who was at least potentially qualified to review the patch at a technical level. Now the super-review is coming from a driver who most likely has no clue what the patch does?
Chris, do you have the slightest idea what the code in the patch does you just approved? Don't feel bad if not. Its a super specialized corner case that probably 3-4 people on the planet really understand. You can't expect drivers to have a technical clue/review opinion for individual patches. That's simply not their job. These extra eyes are not qualified to see anything useful (and shouldn't be expected to). Reviews should come from module owners/peers.
For the last few branch patches I have seen the approval step is basically just a delay/paperwork/stamping step. Why do it then? If you mark a bug blocking a branch, it means: module owners/peers, write a patch, have it reviewed, and land it asap. If you are worried about upcoming freezes/timing/scheduling and want to stabilize a branch, just close it.
Assignee | ||
Comment 25•14 years ago
|
||
Comment 26•14 years ago
|
||
(In reply to comment #24)
> Now the rule seems to be back, just different and even more pointless.
"back"? The branches have had the same rules now for as long as I can remember... probably at least 4-5 years, if not longer. This process isn't anything new. All branch patches require approval.
While it may seem burdensome, this process has caught numerous problems in the past that would have been missed otherwise had the branch drivers not specifically looked at the patch (even if they didn't understand all of it). Considering the branches get very little testing due to lack of nightly users and limited QA, having an extra pair of eyes is important.
Comment 27•14 years ago
|
||
(In reply to comment #24)
> Chris, do you have the slightest idea what the code in the patch does you just
> approved? Don't feel bad if not. Its a super specialized corner case that
> probably 3-4 people on the planet really understand. You can't expect drivers
> to have a technical clue/review opinion for individual patches. That's simply
> not their job. These extra eyes are not qualified to see anything useful (and
> shouldn't be expected to). Reviews should come from module owners/peers.
Nope, I do not. Sometimes I do, though more often I just look at the extent of the change and approve unless it looks huge. I see the approval required more to meter what goes into a release, not to determine if the fix is "correct". It also gives an additional chance for me (or other release drivers) to ask clarifying questions, make sure 1.9.1 gets a patch if needed, etc. Sure, the same thing can be done w/o needing the up front approvals, but the reality would be that the additional checking would go away after a month or so.
> For the last few branch patches I have seen the approval step is basically just
> a delay/paperwork/stamping step. Why do it then? If you mark a bug blocking a
> branch, it means: module owners/peers, write a patch, have it reviewed, and
> land it asap. If you are worried about upcoming freezes/timing/scheduling and
> want to stabilize a branch, just close it.
Yes, it is generally rubberstamping for JS blackmagic these days :-). It makes a lot more sense for other areas of the code.
I do think we need to revamp the way we manage branch blocking/approvals, I just haven't figured out the best way to do it. I've never encountered a release process that only approves bugs without requiring approval on the individual patches as well FWIW.
Assignee | ||
Comment 28•14 years ago
|
||
On 192 the patch lead to busted leak test:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1295472073.1295478386.7556.gz&fulltext=1
My plan is to check the patch one more time and backout in an hour or so.
Assignee | ||
Comment 29•14 years ago
|
||
To 1.9.2 drivers:
I could not find anything suspicions in the patch. Given the closed tree status I think it is better to wait until a fresh build on tree reopening.
I will be offline for the next 9 hours or so (need to sleep in CEST timezone).
Assignee | ||
Comment 30•14 years ago
|
||
The tinderbox failure was caused by the bug 627225 and unrelated to the landed patch.
Comment 31•14 years ago
|
||
Are there any STR for 1.9.2 to verify this fix?
Whiteboard: [sg:critical] → [sg:critical] [qa-examined-192] [qa-needs-STR]
Comment 32•14 years ago
|
||
I backed this out of 1.9.2 (both default and the .14 relbranch) due to it probably causing bug 631105
blocking1.9.2: .14+ → needed
status1.9.2:
.14-fixed → ---
Comment 33•14 years ago
|
||
It probably didn't cause bug 631105, not directly. Backing this out moved the crash to bug 633869 but the underlying cause of both now appears to be frankenbuilds.
Comment 34•14 years ago
|
||
We've landed and then (fairly publicly) backed out this change. We should try to get it into the next release if we can.
blocking1.9.2: needed → ?
status1.9.2:
--- → wanted
Updated•14 years ago
|
Attachment #498162 -
Flags: approval1.9.2.15?
Updated•14 years ago
|
blocking1.9.2: ? → .15+
Attachment #498162 -
Flags: approval1.9.2.15? → approval1.9.2.15+
Comment 35•14 years ago
|
||
Comment on attachment 498162 [details] [diff] [review]
patch for 192
This turned out not to be the culprit for the crash spike, we want it back in now. Approved for 1.9.2.15, a=dveditz for release-drivers
clegnitto will try to re-land the patch he backed out, but may need a merge if something else changed.
Attachment #498162 -
Flags: approval1.9.2.14+ → approval1.9.2.14-
Updated•14 years ago
|
blocking1.9.2: .17+ → .18+
Comment 36•14 years ago
|
||
Comment on attachment 498162 [details] [diff] [review]
patch for 192
Since this is one of the ones that led to the frankenbuild crash spike we want to wait another release. 3.6.17 is getting additional install/update fixes that should help that problem.
Attachment #498162 -
Flags: approval1.9.2.17+ → approval1.9.2.18?
Comment 37•14 years ago
|
||
Comment on attachment 498162 [details] [diff] [review]
patch for 192
Approved for 1.9.2.18, a=dveditz for release-drivers
Now that users have the install fixes we want to land this.
Attachment #498162 -
Flags: approval1.9.2.18? → approval1.9.2.18+
Updated•14 years ago
|
Flags: in-testsuite?
Keywords: testcase-wanted
Updated•14 years ago
|
blocking1.9.2: .18+ → .19+
Updated•13 years ago
|
Attachment #498162 -
Flags: approval1.9.2.19?
Attachment #498162 -
Flags: approval1.9.2.18-
Attachment #498162 -
Flags: approval1.9.2.18+
Comment 38•13 years ago
|
||
We decided there is just too much risk at this point to take this in 3.6 due to our lower test population. Marking as wontfix for 1.9.2.
blocking1.9.2: .20+ → ---
Updated•13 years ago
|
Attachment #498162 -
Flags: approval1.9.2.20? → approval1.9.2.20-
Updated•10 years ago
|
Group: core-security
Updated•9 years ago
|
Keywords: testcase-wanted
You need to log in
before you can comment on or make changes to this bug.
Description
•