574313 - turning autorooters into conservative stack scan verifiers

Assignee

Description

•

14 years ago

+++ This bug was initially created as a clone of Bug #573709 +++ To increase test coverage for the conservative stack scanner I suggest run the stack scan as the first thing during the GC marking phase and then change autorooters to insist that the stack-based GC things are marked.

Igor Bukanov

Assignee

Comment 1

•

14 years ago

Attached patch v1 (obsolete) (deleted) — Details — Splinter Review

The patch moves the conservative scanner to the beginning of js_TraceRuntime so the following autoroot loop can verify that nothing is missed. Note that the verifier part calls abort() even in optimized builds for max test coverage. The plan is to keep it like that for couple of weeks and then remove the no longer needed autoroots.

Igor Bukanov

Assignee

Comment 2

•

14 years ago

With the patch I am getting try server failures on file:///builds/slave/tryserver-macosx-debug-unittest-jsreftest/build/jsreftest/tests/jsreftest.html?test=js1_8_5/extensions/array-length-protochange.js

Andreas Gal :gal

Comment 3

•

14 years ago

Very suspicious test name. Might be a nice find.

Igor Bukanov

Assignee

Comment 4

•

14 years ago

Aparently there some places in FF where the autoroots are used outside a request. This is bad. I will try to fix this but this may not be straightforward as it is not clear at the call sites if using AutoRequest is safe as other cx could be active on the stack. An alternative is to always do the stack scan for a thread that calls the GC even if the GC was called outside a request and tolerate extra leaks. This will assume that web workers and extensions still properly respect the request model.

Igor Bukanov

Assignee

Comment 5

•

14 years ago

(In reply to comment #4) > An alternative is to always do > the stack scan for a thread that calls the GC even if the GC was called outside > a request and tolerate extra leaks. Note that an autoroots outside a request is bad in general. The autoroot link manipulations are not locked and rely on the request model to serialize with the GC. It only works in FF because thete the GC can only run on the main thread.

Andreas Gal :gal

Comment 6

•

14 years ago

Lets fix the FF code. This is a clear API violation. And we have now a great way to find them.

Andreas Gal :gal

Updated

•

14 years ago

Blocks: 572057

Andreas Gal :gal

Comment 7

•

14 years ago

Any ETA on this?

Igor Bukanov

Assignee

Comment 8

•

14 years ago

If I base the patch on the bug 552266, then this could be ready before Friday. Without that fixing request model violations requres a lot of extra testing.

Depends on: 552266

Igor Bukanov

Assignee

Comment 9

•

14 years ago

I am getting with a path tryserver failures like: REFTEST TEST-START | file:///Users/cltbld/talos-slave/tryserver-leopard-debug-u-jsreftest/build/jsreftest/tests/jsreftest.html?test=js1_8/regress/regress-464418.js ++DOMWINDOW == 10 (0xd1264f8) [serial = 5191] [outer = 0xcd4a8f0] begin test: js1_8/regress/regress-464418.js BUGNUMBER: 464418 STATUS: Do not assert: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp Conservative stack GC scanner has missed a root on the stack. Aborting. investigating...

Igor Bukanov

Assignee

Comment 10

•

14 years ago

Attached patch v2 (obsolete) (deleted) — Details — Splinter Review

With the bug 580578 fixed I no longer see any aborts on the tryserver due to missed roots.

Attachment #453855 - Attachment is obsolete: true

Attachment #460203 - Flags: review?(anygregor)

Igor Bukanov

Assignee

Comment 11

•

14 years ago

Attached patch v3 (deleted) — Details — Splinter Review

patch with more informative abort message

Attachment #460203 - Attachment is obsolete: true

Attachment #460206 - Flags: review?(anygregor)

Attachment #460203 - Flags: review?(anygregor)

Igor Bukanov

Assignee

Comment 12

•

14 years ago

The plan for this bug is to have the abort code at least in one beta before we can remove it together with autorooters.

Gregor Wagner [:gwagner]

Comment 13

•

14 years ago

Comment on attachment 460206 [details] [diff] [review] v3 >- * The conservative stack scanner runs before we mark extra roots, >- * which may use additional colors to implement cycle collection. >+ * We mark extra roots at the last thing so it can use use additional >+ * colors to implement cycle collection. One "use" is enough :)

Attachment #460206 - Flags: review?(anygregor) → review+

Igor Bukanov

Assignee

Comment 14

•

14 years ago

http://hg.mozilla.org/tracemonkey/rev/0b6cbd5916c6 - compared with v3 I have added missing null check to skip AutoObjectRooter if it contains a null pointer.

Whiteboard: fixed-in-tracemonkey

Igor Bukanov

Assignee

Updated

•

14 years ago

Depends on: 583763

Igor Bukanov

Assignee

Comment 15

•

14 years ago

Landing of this bug lead to sporadic of mochi test failures on Windows. Unfortunately I do not have time to work on this today, so, if necessary, this should be backed out from relevant branches.

Depends on: 583598

Michael Kraft [:morac]

Comment 16

•

14 years ago

Since this landed on the trunk load, I'm getting the "Conservative GC scanner has missed the root...." assert quite often when a background thread calls a function that returns a large amount of data. For example in my code I have a function that reads a file and returns the data. I call this function in a loop to read all the files in a folder as such: files.forEach(function(aFile) { let data = this.readFile(file); // do stuff }, module_object); function readFile(file, aHeaderOnly) { let data = doActualFileRead(file, aHeaderOnly); // do processing on data return data; } The readFile function is called in a background thread two instances in my add-on. In the first case, the aHeaderOnly flag is set which causes only 1020 bytes to be read from the file. That works fine. The other instance doesn't set that flag (the code above readFile). That has a high tendency of crashing when executed in a background thread. At first I thought it was the file reads since it was happening when large files were read, but if I replace the "return data;" in readFile function with "return null;" or even 'return "hello";' the crashes no longer occur. Since the files are still being read, it can't be the file reads that are causing the assert. I then thought that maybe it was the processing being done on the data being returned (the "do stuff" code) that caused the assert so I commented it out and the asserts still happened. The largest file reads in 4,170,162 bytes and returning that data crashes pretty much every time. The 2nd largest file is 295050 and that is returned fine. I replaced the readFile function with the following: function readFile(file, aHeaderOnly) { let a = ""; for (var i=0; i < 1000000; i++) a += "1"; return a; } and that crashed as well despite doing no file reads at all. I then started changing the "1000000" to see how low it would need to be and still work. When I changed it to "500000" Minefield aborted after the 2nd time the function was called. When I changed it to "250000" the function was called about 15 times before Minefield aborted. This seemed to indicate that the less data returned, the less likely it is to abort. I also found that if my machine was busy (drive thrashing), it has less of a chance of crashing which indicates a timing issue. I tried removing the module_object "this" parameter from the foreach function and it still aborted. I tried changing the foreach to a normal for and even a while loop and in both cases it still aborted. I then removed the foreach function altogether and just made a bunch of direct calls to module_object.readFile. The aborts no longer occurred. I went back to my original code and changed it so instead of looping, I created a function call that does exactly what was inside the loop and added one call to the function for every file in the folder (52 lines, one for each file). That still aborted on occasion, but the vast majority of the time it worked as opposed to aborting the vast majority of the time. So it seems the simple act of returning data from a function call has a chance of causing data to get lost resulting in the assert when doing a GC. The larger the amount of data returned, the better than chance of aborting. If the call to the function is made from inside a loop, the chance of asserting increases exponentially such that for large amounts of data it's almost a certainty.

Igor Bukanov

Assignee

Comment 17

•

14 years ago

(In reply to comment #16) > Since this landed on the trunk load, I'm getting the "Conservative GC scanner > has missed the root...." assert quite often when a background thread calls a > function that returns a large amount of data. Is this with optimized build of FF? If yes, could you try a debug one and see if you also hits other asserts?

Michael Kraft [:morac]

Comment 18

•

14 years ago

(In reply to comment #17) > Is this with optimized build of FF? If yes, could you try a debug one and see > if you also hits other asserts? Yes it's with the optimized nightly win32 FF build from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/. Is there some place I can get a win32 debug build? I tried installing a build from the following location, but I keep getting a message stating that "This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem." http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/

Michael Kraft [:morac]

Comment 19

•

14 years ago

I managed to get the tinderbox-builds debug builds to run (needed to install VC 2005), but unfortunately the debug builds I've tried, crash with an assertion on startup with a new profile in safe-mode, so I can't even get to the point where I can test my add-on. I tried both the latest mozilla-central-win32-debug and tracemonkey-win32-debug builds so there must be something about those builds that my system doesn't like. If you're curious the assertion is: ###!!! ASSERTION: Can't create document accessible!: 'docAcc', file e:/builds/moz2_slave/mozilla-central-win32-debug/build/accessible/src/base/nsAccDocManager.cpp, line 354 (or 351 for 1281091389 tracemonkey build)

Robert Sayre

Comment 20

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/0b6cbd5916c6

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

v1 14 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
v2 14 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
v3 14 years ago Igor Bukanov (deleted), patch	gwagner : review+	Details \| Diff \| Splinter Review