Closed Bug 623189 (crossfuzz-pvt) Opened 14 years ago Closed 4 years ago

[meta] private cross_fuzz tracking bug

Categories

(Core :: Fuzzing, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: chofmann, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, meta, sec-other, Whiteboard: [sg:nse])

Attachments

(3 files)

lets use this bug as the security closed version of the public bug 581539 with information from the mozilla project research and any privately reported info about running the fuzzer, the variety of signatures we see in crash stats, or reproducible test cases and possible 0-days.
Depends on: 622165, 622456, 622483, 622596, 623070
mz says in https://bugzilla.mozilla.org/show_bug.cgi?id=622456#c12 :

put a seed in location.hash (and if none is found there, a
random one is picked and put in the URL).

Your mileage reproducing crashes this way may vary, because there is also an
element of network timing involved; making a local copy of the fuzzer in
file:/// will likely reduce this dependence.

----

in some testing so far it seem like

/lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html#1

might increase the chances of hitting js::gc::MarkKind 

need to see if we can use this technique to insert a variety of seeds to start to isolate on reproducible crashes connected to seed values, maybe on bc's or tomcat's automation.
also another hint: turning off popup blocking on file:/// urls was quite a pain for me, so on my test set up I just disabled blocking entirely.
Depends on: 623096
one thing I'm finding is that breakpad doesn't seem to pick up the file:/// urls while running the fuzzer so as everyone starts to run locally we lose the ability to build an inventory of signatures to check out, or where they are being reproduced.  file:/// are reported in other cases so the fuzzer, or maybe its just my runs, might be tickling a breakpad bug.

here is what I see in some of my reports

signature: WrappedNativeTearoffSweeper 4.0b8 Mac OS X 10.6.4 10F2108 
URL: \N Comment: local #1
http://crash-stats.mozilla.com/report/index/9acb4ee0-8b40-424d-a576-6acac2110104

signature: js::gc::MarkKind 4.0b8 Mac OS X 10.6.4 10F2108 
URL: \N Comment: local #1
http://crash-stats.mozilla.com/report/index/5034d86f-2687-4c0c-bfab-a09032110104

signature: js::gc::MarkKind 4.0b8 Mac OS X 10.6.4 10F2108 
URL: \N Comment: local #1
http://crash-stats.mozilla.com/report/index/5e2560f3-3165-446b-a2f6-43f0f2110104

signature: js::gc::MarkKind 4.0b8 Mac OS X 10.6.4 10F2108 
URL: \N Comment: local
http://crash-stats.mozilla.com/report/index/e943be5f-60f9-4bc0-852b-63cff2110104


signature: nsSJISProber::HandleData 4.0b8 Mac OS X 10.6.4 10F2108 
URL: \N Comment: local #1
http://crash-stats.mozilla.com/report/index/6dc01f9c-5fb9-43d2-8f55-8b4db2110104

signature: _moz_cairo_surface_set_device_offset 4.0b8 Mac OS X 10.6.4 10F2108 
URL: \N Comment: hash #1
http://crash-stats.mozilla.com/report/index/a99dc69c-d043-4da9-b2a0-ddbd12110104

signature: js::gc::MarkKind 4.0b8 Mac OS X 10.6.4 10F2108 
URL: \N Comment: hash #1
http://crash-stats.mozilla.com/report/index/875efff3-adbe-4f63-9303-4ba1c2110104
here is the latest inventory of signature/firefox version/OS combos we have seen in crash data since the first of the year.

   7 WrappedNativeMarker 3.6.13 Windows NT 6.1.7600 
   4 WrappedNativeMarker 3.6.13 Windows NT 5.1.2600 Service Pack 3 
   2 WrappedNativeMarker 4.0b8 Mac OS X 10.6.5 10H574 
   1 WrappedNativeMarker 3.6.14pre Windows NT 5.1.2600 Service Pack 2 
   1 WrappedNativeMarker 3.6.13 Linux 2.6.32 x86_64 
   1 WrappedNativeMarker 3.6.12 Windows NT 6.1.7600

   3 WrappedNativeJSGCThingTracer 4.0b8 Windows NT 6.1.7600 
   1 WrappedNativeJSGCThingTracer 4.0b8 Windows NT 6.0.6002 Service Pack 2 

   2 XPCWrappedNativeProto::Mark() 4.0b8 Windows NT 6.1.7600 
   1 XPCWrappedNativeProto::Mark() 4.0b8 Windows NT 5.1.2600 Service Pack 3 

   2 XPCNativeSet::Mark() 4.0b8 Windows NT 5.1.2600 Service Pack 3 
   1 XPCNativeSet::Mark() 4.0b8 Windows NT 6.1.7600 

   2 XPCNativeSet::IsMarked() 4.0b8 Windows NT 6.1.7600 

   2 XPCNativeScriptableInfo::Mark() 4.0b8 Windows NT 6.1.7600 

   1 js::gc::MarkKind 4.0b8 Windows NT 6.1.7600 
   1 js::gc::MarkKind 4.0b8 Windows NT 5.1.2600 Dodatek Service Pack 3 
   1 js::gc::MarkKind 4.0b8 Mac OS X 10.6.5 10H574 

   1 js::gc::MarkChildren 4.0b8 Windows NT 6.1.7600 

   1 XPC_WN_Helper_NewResolve 4.0b8 Windows NT 6.1.7600 

   1 XPCNativeScriptableInfo::Mark() 4.0b8 Windows NT 5.1.2600 Service Pack 3 
   1 XPCNativeScriptableInfo::Mark() 4.0b8 Windows NT 5.1.2600 Service Pack 2 

   1 WrappedNativeTearoffSweeper 4.0b8 Windows NT 6.0.6002 Service Pack 2 
   1 WrappedNativeTearoffSweeper 4.0b8 Mac OS X 10.6.5 10H574 

   1 JS_StackFramePrincipals 3.5.16 Windows NT 6.1.7600 

   1 JS_CallTracer 3.6.13 Linux 2.6.35 x86_64 
   1 JS_CallTracer 3.6.10 Mac OS X 10.6.5 10H574 

   1 js_ConcatStrings(JSContext*, JSString*, JSString*) 4.0b8 Windows NT 6.1.7600 

   1 js::PropertyCache::fullTest(JSContext*, unsigned char*, JSObject**, JSObject**, js::PropertyCacheEntry*) 4.0b8 Windows NT 6.1.7600 

   2 js::StackSpace::pushSegmentForInvoke(JSContext*, unsigned int, js::InvokeArgsGuard*) 4.0b9pre Windows NT 5.1.2600 Service Pack 3 

   4 nsTypedSelection::ContainsNode(nsIDOMNode*, int, int*) 3.5.16 Windows NT 6.1.7600 

   4 _moz_cairo_surface_set_device_offset 4.0b9pre Mac OS X 10.6.5 10H574 

   3 nsSVGGlyphFrame::GetExtentOfChar(unsigned int, nsIDOMSVGRect**) 3.6.8 Windows NT 6.0.6002 Service Pack 2 

   1 nsSVGStyleElement::SetAttr(int, nsIAtom*, nsAString_internal const&, int) 3.6.13 Windows NT 5.1.2600 Service Pack 3 

   1 xul.dll@0x2c9cb7 4.0b8 Windows NT 5.1.2600 Service Pack 3 

   1 nsDiskCacheStreamIO::Flush() 3.6.13 Windows NT 5.1.2600 Service Pack 3 

   1 mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface(gfxContext*, gfxASurface::gfxContentType, gfxPoint*) 4.0b8pre Windows NT 6.0.6002 Service Pack 2 

   1 cowpplg.dll@0x279b1 3.6.13 Windows NT 5.1.2600 Service Pack 3 

   1 XUL@0x11985b6 4.0b8pre Mac OS X 10.6.5 10H574
The top crashes up and including JS_CallTracer are all the same bug, most likely. If we catch any of those on the replay box, we should be golden.
shorter list of just the signatures

  16 WrappedNativeMarker 
   4 nsTypedSelection::ContainsNode(nsIDOMNode*, int, int*) 
   4 _moz_cairo_surface_set_device_offset 
   4 XPCNativeScriptableInfo::Mark() 
   4 WrappedNativeJSGCThingTracer 
   3 nsSVGGlyphFrame::GetExtentOfChar(unsigned int, nsIDOMSVGRect**) 
   3 js::gc::MarkKind 
   3 XPCWrappedNativeProto::Mark() 
   3 XPCNativeSet::Mark() 
   2 js::StackSpace::pushSegmentForInvoke(JSContext*, unsigned int, js::InvokeArgsGuard*) 
   2 XPCNativeSet::IsMarked() 
   2 WrappedNativeTearoffSweeper 
   2 JS_CallTracer 
   1 xul.dll@0x2c9cb7 
   1 nsSVGStyleElement::SetAttr(int, nsIAtom*, nsAString_internal const&, int) 
   1 nsDiskCacheStreamIO::Flush() 
   1 mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface(gfxContext*, gfxASurface::gfxContentType, gfxPoint*) 
   1 js_ConcatStrings(JSContext*, JSString*, JSString*) 
   1 js::gc::MarkChildren 
   1 js::PropertyCache::fullTest(JSContext*, unsigned char*, JSObject**, JSObject**, js::PropertyCacheEntry*) 
   1 cowpplg.dll@0x279b1 
   1 XUL@0x11985b6 
   1 XPC_WN_Helper_NewResolve 
   1 JS_StackFramePrincipals
fwiw, I've been trying to modify cross_fuzz to record the steps for replay. Still having problems getting the replayed script to work.
I ran locally with a hash on a 10.6 mac and got @ nsContentList::PopulateSelf the first run. But the second time I got the familiar @ _moz_cairo_surface_set_device_offset.

http://crash-stats.mozilla.com/report/index/bp-d0eedf7e-4050-42ce-8963-dca4d2110105 is the report for the first stack.
Michal,

I see about 11 wyciwyg url's in the crash data indicating that locally cached pages that were generated or modified by a script on the client side.  I wonder if having the fuzzer running out of these cached pages has any impact one way or the other on reproducibly of crashes or getting repeats on the same signature.  Any thoughts?

WrappedNativeMarker wyciwyg://54/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
WrappedNativeMarker wyciwyg://57/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
XPCNativeScriptableInfo::Mark() wyciwyg://206/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
XPCNativeSet::IsMarked() wyciwyg://25/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
XPCWrappedNativeProto::Mark() wyciwyg://12/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
XPCWrappedNativeProto::Mark() wyciwyg://22/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
js::StackSpace::pushSegmentForInvoke(JSContext*, unsigned int, js::InvokeArgsGuard*) wyciwyg://134/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_randomized_20100729_seed.html
js_ConcatStrings(JSContext*, JSString*, JSString*) wyciwyg://38/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
nsTypedSelection::ContainsNode(nsIDOMNode*, int, int*) wyciwyg://21/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
xul.dll@0x2c9cb7 wyciwyg://10/http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_final_20100728.html
I have done some more testing, and managed to get the fuzzer a lot more repeatable with the following changes:

1) wget -r -np the entire cross_fuzz directory to file:///

2) Edit targets/*.html to remove <embed> or replace it with a local file (this eliminates the last network dependency),

3) Set home page for the test profile to about:blank,

4) Configure the browser to always start with a blank page (no session restore),

5) Disable "automatically check for updates" in Advanced -> Update settings,

6) Choose "Never remember history" in Privacy settings.

This removes pretty much all the remaining network dependencies, I believe, and gets the whole thing more reliable.
To provide some context: if given the same seed, cross_fuzz will follow the same set of operations.

The one weak point where it can easily get desynchronized is if, at some point, the loaded "target" documents have a different DOM hierarchy than in a previous run. If it's as little as one DOM property missing, all synchronization is lost and there will be no recovery.

The obvious case in which this can happen is if the document or some of its subresources are not given enough time to fully load in one pass, but will be loaded in another. There may be other cases where DOM trees for identical documents will vary based on browser history or random factors, but I am not aware of anything obvious (?).

As for document loads, the factors here are:

1) Network latency and caching - causing a document not to be fully loaded in one cycle, and fully loaded in another. This is probably the most significant problem. Following the steps outlined above should mitigate this.

2) Non-requested operations, such as loading the home page, updating safebrowsing lists, checking for upgrades, etc - which may appreciably alter the timing and cause some delays in document retrieval.

3) Scheduler / CPU caching interference, and other OS-controlled factors - which may perhaps in turn affect timing of local document loads. I am not sure how much of an impact this may have.

I would imagine that wyciwyg:// should not have any relevance, as you start with a clean slate when the browser is restarted, and it behaves deterministically in every run. That said, I'm not that intimate with the codebase to have a definitive answer.
Well good news everyone!

This made me think about the problem a bit, and I came up with a simple hack that, when combined with the advice in comment 10, makes it possible to hit repros pretty reliably in my setup.

The tweak is to essentially save a seed when recursing into a DOM node, before requesting any more randomness; and then restore this value when returning from that recursive crawl function. This prevents fluctuations from propagating down and ruining the remainder of the fuzzing process.

While it is not perfect, I am getting close to 70-80% seed repro rate this way. The new fuzzer is here:

http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_randomized_20110105_seed.html

Note that you need to re-download it all, including the targets/ subdirectory.

HTH!
Attached file 20110105 Version, zipped up (deleted) —
I have run the version in Comment 12 a few times locally in the QA lab. So far I can consistently reproduce a crash on 10.6 Mac, but I don't always get the same stack - the two most recent are [@ _moz_cairo_surface_set_device_offset ] and [@ js::gc::MarkKind ].

On my WinXP machine the fuzzer has continued to run but not yet reproduced a crash.
Whiteboard: [sg:nse]
Blocks: fuzz
FWIW, in addition to bug 622456, I am also seeing seemingly exploitable crashes
when closing the browser after several minutes of fuzzing. This is most likely - and unfortunately - a delayed fallout from earlier heap corruption. The crash is in either in xul!nsCacheService::DoomEntry_Internal via xul!nsCacheService::OnProfileShutdown, or in ProcessPendingRequest called from DoomEntry_Internal. Note that the second address looks like ASCII garbage, which isn't good:

035bdb80 0100            add     dword ptr [eax],eax  ds:0023:015e76c8=0331db30 

1039d32c 837f1000        cmp     dword ptr [edi+10h],0 ds:0023:645f7479=????????

My recommended fuzzing automation would be:

1) Configure the fuzzer and the browser as noted in the heading of http://lcamtuf.coredump.cx/cross_fuzz/cross_fuzz_randomized_20110105_seed.html (this is essentially comment 10, with a further suggestion to use -private in the cmdline).

2) Pick a random, 32-bit integer seed.

3) Launch the comment 12 fuzzer via file:/// with seed provided via # in the URL.

4) After a specific time limit without a crash and without hitting an interesting assertion (I'd experiment with anywhere from 1 to 15 minutes), close the browser via the UI (so that all the cleanup code gets to run).

Note that the browser alert()s and stops if an unexpected exception is thrown, based on the assumption that this might be a sign of a problem all by itself. Firefox often throws NS_ERROR_XPC_SECURITY_MANAGER_VETO, I am not sure why. If you do not want to stop on these, delete the first two occurrences of "setTimeout" in fuzzer source; and replace the third one:

- setTimeout('event_loop()', 1000);
+ setTimeout('setInterval("event_loop()", 5)', 1000);

5) Collect seed - assertion / crash location pairs. Go to 2 until enough data is collected.

6) For every interesting assertion / crash location, look up a seed that resulted in that crash in the least amount of time. This is your best bet for getting a reliable repro.

7) If that seed fails over several attempts, try the next best seed value for that crash.

Unfortunately, with heap corruption, there will be some cases that are still PITA to track down, but a reliable and fast seed still should be of great help.
Another thing: toggle_gc() in the fuzzer may not be optimized for Firefox, which may make heap corruption issues harder to spot. I am not sure how to reliably force GC in non-debug builds from non-privileged pages, because I couldn't find any documentation on the conditions that normally trigger it. If you're testing with debug builds, replacing toggle_gc() with a call to nsIDOMWindowUtils garbageCollect() may be a good idea to get better repros.

In non-debug builds, when I replaced the routine with a code with one that quickly allocates and deallocates a ~1 GB blob of data, I am getting crashes much sooner with many seeds. But it's also slower.
Here's a patch to toggle GC reliably in non-debug builds:

http://lcamtuf.coredump.cx/cross_fuzz/moz_gc.patch

This crashes rather quickly.
Alias: crossfuzz-pvt
Depends on: 624493
Depends on: 622593
Depends on: 693053
Blocks: crossfuzz
Group: core-security → core-security-release

No activity for 9 years, closing.

Status: NEW → RESOLVED
Closed: 4 years ago
Component: General → Platform Fuzzing Team
Resolution: --- → FIXED
Summary: private cross_fuzz tracking bug → [meta] private cross_fuzz tracking bug
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: