Closed Bug 1165469 Opened 10 years ago Closed 9 years ago

Intermittent Gaia unit test "Return code: 1"/"Tests exited with return code 1: harness failures" due to a crash mid-run

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(firefox40 unaffected, firefox41 affected, firefox42 fixed, firefox-esr31 unaffected, firefox-esr38 unaffected, b2g-v2.2 unaffected, b2g-master fixed)

RESOLVED FIXED
FxOS-S2 (10Jul)
Tracking Status
firefox40 --- unaffected
firefox41 --- affected
firefox42 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected
b2g-v2.2 --- unaffected
b2g-master --- fixed

People

(Reporter: RyanVM, Assigned: mccr8, NeedInfo)

References

Details

(Keywords: crash, intermittent-failure)

Attachments

(1 file)

Looks like it's crashing mid-run, but little to nothing to go off when it does :(
Gregor, can you please help find an owner for this frequent Gaia test crash?
Flags: needinfo?(anygregor)
We used to have crash stacks for child and parent crashes in the log. What happened to them?
Flags: needinfo?(anygregor) → needinfo?(jlal)
I'm getting quite concerned about the lack of responsiveness in this bug. Does anybody own Gu these days?
Flags: needinfo?(anygregor)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #174) > I'm getting quite concerned about the lack of responsiveness in this bug. > Does anybody own Gu these days? It James' team. James, any updates here?
Flags: needinfo?(anygregor)
Julien, Can you look into this and try get it solved. It is appearing to be unowned at the moment.
Flags: needinfo?(felash)
That's interesting that all the crashes seems to happen in SMS tests. Is it possible that it happened elsewhere but for some reason this got reported in another bug ?
Also the log says that the "Build shut down unexpectedly", is there a way we can know more ?
(In reply to Julien Wajsberg [:julienw] from comment #367) > Also the log says that the "Build shut down unexpectedly", is there a way we > can know more ? Possibly but that is an issue that needs to be resolved with the task cluster team/team working on the test runner. James Lal has had a need info for a few days, could you poke him to get this solved.
I'm in his timezone for the week so I'll ping him.
Hey Ryan, is it reasonnably easy to try to get a window from treeherder ?
Flags: needinfo?(ryanvm)
I don't understand what you're asking.
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #439) > I don't understand what you're asking. I guess he is asking for a regression window. We could try to find something around May 15th but otherwise I dont think we have a good chance to find the regression by looking at commits. These are random crashes and we don't have a stack. James said he has a patch ready to go that shows stacks for the crashes. That is our best option imho.
Just FYI I asked James yesterday and he told me he would look into it. No more information for now, but we _really_ need to look at this as this happens too much.
First time it happens in mozilla-inbound: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=7fcf6bf43eda First time it happens in b2g-inbound: https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=b2d068bdde3d I'm retriggering some jobs to see what happens before.
From the mozilla-inbound link, I'd bet this is a consequence of bug 866681. But it's difficult to say more without reproducing or a stack trace. Adding the assignee and reviewer in Cc in case this rings a bell for them.
We do end up running ContentUnbinder less often I think, but I'm not seeing anything hinting that this would be OOM.
I manage to reproduce locally by repeatedly launching the SMS tests. The crash id is 4a9410f4-2774-3a0b-75a2be64-17569523 (does not show up in Socorro yet).
Attached file 4a9410f4-2774-3a0b-75a2be64-17569523 (deleted) —
It still doesn't show up in Socorro, I don't know why... Here is the submission file, in case there are interesting information.
Flags: needinfo?(felash)
Julien, do you happen to have some kind of regression range for this?
Flags: needinfo?(felash)
I think the mozilla-inbound build linked in comment 562 is the first fail, that's why I think it comes from bug 866681 (or at least triggered by this change). I just asked on #breakpad why the crash 4a9410f4-2774-3a0b-75a2be64-17569523 is not showing up in Crash Stats, but being in the middle of the work week makes this difficult too. Here is how I reproduced locally: 1. have a uptodate Gaia. Make sure you have npm2 in your path too. 2. (optional) install and use Xephyr: a. Install xserver-xephyr to have an in-memory X server. b. Run: Xephyr :1 -screen 700x900 3. In another terminal, run from the Gaia directory: DISPLAY=:1 bin/gaia-test -d -f Note 1: Use "DISPLAY=:1" only if you used Xephyr above. Note 2: you can use the environment variable "B2G" to point to a specific B2G binary. 4. In another terminal, run while true ; do APP=sms make test-agent-test ; done 5. Wait until B2G executed in step 3 crashes. It can take a while. I don't know if the crash happens on MacOS X as well...
Flags: needinfo?(felash)
(In reply to Julien Wajsberg [:julienw] from comment #686) > 3. In another terminal, run from the Gaia directory: > > DISPLAY=:1 bin/gaia-test -d -f > > Note 1: Use "DISPLAY=:1" only if you used Xephyr above. > Note 2: you can use the environment variable "B2G" to point to a specific > B2G binary. Also, you can hack the quite simple shell script bin/gaia-test to run B2G in gdb.
Depends on: 1176977
mccr8, could we backout bug 866681 to see if is causes this?
Flags: needinfo?(continuation)
Do we have a crash stack for this yet? (In reply to Olli Pettay [:smaug] from comment #705) > mccr8, could we backout bug 866681 to see if is causes this? Sure, though that will require also backing out bug 861449 and a few other bugs. I can dig through that tomorrow.
Flags: needinfo?(continuation)
Flags: needinfo?(continuation)
The correct link to crash stats is https://crash-stats.mozilla.com/report/index/adf01ce8-8534-4ee6-bc28-3b1382150618 But it doesn't have a stack trace :/ I thought builds taken from task cluster had their symbols automatically uploaded to crash-stats :(
With the patch in bug 1177165 (PR https://github.com/mozilla-b2g/gaia/pull/30703) I made it easier to run the test running in gdb: use the "-g" option to "bin/gaia-test".
Here is a stack trace: #0 0x00007ffff1ef8075 in nsGenericDOMDataNode::UnbindFromTree(bool, bool) () from /home/julien/travail/git/gaia/b2g/libxul.so #1 0x00007ffff1e89f0f in mozilla::dom::Element::UnbindFromTree(bool, bool) () from /home/julien/travail/git/gaia/b2g/libxul.so #2 0x00007ffff24a26bf in nsGenericHTMLElement::UnbindFromTree(bool, bool) () from /home/julien/travail/git/gaia/b2g/libxul.so #3 0x00007ffff248d127 in mozilla::dom::HTMLSharedElement::UnbindFromTree(bool, bool) () from /home/julien/travail/git/gaia/b2g/libxul.so #4 0x00007ffff1e89f0f in mozilla::dom::Element::UnbindFromTree(bool, bool) () from /home/julien/travail/git/gaia/b2g/libxul.so #5 0x00007ffff24a26bf in nsGenericHTMLElement::UnbindFromTree(bool, bool) () from /home/julien/travail/git/gaia/b2g/libxul.so #6 0x00007ffff248d127 in mozilla::dom::HTMLSharedElement::UnbindFromTree(bool, bool) () from /home/julien/travail/git/gaia/b2g/libxul.so #7 0x00007ffff1ed601d in nsDocument::cycleCollection::Unlink(void*) () from /home/julien/travail/git/gaia/b2g/libxul.so #8 0x00007ffff24b5d85 in nsHTMLDocument::cycleCollection::Unlink(void*) () from /home/julien/travail/git/gaia/b2g/libxul.so #9 0x00007ffff1802e03 in nsCycleCollector::CollectWhite() () from /home/julien/travail/git/gaia/b2g/libxul.so #10 0x00007ffff1802fb6 in nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) () from /home/julien/travail/git/gaia/b2g/libxul.so #11 0x00007ffff180312e in nsCycleCollector::FinishAnyCurrentCollection() () from /home/julien/travail/git/gaia/b2g/libxul.so #12 0x00007ffff18031a4 in mozilla::CycleCollectedJSRuntime::OnGC(JSGCStatus) () from /home/julien/travail/git/gaia/b2g/libxul.so #13 0x00007ffff35f42cf in js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) () from /home/julien/travail/git/gaia/b2g/libxul.so #14 0x00007ffff35f4dc3 in js::gc::GCRuntime::startGC(JSGCInvocationKind, JS::gcreason::Reason, long) () from /home/julien/travail/git/gaia/b2g/libxul.so #15 0x00007ffff35f4e72 in js::gc::GCRuntime::gcIfRequested(JSContext*) () from /home/julien/travail/git/gaia/b2g/libxul.so #16 0x00007ffff3235565 in js::gc::GCRuntime::gcIfNeededPerAllocation(JSContext*) () from /home/julien/travail/git/gaia/b2g/libxul.so #17 0x00007ffff323c012 in js::AccessorShape* js::Allocate<js::AccessorShape, (js::AllowGC)1>(js::ExclusiveContext*) () from /home/julien/travail/git/gaia/b2g/libxul.so #18 0x00007ffff32fdcb8 in js::NativeObject::removeProperty(js::ExclusiveContext*, jsid) () from /home/julien/travail/git/gaia/b2g/libxul.so #19 0x00007ffff32fdf68 in js::NativeDeleteProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<jsid>, JS::ObjectOpResult&) () from /home/julien/travail/git/gaia/b2g/libxul.so #20 0x00007ffff319ea35 in js::DeleteProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::ObjectOpResult&) () from /home/julien/travail/git/gaia/b2g/libxul.so #21 0x00007ffff327b21d in bool js::DeletePropertyJit<false>(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, bool*) () from /home/julien/travail/git/gaia/b2g/libxul.so #22 0x00007ffff7e14130 in ?? () #23 0xfffc7fffca66f140 in ?? () #24 0x00007fffffff770c in ?? () #25 0xfffc7fffd20351a0 in ?? () #26 0x00007fffd7e7ff26 in ?? () #27 0x00007ffff56af1a0 in DeletePropertyStrictInfo () from /home/julien/travail/git/gaia/b2g/libxul.so #28 0x00007ffff6b145e0 in ?? () #29 0x00007fffd7ce63e9 in ?? () #30 0x0000000000000701 in ?? () #31 0xfffc7fffc98b22a0 in ?? () #32 0x00007ffff6b5d3a0 in ?? () #33 0x00007fffffff7798 in ?? () #34 0xfffc7fffc98b22a0 in ?? () #35 0xfffc7fffc98b22a0 in ?? () #36 0x00007fffd4ae19a0 in ?? () #37 0x00007fffffff77e0 in ?? () #38 0x0000000000000058 in ?? () #39 0x00007fffc27b6400 in ?? () #40 0x00007fffe13c54b0 in ?? () #41 0x0000000000000001 in ?? () #42 0x00007ffff7e119fc in ?? () #43 0x00000000e7f3f0a0 in ?? () #44 0x00007fffffff77d0 in ?? () #45 0x00007fffd7e84e84 in ?? () #46 0x0000000000000202 in ?? () #47 0x00007fffc2b64f40 in ?? () #48 0x0000000000000001 in ?? () #49 0xfffc7fffd20351a0 in ?? () #50 0xfffc7fffd33bbdc0 in ?? () #51 0x00007fffffff7848 in ?? () #52 0x00007fffc3da5390 in ?? () #53 0x00007fffd7ce5685 in ?? () #54 0x0000000000000601 in ?? () #55 0xfffc7fffd33bbdc0 in ?? () #56 0xfffc7fffd20351a0 in ?? () #57 0xfffc7fffc2b64f40 in ?? () #58 0x00007fffd4ae1b20 in ?? () #59 0x00007fffffff7890 in ?? () #60 0xfff8800000000060 in ?? () #61 0x00007fffc282dd60 in ?? () #62 0x0000000000000000 in ?? () Not sure what I should do to make it better. Gabriele offered to test with its own debug build too so likely more to come.
Here is a stack trace from a debug build of current moz-central: #0 0x00007ffff1e9a653 in nsGenericDOMDataNode::UnbindFromTree (this=0x7fffccc8c9d0, aDeep=<optimized out>, aNullParent=<optimized out>) at /home/julien/travail/git/gecko/dom/base/nsGenericDOMDataNode.cpp:582 #1 0x00007ffff1e2026b in mozilla::dom::Element::UnbindFromTree (this=this@entry=0x7fffc5798880, aDeep=aDeep@entry=true, aNullParent=aNullParent@entry=false) at /home/julien/travail/git/gecko/dom/base/Element.cpp:1777 #2 0x00007ffff24c2875 in nsGenericHTMLElement::UnbindFromTree (this=this@entry=0x7fffc5798880, aDeep=aDeep@entry=true, aNullParent=aNullParent@entry=false) at /home/julien/travail/git/gecko/dom/html/nsGenericHTMLElement.cpp:588 #3 0x00007ffff24b2c5e in mozilla::dom::HTMLSharedElement::UnbindFromTree (this=0x7fffc5798880, aDeep=<optimized out>, aNullParent=<optimized out>) at /home/julien/travail/git/gecko/dom/html/HTMLSharedElement.cpp:292 #4 0x00007ffff1e2026b in mozilla::dom::Element::UnbindFromTree (this=this@entry=0x7fffc57987c0, aDeep=aDeep@entry=true, aNullParent=aNullParent@entry=true) at /home/julien/travail/git/gecko/dom/base/Element.cpp:1777 #5 0x00007ffff24c2875 in nsGenericHTMLElement::UnbindFromTree (this=this@entry=0x7fffc57987c0, aDeep=aDeep@entry=true, aNullParent=aNullParent@entry=true) at /home/julien/travail/git/gecko/dom/html/nsGenericHTMLElement.cpp:588 #6 0x00007ffff24b2c5e in mozilla::dom::HTMLSharedElement::UnbindFromTree (this=0x7fffc57987c0, aDeep=<optimized out>, aNullParent=<optimized out>) at /home/julien/travail/git/gecko/dom/html/HTMLSharedElement.cpp:292 #7 0x00007ffff1e6fba8 in nsDocument::cycleCollection::Unlink (this=<optimized out>, p=p@entry=0x7fffc97e2000) at /home/julien/travail/git/gecko/dom/base/nsDocument.cpp:2057 #8 0x00007ffff24d86b3 in nsHTMLDocument::cycleCollection::Unlink (this=<optimized out>, p=0x7fffc97e2000) at /home/julien/travail/git/gecko/dom/html/nsHTMLDocument.cpp:187 #9 0x00007ffff16a2f3e in nsCycleCollector::CollectWhite (this=this@entry=0x7fffe8d68000) at /home/julien/travail/git/gecko/xpcom/base/nsCycleCollector.cpp:3274 #10 0x00007ffff16a375a in nsCycleCollector::Collect (this=0x7fffe8d68000, aCCType=SliceCC, aBudget=..., aManualListener=0x0, aPreferShorterSlices=false) at /home/julien/travail/git/gecko/xpcom/base/nsCycleCollector.cpp:3631 #11 0x00007ffff16a38f3 in nsCycleCollector::FinishAnyCurrentCollection (this=0x7fffccc8c9d0) at /home/julien/travail/git/gecko/xpcom/base/nsCycleCollector.cpp:3694 #12 0x00007ffff16a394c in mozilla::CycleCollectedJSRuntime::OnGC (this=0x7fffe372b800, aStatus=JSGC_BEGIN) at /home/julien/travail/git/gecko/xpcom/base/CycleCollectedJSRuntime.cpp:1214 #13 0x00007ffff37411ee in js::gc::GCRuntime::collect (this=0x7fffe2561338, incremental=incremental@entry=true, budget=..., reason=JS::gcreason::TOO_MUCH_MALLOC) at /home/julien/travail/git/gecko/js/src/jsgc.cpp:6150 #14 0x00007ffff374146a in js::gc::GCRuntime::startGC (this=<optimized out>, gckind=<optimized out>, reason=<optimized out>, millis=<optimized out>) at /home/julien/travail/git/gecko/js/src/jsgc.cpp:6223 #15 0x00007ffff3742622 in js::gc::GCRuntime::gcIfRequested (this=0x7fffe2561338, cx=<optimized out>) at /home/julien/travail/git/gecko/js/src/jsgc.cpp:6438 #16 0x00007ffff334b8d5 in js::gc::GCRuntime::gcIfNeededPerAllocation (this=0x7fffe2561338, cx=0x7fffc82051d0) at /home/julien/travail/git/gecko/js/src/gc/Allocator.cpp:34 #17 0x00007ffff3371473 in checkAllocatorState<(js::AllowGC)1> (kind=js::gc::STRING, cx=0x7fffc82051d0, this=<optimized out>) at /home/julien/travail/git/gecko/js/src/gc/Allocator.cpp:55 #18 js::Allocate<JSString, (js::AllowGC)1> (cx=cx@entry=0x7fffc82051d0) at /home/julien/travail/git/gecko/js/src/gc/Allocator.cpp:208 #19 0x00007ffff347389f in new_<(js::AllowGC)1, unsigned char> (length=471, chars=0x7fffd049ee00 "function () {\n\"use strict\";\n\n", ' ' <repeats 14 times>, "ConversationView.recipients.add({\n", ' ' <repeats 16 times>, "number: 'foo',\n", ' ' <repeats 16 times>, "isQuestionable: true\n", ' ' <repeats 14 times>, "});\n\n", ' ' <repeats 14 times>, "ConversationView.on.wi"..., cx=<optimized out>) at /home/julien/travail/git/gecko/js/src/vm/String-inl.h:223 #20 js::NewStringDontDeflate<(js::AllowGC)1, unsigned char> (cx=cx@entry=0x7fffc82051d0, chars=chars@entry=0x7fffd049ee00 "function () {\n\"use strict\";\n\n", ' ' <repeats 14 times>, "ConversationView.recipients.add({\n", ' ' <repeats 16 times>, "number: 'foo',\n", ' ' <repeats 16 times>, "isQuestionable: true\n", ' ' <repeats 14 times>, "});\n\n", ' ' <repeats 14 times>, "ConversationView.on.wi"..., length=length@entry=471) at /home/julien/travail/git/gecko/js/src/vm/String.cpp:1074 #21 0x00007ffff3447bf6 in FinishStringFlat<unsigned char, js::Vector<unsigned char, 64ul> > (cb=..., sb=..., cx=0x7fffc82051d0) at /home/julien/travail/git/gecko/js/src/vm/StringBuffer.cpp:86 #22 js::StringBuffer::finishString (this=this@entry=0x7fffffff0fe0) at /home/julien/travail/git/gecko/js/src/vm/StringBuffer.cpp:127 #23 0x00007ffff3747e9e in js::FunctionToString (cx=0x7fffc82051d0, fun=..., bodyOnly=<optimized out>, lambdaParen=<optimized out>) at /home/julien/travail/git/gecko/js/src/jsfun.cpp:1092 #24 0x00007ffff37486e2 in fun_toStringHelper (indent=<optimized out>, obj=..., cx=<optimized out>) at /home/julien/travail/git/gecko/js/src/jsfun.cpp:1109 #25 js::fun_toString (cx=0x7fffc82051d0, argc=1, vp=0x7fffffff1148) at /home/julien/travail/git/gecko/js/src/jsfun.cpp:1127 #26 0x00007fffd88d1361 in ?? () #27 0x00007fffc74a2be0 in ?? () #28 0x00007fffffff1120 in ?? () #29 0x0000000000000000 in ?? ()
Hey Olli, is this stack trace useful to you?
Flags: needinfo?(bugs)
Oh, yes, that is very interesting. It does look related to my patch, at least indirectly. Maybe mParent is null somehow? I guess my patch could affect the order of DOM teardown somehow.
Do you have any information about what the actual value of the crash is? eg are we crashing on a null deref?
Flags: needinfo?(felash)
(gdb) p mParent $1 = (nsINode *) 0x0 looks like so :)
Flags: needinfo?(felash)
To give a little bit of context, the unit tests are constantly adding/removing nodes quite quickly; sometimes we keep references to nodes that are removed from the DOM before the reference are themselves freed. Not sure if this is useful :)
Depends on: 1177627
Flags: needinfo?(continuation)
Flags: needinfo?(bugs)
Looks like bug 1177627 successfully fixed this -> closing. Let's reopen if new reports are coming.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee: nobody → continuation
Target Milestone: --- → FxOS-S2 (10Jul)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: