Closed Bug 1046967 Opened 10 years ago Closed 10 years ago

Performance (b2gperf) tests crashing on b2g-inbound builds

Categories

(Firefox OS Graveyard :: Performance, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: davehunt, Assigned: hub)

References

Details

(4 keywords, Whiteboard: [c=automation p=1 s= u=])

Attachments

(2 files)

Since yesterday the performance tests have been crashing after just a few iterations. I can reproduce this locally and it doesn't appear to matter which application is being tested (replicated with both Phone and Contacts). device_firmware_date: 1403855878 device_firmware_version_incremental: 110 device_firmware_version_release: 4.3 device_id: flame Last good: application_buildid: 20140730104209 application_changeset: b8d783033da7 build_changeset: 3aa6abd313f965a84aa86c6b213dc154e4875139 gaia_changeset: b67ddd7d40b52e65199478b8d6631c2c28fdf41d gaia_date: 1406740488 platform_buildid: 20140730104209 platform_changeset: b8d783033da7 First bad: application_buildid: 20140730105005 application_changeset: 4cc9e0c5dd67 build_changeset: 3aa6abd313f965a84aa86c6b213dc154e4875139 gaia_changeset: c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 gaia_date: 1406740488 platform_buildid: 20140730105005 platform_changeset: 4cc9e0c5dd67
Hub, I need you to look into this and identify the root cause.Please work with Dave Hunt and anyone else needed to resolve this. Thanks, Mike
Severity: normal → blocker
Status: NEW → ASSIGNED
Component: General → Performance
Keywords: perf
Priority: -- → P1
Whiteboard: [c=automation p= s= u=]
Example console output demonstrating the issue is below. When I've replicated locally I see the device perform a reboot. b2gperf --address=localhost:2828 --device=356cd072 --delay=10 --sources=sources.xml --testvars=/home/webqa/webqa-credentials/b2g/b2g-13.1.json --dz-project=b2g --dz-branch=master --dz-device=flame --dz-key=**** --dz-secret=**** --dz-build-url=http://jenkins1.qa.scl3.mozilla.com/job/flame.b2g-inbound.perf.b2gperf/950/ --reset Phone Contacts Messages Settings Gallery Video Music Camera Email Calendar Clock FM Radio Usage Template Browser 2014-07-30 11:56:51,959 B2GPerfRunner INFO | Running B2GPerfLaunchTest 2014-07-30 11:58:37,820 B2GPerfRunner INFO | Phone [1/30] 2014-07-30 11:58:49,098 B2GPerfRunner INFO | Phone [2/30] 2014-07-30 11:59:00,470 B2GPerfRunner INFO | Phone [3/30] 2014-07-30 11:59:11,797 B2GPerfRunner INFO | Phone [4/30] 2014-07-30 11:59:23,183 B2GPerfRunner INFO | Phone [5/30] 2014-07-30 11:59:34,167 B2GPerfRunner INFO | Phone [6/30] 2014-07-30 11:59:45,144 B2GPerfRunner INFO | Phone [7/30] 2014-07-30 11:59:56,051 B2GPerfRunner INFO | Phone [8/30] 2014-07-30 12:00:07,158 B2GPerfRunner INFO | Phone [9/30] Traceback (most recent call last): File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/bin/b2gperf", line 9, in <module> load_entry_point('b2gperf==0.32', 'console_scripts', 'b2gperf')() File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 595, in cli b2gperf.measure_app_perf(args) File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 201, in measure_app_perf test.run() File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 338, in run self.test() File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 377, in test 'launch("%s")' % self.app_name) File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette/marionette.py", line 1166, in execute_async_script filename=os.path.basename(frame[0])) File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette/decorators.py", line 35, in _ return func(*args, **kwargs) File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette/marionette.py", line 590, in _send_message response = self.client.send(message) File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette_transport/transport.py", line 100, in send response = self.receive() File "/var/jenkins/1/workspace/flame.b2g-inbound.perf.b2gperf/.env/local/lib/python2.7/site-packages/marionette_transport/transport.py", line 57, in receive raise IOError(self.connection_lost_msg) IOError: Connection to Marionette server is lost. Check gecko.log (desktop firefox) or logcat (b2g) for errors.
Summary: Performance tests crashing on b2g-inbound builds → Performance (b2gperf) tests crashing on b2g-inbound builds
I've just tested locally with the two b2g-inbound builds around the regression (without resetting gaia) and was unable to replicate the crash. This would imply it's a gaia issue between b67ddd7d40b52e65199478b8d6631c2c28fdf41d and c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 however I've run out of time today for investigating this.
(In reply to Dave Hunt (:davehunt) from comment #3) > I've just tested locally with the two b2g-inbound builds around the > regression (without resetting gaia) and was unable to replicate the crash. > This would imply it's a gaia issue between > b67ddd7d40b52e65199478b8d6631c2c28fdf41d and > c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 however I've run out of time today > for investigating this. https://github.com/mozilla-b2g/gaia/compare/b67ddd7d40b52e65199478b8d6631c2c28fdf41d...c2d7dafab9dcadf1b5a099972d4c7647dcc4e276
(In reply to Jason Smith [:jsmith] - At Work Week, Slow to Respond from comment #4) > (In reply to Dave Hunt (:davehunt) from comment #3) > > I've just tested locally with the two b2g-inbound builds around the > > regression (without resetting gaia) and was unable to replicate the crash. > > This would imply it's a gaia issue between > > b67ddd7d40b52e65199478b8d6631c2c28fdf41d and > > c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 however I've run out of time today > > for investigating this. > > https://github.com/mozilla-b2g/gaia/compare/ > b67ddd7d40b52e65199478b8d6631c2c28fdf41d... > c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 Maybe bug 1045132 caused this? ahal - what do you think?
Flags: needinfo?(ahalberstadt)
(In reply to Jason Smith [:jsmith] - At Work Week, Slow to Respond from comment #5) > (In reply to Jason Smith [:jsmith] - At Work Week, Slow to Respond from > comment #4) > > (In reply to Dave Hunt (:davehunt) from comment #3) > > > I've just tested locally with the two b2g-inbound builds around the > > > regression (without resetting gaia) and was unable to replicate the crash. > > > This would imply it's a gaia issue between > > > b67ddd7d40b52e65199478b8d6631c2c28fdf41d and > > > c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 however I've run out of time today > > > for investigating this. > > > > https://github.com/mozilla-b2g/gaia/compare/ > > b67ddd7d40b52e65199478b8d6631c2c28fdf41d... > > c2d7dafab9dcadf1b5a099972d4c7647dcc4e276 > > Maybe bug 1045132 caused this? > > ahal - what do you think? Ack. Mistyped the bug #. Meant to say bug 1045142.
I just dug into our smoketest reports for today as a point of comparison. The bug that's causing us trouble on our side is bug 1038854. It's causing the camera to fail to start (might also be the reason why email is crashing too).
Ting, can you help here? We need to understand what about your changes in bug 1038854 is causing these test crashes.
Flags: needinfo?(tchou)
ahal mentioned in IRC that bug 1045132 was unlikely to be the cause of this bug, as he thinks the runner service isn't used by anything yet.
Flags: needinfo?(ahalberstadt)
(In reply to Mike Lee [:mlee] from comment #9) > Ting, can you help here? We need to understand what about your changes in > bug 1038854 is causing these test crashes. Note - if bug 1038854 is the cause, then this should be resolved when a new build gets spun with the backout included.
I noticed this yesterday on my device too. I'll update my tree and try again.
By notice this, I mean with |make test-perf|. It is an actual crash of Gecko as on the screen it says "B2G crashed". Looking at bug 1045142 I doubt this code is used anywhere with |make test-perf|
I just update and rebuilt. My top gecko commit is: commit c60b44a7b137ed1ebb3444efebb089d755424d54 Author: Wes Kocher <wkocher@mozilla.com> Date: Thu Jul 31 15:04:49 2014 -0700 Backed out changeset f73cd738c1fe (bug 1038854) a=backout which is the backout for the bug mentioned above. It still crashes. I'll try to dig further, but we might need to bisect.
bisecting it right now.
Keywords: regression
Assignee: nobody → hub
Whiteboard: [c=automation p= s= u=] → [c=automation p=1 s= u=]
Clear NI per comment 14.
Flags: needinfo?(tchou)
to reproduce |APP=clock RESTART_B2G=0 make test-perf| I crashes b2g when doing that.
I confirm that bug 1038854 isn't the source as the crash occurs before this bug was checked in and after it was reverted.
[Blocking Requested - why for this release]: Regression in an existing test suite that must stay up to allow us to do performance measurements.
blocking-b2g: --- → 2.1?
Keywords: qablocker
(In reply to Hubert Figuiere [:hub] from comment #18) > I confirm that bug 1038854 isn't the source as the crash occurs before this > bug was checked in and after it was reverted. hub - can you get a crash stack for the crash being seen here? I could dig through bugzilla here to see if there's a stack already with the crash you are seeing if I know the crash stack.
Flags: needinfo?(hub)
Attached file dmesg output (deleted) —
Attached file Stack trace (deleted) —
Not sure if I am running at the same issue here, but when I do |make test-perf|, eventually b2g process crashes, and keeps restarting and crashing even after reboot. The only way to fix is to reflash the phone. Attachment 8466516 [details] contains the dmesg output and Attachment 8466517 [details] shows the gdb stack trace.
Flags: needinfo?(hub)
(You can work around by not using a debug build.)
I don't have the "keep restarting" here that Wander is saying he has though. Bisect result 8062fdbcecee32574f64f4a0553a4da053a91d93 is the first bad commit commit 8062fdbcecee32574f64f4a0553a4da053a91d93 Author: Sean Lin <selin@mozilla.com> Date: Tue Jun 24 10:51:48 2014 +0800 Bug 874353 - Remove CPU wake lock control from ContentParent. r=gene, khuey :040000 040000 08e93bb32d9c44606f7ea3860e37ed657258c16f f96f62965e32c498aafb5b384843b3bf08ac4dcc M dom
It is a git bisect using the B2G tree. sha1 needs to be map to the actual hg sha1. Just in case we weren't clear on that.
Blocks: 874353
Also the patch has already been backed out due to bug 1046956 (possibly). And after that it no longer crashes. Backtracking to before the back out. still crashes. At the back out: it no longer crashes. Back out is git revision: commit 19e5d2d26c417bd79a6c33d7fb1b4bedfb4ec713 Author: Kyle Huey <khuey@kylehuey.com> Date: Fri Aug 1 11:02:55 2014 -0700 Back out bug 874353, which is suspected of causing bug 1046956. r=me a=backout
Hub - Can we close this then if we've confirmed this no longer reproduces with the backout of bug 874353?
Flags: needinfo?(hub)
Of course we can. See comment 28 for the resolution.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(hub)
Resolution: --- → FIXED
blocking-b2g: 2.1? → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: