Open Bug 1521191 Opened 6 years ago Updated 2 years ago

Failing to produce a leak log does not cause a test failure

Tracking

(firefox-esr60 unaffected, firefox64 unaffected, firefox65 wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 wontfix, firefox69 fix-optional, firefox70 fix-optional)

Status:

NEW

Tracking Flags:

Tracking

Status

firefox-esr60

---

unaffected

firefox64

---

unaffected

firefox65

---

wontfix

firefox66

---

wontfix

firefox67

---

wontfix

firefox68

---

wontfix

firefox69

---

fix-optional

firefox70

---

fix-optional

People

(Reporter: mccr8, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [MemShrink:P2])

Attachments

(4 files, 1 obsolete file)

Never print out a leak log. DO NOT LAND 6 years ago Andrew McCreight [:mccr8] (deleted), patch		Details \| Diff \| Splinter Review
Bug 1521191, part 1 - Add method to disable dump statistics. 6 years ago Andrew McCreight [:mccr8] (deleted), text/x-phabricator-request		Details
Bug 1521191, part 2 - Ensure missing leak logs cause mozharness to fail. 6 years ago Andrew McCreight [:mccr8] (deleted), text/x-phabricator-request		Details
Bug 1521191, part 3 - Add a function to create a negative leak. 6 years ago Andrew McCreight [:mccr8] (deleted), text/x-phabricator-request		Details
Bug 1521191, part 4 - Add a test for producing an error if there's a negative leak. 6 years ago Andrew McCreight [:mccr8] (deleted), text/x-phabricator-request		Details

Andrew McCreight [:mccr8]

Reporter

Description

•

6 years ago

I did a try push with a patch that makes it so that it never prints out a leak log, and this does not cause an orange on TreeHerder:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=222757782&repo=try&lineNumber=5844

Note that you do get a FAIL if you run the tests locally through mach.

Bug 1518115 is about how we print out the "missing output line" message even if the failure is expected, while this is how the test job does not turn orange if we actually hit that failure.

This is probably not hiding any leaks because the most common cause of this error is a crash, but we should make sure of that.

Andrew McCreight [:mccr8]

Reporter

Updated

•

6 years ago

status-firefox64: --- → unaffected

status-firefox65: --- → affected

status-firefox66: --- → affected

status-firefox-esr60: --- → unaffected

Keywords: regression

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

6 years ago

I'm not exactly sure when this code is run, but it only handles the leaking failure case from mozleak_total, and not the missing log case, so maybe that's the reason for not failing? https://hg.mozilla.org/mozilla-central/rev/d40ebdfc91a0

Andrew McCreight [:mccr8]

Reporter

Comment 2

•

6 years ago

Attached patch Never print out a leak log. DO NOT LAND (obsolete) (deleted) — Details — Splinter Review

Here's the patch I used to disable leak logging.

Andrew McCreight [:mccr8]

Reporter

Comment 3

•

6 years ago

James, can you take a look at this? Thanks.

Flags: needinfo?(james)

Andrew McCreight [:mccr8]

Reporter

Comment 4

•

6 years ago

I think I have a handle on how to fix this. I think the logic for mozleak_total in statushandler.py needs to be extended to handle this case.

Flags: needinfo?(james)

Andrew McCreight [:mccr8]

Reporter

Updated

•

6 years ago

Assignee: nobody → continuation

Andrew McCreight [:mccr8]

Reporter

Comment 5

•

6 years ago

Attached file Bug 1521191, part 1 - Add method to disable dump statistics. (deleted) — Details

In order to test the test harness's handling of a process failing to
produce a leak log, add a special function that disables the bloat log
output.

Andrew McCreight [:mccr8]

Reporter

Comment 6

•

6 years ago

Attached file Bug 1521191, part 2 - Ensure missing leak logs cause mozharness to fail. (deleted) — Details

Depends on D17534

Andrew McCreight [:mccr8]

Reporter

Comment 7

•

6 years ago

Attached file Bug 1521191, part 3 - Add a function to create a negative leak. (deleted) — Details

This is needed to test the test harness's handling of a negative leak
being reported by the XPCOM leak checker.

Depends on D17535

Andrew McCreight [:mccr8]

Reporter

Comment 8

•

6 years ago

Attached file Bug 1521191, part 4 - Add a test for producing an error if there's a negative leak. (deleted) — Details

Depends on D17536

Andrew McCreight [:mccr8]

Reporter

Updated

•

6 years ago

Attachment #9037691 - Attachment is obsolete: true

Pulsebot

Comment 9

•

6 years ago

Pushed by amccreight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d6ce4b187195 part 1 - Add method to disable dump statistics. r=froydnj https://hg.mozilla.org/integration/autoland/rev/38d3da4804d7 part 2 - Ensure missing leak logs cause mozharness to fail. r=ahal https://hg.mozilla.org/integration/autoland/rev/3782d011cc9f part 3 - Add a function to create a negative leak. r=froydnj https://hg.mozilla.org/integration/autoland/rev/138e162d2778 part 4 - Add a test for producing an error if there's a negative leak. r=ahal

Dorel Luca [:dluca]

Comment 10

•

6 years ago

Backed out 4 changesets (bug 1521191) for causing leaks on multiple tests

Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=224150195&repo=autoland&lineNumber=53008

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&selectedJob=224150195&searchStr=linux%2Cdebug%2Cweb%2Cplatform%2Ctests%2Ctest-linux32%2Fdebug-web-platform-tests-16%2Cw%28wpt16%29&revision=138e162d2778d37a1b3cc474aade64647bbc1bd6

Backout:
https://hg.mozilla.org/integration/autoland/rev/044a86a4032cf1ab76e4cd43f5d7929fcff0c9a0

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Reporter

Comment 11

•

6 years ago

There's a leak in the webrtc/ directory, which is hitting the "NSS is still alive" fatal assertion, so we crash and don't produce a leak log. That assertion should probably be changed because "crash on leak" is not helpful. Anyways, I'm not sure if this leak is new or I just failed to do a try push with these patches.

Andrew McCreight [:mccr8]

Reporter

Updated

•

6 years ago

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Reporter

Comment 12

•

6 years ago

There's a number of issues here:

I think this WebRTC leak is a regression, so that should get tracked down.
The "NSS_Shutdown failed" crash should be turned into a leak, not a crash.
There needs to be a way to specify in a dir.ini file that there is no leak for a process type, and that needs to be updated on import. I don't see anything like that right now, but maybe I'm just missing it. I meant to look at this before I landed this, but I forgot about it.

Andrew McCreight [:mccr8]

Reporter

Comment 13

•

6 years ago

I found my try push for these patches:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=42f5f6be042d6a915f37bc3943cf3d11ee9411ba

It does contain the NSS_Shutdown crash, but it is still somehow green. I think this implies that:

The WebRTC leak isn't a regression.
My patch somehow doesn't actually detect a missing leak log for the WebRTC leak.
There's some other crash that has started happening?

The logs don't seem to contain the pids of things that crash which makes it hard to figure out what is going on.

In fact, with my try push, there are a bunch of other crashes in WebRTC that show up in the "failure summary" but didn't cause the test to go orange, which is peculiar:
PROCESS-CRASH | /webrtc/RTCPeerConnection-removeTrack.https.html | application crashed [@ libxul.so + 0xd1a42f]

Andrew McCreight [:mccr8]

Reporter

Comment 14

•

6 years ago

Oh, I see, bug 1519862 recently landed and made the leak checker ignore missing logs for content processes. The orange in my attempted landed was from a main process crash. That at least explains that.

Andrew McCreight [:mccr8]

Reporter

Updated

•

6 years ago

Blocks: 1517309

Andrew McCreight [:mccr8]

Reporter

Updated

•

6 years ago

Depends on: 1523445

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Updated

•

6 years ago

status-firefox65: affected → fix-optional

status-firefox66: affected → fix-optional

status-firefox67: --- → affected

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Updated

•

6 years ago

Whiteboard: [MemShrink]

Eric Rahm [:erahm]

Updated

•

6 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Andrew Halberstadt [:ahal]

Updated

•

6 years ago

Priority: -- → P2

Pascal Chevrel:pascalc (PTO until August 21)

Updated

•

6 years ago

status-firefox65: fix-optional → wontfix

status-firefox66: fix-optional → wontfix

status-firefox67: affected → wontfix

status-firefox68: --- → affected

Andrew McCreight [:mccr8]

Reporter

Comment 16

•

6 years ago

I'm not going to be able to get to this any time soon.

Assignee: continuation → nobody

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Updated

•

5 years ago

status-firefox68: affected → wontfix

status-firefox69: --- → affected

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 17

•

5 years ago

Marking fix-optional across all versions to remove this from weekly regression triage.

status-firefox69: affected → fix-optional

status-firefox70: --- → fix-optional

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.