Closed Bug 1752417 Opened 3 years ago Closed 3 years ago

Setting a telemetry_mirror on labeled_counter glean probes leaks nsStringBuffer instances at shutdown

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox98 --- fixed

People

(Reporter: florian, Assigned: chutten)

References

Details

(Whiteboard: [telemetry:fog:m?])

Attachments

(1 file)

TEST-UNEXPECTED-FAIL | leakcheck | default 288 bytes leaked (nsStringBuffer)
TEST-UNEXPECTED-FAIL | leakcheck | tab 16 bytes leaked (nsStringBuffer)
...

There are "XPCOM object nsStringBuffer released from static ctor/dtor" warnings in the terminal. If I set the MOZ_FATAL_STATIC_XPCOM_CTORS_DTORS environment variable, the stack I get is:

 Hit MOZ_CRASH(XPCOM object nsStringBuffer released from static ctor/dtor) at <src>/xpcom/base/nsTraceRefcnt.cpp:204
Initializing stack-fixing for the first stack frame, this may take a while...
#01: AssertActivityIsLegal(char const*, char const*) [<src>/xpcom/base/nsTraceRefcnt.cpp:204]
#02: NS_LogRelease [<src>/xpcom/base/nsTraceRefcnt.cpp:937]
#03: nsStringBuffer::Release() [<src>/xpcom/string/nsSubstring.cpp:185]
#04: PLDHashTable::~PLDHashTable() [<src>/xpcom/ds/PLDHashTable.cpp:293]
#05: mozilla::UniquePtr<nsBaseHashtable<nsIntegralHashKey<unsigned int, 0>, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, nsDefaultConverter<mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> > > >, mozilla::DefaultDelete<nsBaseHashtable<nsIntegralHashKey<unsigned int, 0>, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, nsDefaultConverter<mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> > > > > >::reset(nsBaseHashtable<nsIntegralHashKey<unsigned int, 0>, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, nsDefaultConverter<mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> > > >*) [<src>/obj-dbg/dist/include/mozilla/UniquePtr.h:305]
#06: ??? [/lib/x86_64-linux-gnu/libc.so.6 + 0x49a27]
#07: on_exit [/lib/x86_64-linux-gnu/libc.so.6 + 0x49be0]
#08: __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6 + 0x270ba]
#09: ??? [<src>/obj-dbg/dist/bin/firefox + 0x40de9]

This looks a lot like sLabeledMirrors not being cleaned up at shutdown.

It most certainly isn't being cleaned, and I really should know better given I've done this before. There are also two other maps generated very similarly in GIFFT that could be cleared as well (they're only not leaking because the timing_distribution and timespan metrics all clean up after themselves if they're used correctly).

Should be a fairly quick matter to

  1. Reproduce the issue locally
  2. Write and apply the cleanup-on-shutdown patch
  3. Ensure it fixes it locally
  4. Update the codegen pytests' expected files
  5. Pop it up for review

(especially since I've already done Step 2 and am most of the way through the build for Step 1 at time of writing)

Assignee: nobody → chutten
Severity: -- → S3
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [telemetry:fog:m?]

This wasn't caught by test_GIFFT because xpcshell doesn't run a leakcheck.

Backed out for failures on browser_labeled_gifft.js

[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - TEST-PASS | toolkit/components/glean/tests/browser/browser_labeled_gifft.js | . Can't get the value when you're error'd - 
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - Buffered messages finished
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - TEST-UNEXPECTED-FAIL | toolkit/components/glean/tests/browser/browser_labeled_gifft.js | {"a_label":1,"another_label":2,"InvalidLabel":3} deepEqual "undefined" - JS frame :: chrome://mochitests/content/browser/toolkit/components/glean/tests/browser/browser_labeled_gifft.js :: <TOP_LEVEL> :: line 37
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - Stack trace:
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - chrome://mochitests/content/browser/toolkit/components/glean/tests/browser/browser_labeled_gifft.js:null:37
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - chrome://mochikit/content/browser-test.js:Tester_execTest/<:1130
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1170
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - chrome://mochikit/content/browser-test.js:nextTest/<:978
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/<:1041
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - Leaving test bound 
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - Console message: [JavaScript Error: "Metric had 1 error(s) of type invalid_label!"]
[task 2022-01-28T18:22:13.554Z] 18:22:13     INFO - GECKO(3409) | MEMORY STAT | vsize 11420MB | residentFast 501MB | heapAllocated 223MB
[task 2022-01-28T18:22:13.556Z] 18:22:13     INFO - TEST-OK | toolkit/components/glean/tests/browser/browser_labeled_gifft.js | took 70ms
Flags: needinfo?(chutten)

Well isn't this just hilarious.

Turns out the patch that landed isn't the patch I wrote. The patch I wrote added the line

release_channel_collection: 'opt-out'

to telemetry.test.keyed_unsigned_int

The patch that landed added it to telemetry.test.keyed_boolean_kind.

This is probably due to my patch not being based on latest tip of tree. I've fixed that and will repush.

Flags: needinfo?(chutten)
Regressions: 1752650
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
Regressions: 1763593
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: