Open
Bug 1009060
Opened 11 years ago
Updated 2 years ago
Limit depth of lateWrites stacks in Telemetry payloads
Categories
(Toolkit :: Telemetry, defect, P4)
Tracking
()
REOPENED
People
(Reporter: mreid, Unassigned)
References
(Depends on 1 open bug, Blocks 2 open bugs)
Details
Attachments
(2 files)
It appears that some of the chromehang reports contain a really huge stack trace (including millions of entries).
We should determine if this is due to a bug in the chromehang code, or if these are legitimate stacks. If the latter, we should probably truncate them before reporting.
I've attached one example payload, let me know if you need more info.
Reporter | ||
Comment 1•11 years ago
|
||
Correction - it's the lateWrites part (not chromeHangs) that has a lot of large stacks.
Summary: Overly large chromehang stacks in Telemetry payloads → Overly large lateWrites stacks in Telemetry payloads
Comment 2•11 years ago
|
||
This might be a bug in a 3rd party security tool.. PGHook.dll is part of "Avecto Privilege Guard" http://www.avecto.com/products/overview/
These are the top & bottom of the hang stack. Unfortunately the symbols for this Nightly (from April 7th) are no longer available so the xul frames can't be symbolicated.
0xca384 (in xul.pdb)
0x797bc (in xul.pdb)
0x79770 (in xul.pdb)
FlushFileBuffers (in wkernelbase.pdb)
FlushFileBuffersImplementation (in wkernel32.pdb)
0x175d7 (in PGHook.pdb)
0x1ffff (in PGHook.pdb)
MoveFileWithProgressTransactedW (in wkernel32.pdb)
MoveFileWithProgressTransactedA (in wkernel32.pdb)
MoveFileWithProgressA (in wkernel32.pdb)
MoveFileA (in wkernel32.pdb)
0x8ed1 (in nss3.pdb)
0xc16dd6 (in xul.pdb)
0xca384 (in xul.pdb)
0x797bc (in xul.pdb)
0x79770 (in xul.pdb)
FlushFileBuffers (in wkernelbase.pdb)
FlushFileBuffersImplementation (in wkernel32.pdb)
0x175d7 (in PGHook.pdb)
0x1ffff (in PGHook.pdb)
MoveFileWithProgressTransactedW (in wkernel32.pdb)
MoveFileWithProgressTransactedA (in wkernel32.pdb)
MoveFileWithProgressA (in wkernel32.pdb)
MoveFileA (in wkernel32.pdb)
0x8ed1 (in nss3.pdb)
0xc16dd6 (in xul.pdb)
0xca384 (in xul.pdb)
.. snip ...
0xca384 (in xul.pdb)
0x797bc (in xul.pdb)
0x79770 (in xul.pdb)
FlushFileBuffers (in wkernelbase.pdb)
FlushFileBuffersImplementation (in wkernel32.pdb)
0x175d7 (in PGHook.pdb)
0x1ffff (in PGHook.pdb)
MoveFileWithProgressTransactedW (in wkernel32.pdb)
MoveFileWithProgressTransactedA (in wkernel32.pdb)
MoveFileWithProgressA (in wkernel32.pdb)
MoveFileA (in wkernel32.pdb)
0x8ed1 (in nss3.pdb)
0xc16dd6 (in xul.pdb)
0xca384 (in xul.pdb)
0x797bc (in xul.pdb)
0x79770 (in xul.pdb)
FlushFileBuffers (in wkernelbase.pdb)
FlushFileBuffersImplementation (in wkernel32.pdb)
0x175d7 (in PGHook.pdb)
0x1ffff (in PGHook.pdb)
MoveFileWithProgressTransactedW (in wkernel32.pdb)
MoveFileWithProgressTransactedA (in wkernel32.pdb)
MoveFileWithProgressA (in wkernel32.pdb)
MoveFileA (in wkernel32.pdb)
0x8ed1 (in nss3.pdb)
0xc16dd6 (in xul.pdb)
0xca384 (in xul.pdb)
0x797bc (in xul.pdb)
0x79770 (in xul.pdb)
FlushFileBuffers (in wkernelbase.pdb)
FlushFileBuffersImplementation (in wkernel32.pdb)
0x175d7 (in PGHook.pdb)
0x1ffff (in PGHook.pdb)
MoveFileWithProgressTransactedW (in wkernel32.pdb)
MoveFileWithProgressTransactedA (in wkernel32.pdb)
MoveFileWithProgressA (in wkernel32.pdb)
MoveFileA (in wkernel32.pdb)
0x8ed1 (in nss3.pdb)
0x913db1 (in xul.pdb)
0x2a058e (in xul.pdb)
0x1753 (in firefox.pdb)
0x1a1b (in firefox.pdb)
0x1b26 (in firefox.pdb)
0x242f (in firefox.pdb)
BaseThreadInitThunk (in wkernel32.pdb)
__RtlUserThreadStart (in wntdll.pdb)
_RtlUserThreadStart (in wntdll.pdb)
Comment 4•11 years ago
|
||
Given the limited information that we have there, it looks like IOInterposer caught the FlushFileBuffers call but something in the handler caused us to repeatedly reenter.
There is a thread-local variable in IOInterposer that is supposed to guard against this kind of reentry. For some reason that didn't catch in this case. Furthermore, I agree that this PGHook module is not helping. It would sure be nice to have symbols. :-(
Mark, do we have any of these that are new enough to symbolicate?
Flags: needinfo?(aklotz) → needinfo?(mreid)
Comment 5•11 years ago
|
||
I traced the MoveFileA call to PR_Rename and then to http://dxr.mozilla.org/mozilla-central/source/xpcom/build/LateWriteChecks.cpp#208 ; I think that gives me enough to look further into this.
Comment 6•11 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #4)
> There is a thread-local variable in IOInterposer that is supposed to guard
> against this kind of reentry. For some reason that didn't catch in this
> case. Furthermore, I agree that this PGHook module is not helping. It would
> sure be nice to have symbols. :-(
I checked to see when the reentry variable was added; I checked it in on April 8, which would make sense given the timing of the stack in this bug. Mark, I'm amending my needinfo request: Have you seen anything like this appear since April 8?
Flags: needinfo?(mreid)
Comment 7•11 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #6)
> Mark, I'm amending my needinfo request: Have you seen anything like this
> appear since April 8?
Flags: needinfo?(mreid)
Reporter | ||
Comment 8•11 years ago
|
||
I see one from build id 20140409030203, but nothing since then.
Flags: needinfo?(mreid)
Comment 9•11 years ago
|
||
(In reply to Mark Reid [:mreid] from comment #8)
> I see one from build id 20140409030203, but nothing since then.
I don't think that bug 913653 made it into that build. I doubt that we'll see anything like this again. If we do, of course we can always reopen.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
Reporter | ||
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Reporter | ||
Comment 10•11 years ago
|
||
Here's a newer one (from 20140427).
Comment 11•7 years ago
|
||
We still have this in the payload, although i don't know if anyone uses it actively.
For now we can just limit the stack depth here to be safe.
ni?me to set this up as a mentored bug.
Flags: needinfo?(gfritzsche)
Summary: Overly large lateWrites stacks in Telemetry payloads → Limit depth of lateWrites stacks in Telemetry payloads
Updated•6 years ago
|
Comment 12•6 years ago
|
||
I filed bug 1469146 to see investigate a bunch of performance data that's historically in Telemetry.
Maybe we can just remove this.
Depends on: 1469146
Flags: needinfo?(gfritzsche)
Updated•6 years ago
|
Priority: -- → P4
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•