Open Bug 1733812 Opened 3 years ago Updated 2 years ago

Weird profiler label frame order in these Windows stacks

Categories

(Core :: Gecko Profiler, defect, P2)

All
Windows
defect

Tracking

()

People

(Reporter: mstange, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxp])

Profile: https://share.firefox.dev/3D4s4Jw (captured with Firefox Beta 93 on Windows)

This profile has stacks with some strange orders. The profiler label frames I've annotated below should be moved up into the annotated empty rows:

        mozilla::ViewportFrame::Reflow(nsPresContext*, mozilla::ReflowOutput&, mozilla::ReflowInput const&, nsReflowStatus&)
    *
        mozilla::PresShell::DoReflow(nsIFrame*, bool, mozilla::OverflowChangedTracker*)
        mozilla::PresShell::ProcessReflowCommands(bool)
    *
        mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush)
        mozilla::dom::Document::FlushPendingNotifications(mozilla::ChangesToFlush)
        nsIContent::GetPrimaryFrame(mozilla::FlushType)
        nsGenericHTMLElement::GetOffsetRect(mozilla::gfx::IntRectTyped<mozilla::CSSPixel>&)
    *
        mozilla::dom::HTMLElement_Binding::get_offsetTop(JSContext*, JS::Handle<JSObject *>, void*, JSJitGetterCallArgs)
        0x2c0e33c3170
  --->  Reflow https://<URL>
  --->  PresShell::DoFlushPendingNotifications Layout
  --->  get HTMLElement.offsetTop
        LichessAnalyse</Si<
        a
        An/<
        postpatch
        js::RunScript
        C
        C/<
        C
        C/<
        C
        C/<
        C
        u/<
        Vl/lichess.analysis<
        ye/<
        p
        l/t._handleKey
        y
        js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
        js::RunScript
        mozilla::dom::EventListener::HandleEvent(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::Event&, mozilla::ErrorResult&)

Do you think it's a separate issue from bug 1733810 that you filed just before?
Though in your example, labels seem to jump quite a few frames, not just one! 🤔

In bug 1733810 comment 1 (and 2), my suspicions were:

  • Labels use the address when the AutoProfilerLabel object is stored, the compiler could pick "optimized" stack addresses that don't match the source code flow in the function.
    Idea: Use the current stack pointer instead.
    Extra ideas: Add barriers to guarantee label ordering within the source code (but at added overhead costs, even when not running the profiler). Capture __func__?
  • Stack-walking may give us mismatched instruction & stack pointers. Or we're interpreting them incorrectly? To be debugged.

One thing to try, if possible: Debug during one stack walk, take notes about what the debugger tells us about the stack and where label objects live, and then compare with the profiler output.

Blocks: 1307215
Severity: -- → S2
Priority: -- → P2

(In reply to Gerald Squelart [:gerald] (he/him) from comment #1)

Though in your example, labels seem to jump quite a few frames, not just one! 🤔

Yeah, that's what made me think it's a different bug. Also the way the label frames are grouped up, and the fact that there is JIT code on the stack and the fact that this profile was captured on Windows (so the EnterJIT-stack-recovery might be involved in the bug).

Whiteboard: [fxp]

Another example, again from Windows: https://share.firefox.dev/3ZECDy8

You need to log in before you can comment on or make changes to this bug.