Open Bug 1544621 Opened 5 years ago Updated 2 years ago

[meta] Web platform tests that frequently/always crash running on GeckoView

Categories

(Testing :: web-platform-tests, defect)

Version 3
Unspecified
Android
defect

Tracking

(Not tracked)

People

(Reporter: KWierso, Unassigned)

References

(Depends on 8 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [geckoview:p2])

There are a bunch of tests that I'm seeing frequently crash now that they're running against Geckoview. I can update the expectation data for them to green up the test runs, but it'd be better to track actual fixes to the crashes in the long run.

I'll file individual bugs for either each crash and link them here for tracking. Here's the full list so far:

/element-timing/observe-elementtiming.html.ini
/feature-policy/payment-allowed-by-feature-policy-attribute-redirect-on-load.https.sub.html.ini
/feature-policy/payment-allowed-by-feature-policy.https.sub.html
/feature-policy/payment-default-feature-policy.https.sub.html
/feature-policy/payment-disabled-by-feature-policy.https.sub.html
/FileAPI/url/url-in-tags.window.html
/html/browsers/browsing-the-web/history-traversal/browsing_context_name_cross_origin.html
/html/browsers/browsing-the-web/unloading-documents/beforeunload-canceling.html
/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/open-features-tokenization-screenx-screeny.html
/html/browsers/the-window-object/window-open-noopener.html?_self
/html/browsers/windows/browsing-context-names/choose-_parent-001.html
/html/browsers/windows/browsing-context-names/choose-_top-002.html
/html/browsers/windows/targeting-with-embedded-null-in-target.html
/html/semantics/links/links-created-by-a-and-area-elements/target_blank_implicit_noopener_base.html
/html/webappapis/dynamic-markup-insertion/opening-the-input-stream/tasks.window.js.ini
/payment-request/allowpaymentrequest/basic.https.html
/payment-request/allowpaymentrequest/removing-allowpaymentrequest.https.sub.html
/payment-request/historical.https.html
/payment-request/interfaces.https.html
/payment-request/onpaymentmenthodchange-attribute.https.html
/payment-request/payment-is-showing.https.html.ini
/payment-request/payment-request-canmakepayment-method-protection.https.html
/payment-request/payment-request-ctor-currency-code-checks.https.html
/payment-request/payment-request-hasenrolledinstrument-method-protection.https.html
/payment-request/payment-request-id-attribute.https.html
/payment-request/payment-request-onshippingoptionchange-attribute.https.html
/payment-request/payment-request-shippingOption-attribute.https.html
/payment-request/payment-request-show-method.https.html
/payment-request/payment-response/onpayerdetailchange-attribute.https.html | expected ERROR
/payment-request/PaymentRequestUpdateEvent/updatewith-method.https.html
/WebCryptoAPI/derive_bits_keys/ecdh_bits.https.any.js.ini
/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html.ini
/webrtc/RTCRtpTransceiver.https.html
/websockets/unload-a-document/002.html
/webxr/xrDevice_requestSession_immersive.https.html.ini
/webxr/xrFrame_getPose.https.html
/webxr/xrFrame_lifetime.https.html.ini
/webxr/xrRigidTransform_constructor.https.html.ini
/webxr/xrRigidTransform_matrix.https.html
/webxr/xrSession_cancelAnimationFrame.https.html.ini
/webxr/xrSession_cancelAnimationFrame_invalidhandle.https.html.ini
/webxr/xrSession_end.https.html.ini
/webxr/xrSession_identity_referenceSpace.https.html.ini
/webxr/xrSession_requestAnimationFrame_callback_calls.https.html.ini
/webxr/xrSession_requestAnimationFrame_data_valid.https.html.ini
/webxr/xrSession_requestAnimationFrame_getViewerPose.https.html.ini
/webxr/xrSession_requestReferenceSpace.https.html.ini
/workers/data-url.html
/workers/modules/dedicated-worker-import.any.js.ini
/worklets/animation-worklet-service-worker-interception.https.html
/worklets/audio-worklet-service-worker-interception.https.html
/worklets/layout-worklet-service-worker-interception.https.html

Nick, I'm not seeing any crash stacks in either the test log or the logcat files, for any of these crashes. That's going to make things difficult for people to fix. Am I just not looking in the right places to find them?

(As an aside, the timestamps in the test log files and the timestamps in the logcat files seem to be offset by an hour, which made matching the timelines for the crashes up between the two files a bit confusing. Any chance that the No timezone override file found: /data/misc/zoneinfo/current/icu/icu_tzdata.dat message around the time of the crashes in the logcat files has something to do with that?)

Flags: needinfo?(nalexander)

As an example, here's one of the test logs and here's the logcat file backing it. If you ctrl-f in the logcat for 22:38 it should bring you down to around the time the crash occurs.

(In reply to Wes Kocher (:KWierso) from comment #2)

As an example, here's one of the test logs and here's the logcat file backing it. If you ctrl-f in the logcat for 22:38 it should bring you down to around the time the crash occurs.

In this case there was an actual crash and a minidump was likely generated:

04-11 22:22:06.740 7805 7820 W google-breakpad: ExceptionHandler::GenerateDump cloned child
04-11 22:22:06.740 7805 7820 W google-breakpad: 7919
04-11 22:22:06.740 7805 7820 W google-breakpad:
04-11 22:22:06.740 7805 7820 W google-breakpad: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
04-11 22:22:06.740 7919 7820 W google-breakpad: ExceptionHandler::WaitForContinueSignal waiting for continue signal...

but I don't see any sign that wpt is pulling the minidumps from the device or running minidump_stackwalk.

Is it possible that crash reporting was never implemented for android? Consider:

https://searchfox.org/mozilla-central/rev/d33d470140ce3f9426af523eaa8ecfa83476c806/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/base.py#173
https://searchfox.org/mozilla-central/rev/d33d470140ce3f9426af523eaa8ecfa83476c806/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox.py#432

but I see no check_crashes implementation for browsers/fennec.py.

Note that, for Android, minidumps need to be retrieved from the device first, like

https://searchfox.org/mozilla-central/rev/d33d470140ce3f9426af523eaa8ecfa83476c806/testing/gtest/remotegtests.py#153

So, if I use this patch:

# HG changeset patch
# User Wes Kocher <wkocher@mozilla.com>
# Date 1555448272 25200
#      Tue Apr 16 13:57:52 2019 -0700
# Node ID 8dac04b3ae73c609617008c5fb1541d3e5751cef
# Parent  7099ef4702173876665b684219aacd20b93a7004
Add crash checking

diff --git a/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/fennec.py b/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/fennec.py
--- a/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/fennec.py
+++ b/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/fennec.py
@@ -130,6 +130,7 @@ class FennecBrowser(FirefoxBrowser):
             self.used_ports.add(self.marionette_port)
 
         env = {}
+        #env["MINIDUMP_STACKWALK"] = ???
         env["MOZ_CRASHREPORTER"] = "1"
         env["MOZ_CRASHREPORTER_SHUTDOWN"] = "1"
         env["MOZ_DISABLE_NONLOCAL_CONNECTIONS"] = "1"
@@ -221,3 +222,6 @@ class FennecBrowser(FirefoxBrowser):
             # browser to shut down. This allows the leak log to be written
             self.runner.stop()
         self.logger.debug("stopped")
+
+    def check_crash(self, process, test):
+        return self.runner.check_for_crashes()

I see this in the test output:

 0:52.34 CRASH: pid:98409. Test:mozrunner-startup. Minidump anaylsed:False. Signature:[None]
Crash dump filename: /var/folders/pr/6b26m2pd1zd33ft5v57yw7wr0000gp/T/tmpb7G8ZL/60f549e1-a8e1-39e5-8dcb-753b9ba3b74d-browser.dmp
MINIDUMP_STACKWALK not set, can't process dump.

 0:52.34 CRASH: pid:98409. Test:mozrunner-startup. Minidump anaylsed:False. Signature:[None]
Crash dump filename: /var/folders/pr/6b26m2pd1zd33ft5v57yw7wr0000gp/T/tmpb7G8ZL/60f549e1-a8e1-39e5-8dcb-753b9ba3b74d.dmp
MINIDUMP_STACKWALK not set, can't process dump.

I'm not sure exactly what I need to set MINIDUMP_STACKWALK to be. Looking elsewhere in the codebase, I see it set to something like [DIR.tooltool]/breakpad-tools/minidump_stackwalk. Can I use that here, or is that just in automation?

Flags: needinfo?(gbrown)

And here's what one of those failures looks like in CI https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=240843703&repo=try&lineNumber=6114

If you look in the Job Details tab for this job there's a bunch of new artifacts uploaded. Still doesn't process the crash dumps because no MINIDUMP_STACKWALK, though.

We can probably just disable all payments tests... we not implementing, so probably not worth wasting cycles on.

(In reply to Wes Kocher (:KWierso) from comment #6)

If you look in the Job Details tab for this job there's a bunch of new artifacts uploaded. Still doesn't process the crash dumps because no MINIDUMP_STACKWALK, though.

I would do something like:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=12352f8ec5ac6e29a2ab35fa9899276185a44b36

Flags: needinfo?(gbrown)

So that ends up doing stuff like this: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=241013581&repo=try&lineNumber=6094

So we see the crash stacks now. I'll run the rest of the wpt chunks now to see if the other crashes show similar stacks.

OS: Unspecified → Android
Summary: [meta] Web platform tests that frequently/always crash running on Geckoview → [meta] Web platform tests that frequently/always crash running on GeckoView
Whiteboard: [geckoview:p2]
Depends on: 1551725

Looks like the bits about checking for crashes were answered.

Flags: needinfo?(nalexander)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.