upload_file_minidump contents are the multipart header
Categories
(Socorro :: Antenna, defect, P1)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
Details
Attachments
(1 file)
(deleted),
text/x-github-pull-request
|
Details |
We rewrote extract_payload
to use Falcon's multipart/form-data handling in bug #1562641. I fixed a few issues that popped up in stage, but overall it looked good. Then we deployed that code to production. We fixed a few more things as they came up.
Now I'm seeing a number of incoming crashes where the upload_file_minidump
content is something like this:
-----------------------------640A91CB49F2DAC2
Content-Disposition: form-data; name=CrashType
fatal native crash
Examples:
- bp-fe82d92c-fe80-46b7-b005-27c9d0220303
- bp-e3f0a12e-014f-4a81-9000-ae3bc0220303
- bp-b7a28291-b416-4ec1-9c1a-2a61c0220303
This bug covers figuring out what's going on.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
Assuming HeaderMismatch
in the signature is always an indicator of the issue, it looks like it only affects Fenix crash reports:
$ supersearchfacet --start-date='2022-02-23' --end-date='2022-03-02' --period=daily \
--_facets=product --signature='HeaderMismatch' --format=markdown
date | -- | Fenix | Firefox | Focus |
---|---|---|---|---|
2022-02-23 00:00:00 | 0 | 0 | 0 | 0 |
2022-02-24 00:00:00 | 0 | 3 | 0 | 0 |
2022-02-25 00:00:00 | 0 | 1 | 0 | 0 |
2022-02-26 00:00:00 | 0 | 1 | 0 | 0 |
2022-02-27 00:00:00 | 0 | 1 | 0 | 0 |
2022-02-28 00:00:00 | 0 | 9 | 1 | 0 |
2022-03-01 00:00:00 | 0 | 5 | 0 | 1 |
2022-03-02 00:00:00 | 0 | 1800 | 0 | 57 |
If we look at signatures that have EMPTY
in them (denoting some problem with the minidump), we see this:
$ supersearchfacet --start-date='2022-02-23' --end-date='2022-03-02' --period=daily \
--_facets=product --signature='EMPTY' --format=markdown
date | -- | Fenix | Firefox | Focus | ReferenceBrowser |
---|---|---|---|---|---|
2022-02-23 00:00:00 | 0 | 7241 | 772 | 395 | 0 |
2022-02-24 00:00:00 | 0 | 6323 | 832 | 307 | 0 |
2022-02-25 00:00:00 | 0 | 5804 | 661 | 299 | 2 |
2022-02-26 00:00:00 | 0 | 6648 | 748 | 342 | 0 |
2022-02-27 00:00:00 | 0 | 6349 | 487 | 342 | 0 |
2022-02-28 00:00:00 | 0 | 6101 | 870 | 225 | 0 |
2022-03-01 00:00:00 | 0 | 6344 | 838 | 237 | 0 |
2022-03-02 00:00:00 | 0 | 5700 | 1019 | 245 | 0 |
If we look at EMPTY signatures for Fenix over the last 7 days, we see this:
$ supersearchfacet --start-date='2022-02-23' --end-date='2022-03-02' --_facets=signature \
--signature='EMPTY' --product=Fenix --period=daily
date | -- | EMPTY: no crashing thread identified | EMPTY: no crashing thread identified; EmptyMinidump | EMPTY: no crashing thread identified; HeaderMismatch | EMPTY: no crashing thread identified; MissingSystemInfo | EMPTY: no crashing thread identified; MissingThreadList | EMPTY: no crashing thread identified; unknown error | OOM large EMPTY: no crashing thread identified; EmptyMinidump |
---|---|---|---|---|---|---|---|---|
2022-02-23 00:00:00 | 0 | 1184 | 5989 | 0 | 3 | 27 | 38 | 0 |
2022-02-24 00:00:00 | 0 | 965 | 5259 | 3 | 10 | 26 | 60 | 0 |
2022-02-25 00:00:00 | 0 | 943 | 4781 | 1 | 4 | 36 | 38 | 1 |
2022-02-26 00:00:00 | 0 | 841 | 5719 | 1 | 1 | 41 | 45 | 0 |
2022-02-27 00:00:00 | 0 | 832 | 5429 | 1 | 3 | 38 | 46 | 0 |
2022-02-28 00:00:00 | 0 | 816 | 5190 | 9 | 5 | 47 | 34 | 0 |
2022-03-01 00:00:00 | 0 | 955 | 5308 | 5 | 9 | 31 | 35 | 1 |
2022-03-02 00:00:00 | 0 | 785 | 3060 | 1800 | 2 | 24 | 29 | 0 |
Ergo, while I think there is a bug in the extract_payload code or the Fenix crash reports in question are malformed, I think the crash reports it affects have junk minidumps and rust-minidump would have kicked up a "EmptyMinidump" before and now kicks up a "HeaderMismatch".
Comment 2•3 years ago
|
||
That looks like a bug in Fenix. I know we've always seen more malformed minidumps on Fenix than on any other platform (see bug 1644486) but I haven't figured out why it's happening yet.
Assignee | ||
Comment 3•3 years ago
|
||
I tinkerd with different variations of malformed payloads to see if I could get what I'm seeing in the description. If I include a no-bytes upload_file_minidump and miss the \r\n
after it, then extract_payload in the collector will slurp up the next multiform part as the upload_file_minidump body. That's exactly like what I was seeing in the description.
Here's a raw form:
--c503c85c950243ae83ecb53354be8c5b\r\nContent-Disposition: form-data; name="DateStamp"\r\nContent-Type: text/plain; charset=utf-8\r\n\r\n2022-03-03T17:05:31.476349\r\n--c503c85c950243ae83ecb53354be8c5b\r\nContent-Disposition: form-data; name="ProductName"\r\nContent-Type: text/plain; charset=utf-8\r\n\r\nFenix\r\n--c503c85c950243ae83ecb53354be8c5b\r\nContent-Disposition: form-data; name="upload_file_minidump"; filename="file.dump"\r\nContent-Type: application/octet-stream\r\n\r\nabcde--c503c85c950243ae83ecb53354be8c5b\r\nContent-Disposition: form-data; name="CrashType"\r\nContent-Type: text/plain; charset=utf-8\r\n\r\nnative crash\r\n--c503c85c950243ae83ecb53354be8c5b--\r\n
Here's an (possibly) easier to read version where \r\n
is replaced with newlines:
--c503c85c950243ae83ecb53354be8c5b
Content-Disposition: form-data; name="DateStamp"
Content-Type: text/plain; charset=utf-8
2022-03-03T17:05:31.476349
--c503c85c950243ae83ecb53354be8c5b
Content-Disposition: form-data; name="ProductName"
Content-Type: text/plain; charset=utf-8
Fenix
--c503c85c950243ae83ecb53354be8c5b
Content-Disposition: form-data; name="upload_file_minidump"; filename="file.dump"
Content-Type: application/octet-stream
--c503c85c950243ae83ecb53354be8c5b <-- there should be an additional \r\n here
Content-Disposition: form-data; name="CrashType"
Content-Type: text/plain; charset=utf-8
native crash
--c503c85c950243ae83ecb53354be8c5b--
I'll look into this.
Assignee | ||
Comment 4•3 years ago
|
||
I looked at other Fenix crash reports that have minidumps and they have the CrashType annotation at the end of the dump contents. Ergo, I think sendFile needs to be sending a \r\n
after the file contents.
I wrote this up:
https://github.com/mozilla-mobile/android-components/issues/11809
I'll keep tabs on it.
Assignee | ||
Comment 5•3 years ago
|
||
Assignee | ||
Comment 6•3 years ago
|
||
Assignee | ||
Comment 7•3 years ago
|
||
That fix in PR 11809 landed in the android-components repo. I looked at Fenix nightly crash reports where build id > 20220304000000 that have an upload_file_minidump:
- bp-3d74e36f-70cd-4456-8305-fcba10220308
- bp-e6af315c-055b-48d8-95dd-8e4d40220308
- bp-a09b3bc0-3370-4a49-80ed-7be8c0220308
All three of those have a CrashType annotation and the upload_file_minidump doesn't end with the multipart part.
But then I remembered that for Fenix, the build id is the geckoview build id and not the product build id. The application build id is inscrutable. There isn't a way to get a list of application build ids for Fenix nightly and know when the builds happened.
I did a supersearchfacets and I expected to see EmptyMinidump to jump back up and HeaderMismatch to drop, but that hasn't happened:
$ supersearchfacet --start-date='2022-03-01' --end-date='2022-03-08' --_facets=signature \
--signature='EMPTY' --product=Fenix --period=daily --format=markdown
date | EMPTY: no crashing thread identified | EMPTY: no crashing thread identified; EmptyMinidump | EMPTY: no crashing thread identified; HeaderMismatch | EMPTY: no crashing thread identified; MissingSystemInfo | EMPTY: no crashing thread identified; MissingThreadList | EMPTY: no crashing thread identified; unknown error |
---|---|---|---|---|---|---|
2022-03-01 00:00:00 | 955 | 5308 | 5 | 9 | 31 | 35 |
2022-03-02 00:00:00 | 785 | 3060 | 1800 | 2 | 24 | 29 |
2022-03-03 00:00:00 | 888 | 0 | 4888 | 3 | 42 | 28 |
2022-03-04 00:00:00 | 641 | 6 | 4674 | 2 | 29 | 33 |
2022-03-05 00:00:00 | 792 | 8 | 5165 | 10 | 63 | 43 |
2022-03-06 00:00:00 | 677 | 14 | 5765 | 2 | 35 | 133 |
2022-03-07 00:00:00 | 925 | 9 | 6038 | 7 | 42 | 153 |
2022-03-08 00:00:00 | 484 | 7 | 3320 | 1 | 42 | 94 |
I can't tell by looking at application build ids whether there's been enough uptake on Fenix nightly builds after 3/4/2022 that have the fix to show a change in numbers. I think I'm going to let it go a week and see what happens.
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
In https://bugzilla.mozilla.org/show_bug.cgi?id=1757938#c3 Kevin says:
That was fixed about a week ago. Looking at the recent 99.0a1 and 100.0a1 data this is no longer happening. So ideally the crash data is being processed out into the respective crashes.
Given that, I'm going to mark this as FIXED.
Description
•