Closed Bug 1760187 Opened 3 years ago Closed 3 years ago

mozillavpn crash reports aren't being accepted after extract_payload rewrite

Categories

(Socorro :: Antenna, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Attachments

(4 files)

Looking at Crash Stats, we have no crash reports for MozillaVPN after March 2nd, 2022. That's when we did the Antenna deployment that rewrote extract_payload.

We need to figure out what's going on and why MozillaVPN crash reports aren't working anymore.

Marcus says he's seeing this when submitting crash reports to stage:

Discarded=malformed_no_annotations

I don't see any Sentry reports on stage in the last few days. Because it's getting discarded there aren't any collector_notes to look at. I checked the logs for stage for "extract payload exception" and don't see anything. I'm not sure what's going on.

I'll add some more logging and deploy that to stage to see if that helps.

I grabbed a capture of our request that would go to the socorro server.

Here are two annotation errors that increased after the deploy:

date bad_json no_annotations comment
2/28 1 2
3/1 37 13
3/2 194 46 deployed new code 16:00ish
3/3 333 69
3/4 694 103
3/5 850 93

"bad_json" gets kicked up here:

https://github.com/mozilla-services/antenna/blob/c0fbefceeb5e76414a8d988c0d9c5eb5d0f58e09/antenna/breakpad_resource.py#L203-L212

That code looks pretty good. In either of those situations, it's not well-formed and the collector shouldn't accept it. The general shape of that code didn't change between the old extract_payload and the new extract_payload.

"no_annotations" gets kicked up here:

https://github.com/mozilla-services/antenna/blob/c0fbefceeb5e76414a8d988c0d9c5eb5d0f58e09/antenna/breakpad_resource.py#L231-L236

When there are MultipartParseErrors, they get logged and I'd see them in the logs. I see a slight increase in the number of "invalid text or charset: utf-8".

In bug #1757854, we discovered Fenix was missing an EOL when building the payload between the upload_file_minidump and the next part. When we deployed the new extract_payload code, it was more strict about well-formedness and parsed the parts differently.

Marcus sent me a couple of captured payloads. After I stopped messing it up by fixing them in a text editor, I noticed that the upload_file_minidump part was missing a Content-Type declaration. Falcon's multipart handling will assume that means it's text/plain, but it's definitely not (it's binary) and that's why we're seeing "invalid text or charset: utf-8" in the logs.

Marcus is going to do a fix. I'll look into whether we can improve the error handling. "no_annotations" is the wrong error message here. Anything else would be better.

I think this will help.

I (finally) deployed this to prod in bug #1763206. I think this should fix Mozilla VPN.

I'm going to keep an eye on this over the next day and see if the numbers change.

This one came in just now: https://crash-stats.mozilla.org/report/index/cdac2060-10e9-49c7-b958-7440e0220405

I'm also no longer seeing the errors that we think were indications of the rejections happening (bad_json, no_annotations).

I'll check in again tomorrow.

Flags: needinfo?(willkg)

Everything checks out. We're getting MozillaVPN reports now and there have been no new instances of bad_json and no_annotations. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(willkg)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: