Closed Bug 1615974 Opened 5 years ago Closed 3 years ago

Crash in [@ rust_cascade::Cascade::has]

Categories

(Core :: Security: PSM, defect, P1)

73 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
100 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox73 --- disabled
firefox74 --- disabled
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix
firefox98 --- wontfix
firefox99 --- wontfix
firefox100 --- fixed

People

(Reporter: philipp, Assigned: keeler)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression, Whiteboard: [psm-assigned][tbird crash])

Crash Data

Attachments

(1 file)

This bug is for crash report bp-530c1593-d10f-433a-b7a5-72bc30200216.

Top 10 frames of crashing thread:

0 xul.dll rust_cascade::Cascade::has third_party/rust/rust_cascade/src/lib.rs:200
1 xul.dll cert_storage::{{impl}}::allocate::GetCRLiteRevocationState security/manager/ssl/cert_storage/src/lib.rs:1094
2 xul.dll mozilla::psm::NSSCertDBTrustDomain::CheckRevocation security/certverifier/NSSCertDBTrustDomain.cpp:635
3 xul.dll mozilla::pkix::PathBuildingStep::Check security/nss/lib/mozpkix/lib/pkixbuild.cpp:254
4 xul.dll mozilla::psm::CheckCandidates security/certverifier/NSSCertDBTrustDomain.cpp:193
5 xul.dll mozilla::psm::NSSCertDBTrustDomain::FindIssuer security/certverifier/NSSCertDBTrustDomain.cpp:348
6 xul.dll mozilla::pkix::BuildForward security/nss/lib/mozpkix/lib/pkixbuild.cpp:365
7 xul.dll mozilla::pkix::BuildCertChain security/nss/lib/mozpkix/lib/pkixbuild.cpp:414
8 xul.dll mozilla::psm::BuildCertChainForOneKeyUsage security/certverifier/CertVerifier.cpp:240
9 xul.dll mozilla::psm::CertVerifier::VerifyCert security/certverifier/CertVerifier.cpp:745

this crash signature started appearing during 73.0a1 and now seems starting to pop up in the beta channel as well after the transition to 74.0b.

Priority: -- → P2
Whiteboard: [psm-backlog]

In these crashes, it seems that the underlying storage for the memory-mapped file has become unreliable (with crash reasons like STATUS_DEVICE_DATA_ERROR). In other words, part of the disk died.

Is this even something that we can recover from?

Maybe JC can help answer comment 1?

Flags: needinfo?(jjones)

I can occasionally repro on MacOS (once so far in 10 tries) by making a blank volume for the security_state profile folder, copying in the contents from an existing profile:

hdiutil create -srcfolder /$profile/security_state/ -volname securitystate /tmp/securitystate.dmg
hdiutil attach -mountpoint /$profile/security_state /tmp/securitystate.dmg
mach run --allow-downgrade --profile /$profile
diskutil umount force /$profile/security_state

This doesn't seem like a common scenario for desktops, but I'd like to try putting a catch_unwind in place and see if we can intercept the exception signal. If we can, then that answers the question. If not, then I suppose that also answers the question.

Flags: needinfo?(jjones)

I thought crlite wasn't shipping yet; is it actually enabled in beta now?
[...checks...]
Looks like it's enabled in telemetry mode. Should we turn that off for release to avoid shipping this crash?

Flags: needinfo?(jjones)

This can't possibly be that common of a crash. Numbers are low, and I strongly suspect we have something ignoring similar crashes -- because when the profile directory goes away, Firefox always crashes, somehow or other.

I need to go try and dig up the other crashes, perhaps, and figure out how to classify this the same way.

I do like the idea of trying to wrap it in a catch_unwind, but I'm not going to be able to do that anytime soon due to sudden childcare problems. Let me pass this and the above info to Dana for her take.

Flags: needinfo?(jjones) → needinfo?(dkeeler)

If I'm reading these crash reports correctly, we're faulting when trying to read mmapped memory for which the underlying storage has gone away. Since we're not panic()ing in rust, catch_unwind won't help, unless I'm misunderstanding your suggestion. This appears to be a known issue with mmap (see e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=537742). Since the filter size is only ~1.4MB anyway, maybe we could load it into memory rather than mmapping it (although, that said, lmdb is going to have the exact same problem because it mmaps files too... (we're not seeing those crashes yet because we're using rkv's safe mode for now)).

All that said, with this low of crash volume, I'm not too concerned. If we do see too many crashes, we can disable this by remotely flipping a pref.

Flags: needinfo?(dkeeler)
Whiteboard: [psm-backlog] → [psm-backlog][tbird crash]
Severity: normal → S4
Crash Signature: [@ rust_cascade::Cascade::has] → [@ rust_cascade::Cascade::has] [@ rust_cascade::Cascade::has_internal]
Has Regression Range: --- → yes
Crash Signature: [@ rust_cascade::Cascade::has] [@ rust_cascade::Cascade::has_internal] → [@ rust_cascade::Cascade::has] [@ rust_cascade::Cascade::has_internal]
Flags: needinfo?(dkeeler)

Well, we only process crlite filters on early beta or earlier, so it makes sense we don't see it on release/esr. Maybe the nightly population is too small to hit this?
Given the volume on beta, though, it seems like we should fix this before enabling crlite in release. I'll see about doing what I said in comment 6.

Severity: S4 → S3
Flags: needinfo?(dkeeler)
Priority: P2 → P1
Whiteboard: [psm-backlog][tbird crash] → [psm-assigned][tbird crash]
Assignee: nobody → dkeeler
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b2e8af2fabb0
avoid memmapping CRLite filters in cert_storage r=jschanck,robwu
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: