Closed
Bug 1197616
Opened 9 years ago
Closed 9 years ago
crash (on write, non-shutdown) in mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int)
Categories
(Toolkit :: Telemetry, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 1141565
People
(Reporter: jesup, Unassigned)
References
Details
(Keywords: crash, csectype-uaf, sec-high)
Crash Data
This bug was filed from the Socorro interface and is
report bp-cec7adac-303a-4e6f-9e78-fce142150820.
=============================================================
Spin-off from bug 1193038 to cover non-shutdown crashes, in particular what appear to be random writes (or more likely UAF writes) in ::Accumulate()
See also https://crash-stats.mozilla.com/report/index/c05545b5-cc4f-47f7-bb89-129782150817#allthreads
At minimum sec-high; I'll let dveditz/abillings/etc decide if it's sec-crit or not.
Marking 40-42 as affected based on crash-stats.
Comment 2•9 years ago
|
||
I loaded bp-c05545b5-cc4f-47f7-bb89-129782150817 in MSVC and get these stacks:
main thread:
ntdll.dll!_KiFastSystemCallRet@0 Unknown
ntdll.dll!_ZwWaitForSingleObject@12 Unknown
ntdll.dll!_WaitForWerSvc@0 Unknown
ntdll.dll!_SendMessageToWERService@8 Unknown
ntdll.dll!_ReportExceptionInternal@16 Unknown
kernel32.dll!_WerpReportFaultInternal@8 Unknown
kernel32.dll!_WerpReportFault@8 Unknown
kernel32.dll!_BasepReportFault@8 Unknown
kernel32.dll!_UnhandledExceptionFilter@4 Unknown
ntdll.dll!___RtlUserThreadStart@8 Unknown
ntdll.dll!@_EH4_CallFilterFunc@8 Unknown
ntdll.dll!ExecuteHandler2@20 Unknown
ntdll.dll!ExecuteHandler@20 Unknown
ntdll.dll!_RtlDispatchException@8 Unknown
ntdll.dll!_KiUserExceptionDispatcher@8 Unknown
> xul.dll!mozilla::Telemetry::Accumulate Line 3721 C++
xul.dll!mozilla::Telemetry::AccumulateDelta_impl<1>::compute Line 111 C++
xul.dll!mozilla::Telemetry::AutoTimer<94,1>::~AutoTimer<94,1> Line 140 C++
xul.dll!nsCSSRendering::PaintGradient Line 2828 C++
xul.dll!nsImageRenderer::Draw Line 4942 C++
xul.dll!nsImageRenderer::DrawBackground Line 5031 C++
xul.dll!nsCSSRendering::PaintBackgroundWithSC Line 3029 C++
xul.dll!nsCSSRendering::PaintBackground Line 1669 C++
xul.dll!nsDisplayBackgroundImage::PaintInternal Line 2703 C++
xul.dll!nsDisplayBackgroundImage::Paint Line 2688 C++
xul.dll!mozilla::FrameLayerBuilder::PaintItems Line 5317 C++
xul.dll!mozilla::FrameLayerBuilder::DrawPaintedLayer Line 5516 C++
xul.dll!mozilla::layers::ClientPaintedLayer::PaintThebes Line 92 C++
xul.dll!mozilla::layers::ClientPaintedLayer::RenderLayerWithReadback Line 143 C++
xul.dll!mozilla::layers::ClientContainerLayer::RenderLayer Line 71 C++
xul.dll!mozilla::layers::ClientLayer::RenderLayerWithReadback Line 384 C++
xul.dll!mozilla::layers::ClientContainerLayer::RenderLayer Line 71 C++
xul.dll!mozilla::layers::ClientLayerManager::EndTransactionInternal Line 275 C++
xul.dll!mozilla::layers::ClientLayerManager::EndTransaction Line 318 C++
xul.dll!nsDisplayList::PaintRoot Line 1657 C++
xul.dll!nsLayoutUtils::PaintFrame Line 3330 C++
xul.dll!PresShell::Paint Line 6118 C++
xul.dll!nsViewManager::ProcessPendingUpdatesPaint Line 456 C++
xul.dll!nsViewManager::ProcessPendingUpdatesForView Line 392 C++
xul.dll!nsViewManager::ProcessPendingUpdates Line 1088 C++
xul.dll!nsRefreshDriver::Tick Line 1811 C++
xul.dll!mozilla::RefreshDriverTimer::TickDriver Line 196 C++
xul.dll!mozilla::RefreshDriverTimer::Tick Line 186 C++
xul.dll!mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers Line 438 C++
xul.dll!mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver Line 373 C++
xul.dll!nsRunnableMethodImpl<void (__thiscall mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp),1,mozilla::TimeStamp>::Run Line 828 C++
xul.dll!nsThread::ProcessNextEvent Line 849 C++
xul.dll!mozilla::ipc::MessagePump::Run Line 95 C++
xul.dll!MessageLoop::RunHandler Line 228 C++
xul.dll!MessageLoop::Run Line 202 C++
xul.dll!nsBaseAppShell::Run Line 167 C++
xul.dll!nsAppShell::Run Line 178 C++
xul.dll!nsAppStartup::Run Line 281 C++
xul.dll!XREMain::XRE_mainRun Line 4269 C++
xul.dll!XREMain::XRE_main Line 4353 C++
xul.dll!XRE_main Line 4443 C++
> xul.dll!mozilla::Telemetry::Accumulate Line 3721 C++
xul.dll!CheckPinsForHostname Line 330 C++
xul.dll!mozilla::psm::PublicKeyPinningService::ChainHasValidPins Line 364 C++
xul.dll!mozilla::psm::NSSCertDBTrustDomain::IsChainValid Line 780 C++
xul.dll!mozilla::pkix::BuildForward Line 296 C++
xul.dll!mozilla::pkix::PathBuildingStep::Check Line 195 C++
xul.dll!mozilla::psm::FindIssuerInner Line 111 C++
xul.dll!mozilla::psm::NSSCertDBTrustDomain::FindIssuer Line 139 C++
xul.dll!mozilla::pkix::BuildForward Line 323 C++
xul.dll!mozilla::pkix::PathBuildingStep::Check Line 195 C++
xul.dll!mozilla::psm::FindIssuerInner Line 111 C++
xul.dll!mozilla::psm::NSSCertDBTrustDomain::FindIssuer Line 145 C++
xul.dll!mozilla::pkix::BuildForward Line 323 C++
xul.dll!mozilla::pkix::BuildCertChain Line 360 C++
xul.dll!mozilla::psm::BuildCertChainForOneKeyUsage Line 93 C++
xul.dll!mozilla::psm::CertVerifier::VerifyCert Line 267 C++
xul.dll!mozilla::psm::CertVerifier::VerifySSLServerCert Line 471 C++
xul.dll!nsNSSSocketInfo::IsAcceptableForHost Line 390 C++
xul.dll!nsNSSSocketInfo::JoinConnection Line 423 C++
xul.dll!mozilla::net::nsHttpConnectionMgr::GetSpdyPreferredEnt Line 884 C++
xul.dll!mozilla::net::nsHttpConnection::CanDirectlyActivate Line 721 C++
xul.dll!mozilla::net::nsHttpConnectionMgr::GetSpdyPreferredConn Line 2342 C++
xul.dll!mozilla::net::nsHttpConnectionMgr::nsConnEvent::Run Line 635 C++
xul.dll!nsThread::ProcessNextEvent Line 849 C++
xul.dll!nsSocketTransportService::Run Line 898 C++
xul.dll!nsThread::ProcessNextEvent Line 849 C++
xul.dll!NS_ProcessNextEvent Line 265 C++
xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run Line 356 C++
xul.dll!MessageLoop::RunHandler Line 228 C++
xul.dll!MessageLoop::Run Line 202 C++
xul.dll!nsThread::ThreadFunc Line 362 C++
Looking through GetHistogramByEnumId, there is a static "knownHistograms" which is not protected by any lock. In that case if there were two threads accumulating the *same* histogram we could call HistogramGet simultaneously on both threads. But since these use the same ID and there is synchronization in Histogram::FactoryGet, they should end up returning the same pointer. So assuming memory writes of these pointer-size words are atomic, there should be no hazard there.
I'll continue looking at the actual values.
Comment 3•9 years ago
|
||
At the point of the crash on the PSM thread:
Unhandled exception at 0x62424CC6 (xul.dll) in c05545b5-cc4f-47f7-bb89-129782150817.dmp: 0xC0000005: Access violation writing location 0x74C0890C.
3721: HistogramAdd(*h, aSample, gHistograms[aHistogram].dataset);
62424CC6 shl dword ptr [edi+74C08505h],1Bh
EDI = 0x00000407 (CERT_PINNING_RESULTS)
Except what's weird is that gHistograms is 0x63c72680 not 0x74c08505. And gHistograms is const, anyway, so it shouldn't be doing any math on it.
I hate to go straight to "memory corruption" theory for this, so I'm going to run that build and debug it to see that code in action.
Comment 4•9 years ago
|
||
This does appear a lot like memory corruption:
When running normally:
02A84CC4 75 1C jne mozilla::Telemetry::Accumulate+4Eh (02A84CE2h)
3721: HistogramAdd(*h, aSample, gHistograms[aHistogram].dataset);
02A84CC6 C1 E7 05 shl edi,5
02A84CC9 85 C0 test eax,eax
02A84CCB 74 1B je mozilla::Telemetry::Accumulate+54h (02A84CE8h)
02A84CCD 80 78 45 00 cmp byte ptr [eax+45h],0
From the minidump:
62424CC2 85 CD test ebp,ecx
62424CC4 75 1C jne mozilla::Telemetry::Accumulate+4Eh (62424CE2h)
3721: HistogramAdd(*h, aSample, gHistograms[aHistogram].dataset);
62424CC6 C1 A7 05 85 C0 74 1B shl dword ptr [edi+74C08505h],1Bh
62424CCD 80 78 45 00 cmp byte ptr [eax+45h],0
I really couldn't tell you how/why we're corrupting that one byte of memory. Possibilities include some malware trying to patch our binaries or some other kind of memory corruption. It's pretty unusual that this would happen by accident at the same place every time, though: this memory should be mapped read-only after initial relocations, so you'd have to do some work to corrupt it.
dmajor, any thoughts?
Flags: needinfo?(dmajor)
0:009> .formats e7 ^ a7
Evaluate expression:
Hex: 00000040
Decimal: 64
Octal: 00000000100
Binary: 00000000 00000000 00000000 01000000
In other words, a single-bit error. I wouldn't be surprised if this is a hardware failure.
> It's pretty unusual that this would happen by accident at the same place every time
I got the impression that this was a one-off. Are there other crash reports with this error? Are they from different machines? Different libxul base addresses?
Flags: needinfo?(dmajor)
Comment 6•9 years ago
|
||
(In reply to dmajor (away October 8-15) from comment #5)
> I got the impression that this was a one-off. Are there other crash reports
> with this error? Are they from different machines? Different libxul base
> addresses?
Randell, can you answer these questions? Thanks.
Flags: needinfo?(rjesup)
Reporter | ||
Comment 7•9 years ago
|
||
There are still a bunch that don't fall into the update-and-restart case already fixed/wallpapered, or the nsHostResolver. Some examples from the last 28 days:
via:
https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=mozilla%3A%3ATelemetry%3A%3AAccumulate%28mozilla%3A%3ATelemetry%3A%3AID%2C+unsigned+int%29#tab-reports
https://crash-stats.mozilla.com/report/index/80fe42a4-d169-4408-ab85-6e7b52150925
https://crash-stats.mozilla.com/report/index/95b22ed0-162f-464c-bdfd-2b9b92150925
https://crash-stats.mozilla.com/report/index/051d9764-2842-4f40-9be0-29fc02151001
https://crash-stats.mozilla.com/report/index/705840a5-1bfb-43b8-9705-a8a492151002
https://crash-stats.mozilla.com/report/index/0d5a67dd-85af-4d17-aaac-cc6312151003
https://crash-stats.mozilla.com/report/index/3135da07-7131-4faa-a8b4-f86742150912
https://crash-stats.mozilla.com/report/index/1cc3d136-4b18-4c91-aaaa-cf4af2150911
A bunch are in GlobalWindow in RunTimeout(); the others are scattered all over the place.
You can ignore any with a crash address of 0x5a5a5a8e; those are the nsHostResolver crash on restart. (Sometimes it's other addresses, but not usually.)
This Telemetry problem is still a very real sec issue and UAF.
Flags: needinfo?(rjesup) → needinfo?(gfritzsche)
Updated•9 years ago
|
Crash Signature: [@ mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int)] → [@ mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int)]
[@ mozilla::Telemetry::Accumulate]
Updated•9 years ago
|
Flags: needinfo?(gfritzsche)
Comment 8•9 years ago
|
||
Benjamin, i don't think i can effectively make this actionable now - do we have anyone who could investigate this better until i can take it over?
Flags: needinfo?(benjamin)
Comment 9•9 years ago
|
||
Jesup, are you saying that my analysis from comment 4 is incorrect? That isn't a UAF, it's memory corruption probably from outside, and not something we'd normally track.
Georg, you could load additional minidumps to see whether others have the same pattern of memory corruption, especially over multiple builds.
Flags: needinfo?(benjamin) → needinfo?(rjesup)
Reporter | ||
Comment 10•9 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #9)
> Jesup, are you saying that my analysis from comment 4 is incorrect? That
> isn't a UAF, it's memory corruption probably from outside, and not something
> we'd normally track.
There are a bunch with other signatures, much like we had a bunch from nsHostResolver. The one you looked at might be a 1-byte corruption (malware or cosmic ray). But I doubt all of these are. And the thread-safeness of this code is *very* suspect -- and undocumented. And I was answering if the remaining issues here were a 1-off, and they don't appear to be.
> Georg, you could load additional minidumps to see whether others have the
> same pattern of memory corruption, especially over multiple builds.
That would be good, especially with some of the ones that happen multiple times in a similar fashion.
Flags: needinfo?(rjesup)
Comment 11•9 years ago
|
||
I can give it a try, although i'm probably not the best person to look at it.
I don't have access to the dumps though, who can i ask for the permissions?
Flags: needinfo?(benjamin)
Comment 12•9 years ago
|
||
Vladan, aklotz, froydnj and I all have access, and a few others. Send one of us a list of the UUIDs you want dumps for.
Flags: needinfo?(benjamin)
Reporter | ||
Comment 13•9 years ago
|
||
bsmedberg/vladan/froyd: the list from comment 7 are all different crash backtraces (some of which occur a number of times in the last month); that'd be an excellent set to start with. There are 7 there. (NI'ing 3 of you; if someone grabs it please clear the others. Thanks!)
Flags: needinfo?(vladan.bugzilla)
Flags: needinfo?(nfroyd)
Flags: needinfo?(benjamin)
Comment 14•9 years ago
|
||
I still haven't been granted minidump permissions, so clearing needinfo while I poke people on the permissions bug 1204548
Flags: needinfo?(vladan.bugzilla)
This crash is still showing up in low volume on 43-46.
Updated•9 years ago
|
Group: toolkit-core-security
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(benjamin)
Resolution: --- → DUPLICATE
Updated•9 years ago
|
Flags: needinfo?(nfroyd)
You need to log in
before you can comment on or make changes to this bug.
Description
•