Closed Bug 936808 Opened 11 years ago Closed 11 years ago

Intermittent test_sanitySimpletest.html | application terminated with exit code 11 | application crashed [@ PK11_ImportCert]

Categories

(Core :: Security: PSM, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla29
Tracking Status
firefox26 --- wontfix
firefox27 + fixed
firefox28 + fixed
firefox29 + fixed
firefox-esr17 --- wontfix
firefox-esr24 27+ fixed
b2g18 --- fixed
b2g-v1.1hd --- fixed
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: RyanVM, Assigned: cviecco)

References

Details

(Keywords: crash, intermittent-failure, sec-high, Whiteboard: [fix is speculative; may not actually be fixed][adv-main27+][adv-esr24.3+][qa-])

Crash Data

Attachments

(1 file)

I'm afraid this might be a regression from yesterday's NSS update. Kai, can you please take a look? https://tbpl.mozilla.org/php/getParsedLog.php?id=30361293&tree=Mozilla-Aurora Ubuntu VM 12.04 mozilla-aurora debug test mochitest-1 on 2013-11-08 19:01:10 PST for push 51f5ae29c5de slave: tst-linux32-ec2-101 19:05:31 INFO - 161 INFO TEST-START | /tests/Harness_sanity/test_sanitySimpletest.html 19:05:32 INFO - (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed 19:05:32 INFO - (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed 19:05:32 INFO - ++DOMWINDOW == 38 (0xac91dd0) [serial = 45] [outer = 0xaddc9e0] 19:05:32 INFO - 162 INFO TEST-INFO | /tests/Harness_sanity/test_sanitySimpletest.html | Profile::SimpleTestLoadTime: 32 19:05:32 INFO - 163 INFO TEST-INFO | /tests/Harness_sanity/test_sanitySimpletest.html | Logging some info 19:05:32 INFO - 164 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | test ok - This should be true 19:05:32 INFO - 165 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | is() test failed 19:05:32 INFO - 166 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | isnot() test failed 19:05:32 INFO - 167 INFO TEST-KNOWN-FAIL | /tests/Harness_sanity/test_sanitySimpletest.html | test todo - todo() test should not pass 19:05:32 INFO - 168 INFO TEST-KNOWN-FAIL | /tests/Harness_sanity/test_sanitySimpletest.html | test todo_is - got false, expected true 19:05:32 INFO - 169 INFO TEST-KNOWN-FAIL | /tests/Harness_sanity/test_sanitySimpletest.html | test todo_isnot - didn't expect true, but got it 19:05:32 INFO - 170 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | testPluginIsOOP should be bool 19:05:32 INFO - 171 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | Clipboard has the correct value 19:05:32 INFO - 172 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | waitForClipboard should work 19:05:32 INFO - 173 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | createEl did not create element as expected 19:05:32 INFO - 174 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | $ helper did not get element as expected 19:05:32 INFO - 175 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | computedStyle did not get right display value 19:05:32 INFO - 176 INFO TEST-INFO | /tests/Harness_sanity/test_sanitySimpletest.html | Profile::SimpleTestRunTime: 117 19:05:32 INFO - 177 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | expected uncaught exception - uncaught exception: i am an uncaught exception at :0 19:05:32 INFO - JavaScript error: , line 0: uncaught exception: i am an uncaught exception 19:05:33 WARNING - TEST-UNEXPECTED-FAIL | /tests/Harness_sanity/test_sanitySimpletest.html | application terminated with exit code 11 19:05:33 INFO - INFO | runtests.py | Application ran for: 0:00:26.610075 19:05:33 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpHM6CODpidlog 19:05:47 WARNING - PROCESS-CRASH | /tests/Harness_sanity/test_sanitySimpletest.html | application crashed [@ PK11_ImportCert] 19:05:47 INFO - Crash dump filename: /tmp/tmpsH6iya/minidumps/4c711747-7f3f-8543-3baf75be-34474794.dmp 19:05:47 INFO - Operating system: Linux 19:05:47 INFO - 0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 19:05:47 INFO - CPU: x86 19:05:47 INFO - GenuineIntel family 6 model 26 stepping 5 19:05:47 INFO - 1 CPU 19:05:47 INFO - Crash reason: SIGSEGV 19:05:47 INFO - Crash address: 0x14 19:05:47 INFO - Thread 31 (crashed) 19:05:47 INFO - 0 libnss3.so!PK11_ImportCert [pk11cert.c:51f5ae29c5de : 972 + 0x3] 19:05:47 INFO - eip = 0xb72d7a38 esp = 0xa10fd000 ebp = 0xa10fd098 ebx = 0xb74086a4 19:05:47 INFO - esi = 0xa39d0b18 edi = 0xa39d0b20 eax = 0x00000000 ecx = 0xb1400020 19:05:47 INFO - edx = 0x00000000 efl = 0x00210206 19:05:47 INFO - Found by: given as instruction pointer in context 19:05:47 INFO - 1 libxul.so!mozilla::psm::::AuthCertificate [SSLServerCertVerification.cpp:51f5ae29c5de : 960 + 0x13] 19:05:47 INFO - eip = 0xb489a8cb esp = 0xa10fd0a0 ebp = 0xa10fd118 ebx = 0xb6be96f4 19:05:47 INFO - esi = 0xa39d6a08 edi = 0x00000000 19:05:47 INFO - Found by: call frame info 19:05:47 INFO - 2 libxul.so!mozilla::psm::::SSLServerCertVerificationJob::Run [SSLServerCertVerification.cpp:51f5ae29c5de : 1058 + 0x18] 19:05:47 INFO - eip = 0xb489b6ee esp = 0xa10fd120 ebp = 0xa10fd198 ebx = 0xb6be96f4 19:05:47 INFO - esi = 0xb1402308 edi = 0xa10fd158 19:05:47 INFO - Found by: call frame info 19:05:47 INFO - 3 libxul.so!nsThreadPool::Run() [nsThreadPool.cpp:51f5ae29c5de : 204 + 0x8] 19:05:47 INFO - eip = 0xb4f3525b esp = 0xa10fd1a0 ebp = 0xa10fd1f8 ebx = 0xb6be96f4 19:05:47 INFO - esi = 0xb1402308 edi = 0xa10fd1d8 19:05:47 INFO - Found by: call frame info 19:05:47 INFO - 4 libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:51f5ae29c5de : 622 + 0x13] 19:05:47 INFO - eip = 0xb4f3260e esp = 0xa10fd200 ebp = 0xa10fd258 ebx = 0xb6be96f4 19:05:47 INFO - esi = 0xb145ffe8 edi = 0xa10fd228 19:05:47 INFO - Found by: call frame info 19:05:47 INFO - 5 libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:51f5ae29c5de : 251 + 0xf] 19:05:47 INFO - eip = 0xb4eec1d6 esp = 0xa10fd260 ebp = 0xa10fd298 ebx = 0xb6be96f4 19:05:47 INFO - esi = 0xb145ffe8 edi = 0xa10fd2c0 19:05:47 INFO - Found by: call frame info 19:05:47 INFO - 6 libxul.so!nsThread::ThreadFunc(void*) [nsThread.cpp:51f5ae29c5de : 250 + 0x9] 19:05:47 INFO - eip = 0xb4f33141 esp = 0xa10fd2a0 ebp = 0xa10fd2e8 ebx = 0xb6be96f4 19:05:47 INFO - esi = 0xb145ffe8 edi = 0xa10fd2c0 19:05:47 INFO - Found by: call frame info
Thanks for the report. Looks like a memory corruption issue. :( The code that triggered the crash is here: PK11_ImportCert(...) { if (c->object.cryptoContext) { /* Delete the temp instance */ NSSCryptoContext *cc = c->object.cryptoContext; -> nssCertificateStore_Lock(cc->certStore, &lockTrace); I carefully looked at the changeset, but I cannot see anything that would influence that code. The memory allocation handling changes are adding protection against huge allocation attempts, but doesn't haven an influence for standard allocs. We crash while handling a new SSL/TLS connection, after we have passed server certificate authentication, attempting to cache an intermediate CA certificate. If I understand correctly, this didn't occur while landing the NSS code. It happened when building a later change, and so far it has only happened once. Since this same code is also being used on mozilla-central, I'd expect it to trigger such failures on m-c, too. But it hasn't happened yet, right? I'm hoping for an unrelated one-time failure.
I'll trigger a few additional builds.
Potentially a use-after-free or similar very bad badness -> security sensitive for now.
Group: crypto-core-security
None of the additional 10+ builds triggered this failure.
Depends on: 937721
https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&reason_type=contains&date=2013-11-13&range_value=28&range_unit=days&hang_type=any&process_type=any&signature=PK11_ImportCert Earliest report is against Firefox 6.0. Ideally, we'd find the bug and fix it properly. Short of that, if this is a race in PK11_ImportCert, we may be able to work around this by holding a mutex that serializes these calls to PK11_ImportCert. Or, more drastically, we could try reducing the thread pool size of SSLServerCertVerification to 1 (eliminating parallel cert verification), to see if that helps. Neither of those is a good long-term solution. Another possibility is to stop saving intermediate certificates we encounter into the NSS certificate database. That's bug 733232.
Severity: normal → critical
Crash Signature: [@ PK11_ImportCert]
Keywords: sec-critical
OS: Linux → All
Hardware: x86 → All
Is there somebody who can look into this?
Whiteboard: [maybe a one-off?]
Attached patch mutex-pk11-calls-inpsm (deleted) — Splinter Review
This is only a workaround by serializing the pk11 calls done by authcertificate. There are other pk11 calls in psm. Do you want those serialized as well?
Attachment #8343406 - Flags: feedback?(brian)
Comment on attachment 8343406 [details] [diff] [review] mutex-pk11-calls-inpsm Review of attachment 8343406 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/ssl/src/SSLServerCertVerification.cpp @@ +986,5 @@ > > // We have found a signer cert that we want to remember. > char* nickname = nsNSSCertificate::defaultServerNickname(node->cert); > if (nickname && *nickname) { > + MutexAutoLock telemetryMutex(*gSSLVerificationPK11Mutex); Please add a comment that we suspect there may be a race condition of some kind in PK11_ImportCert and this mutex is an attempt to wallpaper over it. Also, trailing whitespace.
Attachment #8343406 - Flags: feedback?(brian) → review+
Assignee: nobody → cviecco
This got landed on inbound. https://hg.mozilla.org/integration/mozilla-inbound/rev/0e4260c7fa4d Shouldn't it have gotten security approval first?
Comment on attachment 8343406 [details] [diff] [review] mutex-pk11-calls-inpsm sec-approval+ (retroactive). Until we know a more reliable way to cause the race sec-high is a better rating. Still potentially exploitable, but not easily weaponized.
Attachment #8343406 - Flags: sec-approval+
Comment on attachment 8343406 [details] [diff] [review] mutex-pk11-calls-inpsm [Triage Comment] Safe fix, a=dveditz for aurora
Attachment #8343406 - Flags: approval-mozilla-aurora+
Comment on attachment 8343406 [details] [diff] [review] mutex-pk11-calls-inpsm [Triage Comment] missed the merge, so a=dveditz for beta (27) also
Attachment #8343406 - Flags: approval-mozilla-beta+
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
We are guessing that this patch may help. AFAICT, we've never reproduced the bug so we don't know for sure.
Whiteboard: [fix is speculative; may not actually be fixed]
Group: crypto-core-security → core-security
Whiteboard: [fix is speculative; may not actually be fixed] → [fix is speculative; may not actually be fixed][adv-main27+][adv-esr24.3+]
Whiteboard: [fix is speculative; may not actually be fixed][adv-main27+][adv-esr24.3+] → [fix is speculative; may not actually be fixed][adv-main27+][adv-esr24.3+][qa-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: