420187 - hang in nsNSSHttpRequestSession::internal_send_receive_attempt

Reporter

Description

•

17 years ago

On investigating fx-win32-tbox after 3hrs 30min building, it was stuck at: OBJDIR=obj-fx-trunk python obj-fx-trunk/_profile/pgo/profileserver.py firefox.exe was in process list but not on the taskbar. When I ended the firefox process, the output was: Application pid: 7328 FAIL Exited with code 1 during test run fx-win32-tbox.build.mozilla.org - - [28/Feb/2008 11:33:33] "GET /index.html HTTP/1.1" 200 - fx-win32-tbox.build.mozilla.org - - [28/Feb/2008 11:33:33] "GET /quit.js HTTP/1.1" 200 - fx-win32-tbox.build.mozilla.org - - [28/Feb/2008 11:33:34] code 404, message File not found fx-win32-tbox.build.mozilla.org - - [28/Feb/2008 11:33:34] "GET /favicon.ico HTTP/1.1" 404 - It continued to the second build, ignoring the exit status. If the build crashes out then tinderbox should start from the beginning.

Nick Thomas [:nthomas] (UTC+12)

Reporter

Updated

•

17 years ago

Component: Cmd-line Features → Build Config

QA Contact: cmd-line → build-config

Nick Thomas [:nthomas] (UTC+12)

Reporter

Comment 1

•

17 years ago

Attached file Stacks from "hung" firefox (deleted) — Details

Had this problem with profiling again today, so attached with the debugger and did Break All. This is the resulting thread table and callstacks for each thread.

timeless

Updated

•

17 years ago

Assignee: nobody → kengert

Component: Build Config → Security: PSM

Keywords: hang

QA Contact: build-config → psm

Summary: Python profiler should pay attention to firefox exit code → hang in nsNSSHttpRequestSession::internal_send_receive_attempt

(not currently active) Ted Mielczarek

Comment 2

•

17 years ago

should split this into two bugs, one for the python script not exiting with failure if the browser does, and one for the actual hang (which it looks like this bug has already been morphed into).

(not currently active) Ted Mielczarek

Comment 3

•

17 years ago

Filed bug 421281 on the profiling script.

(not currently active) Ted Mielczarek

Comment 4

•

17 years ago

I can reproduce this 100% of the time on OS X using the profiling script, even with a vanilla libxul build. #0 0x908d7a46 in semaphore_timedwait_signal_trap () #1 0x90909daf in _pthread_cond_wait () #2 0x90954de7 in pthread_cond_timedwait () #3 0x0005cd09 in pt_TimedWait (cv=0x1c1bdcc4, ml=0x1c1bdcc4, timeout=<value temporarily unavailable, due to optimizations>) at /Users/luser/build/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:280 #4 0x0005d05c in PR_WaitCondVar (cvar=0x1c1bdcc0, timeout=250) at /Users/luser/build/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:407 #5 0x0215b97a in nsNSSHttpRequestSession::internal_send_receive_attempt (this=0x1c1bd280, retryable_error=@0xb02a348c, pPollDesc=0x0, http_response_code=0xb02a35bc, http_response_content_type=0x0, http_response_headers=0x0, http_response_data=0xb02a35b0, http_response_data_len=0xb02a35b4) at /Users/luser/build/mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp:406 the main thread is sitting here: 0 0x908d7a22 in semaphore_wait_trap () #1 0x9094fb7f in pthread_join () #2 0x00061eb3 in PR_JoinThread (thred=0x3203) at /Users/luser/build/mozilla/nsprpub/pr/src/pthreads/ptthread.c:594 #3 0x021578d5 in nsPSMBackgroundThread::requestExit (this=0x3203) at /Users/luser/build/mozilla/security/manager/ssl/src/nsPSMBackgroundThread.cpp:97 #4 0x0215c63c in nsNSSComponent::DoProfileChangeNetTeardown (this=0x3203) at /Users/luser/build/mozilla/security/manager/ssl/src/nsNSSComponent.cpp:2326

(not currently active) Ted Mielczarek

Comment 5

•

17 years ago

Oh, I should have pasted a full stack from that first thread: 0 0x908d7a46 in semaphore_timedwait_signal_trap () #1 0x90909daf in _pthread_cond_wait () #2 0x90954de7 in pthread_cond_timedwait () #3 0x0005cd09 in pt_TimedWait (cv=0x1c1bdcc4, ml=0x1c1bdcc4, timeout=<value temporarily unavailable, due to optimizations>) at /Users/luser/build/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:280 #4 0x0005d05c in PR_WaitCondVar (cvar=0x1c1bdcc0, timeout=250) at /Users/luser/build/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:407 #5 0x0215b97a in nsNSSHttpRequestSession::internal_send_receive_attempt (this=0x1c1bd280, retryable_error=@0xb02a348c, pPollDesc=0x0, http_response_code=0xb02a35bc, http_response_content_type=0x0, http_response_headers=0x0, http_response_data=0xb02a35b0, http_response_data_len=0xb02a35b4) at /Users/luser/build/mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp:406 #6 0x0215bbc1 in nsNSSHttpRequestSession::trySendAndReceiveFcn (this=0x1c1bd280, pPollDesc=0x0, http_response_code=0xb02a35bc, http_response_content_type=0x0, http_response_headers=0x0, http_response_data=0xb02a35b0, http_response_data_len=0xb02a35b4) at /Users/luser/build/mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp:300 #7 0x00269a52 in ocsp_GetEncodedOCSPResponseFromRequest () #8 0x0026b339 in CERT_CheckOCSPStatus () #9 0x0026e4b6 in CERT_VerifyCert () #10 0x0026e513 in CERT_VerifyCertNow () #11 0x0023d482 in SSL_AuthCertificate () #12 0x0215a48b in AuthCertificateCallback (client_data=0x0, fd=0x1c1b85a0, checksig=-1869776314, isServer=-1869776314) at /Users/luser/build/mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp:907 #13 0x0023b31e in ssl3_HandleHandshakeMessage () #14 0x0023ca1e in ssl3_HandleRecord () #15 0x0023d0b7 in ssl3_GatherCompleteHandshake () #16 0x0023e08c in ssl_GatherRecord1stHandshake () #17 0x00241fee in ssl_Do1stHandshake () #18 0x00242f53 in ssl_SecureSend () #19 0x00242fec in ssl_SecureWrite () #20 0x00246df4 in ssl_Write () #21 0x02157db1 in nsSSLThread::Run (this=0x1b51dba0) at /Users/luser/build/mozilla/security/manager/ssl/src/nsSSLThread.cpp:1029 #22 0x0006189c in _pt_root (arg=0x1b51dc60) at /Users/luser/build/mozilla/nsprpub/pr/src/pthreads/ptthread.c:221 #23 0x90908c55 in _pthread_start () #24 0x90908b12 in thread_start () looks like it's trying to do some ocsp stuff? If I had to make an uneducated guess, I would say that we hit aus looking for extension updates, and then are screwing around in SSL code when the browser wants to shutdown.

(not currently active) Ted Mielczarek

Comment 6

•

17 years ago

I guess that's in Nick's stack too. This sucks, because this hangs fx-win32-tbox on occasion (usually during the nightly). I guess it could hang anyone's browser if they startup and shutdown too quickly.

Severity: normal → major

Reed Loden [:reed]

Updated

•

17 years ago

Flags: blocking1.9?

Reed Loden [:reed]

Updated

•

17 years ago

Priority: -- → P1

Nick Thomas [:nthomas] (UTC+12)

Reporter

Comment 8

•

17 years ago

Hit this again today, which is painful because takes another two and a half hours to get a nightly build out.

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

17 years ago

Flags: blocking1.9?

Reed Loden [:reed]

Updated

•

17 years ago

Flags: blocking1.9?

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 9

•

17 years ago

Marking as blocking, bad effects on build infrastructure related to fast startup-shutdown, which is also a pretty common scenario during upgrade to a new Firefox (extension compat checks, EM restarts, etc.)

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 10

•

17 years ago

Once more, with actual flags.

Flags: blocking1.9? → blocking1.9+

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 11

•

17 years ago

Gentle ping - any update on this?

Nick Thomas [:nthomas] (UTC+12)

Reporter

Comment 12

•

17 years ago

Attached patch [checked in] Temporarily increase the time the browser stays open to 10s (deleted) — Details — Splinter Review

This will hopefully prevent the tinderbox hanging until this is resolved.

Attachment #309626 - Flags: review?(sayrer)

Nick Thomas [:nthomas] (UTC+12)

Reporter

Updated

•

17 years ago

Attachment #309626 - Flags: review?(ted.mielczarek)

(not currently active) Ted Mielczarek

Comment 13

•

17 years ago

Comment on attachment 309626 [details] [diff] [review] [checked in] Temporarily increase the time the browser stays open to 10s worth a shot!

Attachment #309626 - Flags: review?(ted.mielczarek) → review+

Robert Sayre

Updated

•

17 years ago

Attachment #309626 - Flags: review?(sayrer) → review+

Nick Thomas [:nthomas] (UTC+12)

Reporter

Updated

•

17 years ago

Attachment #309626 - Attachment description: Temporarily increase the time the browser stays open to 10s → [checked in] Temporarily increase the time the browser stays open to 10s

timeless

Comment 14

•

17 years ago

http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp&rev=1.62&mark=112#197 perhaps: nsHTTPDownloadEvent::Run or nsHTTPListener should register an observer for some "shutdown" message and do something when they get it.

Mike Schroepfer

Comment 15

•

17 years ago

We ready to go on this?

(not currently active) Ted Mielczarek

Comment 16

•

17 years ago

Schrep: we have a workaround in place in the PGO script, but I haven't seen any progress on the actual hang.

Stacks from "hung" firefox 17 years ago Nick Thomas [:nthomas] (UTC+12) (deleted), text/plain		Details
[checked in] Temporarily increase the time the browser stays open to 10s 17 years ago Nick Thomas [:nthomas] (UTC+12) (deleted), patch	sayrer : review+ ted : review+ beltzner : approval1.9+	Details \| Diff \| Splinter Review
Patch v1 17 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
Patch v2 17 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
another hang 17 years ago Kai Engert (:KaiE:) (deleted), text/plain		Details
yet another hang 17 years ago Kai Engert (:KaiE:) (deleted), text/plain		Details
Patch v3 17 years ago Kai Engert (:KaiE:) (deleted), patch	rrelyea : review+ beltzner : approval1.9b5+	Details \| Diff \| Splinter Review
fix red tinderbox on windows 17 years ago Kai Engert (:KaiE:) (deleted), patch	mtschrep : approval1.9b5+	Details \| Diff \| Splinter Review
fix leaks, v5 17 years ago Kai Engert (:KaiE:) (deleted), patch	rrelyea : review+	Details \| Diff \| Splinter Review
screenshot 17 years ago Carsten Book [:Tomcat] (deleted), image/jpeg		Details
regression culprit 17 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
unexpected stack 17 years ago Kai Engert (:KaiE:) (deleted), text/plain		Details
Patch v7 17 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
Patch v8 17 years ago Kai Engert (:KaiE:) (deleted), patch	KaiE : review+	Details \| Diff \| Splinter Review
Patch v9 17 years ago Kai Engert (:KaiE:) (deleted), patch	KaiE : review+ beltzner : approval1.9+	Details \| Diff \| Splinter Review