Closed
Bug 429110
Opened 17 years ago
Closed 17 years ago
Random crashing ( [@ PR_AtomicIncrement] ?) ( [@ nsIOService::NewURI] ?)
Categories
(Core :: Security: PSM, defect)
Core
Security: PSM
Tracking
()
RESOLVED
FIXED
People
(Reporter: stevee, Assigned: KaiE)
References
()
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(8 files, 2 obsolete files)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008041406 Minefield/3.0pre ID:2008041406 This isn't going to be a very useful bug report, but recently I've been occasionally crashing for no apparent reason. I will be reading pages I normally visit and then POW! Firefox crashes. This is using my daily Firefox profile. http://crash-stats.mozilla.com/report/index/9eb41a87-0a3c-11dd-8ecb-001cc4e2bf68 http://crash-stats.mozilla.com/report/index/185f4e5c-0adb-11dd-98a2-001b78bc73ea Signature PR_AtomicIncrement UUID 185f4e5c-0adb-11dd-98a2-001b78bc73ea Time 2008-04-15 03:58:51-07:00 Uptime 26 Product Firefox Version 3.0pre Build ID 2008041406 OS Windows NT OS Version 5.1.2600 Service Pack 2 CPU x86 CPU Info AuthenticAMD family 6 model 8 stepping 1 Crash Reason EXCEPTION_ACCESS_VIOLATION Crash Address 0x42004c Comments Frame Module Signature [Expand] Source 0 nspr4.dll PR_AtomicIncrement mozilla/nsprpub/pr/src/misc/pratom.c:306 1 xul.dll nsACString_internal::Assign mozilla/xpcom/string/src/nsTSubstring.cpp:396 2 xul.dll nsIOService::NewURI mozilla/netwerk/base/src/nsIOService.cpp:485 3 xul.dll nsIOService::NewChannel mozilla/netwerk/base/src/nsIOService.cpp:579 4 xul.dll nsHTTPDownloadEvent::Run mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp:119 5 xul.dll nsThread::ProcessNextEvent mozilla/xpcom/threads/nsThread.cpp:510 6 xul.dll nsBaseAppShell::Run mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:170 7 nspr4.dll PR_GetEnv 8 firefox.exe wmain mozilla/toolkit/xre/nsWindowsWMain.cpp:87 9 firefox.exe firefox.exe@0x217f 10 kernel32.dll BaseProcessStart From http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0pre&range_value=2&signature=PR_AtomicIncrement we can see that the crashing appears on the 2008-04-12 build Checkins to module PhoenixTinderbox between 2008-04-11 06:00 and 2008-04-12 07:00 : http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=PhoenixTinderbox&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-04-11+06&maxdate=2008-04-12+07&cvsroot=%2Fcvsroot
there's only one thread. please find some other nss/psm bug where i complained that there's only one thread. :)
Reporter | ||
Comment 3•17 years ago
|
||
timeless, this report http://crash-stats.mozilla.com/report/index/49a958fd-0aea-11dd-ad14-001cc45a2c28 has other threads listed. Is that a help at all?
Updated•17 years ago
|
Flags: blocking-firefox3?
Reporter | ||
Comment 4•17 years ago
|
||
I had another random crash today at startup. This may be related, or a different thing, so I will post the stack here anyway. http://crash-stats.mozilla.com/report/index/9df3a71c-0b1b-11dd-998a-001cc4e2bf68 Signature nsIOService::NewURI(nsACString_internal const&, char const*, nsIURI*, nsIURI**) UUID 9df3a71c-0b1b-11dd-998a-001cc4e2bf68 Time 2008-04-15 11:41:24-07:00 Uptime 22 Product Firefox Version 3.0pre Build ID 2008041506 OS Windows NT OS Version 5.1.2600 Service Pack 2 CPU x86 CPU Info AuthenticAMD family 6 model 8 stepping 1 Crash Reason EXCEPTION_ACCESS_VIOLATION Crash Address 0x43004e Comments Frame Module Signature [Expand] Source 0 xul.dll nsIOService::NewURI mozilla/netwerk/base/src/nsIOService.cpp:485 1 xul.dll nsIOService::NewChannel mozilla/netwerk/base/src/nsIOService.cpp:579 2 xul.dll nsHTTPDownloadEvent::Run mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp:119 3 xul.dll nsThread::ProcessNextEvent mozilla/xpcom/threads/nsThread.cpp:510 4 xul.dll nsBaseAppShell::Run mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:170 5 nspr4.dll PR_GetEnv 6 firefox.exe wmain mozilla/toolkit/xre/nsWindowsWMain.cpp:87 7 firefox.exe firefox.exe@0x217f 8 kernel32.dll BaseProcessStart
Summary: Random crashing ( [@ PR_AtomicIncrement] ?) → Random crashing ( [@ PR_AtomicIncrement] ?) ( [@ nsIOService::NewURI] ?)
Comment 5•17 years ago
|
||
Can't block on random crashes with no STR to cause it or specific area in which the crash occurs.
Flags: blocking-firefox3? → blocking-firefox3-
Comment 6•17 years ago
|
||
I'm getting random crashes as well. Although, I think I've found one way to reproduce: 1) Go to a site containing flash 2) While that site is reloading open another tab 3) Try to scroll in the newly opened tab Result: Crash Although, this doesn't always happen, but it may be of some help.
Comment 7•17 years ago
|
||
Ouch! Could be my system as I'm pushing it for its age to run Vista HP with some rather old hardware, however - using the steps in comment #6 I had a total hard crash - system rebooted. Have never seen this since starting to use Vista in over a year now. 1. had a flash vid playing from youtube 2. went to betanews.com and while the page was loading in another tab started to scroll the page - 3. System Crash!/Reboot Nothing in the Event Manager/Appliction log or system logs points to the problem other than 'Unexpected Shutdown' Yeah No-kidding Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9pre) Gecko/2008041610 Minefield/3.0pre Firefox/3.0 ID:2008041610
Comment 8•17 years ago
|
||
I also am getting random crashes. Mostly just the browser crashes, but vista64 froze a few times completely too. I suspected flash to be the cause, which was a good guess I guess, reading the comments in this bug.
Comment 9•17 years ago
|
||
Given the comments, I'm fairly sure my issue is related to this. I'm getting hangs, though. It never crashes. I have to kill the browser.... memory usage stays constant, and I've tried leaving it for several minutes without it ever coming back alive. I've had it happen 16+ times in a day. Recently I've been working with a lot of Flash applications in my development, which is probably the reason for that. -[Unknown]
Comment 11•17 years ago
|
||
This is topcrash #4 for the last 2 days. That makes it a serious regression, IMO.
Flags: blocking-firefox3- → blocking-firefox3?
Comment 12•17 years ago
|
||
Looks like either the pointer to the HTTP request session is garbage or the access to the string on multiple threads is not being happy. Either way, looks like a regression from the resent PSM changes...
Comment 13•17 years ago
|
||
I got other crash on orkut: http://crash-stats.mozilla.com/report/index/47c870e8-0fb7-11dd-8381-001321b13766
Comment 15•17 years ago
|
||
--> Core::General
Flags: blocking-firefox3?
Product: Firefox → Core
QA Contact: general → general
Comment 16•17 years ago
|
||
Carrying over Jesse's blocking nomination from comment 11
Flags: blocking1.9?
Comment 17•17 years ago
|
||
Is this the same crash that occurs for this site?: http://www.yuribou.net/blog I get: http://crash-stats.mozilla.com/report/index/c47864d9-1044-11dd-9b5b-0013211cbf8a
Assignee | ||
Comment 18•17 years ago
|
||
Assignee | ||
Comment 19•17 years ago
|
||
Assignee | ||
Comment 20•17 years ago
|
||
Assignee | ||
Comment 21•17 years ago
|
||
Yes, I can reproduce a crash using the URL given in comment 17. My first thought was, this can't be related to PSM changes, because UI's URL is http. However, that page seems to load a webbug pixel gif from paypal over https... Before we crash, there is a period of time when Firefox hangs with a busy loop. I've captured a stack while we still run, see attachment 317040 [details]. After we crash, I have the stack from attachment 317041 [details]. This is similar to what UI got, we have the same root. But we crash in a different location, it smells like memory corruption. I've also attached the stacks of the other threads after the crash, attachment 317044 [details].
Assignee | ||
Comment 22•17 years ago
|
||
I've backed out the patch from bug 420187 for testing purposes. The page from comment 17 still takes ages to load, and Firefox is stuck in a busy loop for a while. But then it eventually wakes up, displays the web page, connects to paypal, and succeeds in loading. No crash. I'm not yet fully convinced that my patch is the culprit for the crash, but I agree, at least it triggered it.
Assignee | ||
Updated•17 years ago
|
Comment 23•17 years ago
|
||
Does it help to copy mRequestSession->mURL into the download event at event creation time, so you don't have multiple threads accessing the non-threadsafe string object? That assert in your first post-crash stack trace is Really Bad. At what point does it fire? What do the thread stacks look like then?
Comment 24•17 years ago
|
||
Trying to track down what it is on the page that caused the hang: Removing all sidebar stuff didn't affect hang. Removing the paypal webbug didn't affect the hang (so PSM is probably ruled out). Removing first block of bogus links (the one hidden with the font tag, tpao.org) didn't affect the hang. Removing the third block of bogus links (the one hidden with the u tag, secondlife.reuters.com) didn't affect the hang. Removing the second block of bogus links (the one not hidden due to the corrupted a tag, fortt.com) resulted in no hang. (As I suspected, but wanted to clean up other stuff first.) Replacing the corrupted [a href="http://www.dreamhost.com%3E%20Dreamhost%3C/a%3E%3C/p%3E%0D%0A%20%20%20%20%3C/div%3E%0D%0A%3C/div%3E%0D%0A%3C%21--%20%7E%20--%3E%3Cu%20style=" display:none=""][/a] With [a href="http://www.dreamhost.com"]Dreamhost[/a][div style="display: none;"] resulted in no hang. [Note: replaced angle brackets with square brackets here because I'm not sure what will happen if I try to place the original markup in the bugzilla comment.] Leaving the above modification in, but creating a new bogus link block using the same markup as the original and a few of the bogus links caused no hang. Reinserting all of the bogus links (1500) caused the hang again. Gradually deleting chunks of that block reduced the hang time. At 321 links it was no more than a minor spike on the CPU. At 715 it was 6 seconds. At 977 links it was 21 seconds. With all 1500 links it was 35 seconds. So some combination of the malformed markup that was trying to hide the links and the large quantity of links being hidden caused the hang. With proper markup, the large number of links was not an issue. With the improper markup, a small number of links was not an issue.
Assignee | ||
Comment 25•17 years ago
|
||
(In reply to comment #24) > Trying to track down what it is on the page that caused the hang: David, in your experiments, did you ever crash after the hang?
Assignee | ||
Comment 26•17 years ago
|
||
(In reply to comment #23) > That assert in your first post-crash stack trace is Really Bad. At what point > does it fire? What do the thread stacks look like then? I think it fires immediately before the crash. I just ran with XPCOM_DEBUG_BREAK=stack ./firefox -no-remote -P trunktest -g -d gdb and I've attached the output and full stack from gdb. I think I should run once more with XPCOM_DEBUG_BREAK=stack so we get the real full stack of the assertion. I'll try Boris' other proposal soon.
Comment 27•17 years ago
|
||
The performance problem there is a separate issue from the crash. The stack trace in comment 18 indicates that we're just doing layout, which can easily get bogged down with certain kinds of deeply-nested DOMs. It's probably worth filing a separate bug on the performance issue, especially if it's easy to reproduce with a standalone HTML file that doesn't involve hitting this site. Oh, and bugzilla will escape whatever HTML you put into comments, so you can just use angle brackets without fear. ;)
Comment 28•17 years ago
|
||
Kai, XPCOM_DEBUG_BREAK=stack more or less outputs garbage unless you pipe the output through fix-linux-stack.pl (which you didn't in this case, hence all it hands out are library offsets on your system, which are of limited use to anyone else). Can you run with XPCOM_DEBUG_BREAK=break and do a gdb backtrace at that point? Or take that output and run it through fix-linux-stack.pl?
Comment 29•17 years ago
|
||
Oh, and as I said I'd like to see all the thread stacks, not just the one the assert if firing on, at the point in time when we hit the assert.
Assignee | ||
Comment 30•17 years ago
|
||
Full stack for assertion (stack 1), I told to debugger to continue, and it immediately crashed (stack 2)-
Comment 31•17 years ago
|
||
OK. Can you stop at the assert again, and see what |str| looks like there? I'd really like to see all the members of that object.
Assignee | ||
Comment 32•17 years ago
|
||
Assignee | ||
Comment 33•17 years ago
|
||
(In reply to comment #31) > OK. Can you stop at the assert again, and see what |str| looks like there? > I'd really like to see all the members of that object. (gdb) up #2 0x00360187 in nsACString_internal::Assign (this=0xbf93ca04, str=@0xb29b4c0) at /home/kaie/moz/head/mozilla/xpcom/string/src/nsTSubstring.cpp:387 387 NS_ASSERTION(str.mFlags & F_TERMINATED, "shared, but not terminated"); (gdb) print str $1 = (const nsACString_internal &) @0xb29b4c0: {<nsCSubstring_base> = {<No data fields>}, mData = 0xfb5628 "\034οΏ½οΏ½", mLength = 16471688, mFlags = 16471716}
Assignee | ||
Comment 34•17 years ago
|
||
Boris, thanks a lot for asking helpful questions. I'm now convinced I'm guilty. Working on a patch.
Assignee | ||
Updated•17 years ago
|
Assignee: nobody → kengert
Component: General → Security: PSM
OS: Windows XP → All
QA Contact: general → psm
Hardware: PC → All
Comment 36•17 years ago
|
||
(In reply to comment #25) > (In reply to comment #24) > > Trying to track down what it is on the page that caused the hang: > > David, in your experiments, did you ever crash after the hang? > The site was crashing the browser on a build a couple days ago, but the current build just hangs for a period of time. I've stripped the web page down to the basics and will file another bug (bug 430332) for that. It also involved 2 of the CSS rules, in addition to the other factors.
Assignee | ||
Comment 37•17 years ago
|
||
Explanation: when we crash, we are accessing memory which has been already destroyed. Originally, the code that executes an OCSP request always waited for the result. If it was necessary to cancel, it would send a cancel event, and wait until the download really got canceled. Because of this design, object nsHTTPDownloadEvent simply used a pointer without any ownership or reference counting. With the recent work in bug 420187 I changed the above design, in order to avoid deadlocks, the caller no longer waits. And here I made the mistake: I missed the no-ownership pointer. The patch I've attached introduces reference counting for the object that needs to survive longer.
Assignee | ||
Updated•17 years ago
|
Attachment #317083 -
Flags: review?(rrelyea)
Comment 38•17 years ago
|
||
Someone more familiar with this code needs to review that patch (and in particular, the ownership model)...
Assignee | ||
Comment 39•17 years ago
|
||
Thanks to David for reducing the web page. I used the test case he attached to bug 430332 and added the webbug (loading image from paypal with https). This attachment crashes for me, but works with the patch applied.
Comment 40•17 years ago
|
||
Comment on attachment 317083 [details] [diff] [review] Patch v1 r- because of the following reservations: 1) is ++ deemed to be a valid atomic operation on all mozilla platforms for 32bit values. (that is are there platforms where ++ expands to: Load r1, mRefCount Add r1, 1 Store r1, mRefCount Or do we always know that we generate: Add mRefCount,1 If not, then mRefCount needs to be PR_AtomicIncrement(). 2) Explicitly incrementing the reference count seems wrong to me. It seems to need a function nsNSSHttpRequestSession::AddRef() or something similar that gets the reference (and probably returns 'this'). I think case 1 is a real bug, case 2 is more of a style thing. bob
Attachment #317083 -
Flags: review?(rrelyea) → review+
Comment 41•17 years ago
|
||
Comment on attachment 317083 [details] [diff] [review] Patch v1 r- because of the following reservations: 1) is ++ deemed to be a valid atomic operation on all mozilla platforms for 32bit values. (that is are there platforms where ++ expands to: Load r1, mRefCount Add r1, 1 Store r1, mRefCount Or do we always know that we generate: Add mRefCount,1 If not, then mRefCount needs to be PR_AtomicIncrement(). 2) Explicitly incrementing the reference count seems wrong to me. It seems to need a function nsNSSHttpRequestSession::AddRef() or something similar that gets the reference (and probably returns 'this'). I think case 1 is a real bug, case 2 is more of a style thing. bob
Attachment #317083 -
Flags: review+ → review-
Assignee | ||
Comment 42•17 years ago
|
||
Attachment #317083 -
Attachment is obsolete: true
Attachment #317103 -
Flags: review?(rrelyea)
Attachment #317083 -
Flags: review?(bzbarsky)
Comment 43•17 years ago
|
||
Comment on attachment 317103 [details] [diff] [review] Patch v2 r+ much better (though I would have liked to see this->AddRef() rather than just AddRef(), this patch is sufficient). bob
Attachment #317103 -
Flags: review?(rrelyea) → review+
Assignee | ||
Comment 44•17 years ago
|
||
Addressed Bob's proposal to use this->AddRef() carrying forward r=rrelyea requesting approval
Attachment #317103 -
Attachment is obsolete: true
Attachment #317105 -
Flags: review+
Attachment #317105 -
Flags: approval1.9?
Comment on attachment 317105 [details] [diff] [review] Patch v2 with nit addressed a=shaver
Attachment #317105 -
Flags: approval1.9? → approval1.9+
Flags: in-testsuite?
Flags: blocking1.9?
Flags: blocking1.9+
Assignee | ||
Comment 46•17 years ago
|
||
checked in
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 47•17 years ago
|
||
i just installed a build post-patch: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9pre) Gecko/2008042218 Minefield/3.0pre ID:2008042218 The testcase from #39 hangs Fx indefinitely for me. On first clicking it, there is a short (10seconds or so) where Fx freezes, then it loads the page. Then some seconds later Fx will freeze and become unresponsive.
Comment 48•17 years ago
|
||
(In reply to comment #47) > i just installed a build post-patch: > Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9pre) Gecko/2008042218 > Minefield/3.0pre ID:2008042218 > > The testcase from #39 hangs Fx indefinitely for me. On first clicking it, there > is a short (10seconds or so) where Fx freezes, then it loads the page. Then > some seconds later Fx will freeze and become unresponsive. > I see now, if you stop mousing over the links, the freeze will subside in about 10 seconds. Apologies! Still rather nasty :)
Comment 49•17 years ago
|
||
Bryan, please don't quote comments in their entirety. The whole point of comment 39 is that the testcase comes from a performance bug (the one referenced in that comment, and comment 36). So yes, you'll see a performance issue on that testcase. That has nothing to do with this bug.
Comment 50•17 years ago
|
||
I'm seeing this problem as well. The instructions in comment #6 sound strikingly similar to what I'm doing when it happens. Usually I'm at www.linuxtoday.com and doing lots of right-click -> "open in new tab" when this happens. I see this BZ is marked resolved. I am using 2.0.0.14. Does this mean the fix will be in 2.0.0.15 or just a standalone Gecko update?
Reporter | ||
Comment 51•17 years ago
|
||
bryan, this bug was filed against the trunk builds, not the 2.0 branch, so whatever you're seeing should be covered by another bug and not this one.
Comment 52•17 years ago
|
||
Operating System: Windows Vista Home Premium (32) I was closing down a web page tab (a forum), leaving two pages/tabs still open (myspace and yahoo mail), when firefox crashed. Windows gave a pop up information box stating that the program was encountering issues, and gave the option to close the program. I clicked on the "close program" button. I re-opened firefox and was given the option of restoring the session, so nothing was lost from the prior session. I was playing an online playlist from a myspace page on one of the remaining open tabs. More flash issues?
Updated•13 years ago
|
Crash Signature: [@ PR_AtomicIncrement]
[@ nsIOService::NewURI]
You need to log in
before you can comment on or make changes to this bug.
Description
•