Closed Bug 470274 Opened 16 years ago Closed 16 years ago

[Mac] Firefox cannot visit any websites when waking from sleep

Categories

(Core :: Networking, defect, P2)

x86
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: cmtalbert, Assigned: dcamp)

References

Details

(Keywords: fixed1.9.1)

Attachments

(1 file)

This happens consistently but it is not readily reproducible. It happens to me about every two days, and it occurs on the latest nightly of both m-c trunk builds as well as shiretoko builds (1.9.1). == Steps == 1. Start Firefox in a new profile. Browse to three sites in three different tabs. 2. Ensure that one tab is a site where you have to log in, and have it remember your password (in my case the first two sites were mozilla.org sites (the default front pages) and the third site was facebook. 3. Put your mac to sleep (I do this by closing the lid) 4. Wake your mac from sleep. (by opening the lid) 5. Firefox will sit there, ready to go. But anything that requires firefox to hit the network will fail. You can: * Create a new tab, Type a URL in the URL bar, hit enter. Firefox will decalare "Done" in the status bar and stare at you with a blank page. * Interact with the existing sites from step 2 - nothing happens. * Put something in the search box, hit enter - you get a blank page and the "Done" notification. The only way out of this predicament is to restart the browser. This started occuring intermittently in early november. When complaining on IRC, I found that others have also been experiencing this, so it isn't just me, and it isn't my profile, though I did manage to reproduce this on clean profiles as well. However, I could not get this to reproduce on a debug build with logging enabled. So unfortunately, I don't know why we seem to be dropping our networking stuff on the floor. I'm running OS X 10.5.5 In fact, this only *started* happening after I upgraded from OS X 10.4, which I did in early November. The Minefield UA string for the build I just repro'd it on: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20081217 Minefield/3.2a1pre jrabbit on #qa said that there are some sleep tools on mac os x that might help simulate and recreate this issue, in particular SleepX - /Developer/Applications/Utilities/SleepX.app I don't think we want to ship with a bug like this in the product. Setting blocking ? and qawanted.
Flags: blocking-firefox3.1?
Also on nightly build 20081217020325 and OSX 10.5.6. I noticed something else. After the Mac sleeps overnight I *always* get this problem. But not for shorter sleeps during the day. Can't say how short at this point...
same problem if you open a new window? (instead of new tab)
(In reply to comment #2) > same problem if you open a new window? (instead of new tab) Yes, even if I open a new window. In fact, when just checking my bug mail, I encountered this issue this morning, and spent some time debugging it. With Gavin's help, we found that for some reason, File->Work Offline is being set after the browser returns from Sleep. When you get into this state, you can see that File->Work Offline is definitely checked. Now, the interesting thing here is that when you are offline and you attempt to go to a new site, you are supposed to get an informational page telling you that you are in "Offline Mode" and therefore you cannot browse the web. However with this bug, you do NOT get that message. Digging deeper, I discovered that tabbrowser.xml::onLocationChange which should be called anytime you enter a new URL in the URL bar and hit enter, is NOT called in this state. If you click the "Work Offline" menu entry which seems to shock firefox out of its odd state, then you will see that onLocationChange is called and then you will get the standard "You are offline and cannot browse the web" message. If you click Work Offline again, then you will be online and everything functions normally. So, I think (hypothesizing here) that this is what is happening: 1. Mac resumes from sleep 2. Firefox comes back before the networking connection does, determines we have no network connection and puts the "WorkOffline" observer into a weird state. 3. Mac obtains its network connection, but Firefox is none the wiser. 4. The first click to File->Work Offline causes the Observer to reset itself to a coherent state, forcing you offline. This is why the Offline subsystem starts working. 5. Clicking File->Work Offline again does what it's supposed to and causes you to go online. I'm not entirely sure what component offline/online bugs should be in. A quick search shows them all over the place. CCing dcamp to figure that out. Also, bug 327381 looks like it might be related to this behavior. And the current build I just recreated this in: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090124 Minefield/3.2a1pre
Keywords: qawanted
Clint, this sounds like bug 469459 which should probably reopened again if you still have problems after hibernation.
--> Core::Networking for further evaluation based on comment 3
Component: Shell Integration → Networking
Flags: blocking-firefox3.1?
Product: Firefox → Core
QA Contact: shell.integration → networking
Version: 3.1 Branch → Trunk
Flags: blocking1.9.1?
Clint: shouldn't you be able to test your hypothesis in comment 3 by turning off the airport, getting a wired connection, sleeping the computer, pulling the wire, and then restoring? I sleep my computer every day, and wake it (often in a different environment, with a different network) and don't see this issue. We don't have really solid STR here, so I have a lot of trouble blocking on it. Please renominate if you can reproduce consistently.
Flags: blocking1.9.1? → blocking1.9.1-
Oh - also, have we tested the difference between sleep and hibernate here? You can install the "SmartSleep" prefpane to force your system to do one vs. the other.
I haven't been able to reproduce this. Next time someone can, can you run these commands in the console and let me know the output: Components.classes["@mozilla.org/network/network-link-service;1"].getService(Components.interfaces.nsINetworkLinkService).isLinkUp Components.classes["@mozilla.org/network/network-link-service;1"].getService(Components.interfaces.nsINetworkLinkService).linkStatusKnown Components.classes["@mozilla.org/network/io-service;1"].getService(Components.interfaces.nsIIOService2).manageOfflineStatus Components.classes["@mozilla.org/network/io-service;1"].getService(Components.interfaces.nsIIOService2).offline And also verify the checked state of File->Work Offline. And while you're at it, the output of /sbin/ifconfig in that state might be useful too.
(In reply to comment #0) > 2. Ensure that one tab is a site where you have to log in, and have it remember > your password (in my case the first two sites were mozilla.org sites (the > default front pages) and the third site was facebook. This step might be the key. Perhaps reproducing this bug requires having sites loaded that have in-progress XMLHTTPRequests when you put the computer to sleep? I have two instances of Gmail in two different browser processes running pretty much all the time, which might explain why I see this more than others would.
(In reply to comment #8) I just reproduced this, and got the following values: > nsINetworkLinkService.isLinkUp true > nsINetworkLinkService.linkStatusKnown true > nsIIOService2.manageOfflineStatus true > nsIIOService2.offline false which is pretty much what I would expect... unchecking File->Work Offline twice fixed it.
Thunderbird has a similar problem (bug 475922) - I think we make XMLHttpRequests for rss feeds, but we do it rarely...
We check for new mail if you've hibernated for more than 10 minutes, though...
I get the same results for Thunderbird, but necko is unable to make new connections, so it is not on the same page. Does it cache the ioService->GetOffline result somewhere and never check it again, unless it gets a notification? Thunderbird doesn't cache the result, so the mailnews code knows we're online and tries to run urls.
see https://bugzilla.mozilla.org/show_bug.cgi?id=473483#c26 for the cause of this problem - we're getting re-entrant calls to nsIOService::SetOffline.
What I see happening is that when I re-open my laptop, we process an event that puts us offline, and send out that notification. The socket transport service receives that notification and shuts down a thread, which causes events to get pumped again: #29 0x11c0dcac in nsAppShell::OnProcessNextEvent at nsAppShell.mm:766 #30 0x004ff7b2 in nsThread::ProcessNextEvent at nsThread.cpp:497 #31 0x00488a96 in NS_ProcessNextEvent_P at nsThreadUtils.cpp:227 #32 0x004fff4d in nsThread::Shutdown at nsThread.cpp:465 #33 0x11796276 in nsSocketTransportService::Shutdown at nsSocketTransportService2.cpp:445 In the process of pumping events, OS/X notices that it's actually online, and we generate a going online event, while we're still notifying listeners that we're going offline: #4 0x004a095c in nsObserverService::NotifyObservers at nsObserverService.cpp:181 #5 0x118528fc in nsNetworkLinkService::SendEvent at nsNetworkLinkService.mm:207 #6 0x11852946 in nsNetworkLinkService::ReachabilityChanged at nsNetworkLinkService.mm:220 #7 0x95b38cc6 in rlsPerform #8 0x931825f5 in CFRunLoopRunSpecific #9 0x93182cd8 in CFRunLoopRunInMode #10 0x900d9d75 in -[NSRunLoop(NSRunLoop) runMode:beforeDate:] #11 0x11c0e4fb in nsAppShell::ProcessNextNativeEvent at nsAppShell.mm:615 So this is a core bug - I don't know how we want to fix it, though. One thought is to have the network link service wait a little bit before generating notifications, and check that the state has truly changed before sending the notification, to handle the case where we get online->offline followed immediately by offline->online, perhaps by using a timeout. This would prevent any notifications in this situation, at the cost of leaving a small window where the OS knows we're offline but the app does not. Or we could make the io service notice when it's re-entered and try to generate the correct notifications.
Attached patch one way (deleted) — Splinter Review
This patch saves up the last SetOffline() value when called while in the process of bringing down the services. When the toplevel SetOffline() is reached again, it'll re-apply the last value if necessary. I still can't reproduce this, could somebody that can try this out?
I'll try this out right now - one thing I've found is that the socket transport service does not like being Initted while the call to shutdown is still on the stack. I tweaked the ioservice to send the right notifications, but that still left the transport service in an unusable state.
so yes, this seems to be working much better - the patch doesn't quite cleanly apply to my 1.9.1 branch because it has mManageOfflineStatus(PR_FALSE) + , mSettingOffline(PR_FALSE) + , mSetOfflineValue(PR_FALSE) , mManageOfflineStatus(PR_TRUE) but I applied that part by hand, leaving mManageOfflineStatus(PR_FALSE), and tried half a dozen times to break it w/o success.
The patch works perfectly with my 100% reproducible Thunderbird setup, too.
Dave, do you want to request code review from someone? I'm not sure who the necko peers are anymore :-)
And I flipped offline.autoDetect to false by default on OS X for Thunderbird in bug 473483, so anyone testing in Tb will need to set it back to true to tell what they're seeing.
Renomming, now that we have a much better idea of what it is.
Flags: blocking1.9.1- → blocking1.9.1?
Not sure, if I'm right in this bug. I've the same problems using Windows Vista 32 Bit SP-1 with Thunderbird nightlies for many month - maybe a year. After waking up from Vistas sleep / energy saving mode (using my desktop PC), Thunderbird 3 nightly isn't able to reconnect to network and Firefox 3.0.* is frozen as long until I'm closing Thunderbird. This problem affects all Firefox instances over multiple user accounts. After closing / killing Thunderbird process, Firefox is working again over the multiple user accounts.
Losing network connectivity in such a way that users are compelled to restart the app is serious, seems like we should block.
Flags: blocking1.9.1? → blocking1.9.1+
Attachment #360738 - Flags: superreview?(bzbarsky)
Attachment #360738 - Flags: review?(bzbarsky)
Priority: -- → P2
Comment on attachment 360738 [details] [diff] [review] one way Document the members a bit, and looks good.
Attachment #360738 - Flags: superreview?(bzbarsky)
Attachment #360738 - Flags: superreview+
Attachment #360738 - Flags: review?(bzbarsky)
Attachment #360738 - Flags: review+
What's next here? Is dcamp's patch good enough?
Assignee: nobody → dcamp
Status: NEW → RESOLVED
Closed: 16 years ago
Keywords: fixed1.9.1
Resolution: --- → FIXED
This appears to have busted the Thunderbird build.
Error is: /Volumes/Build/osx-comm-1.9.1-check/build/mozilla/netwerk/base/src/nsIOService.cpp:166: error: class 'nsIOService' does not have any field named 'mSettingOfflineValue'
Nevermind. Red seems to be clearing now.
Blocks: 473483
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: