318801 - Exiting firefox with AFS-based profile: ~/.mozilla/firefox/default.xxx/.parentlock is not removed

Reporter

Description

•

19 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 Very nearly always, when I quit firefox and try to run it again, it refuses to run unless I manually delete the ~/.mozilla/firefox/default.xxx/.parentlock file. Reproducible: Always Steps to Reproduce: 1. run firefox 2. open a few windows 3. close firefox 4. run firefox again Actual Results: firefox complains that it's already running, and suggests closing all firefox windows, or restarting the system. Expected Results: Firefox would simply run. This reminds me of the profile already in use bugs in the firefox startup script. It shouldn't matter that's it's already in use; *** I'M *** the one using it. Just open another window already, and stop bugging me! :)

Jason Quinn

Comment 1

•

19 years ago

(In reply to comment #0) I have the same problem. I also use Linux (Red Hat). The name of the profile directory is whatever it happens to be for a given user. Mine is "ao1g3bdz.default". This is a very annoying bug and novice users have no clue that such a thing as a lock file even exists so they are totally baffled.

Eddy De Greef

Comment 2

•

19 years ago

(In reply to comment #1) Same problem here with 1.5: quite often the .parentlock is left behind. System: Red Hat Enterprise Linux AS release 3 (Taroon Update 4)

Benjamin Smedberg

Comment 3

•

19 years ago

Brendan, this could be fallout from the new locking changes.

Brendan Eich [:brendan]

Comment 4

•

19 years ago

You mean roc's new changes? I don't see how. I happen to know the reporter (Hi Greg!) and he said that his home directory is AFS-hosted. So I bet this bug is peculiar to AFS, which is why it bites few enough people to be UNCO. Need some reproducibility to confirm the bug. Anyone have testing access to an AFS exported filesystem? /be

Brendan Eich [:brendan]

Comment 5

•

19 years ago

Jason, Eddy: are you guys using NFS or anything akin (AFS, Samba) for your home directories in which your profiles live? /be

Jason Quinn

Comment 6

•

19 years ago

(In reply to comment #5) > Jason, Eddy: are you guys using NFS or anything akin (AFS, Samba) for your home > directories in which your profiles live? Yes, I am on a network using AFS! I have been using 1.5 since Beta 1 and have noticed this problem on every release since then.

Greg Schussman

Reporter

Comment 7

•

19 years ago

(In reply to comment #6) > (In reply to comment #5) > > Jason, Eddy: are you guys using NFS or anything akin (AFS, Samba) for your home > > directories in which your profiles live? > > Yes, I am on a network using AFS! > > I have been using 1.5 since Beta 1 and have noticed this problem on every > release since then. > That matches my experience. The problem first showed up in 1.5 Beta 1.

Brendan Eich [:brendan]

Updated

•

19 years ago

Status: UNCONFIRMED → NEW

Component: Startup and Profile System → Profile: BackEnd

Ever confirmed: true

Product: Firefox → Core

QA Contact: startup → profile-manager-backend

Summary: On exiting firefox, ~/.mozilla/firefox/default.xxx/.parentlock is not removed → Exiting firefox with AFS-based profile ~/.mozilla/firefox/default.xxx/.parentlock is not removed

Version: unspecified → Trunk

Brendan Eich [:brendan]

Comment 8

•

19 years ago

The fix for 1.5 was roc's patch for bug 151188. It should not result in a hang on restart with an AFS-hosted profile directory, unless AFS has a bug where an fcntl file lock is not automatically cleaned up when the process that acquired it exits. Can someone try running a simple test program after cd'ing to an AFS directory that you can write in, and that contains no files? I'll attach one. /be

Summary: Exiting firefox with AFS-based profile ~/.mozilla/firefox/default.xxx/.parentlock is not removed → Exiting firefox with AFS-based profile: ~/.mozilla/firefox/default.xxx/.parentlock is not removed

Brendan Eich [:brendan]

Comment 9

•

19 years ago

Attached file test program (deleted) — Details

This should compile via gcc -o t t.c or whatever (cc, etc.). Run it once to create .parentlock in the cwd. Run it again and report if/how it fails. /be

Greg Schussman

Reporter

Comment 10

•

19 years ago

(In reply to comment #9) > Created an attachment (id=205439) [edit] > test program > > This should compile via gcc -o t t.c or whatever (cc, etc.). Run it once to > create .parentlock in the cwd. Run it again and report if/how it fails. > > /be > I ran it in one of my afs directories. First time creates the .parentlock file. Second time removes the .parentlock file. Everything seems fine; no fireworks. Greg

Jason Quinn

Comment 11

•

19 years ago

(In reply to comment #9) > This should compile via gcc -o t t.c or whatever (cc, etc.). Run it once to > create .parentlock in the cwd. Run it again and report if/how it fails. I ran it on two different machines. The first run creates a ".parentlock" file but a second run seems to do nothing. It does not delete the file. Here's the system info: REDHAT 2.4.21-32.0.1.EL OpenAFS 1.2.13 REDHAT 2.4.20-31.9 #1 OpenAFS 1.2.11

Brendan Eich [:brendan]

Updated

•

19 years ago

Attachment #205439 - Attachment is patch: false

Brendan Eich [:brendan]

Comment 12

•

19 years ago

(In reply to comment #10) > I ran it in one of my afs directories. First time creates the .parentlock > file. Second time removes the .parentlock file. Everything seems fine; no > fireworks. Odd, nothing in the test program calls unlink or remove. Are you sure the file is removed after every other run? Does this every-other-time behavior continue the third and fourth times? (In reply to comment #11) > I ran it on two different machines. The first run creates a ".parentlock" file > but a second run seems to do nothing. It does not delete the file. > > Here's the system info: > > REDHAT 2.4.21-32.0.1.EL > OpenAFS 1.2.13 > > REDHAT 2.4.20-31.9 #1 > OpenAFS 1.2.11 Jason, please try this: comment out the very last close(fd) call in the program, recompile, and retest. Greg, if you can do the same, please post results as well. /be

Jason Quinn

Comment 13

•

19 years ago

(In reply to comment #12) > Jason, please try this: comment out the very last close(fd) call in the > program, recompile, and retest. Greg, if you can do the same, please post > results as well. I get the same result. The first run creates the file and subsequent runs appear to do nothing. The ".parentlock" file still exists.

Brendan Eich [:brendan]

Comment 14

•

19 years ago

Ok, another idea: run strace(1) or equivalent on your firefox-bin that hangs. A system call trace from the prior, non-hanging firefox-bin startup would be helpful too. We should be able to see what's going on from the hanging case's trace. /be

Greg Schussman

Reporter

Comment 15

•

19 years ago

(In reply to comment #12) > (In reply to comment #10) > > I ran it in one of my afs directories. First time creates the .parentlock > > file. Second time removes the .parentlock file. Everything seems fine; no > > fireworks. > > Odd, nothing in the test program calls unlink or remove. Are you sure the file > is removed after every other run? Does this every-other-time behavior continue > the third and fourth times? I thought that was odd too. If memory serves, it did that every other time. The problem is that the behavior is intermittent. I've had firefox startup fail once today with the .parentlock problem, but it's been starting up fine since then, and the .parentlock from your test program no longer disappears. I can't seem to reproduce the strange behavior right now. :( > > (In reply to comment #11) > > I ran it on two different machines. The first run creates a ".parentlock" file > > but a second run seems to do nothing. It does not delete the file. > > > > Here's the system info: > > > > REDHAT 2.4.21-32.0.1.EL > > OpenAFS 1.2.13 > > > > REDHAT 2.4.20-31.9 #1 > > OpenAFS 1.2.11 > > Jason, please try this: comment out the very last close(fd) call in the > program, recompile, and retest. Greg, if you can do the same, please post > results as well. Next time strange things start happening, I'll give it a try. But for now, I can't reproduce the problem. > > /be >

Brendan Eich [:brendan]

Comment 16

•

19 years ago

Pretty clearly an AFS bug; perhaps we can develop a workaround. Even better if it's known and patched by some AFS update. Can those in the know about AFS check and report any such known bugs, and whether fixes are available or impending? /be

Andrew Brady

Comment 17

•

19 years ago

Just for info. I get the same intermittant behaviour with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 on Linux 2.6.5-7.201-default (SuSE93) on an NFS mounted $HOME. I tried the test program but it works ok and gives no output and does not remove the file.

Jeffrey Altman

Comment 18

•

19 years ago

OpenAFS.org has opened a ticket to track this bug. http://rt.central.org/rt/Ticket/Display.html?id=25037 OpenAFS 1.2.x is rather old. Can someone please reproduce this problem with build 1.4.1-rc3? http://dl.openafs.org/dl/openafs/candidate/1.4.1-rc3/ Thank you. Jeffrey Altman

M.J.G.

Comment 19

•

19 years ago

(In reply to comment #16) > Pretty clearly an AFS bug; perhaps we can develop a workaround. Even better if > it's known and patched by some AFS update. Can those in the know about AFS > check and report any such known bugs, and whether fixes are available or > impending? I get the same behaviour (not in a reproducible way, though) with Thunderbird 1.5, but not with Firefox 1.5. This is on a Suse 9.3 box (not my choice...) with an AFS client running, BUT my $HOME is on NFS, not AFS. I'm using the release builds of TB and FF. Michael

Andrew Brady

Comment 20

•

19 years ago

Attached file Stub of an strace of firefox showing two cases (deleted) — Details

Andrew Brady

Comment 21

•

19 years ago

Unfortunately, at the time I did not check out the system proc information. Now I am back to using Mozilla (in order to plan migration to firefox/thunderbird) so there is less chance that I will be able to catch the instance again. Will try though.

Brendan Eich [:brendan]

Comment 22

•

19 years ago

(In reply to comment #19) > I get the same behaviour (not in a reproducible way, though) with Thunderbird > 1.5, but not with Firefox 1.5. This is on a Suse 9.3 box (not my choice...) > with an AFS client running, BUT my $HOME is on NFS, not AFS. Then you're seeing an NFS-still-sucks-after-two-decades bug, not this bug. Please find one on file, or file a new one if you can't find an existing one to add to. I did the first SGI NFS port in late '85. IIRC in '86 the first lockmgr/lockd disaster from Sun showed up. In '95 I wrote the symlink-based profile locking code for Netscape 2, based on the obvious need to work around numerous buggy lockd+NFS implementations in the field. Is there no hope of progress? Maybe in another ten years. In the mean time, should we revert to the symlink-not-fcntl locking on Linux and even Mac OS X? That sucks, especially if the latter is only to avoid a buggy and probably rarely used SMB client implementation (see bug 309323). Roc, sfraser, what do you think? /be

Simon Wilkinson

Comment 23

•

19 years ago

Just as a data point, I'm seeing this with Thunderbird and an AFS 1.4.0 client on Linux. It doesn't seem to be entirely repeatable, though. Relevant bits of an strace would seem to be: open("/afs/inf.ed.ac.uk/user/s/sxw/.thunderbird/6cid6d4u.default/.parentlock", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5 fcntl64(5, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1, pid=18971}) = 0 fcntl64(5, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable) close(5) = 0 write(1, "fcntl(F_SETLK) failed. errno = 1"..., 34) = 34 rm-ing the .parentlock file was sufficient to get Thunderbird working again.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 24

•

19 years ago

The old code could (and did) hork users with no network filesystems at all, who I suspect are the vast majority of users. I've got another idea. How about we create a temporary file in the same directory with a random unassigned name, and try to lock that with SETLK. If that fails, we know the filesystem is broken and we can take the symlink path. What do you think?

Andrew Brady

Comment 25

•

19 years ago

I agree with Brendan that it is an NFS problem. If my opinion was worth anything in this, I would vote for removal of the parentlock mech and rely on the symlink only. It doen't make a lot of difference to me though, I will simply rm the parentlock in the startup script. It appears that 99% of the time, the parentlock process works fine in my orgs setup, but every now and again, when the firefox/thunderbird process exits legitimately or otherwise, somewhere in the client-nfs-kernel-network-nfs-server stack hiccups and the lock is not removed. If it makes any difference, the NFS server here is an HP-UX11 cluster, so moving the NFS service between nodes on the cluster would mean that the locking doesn't persist anyway (I think). Cheers, Andy.

Brendan Eich [:brendan]

Comment 26

•

19 years ago

(In reply to comment #24) > The old code could (and did) hork users with no network filesystems at all, who > I suspect are the vast majority of users. Yes, due to the symlink containing a dangling IP address, or less likely, due to pid replay. It wasn't perfect, but here we are. > I've got another idea. How about we create a temporary file in the same > directory with a random unassigned name, and try to lock that with SETLK. If > that fails, we know the filesystem is broken and we can take the symlink path. > What do you think? If the failures are intermittent, as reported, the random attempt might fail or not, and its success would not predict success locking .parentlock. The SMB bug is easy to fix, no account of EOPNOTSUPP. I wish fcntl could make up its mind (from the man page) about whether EACCES or EAGAIN was the result of attempting to take a held lock. If we could count on EACCES meaning "lock taken" and EAGAIN meaning "I'm broken", .... /be

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 27

•

19 years ago

Okay then, is there any reliable way to detect that the link is on a network filesystem?

Jeffrey Altman

Comment 28

•

19 years ago

(In reply to comment #23) > Just as a data point, I'm seeing this with Thunderbird and an AFS 1.4.0 client > on Linux. It doesn't seem to be entirely repeatable, though. OpenAFS believes this problem has been fixed in OpenAFS 1.5.1 by the incorporation of this patch: http://www.openafs.org/cgi-bin/cvsweb.cgi/openafs/src/afs/LINUX/osi_vnodeops.c.diff?r1=1.124&r2=1.125 Other file systems may have a similar problem caused by failing to register their locks with the Linux kernel. Without the registration the file system will not be notified when the lock should be released. Jeffrey Altman

Albrecht Gebhardt

Comment 29

•

19 years ago

Attached patch avoid fcntl locks on AFS patch (deleted) — Details — Splinter Review

Recently I had the same problem with the Ubuntu dapper versions of firefox (1.5.0.2) and openafs (1.4.0). So I added an isOnAFS() check function to nsProfileLock.cpp to revert to the old style symlink locks for AFS. The check uses a special feature of AFS: AFS supports hard links, but only within the same subdirectory, otherwise you get an error (EXDEV, Cross-device link). So I execute the following steps: * create a file (lockFilePath + ".afs") * create a subdir (lockFilePath + ".dir") * hard link another file (lockFilePath + ".lnk") to file no 1 * if successfull: * try to hardlink file no 1 to a file within the subdir: (lockFilePath + ".dir/lnk") * if this fails: AFS detected * otherwise: no AFS (or not detectable for other reasons) sorry for the ugly string handling in my patch (malloc ...), but I tried to use nsACStrings without success, I'm no C++ programmer :-) With this patch I get the old style symlink locks back, which work better on AFS. Albrecht

andrewz

Comment 30

•

18 years ago

FWIW, I may be seeing this problem with /home on NFS. There are no FF/TB processes running for a given user, but FF 1.5.0.x and TB 1.5.0.x complain and refuse to start until the .parentlock file is deleted. I've only noticed it lately (since FF/TB 1.5 or since I started using nfslock on the NFS server), but it's happened so much I wrote a zenity wizard script for the users on our terminal server. I'm not yet sure whether the problem starts on closing Firefox or on rebooting the terminal server.

[:Aleksej]

Comment 31

•

18 years ago

Probably the same bug: bug 351477.

andrewz

Comment 32

•

18 years ago

I have entries in my kernel log that suggest this is related to NFS root squashing (but I am not sure what Firefox/Thunderbird are doing as root). Just a snippet: Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at Cache/.nfs000574180000002c. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at Cache/.nfs0005742200000029. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at Cache/.nfs00056bc40000002a. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at Cache/.nfs000573ba0000002b. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at e5do9yn3.default/formhistory.dat. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at vu14xovq.learn/.parentlock. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at vu14xovq.learn/history.dat. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at vu14xovq.learn/search.sqlite. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at vu14xovq.learn/cert8.db. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at vu14xovq.learn/key3.db. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at Cache/_CACHE_MAP_. Jan 15 11:43:55 aslan kernel: fh_verify: no root_squashed access at Cache/_CACHE_001_.

Calvin Liu

Comment 33

•

17 years ago

Any progress for this bug? I'm suffering this problem...

Jeff Walden [:Waldo]

Comment 34

•

17 years ago

No. As a general rule, if there aren't comments or activity on a bug in a certain period of time, no progress has been made on it.

David Cathey

Comment 35

•

17 years ago

I'm having this problem as well. Users with NFS'd /home will leave .parentlock files if Firefox is not closed normally. I've tried changing root_squashing and either way it still happens. This issue is keeping me from upgrading a lot of people's Firefox, since I don't want to spend my day deleting these .parentlock files for all the users!

Brendan Eich [:brendan]

Comment 37

•

17 years ago

What's root squashing? Translation of root to nobody? (In my day [uphill through snow both ways!] that was done, we just didn't call it root squashing. ;-) Firefox should never be run with euid 0, of course -- is someone doing that? This bug needs love for Firefox 3. First, can we confirm the belief cited in comment 28, that OpenAFS 1.5.1 fixes the problem for that remote filesystem implementation? If the remaining problem is misbegotten NFS locking code, a workaround only for NFS could be devised. roc, shaver, anyone else into pain: please help. /be

Flags: wanted1.9+

Flags: blocking1.9?

David Cathey

Comment 38

•

17 years ago

Yes, root_squash maps uid 0 to nobody. I've tried it both ways. The NFS server is running 2.6.10-1.771_FC2, Firefox version is 1.5.0.12 on Fedora 6. The only people having this problem are the NFS served home directories.

Brendan Eich [:brendan]

Comment 39

•

17 years ago

Root mapping to nobody or not shouldn't matter, but if you see log footprints for root on the client taking the profile lock, someone is running Firefox with euid 0 -- not good. But this is a separate issue. If (as we believe) our profile code is correct and a file lock over NFS seems stuck even though the Firefox that took it has exited, then as discussed above in this bug, the bug remains crappy NFS file locking, 22 years on. How to work around this NFS brokenness? David, can you get a system call trace of the failing case, manually remove the lock and trace the successful case too? Fresh traces would be helpful. /be

Brendan Eich [:brendan]

Comment 40

•

17 years ago

Also, if we can confirm comment 28, and up-rev AFS fixes the problem reported by this bug's summary, then this bug should be closed -- or else morphed (it is already morphing) to track the NFS problem. I'm ok with morphing, others should weigh in. /be

David Cathey

Comment 41

•

17 years ago

Firefox is not being run as root, comment 32 questioned if that was related, so I tried both cases. I'll try to get straces tomorrow. My users think the Linux clients are Windows boxes, so part of the issue is they are rebooting the linux client (not the typical "abnormal" Firefox exit), too.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 42

•

17 years ago

Matthew might be able to help.

Damon Sicore (:damons)

Comment 43

•

17 years ago

+'ing w/ P4. If someone thinks this deserves a different priority, please comment.

Flags: wanted1.9+

Flags: blocking1.9?

Flags: blocking1.9+

Priority: -- → P4

Benjamin Smedberg

Comment 44

•

17 years ago

I don't think this should be a blocker, and if it were it should be a P5. We've lived with it since the beginning of time.

Matthew Gregan [:kinetik]

Comment 45

•

17 years ago

It sounds like we can consider the AFS problem closed, but it'd be awesome if someone could confirm that. It shouldn't be a problem that the .parentlock file is left in place because it is not a dotlock file (that is, the presence of the file does not constitute a lock). Instead, we rely on the kernel (and network lock manager, in the case of networked filesystems) to manage the fcntl lock for us. When a process holding an fcntl lock dies (abruptly or otherwise), the kernel/NLM should clean up the lock up and further attempts by us to lock using fcntl should succeed. When everything is working correctly, there should not be a stale lock problem with fcntl locking. Removing the .parentlock hides the issue, because we will create a new .parentlock and lock it with fcntl (effectively creating a new lock, since they are keyed on the inode) when attempting to lock the profile, but the old fcntl lock (against the old now removed .parentlock) may still be floating around. The presence of symlink dotlock files (named 'lock') do constitute a lock, so removing these when they are stale is a more valid workaround. We try to detect stale symlink locks by checking the lock holder's hostname against ours and, if equal, testing the holder's pid to see if it is still alive. We also ignore the symlink dotlock if fcntl worked for us and the earlier acquirer of the symlink dotlock indicated it had worked for them as well. This lock liveness test does not work in some cases, such as when the current purported lock holder and the new lock acquirer (who is testing for lock liveness) have different hostnames, or if pid-wrap has occurred, for example. One method to detect stale symlink locks, which could also be applied to fcntl locks to work around problems like we're seeing with OpenAFS (prior to the bugfix) and has-working-but-buggy-fcntl NFS implementations, would be to update the timestamp on the lock at regular intervals. Mike Shaver suggested this to me on IRC when I started looking at the bug. This method would allow the lock-liveness test to determine if the holder is still alive by comparing the last lock timestamp update with the current time. I've implemented a test patch to do this, but it is gross for a bunch of reasons: - It has to jump through some hoops to use XPCOM timers. XPCOM is initialized inside the lifetime of the profile lock, so XPCOM is not available at lock and unlock time. This is worked around by starting the lock timestamp update (once the application has initialized fully) and stopping it earlier (before XPCOM is deinitialized). An alternative could be to use kernel interval timer (via setitimer(2), but this means the profile locking code steals SIGALRM, which I think we also expect to be able to use for JProf's realtime profiling (and possibly other things). - It might not provide much relief to people with buggy network lock managers because we can only be confident that the lock is stale after the lock update timeout has passed, which does not help situations where the browser is restarted and a stale lock is left acquired in an NLM somewhere. - It punishes every user of the XP_UNIX code path because we will always start the lock update timer in this code path. (If we had a reliable way to detect that we needed the lock update timer, we could only start it then. Robert O'Callahan suggested having a hidden pref to control this; users who know their profiles are on network filesystems would have to enable the pref to enable stale lock detection.) - There is no "right" value to use for the timer interval. Waking up too frequently is bad for power saving because we'll cause a CPU and HD wakeup and will increase network load for networked FSes. Waking up too infrequently makes the stale lock detection less useful because users will end up having to check and remove lock files manually if the lock timeout has not yet expired. I'll post the patch soon if people thing this is an acceptable workaround method. Another workaround that was suggested by Robert O'Callahan and Vladimir Vukićević is to have the ability to forcefully break the lock. This would involve adding new UI to the "profile in use dialog", but should only be available if it has been previously enabled (hidden prefs and an environment variable were suggested methods to enable the lock breaking UI--I don't think prefs will work because it would require the profile to be available...). The other NFS-safe locking techniques that I'm aware of (such as doing a dance with link() and stat() to create an exclusive lock by ending up with a lock file with a link count of exactly 2) also suffer from stale lock file problems. Most other tools that deal with stale files when locking over NFS (MTA/MDA/MUAs are good candidates to look at) usually rely on detecting stale locks with a lock timestamp, but they can avoid using interval timers to touch the file by relying on the fact that the lock is only held for short durations (e.g. by assuming mail delivery does not last more than 5 minutes). The NFS problem sounds suspiciously like Red Hat bug #229469 (https://bugzilla.redhat.com/show_bug.cgi?id=229469), which is still open against Fedora 7 and earlier. It turns out that I can reproduce what I think is the same problem pretty easily using two Fedora 8 VMs (one the NFS server, the other the client, rpc.statd and lockd running on both). I have a test program (which is basically the code from nsProfileLock) that attempts to lock using fcntl() and symlink(), then exits with success or failure. Running this test program on the client in a tight loop on the client (with the lock path pointing at the NFS mount) along with another tight loop that sends SIGKILL to any running instance of the test program, I can get into a situation where fcntl() always fails with EAGAIN. Once in this state, attempting to lock the file with fcntl() fails with EAGAIN on both the NFS client and server. I couldn't reproduce it with a single machine acting as the server and client. I've filed a bug in the kernel.org bugzilla (http://bugzilla.kernel.org/show_bug.cgi?id=9601) about this.

Matthew Gregan [:kinetik]

Updated

•

17 years ago

Assignee: nobody → kinetik

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 46

•

17 years ago

What did you think of the idea of using "df" to check whether the lock file is in NFS?

Matthew Gregan [:kinetik]

Comment 47

•

17 years ago

Yeah, that should work. We might be able to use statfs() and look at the filesystem type, instead... it's not fantastically portable, but I think it'll work (with a bit of per-platform #ifdef magic) on all of the platforms we care about. I'll dig into this a bit more. Sorry, I forgot to mention that aspect--the "lock breaking" UI could be shown only when we detect that the profile is hosted on NFS (rather than using prefs or env vars), either using roc's method (run df against the lock and parse the output) or using some other technique (statfs and appropriate per-platform #ifdefs).

RalfG

Comment 48

•

17 years ago

Same problem here with Debian Edu (NFS share). I added a line to delete .parentlock in the start script.

Mike Beltzner [:beltzner, not reading bugmail]

Updated

•

17 years ago

Flags: tracking1.9+ → wanted-next+

Matthew Gregan [:kinetik]

Updated

•

14 years ago

Assignee: kinetik → nobody

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Updated

•

9 years ago

Blocks: 1243899

Benjamin Smedberg

Comment 49

•

9 years ago

This bug is filed in a bugzilla component related to pre-Firefox code which no longer exists. I believe it is no longer relevant and I am therefore closing it INCOMPLETE. If you believe that this bug is still valid and needs to be fixed, please reopen it and move it to the Toolkit:Startup and Profile System product/component.

No longer blocks: 1243899

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → INCOMPLETE

Nobody; OK to take it and work on it

Assignee

Updated

•

9 years ago

Product: Core → Core Graveyard

test program 19 years ago Brendan Eich [:brendan] (deleted), text/plain		Details
Stub of an strace of firefox showing two cases 19 years ago Andrew Brady (deleted), text/plain		Details
avoid fcntl locks on AFS patch 19 years ago Albrecht Gebhardt (deleted), patch		Details \| Diff \| Splinter Review