Closed Bug 464942 Opened 16 years ago Closed 16 years ago

|failed in test_removeDataFromDomain.js on (NS_ERROR_FILE_NOT_FOUND) [nsICacheService.evictEntries] nsPrivateBrowsingService.js :: PBS_removeDataFromDomain :: line 296|

Categories

(Firefox :: Private Browsing, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Firefox 3.6a1

People

(Reporter: myk, Assigned: ehsan.akhgari)

References

()

Details

(Keywords: fixed1.9.1, intermittent-failure)

Attachments

(3 files, 2 obsolete files)

moz2-darwin8-slave01 turned orange this morning with TEST-UNEXPECTED-FAIL on /test_privatebrowsing/unit/test_removeDataFromDomain.js because of the following error:

(NS_ERROR_FILE_NOT_FOUND) [nsICacheService.evictEntries] location: nsPrivateBrowsingService.js :: PBS_removeDataFromDomain :: line 296

It turned green the next cycle, so it's not yet clear whether this was a one off or is an intermittent problem.
CCing Shawn who wrote the test...
I have documented my experience in writing tests for the cache service here:

<http://ehsanakhgari.org/blog/2008-10-17/testing-cache-service>

Not that it's directly relevant, but I thought I'd mention it.
Keep me posted if it continues.  That's a strange errors...
This happened again today:
moz2-darwin8-slave01 2008/11/17 09:40:28
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1226943628.1226947830.18457.gz&fulltext=1
(as noted on https://wiki.mozilla.org/Tinderbox/Nov-2008-Orange-Compendium )
FWIW, this test also failed during this past weekend. (updated Compendium again)
moz2-darwin8-slave01 2008/11/15 15:47:38
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1226792858.1226797085.30700.gz&fulltext=1
Version: unspecified → Trunk
Again today on moz2-darwin8-slave01 dep unit test 2008/11/24 04:22:51

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1227529371.1227533653.23836.gz&fulltext=1
Shawn, do you have any idea why this could happen?

I'm CCing biesi to see if he has a clue.
I have no idea why it is happening.  This has to be coming from the cache service internals...
This is the evictEntries(STORE_ANYWHERE) call, right?  That call will try to initialize a disk cache if it hasn't happened yet.  I can totally imagine some of that being able to fail with NS_ERROR_FILE_NOT_FOUND, though I don't know what the conditions for that would have to be....
Still happening today (unsurprising, I know)

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1227707508.1227710942.1713.gz
Note that the most recent failure was on a windows box.  Not platform-specific, it seems?
OS: Mac OS X → All
Hardware: Macintosh → All
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229023780.1229028958.25940.gz
(In reply to comment #9)
> This is the evictEntries(STORE_ANYWHERE) call, right?  That call will try to
> initialize a disk cache if it hasn't happened yet.  I can totally imagine some
> of that being able to fail with NS_ERROR_FILE_NOT_FOUND, though I don't know
> what the conditions for that would have to be....
Boris, would you consider it a bug to throw when we call evictEntries(STORE_ANYWHERE) (at least in this case)?

Specifically, I don't think it's right for us to throw here:
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/src/nsCacheService.cpp#827
or here:
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/src/nsCacheService.cpp#841

Am I right?
Hmm.  While that arguably isn't a failure for the eviction methods, it does seem like an unexpected condition that merits an exception.

As in, we should figure out why creation fails, if that's what's happening.
So, the only way I can see this happening is if the file got removed while the service was expecting it to be there.  I'm not sold on that though as this code jumps all over the place...
On mac at least, I think comment 15 is true because of this piece of code:
http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileOSX.mm#2350

For windows, I think we might be failing here:
http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileWin.cpp#797

(and in both cases, if the file was legitimately missing)
notice that doing a simple 

for(>10 loops)
  removeDataFromDomain(aDomain);

is enough to cause an intermittent failure, and that does not happen on first call, but in the middle of the loop, so sounds more like the cache service locking a file in use rather than a problem creating a new disk cache.
on Windows i see failures with NS_ERROR_FILE_ACCESS_DENIED, i suppose having some timeout between tests could help.

Also notice today on moz-central we are failing on
TEST-UNEXPECTED-FAIL | ../../../../_tests/xpcshell-simple/test_privatebrowsing/unit/test_removeDataFromDomain.js | test failed, see log
*** TEST-UNEXPECTED-FAIL | ../../../../_tests/xpcshell-simple/test_privatebrowsing/unit/test_removeDataFromDomain.js | 1 != 1
test_removeDataFromDomain.js :: add_permission :: line 282

i think the previous file access failure did not allow the test to clean up permission manager values, and we are still seeing old values, so probably it would be better to cleanup permission manager before starting tests.
Attached patch stupid hack (obsolete) (deleted) β€” β€” Splinter Review
This does not solve the underlying issue, but appear to demonstrate that having a delay between tests solves the intermittent failure, indeed with this change i'm no more able to fail on my VM, while i was definately able to reproduce failure before. This seems to confirm the file locked theory.
I've landed this to include permissions.sqlite in the files to clean up when running these tests. Not doing that was leaving the test permanently broken on tinderbox. I've filed bug 469829 to work on properly cleaning up profile folders in xpcshell tests.
Mass moving of all Firefox::General private browsing bugs to Firefox::Private Browsing.
Component: General → Private Browsing
QA Contact: general → private.browsing
(In reply to comment #19)
> Created an attachment (id=353218) [details]
> Cleanup up permissions.sqlite
> 
> I've landed this to include permissions.sqlite in the files to clean up when
> running these tests. 

this should be pushed to 1.9.1 too, i see similar errors due to that.
Attachment #353040 - Attachment is obsolete: true
Comment on attachment 353218 [details] [diff] [review]
Cleanup up permissions.sqlite
[Checkin: Comment 22]

(In reply to comment #19)

http://hg.mozilla.org/mozilla-central/rev/fc61fc41c4b4
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/c09827749ba7

NB: Please, use current date and (explicitly) update the bug when pushing.
Attachment #353218 - Attachment description: Cleanup up permissions.sqlite → Cleanup up permissions.sqlite [Checkin: Comment 22]
(In reply to comment #21)
> this should be pushed to 1.9.1 too, i see similar errors due to that.

This patch was landed,
but the random failure is still there (with a different error code):
{
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.1/1229784034.1229788511.5306.gz
MacOSX 10.5.2 mozilla-1.9.1 moz2-darwin9-slave05 dep unit test on 2008/12/20 06:40:34

[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsICacheService.evictEntries]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: file:///builds/slave/trunk_darwin-5/build/objdir/dist/bin/components/nsPrivateBrowsingService.js :: PBS_removeDataFromDomain :: line 296"  data: no]
}
Assignee: nobody → dtownsend
Blocks: 460086
Status: NEW → ASSIGNED
Flags: wanted-firefox3.1?
Whiteboard: [fixed1.9.1]
Target Milestone: --- → Firefox 3.1b3
The patch I landed was not intended to fix the random failure, only to stop the random failure breaking all future runs.
Assignee: dtownsend → nobody
Status: ASSIGNED → NEW
Whiteboard: [fixed1.9.1]
Target Milestone: Firefox 3.1b3 → ---
Attachment #353040 - Attachment is obsolete: false
(In reply to comment #23)

Yet an other error code:
{
http://tinderbox.mozilla.org/showlog.cgi?tree=Firefox&errorparser=unix&logfile=1229814045.1229821152.20822.gz&buildtime=1229814045&buildname=WINNT%205.2%20mozilla-central%20unit%20test&fulltext=1
WINNT 5.2 mozilla-central unit test on 2008/12/20 15:00:45

[Exception... "Component returned failure code: 0x80520015 (NS_ERROR_FILE_ACCESS_DENIED) [nsICacheService.evictEntries]"  nsresult: "0x80520015 (NS_ERROR_FILE_ACCESS_DENIED)"  location: "JS frame :: file:///e:/builds/moz2_slave/mozilla-central-win32-unittest/build/objdir/dist/bin/components/nsPrivateBrowsingService.js :: PBS_removeDataFromDomain :: line 296"  data: no]
}

(In reply to comment #24)

Ah, good.
Now, what is the next step ?
we should fix cache service, or if anyone wants, we can take my delay patch to add some delay between tests in this unit test, that seems to reduce the failures to almost 0.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1230447740.1230450992.5868.gz
WINNT 5.2 mozilla-central unit test on 2008/12/27 23:02:20
Summary: moz2-darwin8-slave01 failed in test_removeDataFromDomain.js on (NS_ERROR_FILE_NOT_FOUND) [nsICacheService.evictEntries] nsPrivateBrowsingService.js :: PBS_removeDataFromDomain :: line 296 → |failed in test_removeDataFromDomain.js on (NS_ERROR_FILE_NOT_FOUND) [nsICacheService.evictEntries] nsPrivateBrowsingService.js :: PBS_removeDataFromDomain :: line 296|
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.1/1232109381.1232114827.24236.gz
WINNT 5.2 mozilla-1.9.1 unit test on 2009/01/16 04:36:21
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.1/1232481221.1232488621.32145.gz#err0

WINNT 5.2 mozilla-1.9.1 unit test on 2009/01/20 11:53:41
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.1/1233087408.1233092907.20628.gz
WINNT 5.2 mozilla-1.9.1 unit test on 2009/01/27 12:16:48

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.1/1233088393.1233094689.23693.gz#err0
WINNT 5.2 mozilla-1.9.1 unit test on 2009/01/27 12:33:13
Another failure...
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1233890070.1233894617.4881.gz
WINNT 5.2 mozilla-central unit test on 2009/02/05 19:14:30

Perhaps we should disable this test until it's fixed -- it's been sporadically failing for almost 3 months now.
Another one (not on Windows):
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234575080.1234584544.4555.gz
OS X 10.5.2 mozilla-central unit test on 2009/02/13 17:31:20
we should just disable this part of the test...
(In reply to comment #34)
> we should just disable this part of the test...

Wouldn't wrapping the evictEntries call in a try/catch block be a better solution here?  We could either check for NS_ERROR_FILE_NOT_FOUND explicitly and ignore it (and pass all other exceptions to the caller) or ignore all exceptions thrown from evictEntries.  What do you think, Shawn?
i can't see how disabling a test is better than adding a small delay to it, that probably could make it stop going random and still allow us to have a test running.
(In reply to comment #35)
> Wouldn't wrapping the evictEntries call in a try/catch block be a better
> solution here?  We could either check for NS_ERROR_FILE_NOT_FOUND explicitly
> and ignore it (and pass all other exceptions to the caller) or ignore all
> exceptions thrown from evictEntries.  What do you think, Shawn?
I don't like the idea of silently failing.  I suspect this may be a real bug that we can only tickle once in a while, but what is wrong isn't obvious.

(In reply to comment #36)
> i can't see how disabling a test is better than adding a small delay to it,
> that probably could make it stop going random and still allow us to have a test
> running.
I'm not saying disable the whole test file, just the check that is failing.
> (In reply to comment #36)
> > i can't see how disabling a test is better than adding a small delay to it,
> > that probably could make it stop going random and still allow us to have a test
> > running.
> I'm not saying disable the whole test file, just the check that is failing.

right, i still think taking a delay to allow the tests pass is better than not having a test (part of it)... We could maybe try taking the change, see for a couple weeks if randomness is fixed, if that still happens backout and disable.

Ignoring the failure or remove the check are exactly the same thing in the end.
(In reply to comment #38)
> right, i still think taking a delay to allow the tests pass is better than not
> having a test (part of it)... We could maybe try taking the change, see for a
> couple weeks if randomness is fixed, if that still happens backout and disable.
We don't even really understand why it's failing AFAIK, so how do we know it's going to help us?

> Ignoring the failure or remove the check are exactly the same thing in the end.
Except that if we wrap it in a try-catch block, it'll never put a error in the error console if it happens to a user in the real-world too.  We'd want to notify about that IMO.
(In reply to comment #39)
> We don't even really understand why it's failing AFAIK, so how do we know it's
> going to help us?

by yhe fact i can reproduce the randomness in my vm, and the above patch solves the issue locally not much i understand. i think the cached file could take some more time to be removed by the OS, when we try to clear them too fast some of them will still be in use by the previous call.
Attached patch delay patch (deleted) β€” β€” Splinter Review
in case this is the updated patch, i'm not saying this is a perfect way to handle the issue, but if the alternative is disable part of the test i would prefer trying this before. I leave the decision to the original test/components creator, no problem on my side.
Attachment #353040 - Attachment is obsolete: true
All adding a delay does is make the test go randomly orange when the sum of the delays goes over the global test timeout or when the delay proves insufficient for whatever reason (due to machine load, say).  What makes you think that a 200ms delay is always sufficient while a 10ms delays is not, for example?

The other problem with a delay is that it increases test run times, which is in fact bad for both tinderbox and people running tests locally.  You're adding over a second's worth of delay to this test.  If every test did that, we'd be looking at several hours worth of running time increase for a test run...
Priority: -- → P3
Target Milestone: --- → Firefox 3.2a1
(In reply to comment #39)
end.
> Except that if we wrap it in a try-catch block, it'll never put a error in the
> error console if it happens to a user in the real-world too.  We'd want to
> notify about that IMO.

We could still report the error to the console for notification purposes...
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1236220657.1236224384.10828.gz
OS X 10.5.2 mozilla-central unit test on 2009/03/04 18:37:37
Whiteboard: [orange]
Assignee: nobody → ehsan.akhgari
Status: NEW → ASSIGNED
Attachment #366186 - Flags: review?(mconnor)
Attachment #366186 - Flags: review?(mconnor) → review-
Comment on attachment 366186 [details] [diff] [review]
Eat up exceptions from evictEntries, and report them to the console

that finally block is adding an unmatched bracket, what are you trying to do there?
Oh, please disregard the previous patch; I meant to attach it after qrefreshing.
Attachment #366186 - Attachment is obsolete: true
Attachment #366247 - Flags: review?(mconnor)
Attachment #366247 - Flags: review?(mconnor) → review+
I landed the patch which may fix the orange issue on trunk. <http://hg.mozilla.org/mozilla-central/rev/5a5fde47ba44>  I'll wait some time before marking this as FIXED to make sure that no orange happens after this on trunk.
Whiteboard: [orange] → [orange][potential fix landed on trunk]
Still wanted on 3.5:
{
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.5/1237542016.1237549936.7960.gz
WINNT 5.2 mozilla-1.9.1 unit test on 2009/03/20 02:40:16

TEST-UNEXPECTED-FAIL | ../../../../_tests/xpcshell/test_privatebrowsing/unit/test_privatebrowsingwrapper_removeDataFromDomain.js | test failed, see log
}
This was landed a week ago on trunk, and since the failures seem to have stopped, I'm resolving this.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: [orange][potential fix landed on trunk] → [orange]
Attachment #366247 - Flags: approval1.9.1?
Attachment #366247 - Flags: approval1.9.1? → approval1.9.1+
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/9a3b8a1c4035
Flags: wanted-firefox3.5?
Keywords: fixed1.9.1
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: