Closed Bug 572223 Opened 14 years ago Closed 14 years ago

too much cookies.sqlite io

Categories

(Core :: Networking: Cookies, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b4
Tracking Status
blocking2.0 --- beta4+

People

(Reporter: taras.mozilla, Assigned: dwitte)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: main-thread-io)

Attachments

(11 files, 4 obsolete files)

(deleted), patch
sdwilsh
: review+
Details | Diff | Splinter Review
(deleted), patch
sdwilsh
: review+
Details | Diff | Splinter Review
(deleted), patch
sdwilsh
: review+
Details | Diff | Splinter Review
(deleted), patch
sdwilsh
: review+
Details | Diff | Splinter Review
(deleted), patch
sdwilsh
: review+
Details | Diff | Splinter Review
(deleted), patch
sdwilsh
: review+
Details | Diff | Splinter Review
(deleted), patch
sdwilsh
: review+
Details | Diff | Splinter Review
(deleted), image/png
Details
(deleted), image/jpeg
Details
(deleted), patch
sdwilsh
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
It causes almost 5x more io than the next closest thing on my system.
http://pastebin.mozilla.org/735864

Questions:

1) What do the numbers look like when starting to a homepage? Need to figure out if this is just database read cost or something else (i.e. sessionstore).

2) How many of those ops are reads and how many writes?
1) Same

2) Vast majority are reads. 5:1 or 6:1 read:write ratio.

On my ssd the time to read the database is ~2% of total startup time
Realized we already had a bug for this
Depends on: 559687
Blocks: 572459
Summary: cookies.sqlite dominates io while restoring tabs → too much cookies.sqlite io
Attached patch part 1: add baseDomain column (obsolete) (deleted) — Splinter Review
So we can read in the columns that matter.
Assignee: nobody → dwitte
Status: NEW → ASSIGNED
Attachment #454661 - Flags: review?(sdwilsh)
Attached patch part 2: async read (obsolete) (deleted) — Splinter Review
Kick off an async read for the whole db on startup.
Attachment #454663 - Flags: review?(sdwilsh)
Attached patch part 3: sync read per domain (obsolete) (deleted) — Splinter Review
Do sync reads for base domains on demand.
Attachment #454664 - Flags: review?(sdwilsh)
Attached patch part 4: baseDomain index (obsolete) (deleted) — Splinter Review
Index on baseDomain so reading is actually fast.
Attachment #454665 - Flags: review?(sdwilsh)
Attached patch part 5: tests (deleted) — Splinter Review
Test reading in a large database, piece by piece.
Attachment #454666 - Flags: review?(sdwilsh)
Attached patch part 6: async RemoveAll (deleted) — Splinter Review
Since db ops are async, we really don't want to be doing synchronous ops at the same time. Doing so could block or fail. So, make RemoveAll() async.
Attachment #454668 - Flags: review?(sdwilsh)
taras: tryserver builds with these patches will be at http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/dwitte@mozilla.com-6ae7db6f7734/ in a couple hours.

Perf testing would be much appreciated :)

I'd recommend the following:
1) Start up the build with a profile with your large cookies.sqlite. This will migrate the schema of your file. This only happens once.
2) Start it with a few sessionrestore tabs. This should synchronously load in just the required cookies.
3) Start it to about:blank. This should cause no sync loads and should be the fastest.

Since the full async read is still happening in 2) and 3) you will probably see a lot of IO still. You could PR_Now() time nsCookieService::InitDB() and nsCookieService::EnsureReadDomain(), and nsCookieService::EnsureReadComplete() to be sure it doesn't get hit, to see what the cost of the handful of sync reads is. I'll see if I can get these numbers.

Note that we could, in principle, arbitrarily delay the full async read to whenever it is optimal. At some point, the cost of doing sync reads for every pageload will outweigh the amortized cost of an eager async read, so we shouldn't delay it indefinitely. But this point might not be at startup. Perhaps delayed-startup, or even later... we can tune it later. If you want to figure out what that point should be, that would be appreciated :)
Attached patch part 1: add baseDomain column (deleted) — Splinter Review
Attachment #454661 - Attachment is obsolete: true
Attachment #455018 - Flags: review?(sdwilsh)
Attachment #454661 - Flags: review?(sdwilsh)
Attached patch part 2: async read (deleted) — Splinter Review
Attachment #454663 - Attachment is obsolete: true
Attachment #455019 - Flags: review?(sdwilsh)
Attachment #454663 - Flags: review?(sdwilsh)
Attached patch part 3: sync read per domain (deleted) — Splinter Review
Attachment #454664 - Attachment is obsolete: true
Attachment #455020 - Flags: review?(sdwilsh)
Attachment #454664 - Flags: review?(sdwilsh)
Attached patch part 4: baseDomain index (deleted) — Splinter Review
Attachment #454665 - Attachment is obsolete: true
Attachment #455021 - Flags: review?(sdwilsh)
Attachment #454665 - Flags: review?(sdwilsh)
Some numbers:

On a typical startup (restoring many tabs), I see a handful (< 5) sync reads for their respective domains. These reads take around 1-2ms without index, 50-200us with index. So the index is useful.

The async read takes around 100 - 200ms to come back complete, at which time no further sync reads are required. Which is why I'm only seeing a handful of sync reads when restoring many tabs.

We could potentially push the async read out further, or even not do it at all. Doing so would mean we pay a 50 - 200us sync read cost on every new base domain we see, as opposed to essentially zero for having it already in-memory. Not reading it in would save memory. So we have a memory vs. speed tradeoff; assuming that the background read is essentially free (i.e. IO bound), then we probably want to do the async read. If it's not free, then we can rethink.
Comment on attachment 455018 [details] [diff] [review]
part 1: add baseDomain column

>+        mozStorageTransaction transaction(mDBState->dbConn, PR_TRUE);
I suspect you want to not pass PR_TRUE here since we don't want to commit a partial schema change if an error occurs.  You'll have to manually commit at the end though.

>+        // add the baseDomain column and index to the table
nit: capitalization and punctuation please (this is a complete sentence after all)

>+        while (NS_SUCCEEDED(rv = select->ExecuteStep(&hasResult)) && hasResult) {
I think you want to drop the rv = in the loop.  You don't actually use it.

>+        // update the schema version
nit: capitalization and punctuation please (this is a complete sentence after all)

>+      // fall through to the next upgrade
nit: capitalization and punctuation please (this is a complete sentence after all)

Also, you always seem to set the schema version to be the current version which seems wrong for older ones (but not the last one).

r=sdwilsh with those fixed
Attachment #455018 - Flags: review?(sdwilsh) → review+
(In reply to comment #16)
> Also, you always seem to set the schema version to be the current version which
> seems wrong for older ones (but not the last one).
Which you've done in part 4 for case one (please move that to part 1).
Comment on attachment 455021 [details] [diff] [review]
part 4: baseDomain index

>+  // create an index on baseDomain
nit: capitalization and punctuation please (this is a complete sentence after all)

This whole part should be rolled into part 1.  r=sdwilsh with that change (qfold in your friend here).
Attachment #455021 - Flags: review?(sdwilsh) → review+
Comment on attachment 455019 [details] [diff] [review]
part 2: async read

>-namespace {
No reason to de-anonymize all of these so you can get just one in the header file.

>+ * RemoveCookieDBListener imp:
>+ * Static mozIStorageStatementCallback used to track asynchronous removal
>+ * operations.
nit: it's not static.

>+  PRBool mCanceled;
nit: use bool

>+      // We may receive a reason of READ_FINISHED after being canceled;
I don't know what READ_FINISHED is

>+    case mozIStorageStatementCallback::REASON_ERROR: {
>+      // XXX Ignore corruption for now.
wouldn't this be handled by the error handler anyway (the thing this class inherits from)?

>-  // delete expired cookies, before we read in the db
If we still want to do this in the ftu
>-  {
>-    // scope the deletion, so the write lock is released when finished
>-    nsCOMPtr<mozIStorageStatement> stmtDeleteExpired;
>-    rv = mDBState->dbConn->CreateStatement(NS_LITERAL_CSTRING(
>-      "DELETE FROM moz_cookies WHERE expiry <= ?1"),
>-      getter_AddRefs(stmtDeleteExpired));
>-    NS_ENSURE_SUCCESS(rv, rv);
>-
>-    rv = stmtDeleteExpired->BindInt64ByIndex(0, PR_Now() / PR_USEC_PER_SEC);
>-    NS_ENSURE_SUCCESS(rv, rv);
>-
>-    PRBool hasResult;
>-    rv = stmtDeleteExpired->ExecuteStep(&hasResult);
>-    NS_ENSURE_SUCCESS(rv, rv);
>-  }
Seems like we don't do this anymore?

>-      // It is purpose that created us; purpose that connects us;
>-      // purpose that pulls us; that guides us; that drives us.
>-      // It is purpose that defines us; purpose that binds us.
>-      // When a cookie no longer has purpose, it has a choice:
>-      // it can return to the source to be deleted, or it can go
>-      // into exile, and stay hidden inside the Matrix.
>-      // Let's choose deletion.
Sad to see this go.

>+  nsRefPtr<ReadCookieDBListener> readListener = new ReadCookieDBListener;
>+  rv = stmt->ExecuteAsync(readListener,
>+                          getter_AddRefs(mDefaultDBState.pendingRead));
>+  NS_ASSERT_SUCCESS(rv);
>+
>+  mDefaultDBState.readListener = readListener;
Any reason we don't do one of the following:
1) Make ReadCookieDBListener a class (not a ref ptr so not allocated separately) on mDefaultDBState.
2) Just assign the result of new to this variable?

>+  nsCAutoString name, value, host, path;
It's actually better to use a nsCString here otherwise the srogate code has to copy this to the stack, and then you copy it to the nsCookie member variable in Create.

>+  // create a new nsCookie and assign the data.
nit: capitalize "c" in create.

>+  ReadCookieDBListener*                 readListener; // weak ref
Don't see the point of this.  Prefer option (1) listed above since you seem to be trying to avoid AddRef/Release as much as possible.  You don't use it to track state, so this is doable.

r=sdwilsh with comments addressed.
Attachment #455019 - Flags: review?(sdwilsh) → review+
Comment on attachment 455020 [details] [diff] [review]
part 3: sync read per domain

>+      tuple->cookie = nsCookieService::gCookieService->ReadRow(row);
And now I think we should call this GetCookieFromRow.

>+  // Fast path 1: nothing to read, or we've already finished reading.
>+  if (!mDBState->dbConn || !mDefaultDBState.pendingRead)
>+    return;
>+
>+  // Fast path 2: already read in this particular domain.
>+  if (mDefaultDBState.readSet.GetEntry(aBaseDomain))
>+    return;
Maybe want to wrap these in NS_LIKELY

>+    // Cache the statement, since it's likely to be used again.
>+    rv = mDefaultDBState.dbConn->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT "
>+        "id, "
>+        "name, "
>+        "value, "
>+        "host, "
>+        "path, "
>+        "expiry, "
>+        "lastAccessed, "
>+        "isSecure, "
>+        "isHttpOnly "
>+      "FROM moz_cookies "
>+      "WHERE baseDomain = ?1"),
>+      getter_AddRefs(mDefaultDBState.stmtReadDomain));
>+
>+    // XXX Ignore corruption for now.
>+    if (NS_FAILED(rv)) return;
Pull all this into it's own method for getting this statement please.

>+    // XXX Ignore corruption for now.
AFAIK, we have a bug on file to handle this stuff.  Reference it please in all the places you say this.

>+  // Read in the data synchronously.
>+  nsCOMPtr<mozIStorageStatement> stmt;
>+  nsresult rv = mDefaultDBState.dbConn->CreateStatement(NS_LITERAL_CSTRING(
>+    "SELECT "
>+      "id, "
>+      "name, "
>+      "value, "
>+      "host, "
>+      "path, "
>+      "expiry, "
>+      "lastAccessed, "
>+      "isSecure, "
>+      "isHttpOnly, "
>+      "baseDomain "
>+    "FROM moz_cookies"), getter_AddRefs(stmt));
We should cache this one too, but we can throw it away once our async read is done.  Also pulled into another method.

>+  nsCAutoString baseDomain, name, value, host, path;
ditto about nsCString from previous comments

>+  nsTHashtable<nsCStringHashKey>        readSet;
It's not clear what this is for, so a better name or a comment would be helpful here.

>     void                          ReadComplete();
>+    void                          CancelRead(PRBool aPurgeReadSet);
I actually think both of these should reference Async.  AsyncReadComplete and CancelAsyncRead

r=sdwilsh with comments addressed
Attachment #455020 - Flags: review?(sdwilsh) → review+
Comment on attachment 454666 [details] [diff] [review]
part 5: tests

r=sdwilsh
Attachment #454666 - Flags: review?(sdwilsh) → review+
Comment on attachment 454668 [details] [diff] [review]
part 6: async RemoveAll

r=sdwilsh
Attachment #454668 - Flags: review?(sdwilsh) → review+
(In reply to comment #19)
> >+    case mozIStorageStatementCallback::REASON_ERROR: {
> >+      // XXX Ignore corruption for now.
> wouldn't this be handled by the error handler anyway (the thing this class
> inherits from)?

Does storage guarantee that if HandleCompletion is called with REASON_ERROR, that HandleError was called prior?

If so, great -- we can do the errorhandling there instead.

> >-  // delete expired cookies, before we read in the db
> Seems like we don't do this anymore?

Yeah. We probably do want to do it more often than a purge would. But as an async write, after the db is read in. I'll do that in a separate part.

> >+  nsRefPtr<ReadCookieDBListener> readListener = new ReadCookieDBListener;
> >+  rv = stmt->ExecuteAsync(readListener,
> >+                          getter_AddRefs(mDefaultDBState.pendingRead));
> >+  NS_ASSERT_SUCCESS(rv);
> >+
> >+  mDefaultDBState.readListener = readListener;
> Any reason we don't do one of the following:
> 1) Make ReadCookieDBListener a class (not a ref ptr so not allocated
> separately) on mDefaultDBState.
> 2) Just assign the result of new to this variable?

Because readListener is attached to a database connection, which can change. For example, we start reading on one profile, then switch to another. We mark the listener as canceled and then AsyncClose the connection. After that point, mDefaultDBState has nothing to do with the old connection, but the read could still be in flight.

Thus, mDefaultDBState doesn't own the readListener, it merely points to it; storage itself owns the listener until the connection is closed.
(In reply to comment #20)
> Pull all this into it's own method for getting this statement please.

I'd rather keep it inline, close as possible to the code that uses it, since said code references indices in the statement etc... but if you really want me to factor it I can.

> >+    // XXX Ignore corruption for now.
> AFAIK, we have a bug on file to handle this stuff.  Reference it please in all
> the places you say this.

Got it, bug 547031.

> >+  // Read in the data synchronously.
> >+  nsCOMPtr<mozIStorageStatement> stmt;
> >+  nsresult rv = mDefaultDBState.dbConn->CreateStatement(NS_LITERAL_CSTRING(
> We should cache this one too, but we can throw it away once our async read is
> done.  Also pulled into another method.

No need to cache it -- it's used precisely once. After the sync read is done, we never do it again.
(In reply to comment #23)
> Does storage guarantee that if HandleCompletion is called with REASON_ERROR,
> that HandleError was called prior?
Yes.

> Yeah. We probably do want to do it more often than a purge would. But as an
> async write, after the db is read in. I'll do that in a separate part.
Let's do a follow-up bug.
(In reply to comment #24)
> I'd rather keep it inline, close as possible to the code that uses it, since
> said code references indices in the statement etc... but if you really want me
> to factor it I can.
This is why I advocate getting columns by name and not by index anymore.  Getting by index is too easy to get wrong (as you even saw).
Blocks: 576347
Create a fresh connection lazily for performing sync reads.
Attachment #455564 - Flags: review?(sdwilsh)
Bug 506131 and bug 576348 tagged as clones of this, it seems to affect all platforms.
Want this for Fx4 -- more startup perf is good, and this patchset includes some pretty good tests.
blocking2.0: --- → ?
Comment on attachment 455564 [details] [diff] [review]
part 7: separate sync connection

>diff --git a/netwerk/cookie/nsCookieService.cpp b/netwerk/cookie/nsCookieService.cpp
> void
>+nsCookieService::GetSyncDBConn()
This should return mDefaultDBState.dbConn so the code in other places gets simplified:
if (!GetSycnDBConn())
  return;

>   if (!mDefaultDBState.stmtReadDomain) {
Need to assert mDefaultDBState.syncConn is non-null when mDefaultDBState.stmtReadDomain is true to make sure invariants hold.

r=sdwilsh
Attachment #455564 - Flags: review?(sdwilsh) → review+
Comment on attachment 455018 [details] [diff] [review]
part 1: add baseDomain column

Seeking approval for this set of patches.
Attachment #455018 - Flags: approval2.0?
blocking2.0: ? → beta4+
Attachment #455018 - Flags: approval2.0?
There's definitely something wrong with cookies now that this patch landed. (Built from http://hg.mozilla.org/mozilla-central/rev/6d0d4c0c8890)
On some websites (i.e bugzilla) there's no problem at all, on other sites (i.e esreality) I'm always logged out whenever I close the browser and on others (i.e mozillazine) I'm not able to login at all.
Can you take a cookie log? What happens if you delete cookies.sqlite and restart the browser?
(In reply to comment #35)
> Can you take a cookie log? What happens if you delete cookies.sqlite and
> restart the browser?

Sorry I deleted cookies.sqlite before taking the log and now the problem is gone. I'm not the only one seeing this btw (http://forums.mozillazine.org/viewtopic.php?p=9691433#p9691433).
I can revert to today's nightly and then upgrade to the newest build again to reproduce the bug if you need me to.
It's probably a bug with migration then. If you can, generate a cookies.sqlite with an old build and then try the new one on it?
Weird, I deleted cookies.sqlite, went to an older build, logged on the sites mentioned in post 34 then upgraded again and wasn't able to reproduce the problem. Twice. Maybe this only happens if you have a huge number of cookies?
Hmm. Let me know if you find anything, and I'll see if I can reproduce...
Attached image duplicate cookies (deleted) —
I have not been able to find exact steps yet but the bug does also happen with new cookies.sqlite. Mozillazine is so far the only site where I have seen this, the site works properly for a while after deleting cookies.sqlite but eventually breaks. I noticed that when the issue occurs you end up with huge numbers of duplicate cookies.
Can you dump the contents of cookies.sqlite using sqlite3? Or attach it here, or email it to me?
Also seeing this. I have my old cookies.sqlite file if you want it
Attached image More duplicates (deleted) —
Same problem with forums.mozillazine.org, a lot of dup cookies.
Awesome, thanks. It's not a schema / binding params problem, must be a bug in the check for existing cookies.

Two questions:
1. Are you using sessionrestore?
2. If so, are the affected sites the ones that are loaded at startup? And only those sites?
Maybe related: I have to refresh pages twice until they load properly; first refresh doesn't load images.
(In reply to comment #45)
That was fixed in the followup patch in Bug 513086.
(In reply to comment #44)

> Two questions:
> 1. Are you using sessionrestore?
> 2. If so, are the affected sites the ones that are loaded at startup? And only
> those sites?

1) I am and I have seen this issue
2) Doesn't need to be startup, I hit it right after logging into mozillazine, change a page on the site and I was logged out.  Had to login again.
(In reply to comment #47)
> (In reply to comment #44)
> 
> > Two questions:
> > 1. Are you using sessionrestore?
> > 2. If so, are the affected sites the ones that are loaded at startup? And only
> > those sites?
> 
> 1) I am and I have seen this issue
> 2) Doesn't need to be startup, I hit it right after logging into mozillazine,
> change a page on the site and I was logged out.  Had to login again.

I also had to clear cookies in order to login and stay logged in.
(In reply to comment #44)
> Awesome, thanks. It's not a schema / binding params problem, must be a bug in
> the check for existing cookies.
> 
> Two questions:
> 1. Are you using sessionrestore?
> 2. If so, are the affected sites the ones that are loaded at startup? And only
> those sites?

1. No, not normally (I tried it once to see if it would keep me logged on), but it didn't work so I turned it off.
2. Just once in the above test (shouldn't have caused all those dup's)... The number of duplicate cookies might be the number of times I was sent back to the login page, and since I have the userid and psw's saved using a master password, I can retry logging in easily.

I also tried deleting the cookies for forums.mozillazine.org. This worked only a couple of times before failing again.
I checked other Cookies.sqlite I have (such as Firefox 3.6.8 and 4.0B2...) and noticed they don't have the 'baseDomain column' (also they have no dup's).
In the SQLite dump I made above it has baseDomain as the last column listed,(undoubtedly) when this patch landed there was some code used to build the new baseDomain column from the existing data, might this have caused problems?
Does all code handle/maintain this column, as needed?

After deleting cookies.sqlite, and letting it be rebuilt I have not had problems so far, but I'll watch for duplicates per comment #40.
I was able to reproduce the problem when updating to today's nightly. Whenever I close the browser I'm logged out of ESReality and I can't login to hotmail anymore, it just keeps trying to login but fails and ends up returning to the  previous page. I'm not using session restore btw. Do you still need the cookies log?
Might be useful, yup.
I am logged out of every forum by this and it is getting rather annoying as I belong to a great deal of Forums. I cannot keep deleting my cookies.sqlite on a daily basis.
(In reply to comment #52)
> Might be useful, yup.

I must be doing something stupid because following the instructions from http://www.mozilla.org/projects/netlib/cookies/cookie-log.html the cookies-log.txt file is generated but there's nothing there.
Also I noticed that if you try a new profile, it takes a while for this to happen but it will happen eventually, which led me to think that vacuuming cookies.sqlite is corrupting it, leading to duplicated cookies and login problems on some (though not all) websites.
> following the instructions from http://www.mozilla.org/projects/netlib/cookies/cookie-log.html the cookies-log.txt file is generated but there's nothing there.

Same here. I think this should be backed out at this point.
It appears that what is happening here is that deleting cookies is not actually removing them from the database.  If i delete cookies via the Preferences panel, the seem to go away, but if I then exit the browser and re-launch it, they are back.
I'm also having issues with cookies disappearing. I guess these issues (from comment 34 onwards) should be filed as a separate bug, unless this patch is going to be backed out.
Depends on: 583568
Yeah, I've backed this out -- I can't reproduce locally, so I've no idea how long it will take to fix. Hopefully will be able to re-land in a few days.

Thanks to everyone who helped debug; it should make it much easier for me to figure out what's happening...

http://hg.mozilla.org/mozilla-central/rev/f53433d38b84
Interestingly I seem to 'lose' my cookies only from sites powered by phpBB.

Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; rv:2.0b3pre) Gecko/20100801 Minefield/4.0b3pre - Build ID: 20100801041348
> following the instructions from http://www.mozilla.org/projects/netlib/cookies/cookie-log.html the cookies-log.txt file is generated but there's nothing there.

Filed bug 583636.
Attached patch fix creationid (deleted) — Splinter Review
Fix creationid logic now that we're reading rowid's out-of-order.
Attachment #462226 - Flags: review?(sdwilsh)
Attached patch test for creationid (deleted) — Splinter Review
FTR, here's a test that makes sure creationid's are consistent.
Comment on attachment 462226 [details] [diff] [review]
fix creationid

r=sdwilsh
Attachment #462226 - Flags: review?(sdwilsh) → review+
Blocks: 583557
Target Milestone: --- → mozilla2.0b4
Depends on: 587767
Depends on: 597728
I have Firefox 4.0b6 with profile residing on SD card.
I experience extreme slowdown because of constant and intensive writing to cookies.sqlite on browser start. I have like 30 tabs opened so those are being restored on start.
Here's link to resource monitor screenshot:
http://jbtnkw.blu.livefilestore.com/y1pvpgIhiz545sUSI67IG8ywXGcW0w2cqYeKAft3FId00fwRHkYVYKZ6LG2j9duAEJxh1Z5xLAuYMW2DpGJxqym3EGd0Y4Oxkjw/firefox4.0b6-cookies-issue.png?psid=1

cookies.sqlite had size of 500kb before removal and is 98 kb now. The problem remains after deletion of cookies.sqlite and restart of browser.
That should be fixed by bug 595925.
Depends on: 600684
Depends on: 651810
No longer depends on: 651810
Depends on: 651810
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: