870460 - Lazy load of cookie service blocks main thread while cookie database loads

Reporter

Description

•

12 years ago

Desktop currently lazily loads the cookie database (i.e. we wait until the first use of the DB to initialize the cookie service). So the first HTTP channel that does an AsyncOpen will call GetCookieStringFromHttp (a synchronous call), which winds up waiting for the cookie service to open the database and fetch the desired cookie synchronously off disk. Even if we make all the I/O in the cookie DB off-main-thread and/or async, we'd still have to have the main thread sit and wait for it since the GetCookie API is sync.

I suspect the best fix is to init the cookie service earlier in startup, loading it async (or sync on a non-main thread) and in most cases it will hopefully already be loaded by the time it's used. We'd have to measure if this is actually a win, though.

Measurement here is a little tricky. We could instrument the code so we get telemetry on the 1st cookie load, and compare that time. If I'm right it will be faster (hopefully instant) when we load the cookie service earlier. But this won't measure the impact that loading the cookie service earlier has on startup time overall (there will be some possibility that moving the cookie I/O earlier will cause I/O contention and slow startup). Not sure if there's a good way to measure that.

Taras, does this sound like a plan that's worth trying? It's a simple enough patch to write (as long as someone can point me at a good place during startup to init the cookie service).

Note that in b2g we already do the early init (bug 810209) to avoid delete races.

Jason Duell

Reporter

Comment 1

•

12 years ago

this would also fix bug 867798.

Flags: needinfo?(taras.mozilla)

(dormant account)

Comment 2

•

12 years ago

Can we make the network channel wake up the cookie db and have the main thread wait for the channel as it usually waits for data to load?

Flags: needinfo?(taras.mozilla)

Jason Duell

Reporter

Comment 3

•

12 years ago

If by "wait" you mean "wait asynchronously", we can do that, but only by changing the nsICookieService API to add an async version of GetCookie.  Doable, and we would presumably not have to change all call sites, just nsHttpChannel::AsyncOpen.  It's doable.  Of course the first pageload will still be delayed by however long it takes to load the SQL.

Patrick McManus [:mcmanus]

Updated

•

9 years ago

Whiteboard: [necko-backlog]

Shian-Yow Wu [:swu]

Updated

•

7 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1360202

Jason Duell

Reporter

Comment 6

•

7 years ago

So in bug 1360202 we're seeing startup jank of 90ms, which would be good to fix (I'm not sure I'd put it as a p1 myself, but I'm hoping the fix is easy.)

Per comment 0--we need to find a suitable, early-as-possible place to initialize the cookie service, instead of our current dumb lazy-load strategy.  If we do that, with any luck we'll be done with the async query that loads the whole SQLite cookie database into memory.  And if not, at least the amount of time we'll need to wait for it should be less.

Benjamin, where's a good place to put this during init? AFAIK the cookie service shouldn't need a whole lot of other stuff running already (SQLite being the big one).

Nick, if you're busy you could possibly give this to Junior. But it looks relatively easy and self-contained.

Assignee: nobody → hurley

Flags: needinfo?(benjamin)

Whiteboard: [necko-backlog] → [qf:p1][necko-quantum] [necko-active]

Will Wang [:WillWang]

Comment 7

•

7 years ago

Bring bug dependency from duplicated bug 1360202.

Blocks: ss-perf, 1356605

Kan-Ru Chen [:kanru] (UTC+9)

Comment 8

•

7 years ago

(In reply to Jason Duell [:jduell] (needinfo me) from comment #6)
> So in bug 1360202 we're seeing startup jank of 90ms, which would be good to
> fix (I'm not sure I'd put it as a p1 myself, but I'm hoping the fix is easy.)
> 
> Per comment 0--we need to find a suitable, early-as-possible place to
> initialize the cookie service, instead of our current dumb lazy-load
> strategy.  If we do that, with any luck we'll be done with the async query
> that loads the whole SQLite cookie database into memory.  And if not, at
> least the amount of time we'll need to wait for it should be less.
> 
> Benjamin, where's a good place to put this during init? AFAIK the cookie
> service shouldn't need a whole lot of other stuff running already (SQLite
> being the big one).

Note if you want to use sqlite as early as possible, there is one potential blocker bug 730495 that we don't have a well defined "sqlite is ready" point where we can safely call sqlite API.

Will Wang [:WillWang]

Comment 9

•

7 years ago

(In reply to Jason Duell [:jduell] (needinfo me) from comment #6)
> So in bug 1360202 we're seeing startup jank of 90ms, which would be good to
> fix (I'm not sure I'd put it as a p1 myself, but I'm hoping the fix is easy.)

For the 90ms jank, I'd like to share more info for reference:

I just tried to profile my old mac mini(2010 Mid) with the latest m-c and it spent
130ms for nsCookieService::Init()
110ms for nsCookieService::InitDBStates()
the profile is here: https://perfht.ml/2qq6blw

Bug 1356605 comment 10 also shows that a very different result for the profiling in InitDBStates()  (9ms vs 90ms)

I think it might be mostly about the SSD vs HDD.

Jason Duell

Reporter

Comment 10

•

7 years ago

kanru: so I get from bug 730495 that there's no *programmatic* way to guarantee that sqlite is ready.  But we obviously must have some places in our startup at this point where we know that it effectively must be ready (i.e. where we can start using places, NSS, etc).  Do you know of the earliest point when that is (or at least some early point)?

Flags: needinfo?(kchen)

Kan-Ru Chen [:kanru] (UTC+9)

Comment 11

•

7 years ago

(In reply to Jason Duell [:jduell] (needinfo me) from comment #10)
> kanru: so I get from bug 730495 that there's no *programmatic* way to
> guarantee that sqlite is ready.  But we obviously must have some places in
> our startup at this point where we know that it effectively must be ready
> (i.e. where we can start using places, NSS, etc).  Do you know of the
> earliest point when that is (or at least some early point)?

It was working because coincidence. I think it was https://dxr.mozilla.org/mozilla-central/rev/1ec3a3ff68f2d1a54e6ed33e926c28fee286bdf1/toolkit/mozapps/extensions/internal/XPIProvider.jsm#1250-1278 initializing sqlite but the line was removed by bug 1277295. I'm not sure where sqlite is initialized now without using a debugger.

Really, I think we should fix bug 730495 properly...

Flags: needinfo?(kchen)

Jason Duell

Reporter

Comment 12

•

7 years ago

OK, I'll ask in bug 730495 if it's likely to happen in the 57 time frame. I'm guessing not, so perhaps we'll just need to fire up a debugger for now and find a place to insert a cookieservice init.

Jason Duell

Reporter

Comment 13

•

7 years ago

Marco:

So currently our cookieService runs the following SQLite logic during startup.  Besides our plan to simply start this logic earlier, I'm wondering what else we can do here:

------

InitDBStates():
  ReadAheadFile(mDefaultDBState->cookieFile);
  mStorageService->OpenUnsharedDatabase(mDefaultDBState->cookieFile,
  insert a bunch of DB Listeners
  TableExists() check
  GetSchemaVersion()
  3 CreateAsyncStatement()s

  Call Read()
    2 more CreateAsyncStatement()s

    // Start a new connection for sync reads, to reduce contention with the
    // background thread [JD: not sure what that means?]. We need to do this
    // before we kick off write statements, since they can lock the database
    // and prevent connections from being opened. [JD: still true?]
    OpenUnsharedDatabase()   

    ExecuteAsync(SELECT * from the cookie table)
    ExecuteAsync(possibly no longer needed deletion of any records with NULL origin)

---

Which of these things block?  I'm assuming the major delay is calling OpenUnsharedDatabase. It doesn't look like we can avoid that call (i.e switch to OpenAsyncDatabase() instead), since we need to check the schemaVersion of the database at startup, and that operation is only supported for synchronous connections (could that easily be changed? Could we have an AsyncGetSchemaVersion (with a callback) for async connections?

Also, if you have any thoughts on a good, early place to init the cookieSvc during startup (see comment 6) that would be great too.

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Comment 14

•

7 years ago

(In reply to Jason Duell [:jduell] (needinfo me) from comment #13)
> Which of these things block?

Everything but ExecuteAsync.

>   mStorageService->OpenUnsharedDatabase(mDefaultDBState->cookieFile,

For sure this one is the biggest offender.

>   TableExists() check
>   GetSchemaVersion()

These are basically just tiny wrappers around queries, they are unlikely to do IO since this info is probably in memory, though they are on main-thread and thus they contend the database resource with the async thread (if the async thread is busy doing anything, the main-thread blocks). Should not be a problem in your current setup, since everything before them is synchronous...

> I'm assuming the major delay is calling
> OpenUnsharedDatabase. It doesn't look like we can avoid that call (i.e
> switch to OpenAsyncDatabase() instead), since we need to check the
> schemaVersion of the database at startup, and that operation is only
> supported for synchronous connections

Actually GetSchemaVersion() is just a wrapper that executes a "PRAGMA user_version" statement and fetches the result... You can do the same with ExecuteAsync. We didn't add an API because through the current API you can do that already. We could also gain an helper if that helps, but likely you'll have to do that yourself, there's no Storage dedicated resources, just an owner and some peers taking care of needinfo, updates and emergencies. Also TableExists() is a simple SELECT against sqlite_master.

Btw, it was my assumption that the cookies manager is synchronous, so you'll still have to block the main-thread until the db is ready? Surely by opening early you could delay that, but it will then regress startup.
Imho the cleaner solution would be to expose 2 APIs, one asynchronous in the chrome process, the other synchronous in the content process. Content can use a ScriptBlocker. That would allow to keep the UI responsive and then it wouldn't matter much when you init cookies.

> Also, if you have any thoughts on a good, early place to init the cookieSvc
> during startup (see comment 6) that would be great too.

IIRC (it passed some time so I may have rusty memories) bug 730495 should not be a problem for you. That bug is about NSS initializing the sqlite library before mozStorage. Your code always goes through mozStorage, so it should not be affected afaict, unless you also initialize NSS before the database in your code.
By making your code running earlier off-hand looks like you will reduce the likely for bug 730495 to happen, rather than causing it.

Flags: needinfo?(mak77)

u408661

Comment 15

•

7 years ago

So I've run this under a debugger, and everything said in comment 14 jives with what I saw - initializing the cookie service causes mozstorage to be initialized, which calls sqlite3_config. So we're all good on the race, we'll never hit it no matter when we init the cookie service.

The question at this point, Jason, comes down to would we rather (1) slow down startup (by initing the cookie service, say, when we start up the socket thread), or (2) add a new async cookie api to always be used internally by gecko, and make the sync api use the scriptblocker thing. The latter is definitely way more work (I haven't looked into what would be involved yet), and I have to imagine it's been discussed at some point in the past (see also how we had both sync and async cache APIs) and decided against, but this may change the calculus. In the long run, though, the async api option could quite possibly be The Right Thing.

Flags: needinfo?(jduell.mcbugs)

Jason Duell

Reporter

Comment 16

•

7 years ago

The silly thing about making the cookie API async is that it doesn't provide any benefit except during startup. And I suspect it wouldn't even help startup potentially--all it takes is some page JS checking document.cookie and we'll have to fall back to a sync option.

I think it might be better and easier to have the cookieService attempt to load the database with the OpenAsync() version very early on in startup:

1) OpenAsyncDatabase()
2) in callback when DB is ready, asyncExecute Table Exists check.
3) In callback, if table does exist, asyncExecute("PRAGMA user_version")
4) In that callback, check if the schema is the right version.
5) If either the table was missing, or scheme version is off, do fall back logic--open a sync DB connection and do all the grunt work to re-create the tables (sucks to do it synchronously, but it's a rare failure pathway.)
4) If the schema version was OK, asyncExecute the big select * that loads all the cookies into memory.
5) When we actually get our first GetCookie() etc call on the cookieService, hopefully we'll have all the data already (telemetry!). If we don't, open a sync database connection and ask for just the one record in question (i.e. the logic that we currently have for GetCookie(), except we'd only open the sync connection if and when it's needed).

I.e. I think if we get lucky we can use only async APIs for the initial database read, which will get rid of the startup jank, and only do sync database calls when we hit a bad or out of date database.

This is a fairly simple plan, but there's probably some complexity/risk, and it might not hit 57 (or might take away dev hours we could more profitably use elsewhere). Hard to say exactly at this point. Maybe give it a stab for a day or two and if it winds up being harder than we think, simply moving the DB init earlier in startup may be all we can do for 57.

Flags: needinfo?(jduell.mcbugs)

Jason Duell

Reporter

Comment 17

•

7 years ago

Another option (which I just thought of, so maybe it's half-baked):  move all cookie DB access to a new thread (or maybe just use one from our thread pool for blocking disk I/O).  It can use sync APIs for everything (including the current async "select *"). Fire off init early in startup. When GetCookie gets called, if we haven't finished loading the cookies into the big hashtable, block the main thread on a mutex until we're done.  Present telemetry to show that the blocking hardly ever happens and point out that even if it does, the alternative is to block on I/O on the main thread.

This might be cleaner and easier to implement (and moves all the I/O off the main thread, except for the sometimes hitting a mutex issue).

Marco Bonardo [:mak]

Comment 18

•

7 years ago

(In reply to Jason Duell [:jduell] (needinfo me) from comment #16)
> And I suspect it wouldn't even help
> startup potentially--all it takes is some page JS checking document.cookie
> and we'll have to fall back to a sync option.

The idea would be that at point you don't block the main-thread waiting for init, you block the page with a scriptBlocker instead.

Honza Bambas (:mayhemer)

Comment 19

•

7 years ago

(In reply to Marco Bonardo [::mak] from comment #18)
> (In reply to Jason Duell [:jduell] (needinfo me) from comment #16)
> > And I suspect it wouldn't even help
> > startup potentially--all it takes is some page JS checking document.cookie
> > and we'll have to fall back to a sync option.
> 
> The idea would be that at point you don't block the main-thread waiting for
> init, you block the page with a scriptBlocker instead.

Exactly.  OTOH, how is the

<head/><body><script>var allCookies = document.cookie;</script>....

case handled?

Benjamin Smedberg

Comment 20

•

7 years ago

Looks to me as if other people answered this better than I could have. Let me know if you still have questions for me.

Flags: needinfo?(benjamin)

Jason Duell

Reporter

Comment 21

•

7 years ago

After talking this through with Nick, our plan is to go with comment 17:  we're going to move all SQLite activity onto a new thread.

- We'll put code that initializes the cookieService (i.e. just grab it to a nsCOMPtr but then don't actually do anything with it) early on somewhere during startup (Nick suggests we do it when we initialize the socket transport thread: I'd like to ask around to find out if that's the earliest spot).  I'm not sure if we need to do this init on the main thread or not (the code will just launch a new thread, so maybe it doesn't matter which thread uses it--but generally the cookieService is main-thread only and I don't know if there's any XPCOM service infrastructure that requires that services are requested only on the main thread).

- When the cookieService is initialized it will create a new thread that does all the current logic to open (and if needed, alter/reconstruct) the cookie database.  One change: we can get rid of the current logic that does a blocking sync request for a single cookie if the large "select all cookies" async query hasn't finished.  Instead we can make the "select all" a sync query (and just get rid of the single cookie request).

- If something on the main thread calls into the cookieService before it is finished setting up the hashtable (i.e. the sqlite query isn't complete), we can either 1) simply block the main thread on a monitor until it's done, and gather telemetry to see how often it happens (and with how much delay).  If those numbers show any significant jank then we should 2) add an async version of GetCookie, and have necko use it in AsyncOpen, so we don't block the main thread.  

- We will need to make sure that *all* database access happens on the new "cookie I/O thread", so it will stay alive after initialization, and any calls to SetCookie/DeleteCookie/etc will need to 1) modify the hashtable synchronously on the main thread (see comment on locking, below), and then 2) proxy an event to the cookie I/O thread to do the SQLite update.

Locking: so in this plan we'll have two different threads modifying the cookie hashtable.  The most obvious way to handle that would be to have them lock a mutex every time they need hashtable access.  But the write pattern here is actually very simple, and we might not need to lock/unlock a mutex for every cookieService access.  We basically have a two-stage access model:  in stage 1 (initialization), only the cache I/O thread is allowed to write (or read) the hashtable, and if the main thread tries to, it must be delayed (via monitor.wait) until init is complete.  In stage 2 (post-init), the only thread that will read or write (AFAICT) the hashtable will be the main thread (the cache I/O thread will do writes to the database, but shouldn't need to touch the hashtable after init AFAICT).  So we might be able to get away with keeping some "mInitialized" boolean that starts out at 'false' and only gets set to 'true' after the hashtable has been written AND we've guaranteed that we've flushed the CPU operations to memory (Any mutex/monitor lock or unlock will flush memory).  So the I/O thread would do something like

  // It's OK that we hold this lock while we do I/O on our non-main I/O thread
  MonitorAutoLock lock(mMonitor);
  ... Do synchronous "Select all" on database and populate the cookie hashtable
  mInitialized = true;
  mMonitor.Notify();

And then all synchronous CookieService functions like Get|SetCookie could then do something like

    if (mInitialized) { 
      touch the hashtable, no need to lock
    } else { 
      MonitorAutoLock lock(mMonitor);
      while (!mInitialized) {
        mMonitor.Wait();
      }
    }

This could be wrong, and/or overkill (maybe grabbing a mutex every time we call into the CookieService is cheap enough). I'll ask someone who knows our thread synchronization stuff better than I do.

Jason Duell

Reporter

Comment 22

•

7 years ago

Nathan: can you look at comment 21 and tell me

1) If you know of a good, early place during startup to initialize the cookieService (requires SQLite to be working/available by then).

2) Does service instantiation need to happen on the main thread?

3) Does my plan for lockless access to the cookie hashtable seem sane?

Flags: needinfo?(nfroyd)

Jason Duell

Reporter

Comment 23

•

7 years ago

Note: in case I didn't make it clear, this plan would mean that *all* of our cookie SQLite would be done with a sync db connection on the I/O thread. AFAICT we would no need to do anything async, and would no longer need to open a 2nd, async db connection.

u408661

Comment 24

•

7 years ago

Junior - Jason says you're free to take this on. I'll help mentor you through it and do at least the first round or two of reviews. Sound good?

Assignee: hurley → juhsu

Part1_OMT_Init WIP1, v1 7 years ago Junior [inactive] (deleted), patch		Details \| Diff \| Splinter Review
Part1_OMT_Init WIP1, v2 7 years ago Junior [inactive] (deleted), patch	u408661 : feedback+	Details \| Diff \| Splinter Review
CookieDBStartupOMT - WIP, v3 7 years ago Junior [inactive] (deleted), patch		Details \| Diff \| Splinter Review
CookieDBStartupOMT - WIP, v4 7 years ago Junior [inactive] (deleted), patch		Details \| Diff \| Splinter Review
CookieDBStartupOMT, v5 7 years ago Junior [inactive] (deleted), patch	francois : feedback-	Details \| Diff \| Splinter Review
CookieDBStartupOMT, v6 7 years ago Junior [inactive] (deleted), patch	u408661 : review-	Details \| Diff \| Splinter Review
CookieDBStartupOMT, v7 7 years ago Junior [inactive] (deleted), patch		Details \| Diff \| Splinter Review
CookieDBStartupOMT, v8 7 years ago Junior [inactive] (deleted), patch	u408661 : review-	Details \| Diff \| Splinter Review
CookieDBStartupOMT, v9 7 years ago Junior [inactive] (deleted), patch		Details \| Diff \| Splinter Review
Part1: CookieDBStartupOMT, v10 7 years ago Junior [inactive] (deleted), patch	u408661 : review+ jdm : review+	Details \| Diff \| Splinter Review
Part2: close syncConn for edge cases, v1 7 years ago Junior [inactive] (deleted), patch	u408661 : review+	Details \| Diff \| Splinter Review
Part1: CookieDBStartupOMT, v11 7 years ago Junior [inactive] (deleted), patch	CuveeHsu : review+ francois : feedback+	Details \| Diff \| Splinter Review
Part2: close syncConn for edge cases, v2 7 years ago Junior [inactive] (deleted), patch	CuveeHsu : review+	Details \| Diff \| Splinter Review
Part3: talos-whitelist, v1 7 years ago Junior [inactive] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
Part4: threadLifecycle - v1 7 years ago Junior [inactive] (deleted), patch	u408661 : review+	Details \| Diff \| Splinter Review
Part4: threadLifecycle - v2 7 years ago Junior [inactive] (deleted), patch	CuveeHsu : review+	Details \| Diff \| Splinter Review