Closed Bug 870460 Opened 12 years ago Closed 7 years ago

Lazy load of cookie service blocks main thread while cookie database loads

Categories

(Core :: Networking: Cookies, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Performance Impact medium
Tracking Status
firefox58 --- fixed

People

(Reporter: jduell.mcbugs, Assigned: CuveeHsu)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [necko-quantum][necko-triaged])

Attachments

(4 files, 12 obsolete files)

(deleted), patch
CuveeHsu
: review+
francois
: feedback+
Details | Diff | Splinter Review
(deleted), patch
CuveeHsu
: review+
Details | Diff | Splinter Review
(deleted), patch
jmaher
: review+
Details | Diff | Splinter Review
(deleted), patch
CuveeHsu
: review+
Details | Diff | Splinter Review
Desktop currently lazily loads the cookie database (i.e. we wait until the first use of the DB to initialize the cookie service).  So the first HTTP channel that does an AsyncOpen will call GetCookieStringFromHttp (a synchronous call), which winds up waiting for the cookie service to open the database and fetch the desired cookie synchronously off disk.  Even if we make all the I/O in the cookie DB off-main-thread and/or async, we'd still have to have the main thread sit and wait for it since the GetCookie API is sync.

I suspect the best fix is to init the cookie service earlier in startup, loading it async (or sync on a non-main thread) and in most cases it will hopefully already be loaded by the time it's used.  We'd have to measure if this is actually a win, though.

Measurement here is a little tricky.  We could instrument the code so we get telemetry on the 1st cookie load, and compare that time.  If I'm right it will be faster (hopefully instant) when we load the cookie service earlier.  But this won't measure the impact that loading the cookie service earlier has on startup time overall (there will be some possibility that moving the cookie I/O earlier will cause I/O contention and slow startup).  Not sure if there's a good way to measure that.

Taras, does this sound like a plan that's worth trying? It's a simple enough patch to write (as long as someone can point me at a good place during startup to init the cookie service).

Note that in b2g we already do the early init (bug 810209) to avoid delete races.
this would also fix bug 867798.
Flags: needinfo?(taras.mozilla)
Can we make the network channel wake up the cookie db and have the main thread wait for the channel as it usually waits for data to load?
Flags: needinfo?(taras.mozilla)
If by "wait" you mean "wait asynchronously", we can do that, but only by changing the nsICookieService API to add an async version of GetCookie.  Doable, and we would presumably not have to change all call sites, just nsHttpChannel::AsyncOpen.  It's doable.  Of course the first pageload will still be delayed by however long it takes to load the SQL.
Whiteboard: [necko-backlog]
So in bug 1360202 we're seeing startup jank of 90ms, which would be good to fix (I'm not sure I'd put it as a p1 myself, but I'm hoping the fix is easy.)

Per comment 0--we need to find a suitable, early-as-possible place to initialize the cookie service, instead of our current dumb lazy-load strategy.  If we do that, with any luck we'll be done with the async query that loads the whole SQLite cookie database into memory.  And if not, at least the amount of time we'll need to wait for it should be less.

Benjamin, where's a good place to put this during init? AFAIK the cookie service shouldn't need a whole lot of other stuff running already (SQLite being the big one).

Nick, if you're busy you could possibly give this to Junior. But it looks relatively easy and self-contained.
Assignee: nobody → hurley
Flags: needinfo?(benjamin)
Whiteboard: [necko-backlog] → [qf:p1][necko-quantum] [necko-active]
Bring bug dependency from duplicated bug 1360202.
Blocks: ss-perf, 1356605
(In reply to Jason Duell [:jduell] (needinfo me) from comment #6)
> So in bug 1360202 we're seeing startup jank of 90ms, which would be good to
> fix (I'm not sure I'd put it as a p1 myself, but I'm hoping the fix is easy.)
> 
> Per comment 0--we need to find a suitable, early-as-possible place to
> initialize the cookie service, instead of our current dumb lazy-load
> strategy.  If we do that, with any luck we'll be done with the async query
> that loads the whole SQLite cookie database into memory.  And if not, at
> least the amount of time we'll need to wait for it should be less.
> 
> Benjamin, where's a good place to put this during init? AFAIK the cookie
> service shouldn't need a whole lot of other stuff running already (SQLite
> being the big one).

Note if you want to use sqlite as early as possible, there is one potential blocker bug 730495 that we don't have a well defined "sqlite is ready" point where we can safely call sqlite API.
(In reply to Jason Duell [:jduell] (needinfo me) from comment #6)
> So in bug 1360202 we're seeing startup jank of 90ms, which would be good to
> fix (I'm not sure I'd put it as a p1 myself, but I'm hoping the fix is easy.)

For the 90ms jank, I'd like to share more info for reference:

I just tried to profile my old mac mini(2010 Mid) with the latest m-c and it spent
130ms for nsCookieService::Init()
110ms for nsCookieService::InitDBStates()
the profile is here: https://perfht.ml/2qq6blw

Bug 1356605 comment 10 also shows that a very different result for the profiling in InitDBStates()  (9ms vs 90ms)

I think it might be mostly about the SSD vs HDD.
kanru: so I get from bug 730495 that there's no *programmatic* way to guarantee that sqlite is ready.  But we obviously must have some places in our startup at this point where we know that it effectively must be ready (i.e. where we can start using places, NSS, etc).  Do you know of the earliest point when that is (or at least some early point)?
Flags: needinfo?(kchen)
(In reply to Jason Duell [:jduell] (needinfo me) from comment #10)
> kanru: so I get from bug 730495 that there's no *programmatic* way to
> guarantee that sqlite is ready.  But we obviously must have some places in
> our startup at this point where we know that it effectively must be ready
> (i.e. where we can start using places, NSS, etc).  Do you know of the
> earliest point when that is (or at least some early point)?

It was working because coincidence. I think it was https://dxr.mozilla.org/mozilla-central/rev/1ec3a3ff68f2d1a54e6ed33e926c28fee286bdf1/toolkit/mozapps/extensions/internal/XPIProvider.jsm#1250-1278 initializing sqlite but the line was removed by bug 1277295. I'm not sure where sqlite is initialized now without using a debugger.

Really, I think we should fix bug 730495 properly...
Flags: needinfo?(kchen)
OK, I'll ask in bug 730495 if it's likely to happen in the 57 time frame. I'm guessing not, so perhaps we'll just need to fire up a debugger for now and find a place to insert a cookieservice init.
Marco:

So currently our cookieService runs the following SQLite logic during startup.  Besides our plan to simply start this logic earlier, I'm wondering what else we can do here:

------

InitDBStates():
  ReadAheadFile(mDefaultDBState->cookieFile);
  mStorageService->OpenUnsharedDatabase(mDefaultDBState->cookieFile,
  insert a bunch of DB Listeners
  TableExists() check
  GetSchemaVersion()
  3 CreateAsyncStatement()s

  Call Read()
    2 more CreateAsyncStatement()s

    // Start a new connection for sync reads, to reduce contention with the
    // background thread [JD: not sure what that means?]. We need to do this
    // before we kick off write statements, since they can lock the database
    // and prevent connections from being opened. [JD: still true?]
    OpenUnsharedDatabase()   

    ExecuteAsync(SELECT * from the cookie table)
    ExecuteAsync(possibly no longer needed deletion of any records with NULL origin)

---

Which of these things block?  I'm assuming the major delay is calling OpenUnsharedDatabase. It doesn't look like we can avoid that call (i.e switch to OpenAsyncDatabase() instead), since we need to check the schemaVersion of the database at startup, and that operation is only supported for synchronous connections (could that easily be changed? Could we have an AsyncGetSchemaVersion (with a callback) for async connections?

Also, if you have any thoughts on a good, early place to init the cookieSvc during startup (see comment 6) that would be great too.
Flags: needinfo?(mak77)
(In reply to Jason Duell [:jduell] (needinfo me) from comment #13)
> Which of these things block?

Everything but ExecuteAsync.

>   mStorageService->OpenUnsharedDatabase(mDefaultDBState->cookieFile,

For sure this one is the biggest offender.

>   TableExists() check
>   GetSchemaVersion()

These are basically just tiny wrappers around queries, they are unlikely to do IO since this info is probably in memory, though they are on main-thread and thus they contend the database resource with the async thread (if the async thread is busy doing anything, the main-thread blocks). Should not be a problem in your current setup, since everything before them is synchronous...

> I'm assuming the major delay is calling
> OpenUnsharedDatabase. It doesn't look like we can avoid that call (i.e
> switch to OpenAsyncDatabase() instead), since we need to check the
> schemaVersion of the database at startup, and that operation is only
> supported for synchronous connections

Actually GetSchemaVersion() is just a wrapper that executes a "PRAGMA user_version" statement and fetches the result... You can do the same with ExecuteAsync. We didn't add an API because through the current API you can do that already. We could also gain an helper if that helps, but likely you'll have to do that yourself, there's no Storage dedicated resources, just an owner and some peers taking care of needinfo, updates and emergencies. Also TableExists() is a simple SELECT against sqlite_master.

Btw, it was my assumption that the cookies manager is synchronous, so you'll still have to block the main-thread until the db is ready? Surely by opening early you could delay that, but it will then regress startup.
Imho the cleaner solution would be to expose 2 APIs, one asynchronous in the chrome process, the other synchronous in the content process. Content can use a ScriptBlocker. That would allow to keep the UI responsive and then it wouldn't matter much when you init cookies.

> Also, if you have any thoughts on a good, early place to init the cookieSvc
> during startup (see comment 6) that would be great too.

IIRC (it passed some time so I may have rusty memories) bug 730495 should not be a problem for you. That bug is about NSS initializing the sqlite library before mozStorage. Your code always goes through mozStorage, so it should not be affected afaict, unless you also initialize NSS before the database in your code.
By making your code running earlier off-hand looks like you will reduce the likely for bug 730495 to happen, rather than causing it.
Flags: needinfo?(mak77)
So I've run this under a debugger, and everything said in comment 14 jives with what I saw - initializing the cookie service causes mozstorage to be initialized, which calls sqlite3_config. So we're all good on the race, we'll never hit it no matter when we init the cookie service.

The question at this point, Jason, comes down to would we rather (1) slow down startup (by initing the cookie service, say, when we start up the socket thread), or (2) add a new async cookie api to always be used internally by gecko, and make the sync api use the scriptblocker thing. The latter is definitely way more work (I haven't looked into what would be involved yet), and I have to imagine it's been discussed at some point in the past (see also how we had both sync and async cache APIs) and decided against, but this may change the calculus. In the long run, though, the async api option could quite possibly be The Right Thing.
Flags: needinfo?(jduell.mcbugs)
The silly thing about making the cookie API async is that it doesn't provide any benefit except during startup.  And I suspect it wouldn't even help startup potentially--all it takes is some page JS checking document.cookie and we'll have to fall back to a sync option.

I think it might be better and easier to have the cookieService attempt to load the database with the OpenAsync() version very early on in startup:

   1) OpenAsyncDatabase()
   2) in callback when DB is ready, asyncExecute Table Exists check.
   3) In callback, if table does exist, asyncExecute("PRAGMA user_version")
   4) In that callback, check if the schema is the right version.  
   5) If either the table was missing, or scheme version is off, do fall back logic--open a sync DB connection and do all the grunt work to re-create the tables (sucks to do it synchronously, but it's a rare failure pathway.)
   4) If the schema version was OK, asyncExecute the big select * that loads all the cookies into memory.
   5) When we actually get our first GetCookie() etc call on the cookieService, hopefully we'll have all the data already (telemetry!).  If we don't, open a sync database connection and ask for just the one record in question (i.e. the logic that we currently have for GetCookie(), except we'd only open the sync connection if and when it's needed).

I.e. I think if we get lucky we can use only async APIs for the initial database read, which will get rid of the startup jank, and only do sync database calls when we hit a bad or out of date database.

This is a fairly simple plan, but there's probably some complexity/risk, and it might not hit 57 (or might take away dev hours we could more profitably use elsewhere).  Hard to say exactly at this point.  Maybe give it a stab for a day or two and if it winds up being harder than we think, simply moving the DB init earlier in startup may be all we can do for 57.
Flags: needinfo?(jduell.mcbugs)
Another option (which I just thought of, so maybe it's half-baked):  move all cookie DB access to a new thread (or maybe just use one from our thread pool for blocking disk I/O).  It can use sync APIs for everything (including the current async "select *"). Fire off init early in startup. When GetCookie gets called, if we haven't finished loading the cookies into the big hashtable, block the main thread on a mutex until we're done.  Present telemetry to show that the blocking hardly ever happens and point out that even if it does, the alternative is to block on I/O on the main thread.

This might be cleaner and easier to implement (and moves all the I/O off the main thread, except for the sometimes hitting a mutex issue).
(In reply to Jason Duell [:jduell] (needinfo me) from comment #16)
> And I suspect it wouldn't even help
> startup potentially--all it takes is some page JS checking document.cookie
> and we'll have to fall back to a sync option.

The idea would be that at point you don't block the main-thread waiting for init, you block the page with a scriptBlocker instead.
(In reply to Marco Bonardo [::mak] from comment #18)
> (In reply to Jason Duell [:jduell] (needinfo me) from comment #16)
> > And I suspect it wouldn't even help
> > startup potentially--all it takes is some page JS checking document.cookie
> > and we'll have to fall back to a sync option.
> 
> The idea would be that at point you don't block the main-thread waiting for
> init, you block the page with a scriptBlocker instead.

Exactly.  OTOH, how is the

<head/><body><script>var allCookies = document.cookie;</script>....

case handled?
Looks to me as if other people answered this better than I could have. Let me know if you still have questions for me.
Flags: needinfo?(benjamin)
After talking this through with Nick, our plan is to go with comment 17:  we're going to move all SQLite activity onto a new thread.

- We'll put code that initializes the cookieService (i.e. just grab it to a nsCOMPtr but then don't actually do anything with it) early on somewhere during startup (Nick suggests we do it when we initialize the socket transport thread: I'd like to ask around to find out if that's the earliest spot).  I'm not sure if we need to do this init on the main thread or not (the code will just launch a new thread, so maybe it doesn't matter which thread uses it--but generally the cookieService is main-thread only and I don't know if there's any XPCOM service infrastructure that requires that services are requested only on the main thread).

- When the cookieService is initialized it will create a new thread that does all the current logic to open (and if needed, alter/reconstruct) the cookie database.  One change: we can get rid of the current logic that does a blocking sync request for a single cookie if the large "select all cookies" async query hasn't finished.  Instead we can make the "select all" a sync query (and just get rid of the single cookie request).

- If something on the main thread calls into the cookieService before it is finished setting up the hashtable (i.e. the sqlite query isn't complete), we can either 1) simply block the main thread on a monitor until it's done, and gather telemetry to see how often it happens (and with how much delay).  If those numbers show any significant jank then we should 2) add an async version of GetCookie, and have necko use it in AsyncOpen, so we don't block the main thread.  

- We will need to make sure that *all* database access happens on the new "cookie I/O thread", so it will stay alive after initialization, and any calls to SetCookie/DeleteCookie/etc will need to 1) modify the hashtable synchronously on the main thread (see comment on locking, below), and then 2) proxy an event to the cookie I/O thread to do the SQLite update.

Locking: so in this plan we'll have two different threads modifying the cookie hashtable.  The most obvious way to handle that would be to have them lock a mutex every time they need hashtable access.  But the write pattern here is actually very simple, and we might not need to lock/unlock a mutex for every cookieService access.  We basically have a two-stage access model:  in stage 1 (initialization), only the cache I/O thread is allowed to write (or read) the hashtable, and if the main thread tries to, it must be delayed (via monitor.wait) until init is complete.  In stage 2 (post-init), the only thread that will read or write (AFAICT) the hashtable will be the main thread (the cache I/O thread will do writes to the database, but shouldn't need to touch the hashtable after init AFAICT).  So we might be able to get away with keeping some "mInitialized" boolean that starts out at 'false' and only gets set to 'true' after the hashtable has been written AND we've guaranteed that we've flushed the CPU operations to memory (Any mutex/monitor lock or unlock will flush memory).  So the I/O thread would do something like

  // It's OK that we hold this lock while we do I/O on our non-main I/O thread
  MonitorAutoLock lock(mMonitor);
  ... Do synchronous "Select all" on database and populate the cookie hashtable
  mInitialized = true;
  mMonitor.Notify();

And then all synchronous CookieService functions like Get|SetCookie could then do something like

    if (mInitialized) { 
      touch the hashtable, no need to lock
    } else { 
      MonitorAutoLock lock(mMonitor);
      while (!mInitialized) {
        mMonitor.Wait();
      }
    }

This could be wrong, and/or overkill (maybe grabbing a mutex every time we call into the CookieService is cheap enough). I'll ask someone who knows our thread synchronization stuff better than I do.
Nathan: can you look at comment 21 and tell me

1) If you know of a good, early place during startup to initialize the cookieService (requires SQLite to be working/available by then).

2) Does service instantiation need to happen on the main thread?

3) Does my plan for lockless access to the cookie hashtable seem sane?
Flags: needinfo?(nfroyd)
Note: in case I didn't make it clear, this plan would mean that *all* of our cookie SQLite would be done with a sync db connection on the I/O thread. AFAICT we would no need to do anything async, and would no longer need to open a 2nd, async db connection.
Junior - Jason says you're free to take this on. I'll help mentor you through it and do at least the first round or two of reviews. Sound good?
Assignee: hurley → juhsu
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #24)
> Junior - Jason says you're free to take this on. I'll help mentor you
> through it and do at least the first round or two of reviews. Sound good?

Yes, I'm willing to take this.
It's great to have your help with the bug.
Sorry for the delay here...

(In reply to Jason Duell [:jduell] (needinfo me) from comment #22)
> 1) If you know of a good, early place during startup to initialize the
> cookieService (requires SQLite to be working/available by then).

Do we have a single place that initializes SQLite?  I guess this would be the storage service, which needs to be instantiated on the main thread.  And the cookie service uses that.  So we'd need to find a good place to (transitively) start up the storage service; I guess NS_InitXPCOM2 or nsLayoutStatics::Initialize (called from NS_InitXPCOM2) would be early enough.

> 2) Does service instantiation need to happen on the main thread?

Service instantiation does not need to happen on the main thread.  That being said, a lot of our services do assert that they are created on the main thread, have single-threaded refcounting, and/or are used only on the main thread.

> 3) Does my plan for lockless access to the cookie hashtable seem sane?

I think so; all modifications to the cookie service hashtable will happen from a single thread after the initial load?  TSan will probably complain about the pattern, so that's a clear minus, but there are ways around that.
Flags: needinfo?(nfroyd)
> TSan will probably complain about the pattern

We could also probably just use a lock every time we enter the cookie service.  IIRC it's pretty cheap to grab an uncontested lock--an atomic operation and/or a memory flush is unlikely to cost us much given how infrequently we'll be doing this
Just to remind everybody that we care a lot about this getting fixed - here is a first-startup (ie. creating the profile) profile where nsCookieService::Init() blocks the main thread for 334ms when doing the first async XHR: https://perfht.ml/2sfMzSK
(In reply to Florian Quèze [:florian] [:flo] from comment #28)
> Just to remind everybody that we care a lot about this getting fixed - here
> is a first-startup (ie. creating the profile) profile where
> nsCookieService::Init() blocks the main thread for 334ms when doing the
> first async XHR: https://perfht.ml/2sfMzSK

Thanks for your reminder.

Nick and I had a long discussion about this several hours ago.
It could be a monster change, but it's ongoing.

We had a plan and will have an experiment result in one or two weeks.
Stay tuned.
After talking to Nfroyd and ehsan, it looks like the first place we should try to init the cookie DB is during the socket transport service init:

--

I guess we'll kick the init off when the socket transport starts up (seems fairly early), and see how it goes.  We'll be measuring how long the 1st HTTP request takes to get its cookies (ideally we want to make sure we init early enough that the answer will be 0).  Hopefully if we slow down other important bits of startup that'll get captured by some existing metric and people will come hassle us (and we can possibly move the cookie init later--we can certainly experiment with how late we can fire off init while still keeping 1st-cookie-wait to 0. We could also possibly switch to an async GetCookie API, which would still cause channels to be delayed till the cookies are loaded, but at least would stop janking the main thread when it that happens).
Florian:  re: comment 28. I looked at the gecko profiler report you posted--it turns out that this happens only when we need to create the tables for the database (i.e either 1st time profile, or rebuild of database).  Those are rare, so that's good.  I'd be surprised if Init for a regular, already created database would take that long.

OTOH the next part of the cookie logic may be expensive: after HttpBaseChannel::AddCookiesToRequest() grabs the cookie services, which is the :Init() timing you posted, it then calls GetCookieStringFromHttp(), which should also cause a fair amount of jank (it needs to block on disk I/O while we load the cookies from disk). In general I'd expect the loading cookies from disk part of a profile to actually take longer than the database init part of it.  (Not in this comment 28 case--with a new profile, all the work is in creating the SQlite tables, and there's no cookies in it to load).
I'm thinking if it's necessary to move _all_ the sqlite operation off-main-thread.
The jank is in start-up, especially with a new profile.

I'd rather than OMT the TryInitDB first.

What do you think, Nick?
Flags: needinfo?(hurley)
Yes, all sqlite ops should happen off main thread. However, we still need to move cookie service init earlier, and make sure we block any cookie requests until the db is initialized (moving init earlier will help ensure that is less likely to happen, and we are therefore less likely to jank the main thread). I was under the impression that this was included in what we had agreed on earlier, but perhaps I was mistaken or misremembering.
Flags: needinfo?(hurley)
Yes, we agreed that we should move all sqlite ops OMT.
The idea came from Comment 21 and Comment 23.

The thing is ops other than start/close don't jank much.
We could open two db connections, like current implementation.

Therefore, we can split to two milestones.
IMO it's a possible solution to solve the start-up jank in 57.

M1:
1. Move init as early as nsIOService::InitializeSocketTransportService
2. OMT the InitDBState to socket thread
3. Block main thread if we need something in hashtable and InitDBState has not done

M2:
1. Make all the sqlite ops use the same connection in |InitDBState|, so all ops are sync
2. OMT them

I'd like to do M1 first.
If the split does not make sense, lets land all the patches in one shot.
Flags: needinfo?(hurley)
FWIW, nsIOService::Init is not the place to init cookie service.

It will crash in
http://searchfox.org/mozilla-central/rev/bbc1c59e460a27b20929b56489e2e55438de81fa/netwerk/cookie/nsCookieService.cpp#739

No matter we're running in main thread or socket thread.

1) Main thread would hit |nsScriptSecurityManager::Init()| and go to get IOService. bang!

  * frame #0: 0x00000001024b6872 XUL`nsScriptSecurityManager::Init() [inlined] nsresult CallGetService<nsIIOService>(aContractID=<unavailable>, aDestination=<unavailable>) at nsServiceManagerUtils.h:89 [opt]
    frame #1: 0x00000001024b6872 XUL`nsScriptSecurityManager::Init(this=0x00000001006ee2e0) at nsScriptSecurityManager.cpp:1354 [opt]
    frame #2: 0x00000001024b6c82 XUL`nsScriptSecurityManager::InitStatics() at nsScriptSecurityManager.cpp:1428 [opt]

2) Socket thread might say we're not ready yet.
http://searchfox.org/mozilla-central/rev/bbc1c59e460a27b20929b56489e2e55438de81fa/xpcom/base/nsCycleCollector.cpp#3967
I did a NS_DispatchToCurrentThread trick in IOService::Init
It's better with some random crash on db read/first httpChannel::AsyncOpen2
(In reply to Junior[:junior] from comment #34)
> Yes, we agreed that we should move all sqlite ops OMT.
> The idea came from Comment 21 and Comment 23.
> 
> The thing is ops other than start/close don't jank much.
> We could open two db connections, like current implementation.
> 
> Therefore, we can split to two milestones.
> IMO it's a possible solution to solve the start-up jank in 57.
> 
> M1:
> 1. Move init as early as nsIOService::InitializeSocketTransportService
> 2. OMT the InitDBState to socket thread
> 3. Block main thread if we need something in hashtable and InitDBState has
> not done
> 
> M2:
> 1. Make all the sqlite ops use the same connection in |InitDBState|, so all
> ops are sync
> 2. OMT them
> 
> I'd like to do M1 first.
> If the split does not make sense, lets land all the patches in one shot.

My concern here is that having this bifurcated solution will take approximately the same amount of time as OMT-ing everything, so we split the work for no time savings (in terms of when it gets landed). There's also a concern about things more easily getting out of sync using multiple connections.

That said, I'll trust your judgement on timing and consistency, as you're the one writing the code, and you know better than I what kind of work this is going to take. The above are just a couple things I want to make sure we're taking into account when we take the decision here.
Flags: needinfo?(hurley)
> M1:
> 1. Move init as early as nsIOService::InitializeSocketTransportService
> 2. OMT the InitDBState to socket thread

Don't run anything on the socket thread.  I think we'll want a separate thread that just does cookie I/O.  That's what I was thinking anyway.  It's possible we could use the file I/O thread pool instead.  But I think a dedicated thread is fine and would be simplest.

> 3. Block main thread if we need something in hashtable and InitDBState has not done
> 
> M2:
> 1. Make all the sqlite ops use the same connection in |InitDBState|, so all ops are sync
> 2. OMT them

I'm not sure I understand what's in M1 exactly.  If we're blocking main thread cookie calls until the background I/O thread finishes reading from SQLite, then what SQL would still be using the main thread SQLite connection(s)?
Flags: needinfo?(juhsu)
(In reply to Jason Duell [:jduell] (needinfo me) from comment #38)
> > M1:
> > 1. Move init as early as nsIOService::InitializeSocketTransportService
> > 2. OMT the InitDBState to socket thread
> 
> Don't run anything on the socket thread.  I think we'll want a separate
> thread that just does cookie I/O.  That's what I was thinking anyway.  It's
> possible we could use the file I/O thread pool instead.  But I think a
> dedicated thread is fine and would be simplest.
> 

OK, I'm confused previously since comment 30 discussed about socket thread.
Now I guess you're thinking of the Cookie I/O thread initial time.

By the way, comment 35 shows it's too early to initial as we initial socket thread.
InitXPCOM2 doesn't work either since the DirectoryService can not get file correctly.

I'm trying to hack XRE_mainRun later.

> > 3. Block main thread if we need something in hashtable and InitDBState has not done
> > 
> > M2:
> > 1. Make all the sqlite ops use the same connection in |InitDBState|, so all ops are sync
> > 2. OMT them
> 
> I'm not sure I understand what's in M1 exactly.  If we're blocking main
> thread cookie calls until the background I/O thread finishes reading from
> SQLite, then what SQL would still be using the main thread SQLite
> connection(s)?
For insert/update/remove,
I'd like to keep use mDBState->dbConn, which is an async connection.

No more reading from sqlite is needed, so only block the main thread.
Flags: needinfo?(juhsu)
> > I'm not sure I understand what's in M1 exactly.  If we're blocking main
> > thread cookie calls until the background I/O thread finishes reading from
> > SQLite, then what SQL would still be using the main thread SQLite
> > connection(s)?
> For insert/update/remove,
> I'd like to keep use mDBState->dbConn, which is an async connection.
> 
> No more reading from sqlite is needed, so only block the main thread.

1. Mm... async connection isn't cheap enough to ignore.
Sometimes SetCookieStringInternal takes several ms, even larger than InitDBState.
So, eventually we need to OMT all the db operation.

2. For the initialization of nsCookieService, 
I believe we should wait for "profile-do-change" or "profile-after-change"

http://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/netwerk/cookie/nsCookieService.cpp#1945-1962
https://developer.mozilla.org/en-US/docs/Observer_Notifications
One thing to be aware of, since the code is currently using async connections and async execution ("ExecuteAsync" calls)... minor enhancements to mozStorage are needed if you want to use ExecuteAsync and friends off the main thread.  There's at least a few places in async code where we may assume the issuing thread is the main thread.

The main take-aways are:
- If you move code invoking ExecuteAsync() from the main thread to another thread, things may break.
- It won't be very hard to fix them, but you will have to fix them.
(In reply to Andrew Sutherland [:asuth] from comment #41)
> One thing to be aware of, since the code is currently using async
> connections and async execution ("ExecuteAsync" calls)... minor enhancements
> to mozStorage are needed if you want to use ExecuteAsync and friends off the
> main thread.  There's at least a few places in async code where we may
> assume the issuing thread is the main thread.
> 
> The main take-aways are:
> - If you move code invoking ExecuteAsync() from the main thread to another
> thread, things may break.
> - It won't be very hard to fix them, but you will have to fix them.

Thanks for this notice.
If possible, we might merge all the connections into one sync connection.
Attached patch Part1_OMT_Init WIP1, v1 (obsolete) (deleted) β€” β€” Splinter Review
This is a WIP patch in a very early stage, showing the idea about M1

Before I make the DB read sync, my profiler shows 1 sec between profile-do-change and first GetCookieString (except some handling in AsyncReadComplete [1])

Reading DB is fast in my environment - less than 5 ms.

What do you think about the setup, Nick?

TODO:
0. Make InitDBState OMT, still thinking how many things we'd like to OMT
   Bug 1331680 Comment 244 has a plan to init a brand new Cookie thread,
   which can be leveraged
1. Block the main thread if db is not read complete
   in EnsureReadDomain/EnsureReadComplete
   Probe those data to see if we need to do more.
2. Move [1] to sync read
3. Handle mDefaultDBState->readSet carefully
4. Handle DB corruption

[1] http://searchfox.org/mozilla-central/rev/30a47c4339bd397b937abdb2305f99d3bb537ba6/netwerk/cookie/nsCookieService.cpp#2758-2772
Attachment #8893710 - Flags: feedback?(hurley)
Comment on attachment 8893710 [details] [diff] [review]
Part1_OMT_Init WIP1, v1

Review of attachment 8893710 [details] [diff] [review]:
-----------------------------------------------------------------

First of all, sorry for the lag here. I'll do better next time (unless you happen to ask for more feedback while I'm gone the rest of this week!)

Anyway, this looks like it's headed in the right direction. I do have a few concerns, though. Most of which are inline below. The big thing is that this is still pretty bare in terms of what's changed, so it's hard to get a real good sense of what the end solution looks like. I've extrapolated as best as I can for my comments.

::: netwerk/base/nsIOService.cpp
@@ +1479,5 @@
>              nsCOMPtr<nsIPrefBranch> prefBranch;
>              GetPrefBranch(getter_AddRefs(prefBranch));
>              PrefsChanged(prefBranch, MANAGE_OFFLINE_STATUS_PREF);
> +
> +            // Read cookie database          

This needs to explain *why* we're reading the cookie database here, preferably with a pointer to this bug for even more history. This is one of the biggest things gecko in general is missing in comments, and it's been a regular thorn in my side since I've started working here.

::: netwerk/cookie/nsCookieService.cpp
@@ +1444,5 @@
>    if (aRecreateDB)
>      return RESULT_OK;
>  
>    // check whether to import or just read in the db
> +  if (tableExists) {

I would expect this to actually dispatch something to the IO thread. Right now, this is racy - if someone happens to call do_GetService(cookieservice) on the main thread before the new init call you added happens, then we're back in "jank the main thread" territory.

@@ +2598,5 @@
> +
> +void
> +nsCookieService::BlockMainThreadIfUninitialized()
> +{
> +  //TODO

It's important to know how this works before we carry on much farther :)

@@ +2603,5 @@
> +}
> +
> +void
> +nsCookieService::SyncRead()
> +{

In addition to dispatching this to the IO thread (as mentioned above), this needs to assert that it's not running on the main thread as added insurance against the jank coming back accidentally in the future.
Attachment #8893710 - Flags: feedback?(hurley)
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #44)
> Comment on attachment 8893710 [details] [diff] [review]
> Part1_OMT_Init WIP1, v1
> 
> Review of attachment 8893710 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> First of all, sorry for the lag here. I'll do better next time (unless you
> happen to ask for more feedback while I'm gone the rest of this week!)
> 
> Anyway, this looks like it's headed in the right direction. I do have a few
> concerns, though. Most of which are inline below. The big thing is that this
> is still pretty bare in terms of what's changed, so it's hard to get a real
> good sense of what the end solution looks like. I've extrapolated as best as
> I can for my comments.

The patch is based on Milestone 1 in Comment 34.
I reuse the sync read in EnsureReadComplete, so the diff looks weird :p

I'll make a better patch when you come back! 
> 
> ::: netwerk/base/nsIOService.cpp
> @@ +1479,5 @@
> >              nsCOMPtr<nsIPrefBranch> prefBranch;
> >              GetPrefBranch(getter_AddRefs(prefBranch));
> >              PrefsChanged(prefBranch, MANAGE_OFFLINE_STATUS_PREF);
> > +
> > +            // Read cookie database          
> 
> This needs to explain *why* we're reading the cookie database here,
> preferably with a pointer to this bug for even more history. This is one of
> the biggest things gecko in general is missing in comments, and it's been a
> regular thorn in my side since I've started working here.
> 

It's a good suggestion. Thanks!
> ::: netwerk/cookie/nsCookieService.cpp
> @@ +1444,5 @@
> >    if (aRecreateDB)
> >      return RESULT_OK;
> >  
> >    // check whether to import or just read in the db
> > +  if (tableExists) {
> 
> I would expect this to actually dispatch something to the IO thread. Right
> now, this is racy - if someone happens to call do_GetService(cookieservice)
> on the main thread before the new init call you added happens, then we're
> back in "jank the main thread" territory.
> 
Plan A is to simply block main thread and add a telemetry to see if it's
severe. In my local testing, around 1 second between profile-do-change and the
first getCookieString, and reading db sync takes only <10 ms (unless first time
usage due to creating the table.)
> 2. Move [1] to sync read
> 3. Handle mDefaultDBState->readSet carefully
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> 30a47c4339bd397b937abdb2305f99d3bb537ba6/netwerk/cookie/nsCookieService.
> cpp#2758-2772

Since 
1) we abandon reading one domain in EnsureReadDomain (readSet),
2) we're not async reading anything (hostArray),
we need to remove readSet and hostArray.

Hence, no need to worry 2 and 3.
Attached patch Part1_OMT_Init WIP1, v2 (obsolete) (deleted) β€” β€” Splinter Review
The patch works functionally. Still need some detail change.

TODO:
1. I'm still thinking how to get the frequency we block the main thread.
2. HandleCorruptDB access the TryInitDB, which should be OMT
3. If I add a PR_Sleep after InitDBState, the gecko profiler will let
   the child process crash.
Attachment #8893710 - Attachment is obsolete: true
Attachment #8897376 - Flags: review?(hurley)
Comment on attachment 8897376 [details] [diff] [review]
Part1_OMT_Init WIP1, v2

Should set a feedback flag. Sorry for spam.
Attachment #8897376 - Flags: review?(hurley) → feedback?(hurley)
Comment on attachment 8897376 [details] [diff] [review]
Part1_OMT_Init WIP1, v2

Review of attachment 8897376 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking good so far. It will be interesting to see what kind of telemetry we get out of this.

::: netwerk/base/nsIOService.cpp
@@ +1479,5 @@
>              nsCOMPtr<nsIPrefBranch> prefBranch;
>              GetPrefBranch(getter_AddRefs(prefBranch));
>              PrefsChanged(prefBranch, MANAGE_OFFLINE_STATUS_PREF);
> +
> +            // Bug 970460 - Read cookie database at an early-as-possible time

nit: this is *8*70460, not 9... :)

::: netwerk/cookie/nsCookieService.cpp
@@ +892,5 @@
>    mDefaultDBState->removeListener = new RemoveCookieDBListener(mDefaultDBState);
>    mDefaultDBState->closeListener = new CloseCookieDBListener(mDefaultDBState);
>  
>    // Grow cookie db in 512KB increments
> +  mDefaultDBState->syncConn->SetGrowthIncrement(512 * 1024, EmptyCString());

We need to be sure this isn't a connection-specific setting before removing the call on dbConn, as well.

@@ +2606,5 @@
> +    while (!mInitialized) {
> +      mMonitor.Wait();
> +    }
> +    // TODO: Need to send a telemetry
> +    // int64_t((TimeStamp::Now() - startBlockTime).ToMilliseconds())

Use Telemetry::AccumulateTimeDelta(<key>, startBlockTime);

Also, we should either (1) accumulate a 0 if we never end up having to wait, or (2) have a separate telemetry key that tells whether or not we waited. Otherwise, we'll not get a sense of the ratio of times we have to wait to the times we never have to wait.
Attachment #8897376 - Flags: feedback?(hurley) → feedback+
(In reply to Junior[:junior] from comment #47)
> Created attachment 8897376 [details] [diff] [review]
> Part1_OMT_Init WIP1, v2
> 
> The patch works functionally. Still need some detail change.
> 
> TODO:
> 1. I'm still thinking how to get the frequency we block the main thread.

I think that option (1) above in my feedback comment is the right way here. You just need to make sure you only accumulate that once per session (so have a bool mAccumulatedWaitTelemetry or something, set that to true after accumulating, and only accumulate if it's false)

> 2. HandleCorruptDB access the TryInitDB, which should be OMT

Yes, this is important. We'll probably want to have separate telemetry for when we wait on this vs. when we wait on the initial read.

> 3. If I add a PR_Sleep after InitDBState, the gecko profiler will let
>    the child process crash.

That's... odd.
BlockMainThread for those in nsICookieManager and nsICookieService to see what happen
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d2a12e2e4e0e6fa1b4e087d9eff2c8d6c1470c65
> ::: netwerk/cookie/nsCookieService.cpp
> @@ +892,5 @@
> >    mDefaultDBState->removeListener = new RemoveCookieDBListener(mDefaultDBState);
> >    mDefaultDBState->closeListener = new CloseCookieDBListener(mDefaultDBState);
> >  
> >    // Grow cookie db in 512KB increments
> > +  mDefaultDBState->syncConn->SetGrowthIncrement(512 * 1024, EmptyCString());
> 
> We need to be sure this isn't a connection-specific setting before removing
> the call on dbConn, as well.
> 

Hello Marco,
Sorry for a quick question.
Do you know if |SetGrowthIncrement| is connection-specific?
I guess it's sqlite file-based [1], but I'm not very sure.

http://searchfox.org/mozilla-central/rev/5696c3e525fc8222674eed6a562f5fcbe804c4c7/storage/mozIStorageConnection.idl#229-243
Flags: needinfo?(mak77)
it's an option set at runtime on the vfs underlying a connection, and thus imo it is connection specific. In general it doesn't matter much because you usually have only 1 r/w connection and multiple read-only connections (since with WAL you can have concurrent readers, but you can never have concurrent writers, the writes would be serialized in both cases).
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] from comment #53)
> it's an option set at runtime on the vfs underlying a connection, and thus
> imo it is connection specific. In general it doesn't matter much because you
> usually have only 1 r/w connection and multiple read-only connections (since
> with WAL you can have concurrent readers, but you can never have concurrent
> writers, the writes would be serialized in both cases).

Thanks.
Therefore it seems great to move the SetGrowthIncrement to the r/w connection.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c00693ddeded52878f1949b9e1ffd0d72d565172&selectedJob=125826483
We have two connections: syncConn(sync, cookie thread), dbConn(async, main thread)
Every connection is restricted to one thread.

Hence, the flow will be:

-- Main Thread --
profile-do-change
InitDBStates
NS_GetSpecialDirectory <== have better in main thread since test mocked DirectorService with js code

mThread->Dispatch

-- Cookie Thread --
mMonitor lock
mInitializedDBStates = false
TryInitDB() <== start to do DB operations
  ReadAheadFile(mDefaultDBState->cookieFile)
  OpenUnsharedDatabase(syncConn)
  Read()
mInitializedDBStates = true
mMonitor release lock
mMonitor.Notify

DispatchToMainThread

-- Main Thread -- 
InitDBConn
  OpenUnsharedDatabase(dbConn) <== should be after readAheadFile and on main thread
Attached patch CookieDBStartupOMT - WIP, v3 (obsolete) (deleted) β€” β€” Splinter Review
Still suffered from test-failures, most of them operate CookieService before cookie-db-read
Attachment #8897376 - Attachment is obsolete: true
Short story about disable this test case:
http://searchfox.org/mozilla-central/rev/999385a5e8c2d360cc37286882508075fc2078bd/extensions/cookie/test/unit/test_cookies_async_failure.js#210-276

reading cookies from a host (EnsureReadDomain) works from a corrupt DB in the first time
Hence this test pass.

We change our architecture to read whole db in the start-up, so lots change in this test
Hello Eric,
We'd like to off-main-thread the cookie db start-up read.
However, just found a main thread assertion in the TLDService
http://searchfox.org/mozilla-central/rev/999385a5e8c2d360cc37286882508075fc2078bd/netwerk/dns/nsEffectiveTLDService.cpp#342

Any reasons for this constraint?

Moreover, we really seldom use TLD service off-main-thread 
(<10000 users in a year encounter this I guess)

http://searchfox.org/mozilla-central/rev/999385a5e8c2d360cc37286882508075fc2078bd/netwerk/cookie/nsCookieService.cpp#1035

Any workable (maybe not so fast) solution?

Thanks!
Flags: needinfo?(erahm)
Attached patch CookieDBStartupOMT - WIP, v4 (obsolete) (deleted) β€” β€” Splinter Review
I feel it's a little fragile with new architecture
If something we don't do good, main thread might be blocked forever.

TODO: TLDService main thread issue.
Attachment #8902629 - Attachment is obsolete: true
Attachment #8904510 - Flags: feedback?(hurley)
(In reply to Junior[:junior] from comment #59)
> Hello Eric,
> We'd like to off-main-thread the cookie db start-up read.
> However, just found a main thread assertion in the TLDService
> http://searchfox.org/mozilla-central/rev/
> 999385a5e8c2d360cc37286882508075fc2078bd/netwerk/dns/nsEffectiveTLDService.
> cpp#342
> 
> Any reasons for this constraint?

Going to look at the patch soon, but really, a quick look at the code shows access to a data structure that isn't threadsafe. As such, it either needs locking, or single-threaded access. Whenever this was written, it looks like single-threaded access was assumed. Seems at first glance like we'll need to make the caching in the etld service threadsafe with a lock... (and perhaps the TLDCacheEntry struct, as well? Needs a deeper look than I can give right now.)
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #61)
> (In reply to Junior[:junior] from comment #59)
> > Hello Eric,
> > We'd like to off-main-thread the cookie db start-up read.
> > However, just found a main thread assertion in the TLDService
> > http://searchfox.org/mozilla-central/rev/
> > 999385a5e8c2d360cc37286882508075fc2078bd/netwerk/dns/nsEffectiveTLDService.
> > cpp#342
> > 
> > Any reasons for this constraint?
> 
> Going to look at the patch soon, but really, a quick look at the code shows
> access to a data structure that isn't threadsafe. As such, it either needs
> locking, or single-threaded access. Whenever this was written, it looks like
> single-threaded access was assumed. Seems at first glance like we'll need to
> make the caching in the etld service threadsafe with a lock... (and perhaps
> the TLDCacheEntry struct, as well? Needs a deeper look than I can give right
> now.)

Yes the problem is the outer cache is not thread-safe. We could make it thread-safe and reasonably efficient by using a read-write lock [1]. Please file a blocking bug and I can implement this (or you can do it yourself and I'm happy to review).

[1] http://searchfox.org/mozilla-central/source/xpcom/threads/RWLock.h
Flags: needinfo?(erahm)
Depends on: 1396958
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #62)
> (In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org)
> from comment #61)
> > (In reply to Junior[:junior] from comment #59)
> > > Hello Eric,
> > > We'd like to off-main-thread the cookie db start-up read.
> > > However, just found a main thread assertion in the TLDService
> > > http://searchfox.org/mozilla-central/rev/
> > > 999385a5e8c2d360cc37286882508075fc2078bd/netwerk/dns/nsEffectiveTLDService.
> > > cpp#342
> > > 
> > > Any reasons for this constraint?
> > 
> > Going to look at the patch soon, but really, a quick look at the code shows
> > access to a data structure that isn't threadsafe. As such, it either needs
> > locking, or single-threaded access. Whenever this was written, it looks like
> > single-threaded access was assumed. Seems at first glance like we'll need to
> > make the caching in the etld service threadsafe with a lock... (and perhaps
> > the TLDCacheEntry struct, as well? Needs a deeper look than I can give right
> > now.)
> 
> Yes the problem is the outer cache is not thread-safe. We could make it
> thread-safe and reasonably efficient by using a read-write lock [1]. Please
> file a blocking bug and I can implement this (or you can do it yourself and
> I'm happy to review).
> 
> [1] http://searchfox.org/mozilla-central/source/xpcom/threads/RWLock.h

I'll take care of this today.
Comment on attachment 8904510 [details] [diff] [review]
CookieDBStartupOMT - WIP, v4

Review of attachment 8904510 [details] [diff] [review]:
-----------------------------------------------------------------

Looking pretty good! Looks like the etld service issue is about to be fixed, too.

I am worried though... why are you concerned that we might block the main thread forever?

::: extensions/cookie/test/unit/xpcshell.ini
@@ +29,5 @@
>  [test_permmanager_idn.js]
>  [test_permmanager_subdomains.js]
>  [test_permmanager_local_files.js]
>  [test_permmanager_cleardata.js]
> +#[test_schema_2_migration.js]

Why is this commented out?

::: toolkit/components/telemetry/Histograms.json
@@ +4407,5 @@
> +  "MOZ_SQLITE_COOKIES_BLOCK_MAIN_THREAD_MS": {
> +    "record_in_processes": ["main"],
> +    "expires_in_version": "never",
> +    "kind": "exponential",
> +    "alert_emails": ["necko@mozilla.com", "bsmedberg@mozilla.com"],

bsmedberg@mozilla is no longer a valid address. Just use necko@mozilla.com
Attachment #8904510 - Flags: feedback?(hurley)
> 
> Looking pretty good! Looks like the etld service issue is about to be fixed,
> too.
> 
> I am worried though... why are you concerned that we might block the main
> thread forever?
> 
When I wrote the patch, mMonitor.Wait() is not notified for many cases I was
aware by test case. 

The flow control is complicated.
We have mInitializedDBStates and mInitializaedDBConn for controlling the status.
Also have (!mDBStates) to determine if the DBStates is closed or not.
I'm afraid if there's some corner flaw I'm not aware.

> ::: extensions/cookie/test/unit/xpcshell.ini
> @@ +29,5 @@
> >  [test_permmanager_idn.js]
> >  [test_permmanager_subdomains.js]
> >  [test_permmanager_local_files.js]
> >  [test_permmanager_cleardata.js]
> > +#[test_schema_2_migration.js]
> 
> Why is this commented out?
> 
Since it's blocked by bug 1396958.
Will remove it after it lands.

> ::: toolkit/components/telemetry/Histograms.json
> @@ +4407,5 @@
> > +  "MOZ_SQLITE_COOKIES_BLOCK_MAIN_THREAD_MS": {
> > +    "record_in_processes": ["main"],
> > +    "expires_in_version": "never",
> > +    "kind": "exponential",
> > +    "alert_emails": ["necko@mozilla.com", "bsmedberg@mozilla.com"],
> 
> bsmedberg@mozilla is no longer a valid address. Just use necko@mozilla.com
Thanks for your remind.
Do you have a suggested candidate for data review?
Flags: needinfo?(hurley)
I'm thinking we could ignore the old cookie format.
Simply create a new database for them, since the cookies should be stale.

It could save some CPU times and code complexity

For example, we migrate from .txt to .sqlite 9 years ago but we still able to import .txt
Everytime we create a new table causing this check.

bug: https://bugzilla.mozilla.org/show_bug.cgi?id=230933
changeset: https://github.com/mozilla/gecko-dev/commit/5493e24575253cdf46b42878d33d1262e4f82eae
back-compat code http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/netwerk/cookie/nsCookieService.cpp#1518-1531

Another example is a migration from schema 1 to 2. 
bug: https://bugzilla.mozilla.org/show_bug.cgi?id=599799
In each startup, we're doing a no-op for >99% case to delete a non-exist null baseDomain record.
http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/netwerk/cookie/nsCookieService.cpp#2682-2685
Attached patch CookieDBStartupOMT, v5 (obsolete) (deleted) β€” β€” Splinter Review
Hello :francois,
We off-main-thread the startup cookie read.
If anyone access to the cookie during the OMT read, we simply block
main thread (it's should be rare)

Here we put a probe to see if it's rare or not.
Attachment #8904510 - Attachment is obsolete: true
Flags: needinfo?(hurley)
Attachment #8904905 - Flags: review?(hurley)
Attachment #8904905 - Flags: feedback?(francois)
Comment on attachment 8904905 [details] [diff] [review]
CookieDBStartupOMT, v5

Review of attachment 8904905 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +4407,5 @@
> +  "MOZ_SQLITE_COOKIES_BLOCK_MAIN_THREAD_MS": {
> +    "record_in_processes": ["main"],
> +    "expires_in_version": "never",
> +    "kind": "exponential",
> +    "alert_emails": ["necko@mozilla.com", "francois@mozilla.com"],

I don't want these alerts :)

I think you meant to use your own email address here.
Attachment #8904905 - Flags: feedback?(francois) → feedback-
> 
> I don't want these alerts :)
> 
> I think you meant to use your own email address here.

Will remove in the next patch :)
Comment on attachment 8904905 [details] [diff] [review]
CookieDBStartupOMT, v5

Review of attachment 8904905 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but I do have one worry. See below.

::: netwerk/cookie/nsCookie.h
@@ +30,5 @@
>  class nsCookie : public nsICookie2
>  {
>    public:
>      // nsISupports
> +    NS_DECL_THREADSAFE_ISUPPORTS

Do we know this is safe? Generally when we've made something non-threadsafe (especially something this old), it's for a purpose. Based on a quick look, I'm not so sure that all the uses of the members are threadsafe.

::: netwerk/cookie/nsCookieService.cpp
@@ +1383,5 @@
>    }
>  
> +  // if we deleted a corrupt db, don't attempt to import - return now
> +  if (aRecreateDB)
> +    return RESULT_OK;

braces around body, please.

@@ +1401,5 @@
> +    );
> +    return RESULT_OK;
> +  }
> +
> +  nsCOMPtr<nsIRunnable> runnable = 

nit: trailing whitespace on this line.

@@ +1406,5 @@
> +    NS_NewRunnableFunction("TryInitDB.ImportCookies", [self] {
> +      nsCOMPtr<nsIFile> oldCookieFile;
> +      nsresult rv = NS_GetSpecialDirectory(NS_APP_USER_PROFILE_50_DIR,
> +        getter_AddRefs(oldCookieFile));
> +      if (NS_FAILED(rv)) return;

braces around body, and body on its own line, please.

::: toolkit/components/telemetry/Histograms.json
@@ +4407,5 @@
> +  "MOZ_SQLITE_COOKIES_BLOCK_MAIN_THREAD_MS": {
> +    "record_in_processes": ["main"],
> +    "expires_in_version": "never",
> +    "kind": "exponential",
> +    "alert_emails": ["necko@mozilla.com", "francois@mozilla.com"],

Either what Francois said, or just leave it as necko@mozilla.com without emails direct to you.
Attachment #8904905 - Flags: review?(hurley)
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #70)
> Either what Francois said, or just leave it as necko@mozilla.com without
> emails direct to you.

Having necko in there is fine, but we also need the email of a person responsible for this probe. Most often an engineer, sometimes a PM.
> ::: netwerk/cookie/nsCookie.h
> @@ +30,5 @@
> >  class nsCookie : public nsICookie2
> >  {
> >    public:
> >      // nsISupports
> > +    NS_DECL_THREADSAFE_ISUPPORTS
> 
> Do we know this is safe? Generally when we've made something non-threadsafe
> (especially something this old), it's for a purpose. Based on a quick look,
> I'm not so sure that all the uses of the members are threadsafe.
> 

Thanks for catching this.

With respect to the usage of nsCookie, two things happen to cookie thread:
1) InitDBStates, protect by mMonitor and mInitializedDBStates.
2) RebuildCorruptDB, only "read" the hash table

RebuildCurruptDB is a little dangerous since there's a little chance that
someone writing cookie when we're rebuilding.

Rebuilding is rare, so mutex doesn't make sense.

I'm thinking a deep-copy of the cookie hash table on the main thread and pass
it to rebuild. What do you think?
Flags: needinfo?(hurley)
(In reply to Junior[:junior] from comment #72)
> > ::: netwerk/cookie/nsCookie.h
> > @@ +30,5 @@
> > >  class nsCookie : public nsICookie2
> > >  {
> > >    public:
> > >      // nsISupports
> > > +    NS_DECL_THREADSAFE_ISUPPORTS
> > 
> > Do we know this is safe? Generally when we've made something non-threadsafe
> > (especially something this old), it's for a purpose. Based on a quick look,
> > I'm not so sure that all the uses of the members are threadsafe.
> > 
> 
> Thanks for catching this.
> 
> With respect to the usage of nsCookie, two things happen to cookie thread:
> 1) InitDBStates, protect by mMonitor and mInitializedDBStates.
> 2) RebuildCorruptDB, only "read" the hash table
> 
> RebuildCurruptDB is a little dangerous since there's a little chance that
> someone writing cookie when we're rebuilding.
> 
> Rebuilding is rare, so mutex doesn't make sense.
> 
> I'm thinking a deep-copy of the cookie hash table on the main thread and pass
> it to rebuild. What do you think?

I found that we don't do many things for rebuilding in cookie thread.
|TryInitDB(false)| recreates a brand-new db without touching hosttable, which should be taken care.

And read the hosttable in the main thread.
Flags: needinfo?(hurley)
Attached patch CookieDBStartupOMT, v6 (obsolete) (deleted) β€” β€” Splinter Review
Attachment #8904905 - Attachment is obsolete: true
Attachment #8906509 - Flags: review?(hurley)
Comment on attachment 8906509 [details] [diff] [review]
CookieDBStartupOMT, v6

Review of attachment 8906509 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cookie/nsCookie.h
@@ +30,5 @@
>  class nsCookie : public nsICookie2
>  {
>    public:
>      // nsISupports
> +    NS_DECL_THREADSAFE_ISUPPORTS

This is still not safe. While you may have ensured the uses of it *in nsCookieService* are now threadsafe (I haven't looked that far in the patch), there are other consumers of this interface that could do something wrong. If you are making the refcounting threadsafe, you need to ensure its other data members are threadsafe too, otherwise you're just introducing a footgun that's basically guaranteed to go off at some point, likely resulting in a chemspill.
Attachment #8906509 - Flags: review?(hurley) → review-
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #75)
> Comment on attachment 8906509 [details] [diff] [review]
> CookieDBStartupOMT, v6
> 
> Review of attachment 8906509 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/cookie/nsCookie.h
> @@ +30,5 @@
> >  class nsCookie : public nsICookie2
> >  {
> >    public:
> >      // nsISupports
> > +    NS_DECL_THREADSAFE_ISUPPORTS
> 
> This is still not safe. While you may have ensured the uses of it *in
> nsCookieService* are now threadsafe (I haven't looked that far in the
> patch), there are other consumers of this interface that could do something
> wrong. If you are making the refcounting threadsafe, you need to ensure its
> other data members are threadsafe too, otherwise you're just introducing a
> footgun that's basically guaranteed to go off at some point, likely
> resulting in a chemspill.

Let's try this:
After we finish database reads, we pass the "keys" (parameters of nsCookie::Create)
to main thread, and let main thread convert those to nsCookie.
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Changing to P2 as this will not be landed in 57, and we would like to land it in the next iteration.
Priority: P1 → P2
Whiteboard: [qf:p1][necko-quantum] [necko-active] → [qf:p1][necko-quantum]
Based on #comment78, I am moving this from qf:p1 to qf:p2.
Whiteboard: [qf:p1][necko-quantum] → [qf:p2][necko-quantum]
Attached patch CookieDBStartupOMT, v7 (obsolete) (deleted) β€” β€” Splinter Review
Introduce a ConstCookie for threadsafe nsCookie generator. And use UniquePtr to avoid deep copy.
Attachment #8906509 - Attachment is obsolete: true
Attachment #8910694 - Flags: review?(hurley)
Comment on attachment 8910694 [details] [diff] [review]
CookieDBStartupOMT, v7

Review of attachment 8910694 [details] [diff] [review]:
-----------------------------------------------------------------

Nice progress, but one other thing I noticed this time around.

::: netwerk/cookie/nsCookieService.h
@@ +152,5 @@
>    {
>    }
>  
>  public:
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(DBState)

I just went back and looked and saw this was in the previous version, too. Sorry I didn't catch it that time. Anyway, this is yet another case of making something threadsafe without ensuring its members (specifically the hash table) are not accessed in a manner inconsistent with that assumption.
Attachment #8910694 - Flags: review?(hurley)
Attached patch CookieDBStartupOMT, v8 (obsolete) (deleted) β€” β€” Splinter Review
Use gCookieService instead of adding refcount
Attachment #8910694 - Attachment is obsolete: true
Attachment #8911686 - Flags: review?(hurley)
Comment on attachment 8911686 [details] [diff] [review]
CookieDBStartupOMT, v8

Review of attachment 8911686 [details] [diff] [review]:
-----------------------------------------------------------------

I'm still not convinced here. Now, you're trading refcounting on separate threads (which actually might be the right idea, you just need to make sure it's used safely) for using (effectively) a raw pointer on multiple threads, without even null-checking it. Again, it may be the case that this is safe in practice, as the code is written, but there's no guarantees of that in the code. That's a recipe for crashes and/or data races that we don't want.

Since you're going to be accessing the cookie service on multiple threads, you need to have some sort of protection around the things that are accessed on multiple threads (mDBState comes to mind, at least, though there are almost certainly more).
Attachment #8911686 - Flags: review?(hurley) → review-
Attached patch CookieDBStartupOMT, v9 (obsolete) (deleted) β€” β€” Splinter Review
Add some guards.

Suffer from a treeherder only fail now.
Might be a test problem, however stderr doesn't work in RunnableFunction lambda . Finding a way to debug
Attachment #8911686 - Attachment is obsolete: true
Attachment #8912181 - Flags: review?(hurley)
> Might be a test problem, however stderr doesn't work in RunnableFunction
> lambda . Finding a way to debug
Neither an implementation of nsIRunnable
(In reply to Junior[:junior] from comment #85)
> > Might be a test problem, however stderr doesn't work in RunnableFunction
> > lambda . Finding a way to debug
> Neither an implementation of nsIRunnable

stderr works in treeherder, which is good
Attached patch Part1: CookieDBStartupOMT, v10 (obsolete) (deleted) β€” β€” Splinter Review
Ensure the task is sent after we set the mInitializedDBStates.
Also fix potential-intermittent tests.

Rebase from Bug 1286858.
Attachment #8912181 - Attachment is obsolete: true
Attachment #8912181 - Flags: review?(hurley)
Attachment #8912685 - Flags: review?(hurley)
Attached patch Part2: close syncConn for edge cases, v1 (obsolete) (deleted) β€” β€” Splinter Review
Note that syncConn is able to accessed by cookie thread, same as RELEASE refcount. Hence we remove the null-out in main thread.
Attachment #8913110 - Flags: review?(hurley)
Attachment #8912685 - Attachment description: CookieDBStartupOMT, v10 → Part1: CookieDBStartupOMT, v10
Comment on attachment 8912685 [details] [diff] [review]
Part1: CookieDBStartupOMT, v10

Review of attachment 8912685 [details] [diff] [review]:
-----------------------------------------------------------------

So... I think this is probably ok, the more I look at it (I wasn't convinced at first). But I want to get a second set of eyes on this, as well, to make sure I'm not just going wonky-eyed from looking at this patch for too long :)

So, r+ from me, but let's ask someone else who's familiar with the code. jdm, do you have time to take a look?
Attachment #8912685 - Flags: review?(josh)
Attachment #8912685 - Flags: review?(hurley)
Attachment #8912685 - Flags: review+
Attachment #8913110 - Flags: review?(hurley) → review+
Whiteboard: [qf:p2][necko-quantum] → [qf:p2][necko-quantum][necko-triaged]
Keywords: perf
Comment on attachment 8912685 [details] [diff] [review]
Part1: CookieDBStartupOMT, v10

Review of attachment 8912685 [details] [diff] [review]:
-----------------------------------------------------------------

This is a clever solution, and I can't think of any way to make how it all fits together more clear.

::: netwerk/cookie/nsCookieService.cpp
@@ +643,5 @@
> +  nsCOMPtr<nsIThread> thread;
> +  rv = NS_NewNamedThread("Cookie", getter_AddRefs(thread));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  mThread = thread;

Can we use `NS_NewNamedThread("Cookie", getter_AddRefs(mThread))` instead?
Attachment #8912685 - Flags: review?(josh) → review+
Attached patch Part1: CookieDBStartupOMT, v11 (deleted) β€” β€” Splinter Review
Need a data review.
Comment 67 had the description of the telemetry usage.
Attachment #8912685 - Attachment is obsolete: true
Attachment #8916517 - Flags: review+
Attachment #8916517 - Flags: feedback?(francois)
Attached patch Part2: close syncConn for edge cases, v2 (deleted) β€” β€” Splinter Review
rebase, carry r+
Attachment #8913110 - Attachment is obsolete: true
Attachment #8916518 - Flags: review+
Comment on attachment 8916517 [details] [diff] [review]
Part1: CookieDBStartupOMT, v11

Review of attachment 8916517 [details] [diff] [review]:
-----------------------------------------------------------------

This is Category 1 data. datareview+
Attachment #8916517 - Flags: feedback?(francois) → feedback+
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a05a8fceb4962b38cb915eb2c5f6c00671cf1ca8
Keywords: checkin-needed
I had some experiment at reference device (Acer).

Results:
a. Performance Improvement
   Normal case: 6.8ms [1] -> 3.2ms [2]
   Corrupted case: 245ms [3] -> 24ms [4]

b. Main thread blocking problem
   We finished the init sequence at 0.5-0.6 sec, and
   the first access of cookieService (getCookieStringInternal) is at 1.3s
   
Explanations:
We have two connections for cookie db access.
We OMT the sync startup one, so we have around 50% off for normal case.

OTOH corrupted DB janks heavily (comment 28) and brings this patch into full play.
We OMT 363ms to cookie thread in another experiment [4] (I remove the cookie.sqlite manually)
Still cost 24ms on main thread to import cookie.txt, which I tent to remove (comment 66).
Again, 0.7s between initialization and the first getCookieStringInternal here, too.

[1] https://perfht.ml/2xYIgfb
[2] https://perfht.ml/2xYQSlS
[3] https://perfht.ml/2xYj48j
[4] https://perfht.ml/2xZOlba
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5aa7980268cb
Part 1: Let cookie db startup-read off-main-thread. r=nwgh, r=jdm, data-r=francois
https://hg.mozilla.org/integration/mozilla-inbound/rev/50368fa3e040
Part 2: Close syncconn for edge cases. r=nwgh
Keywords: checkin-needed
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/ede558a89106 for xperf failures, https://treeherder.mozilla.org/logviewer.html#?job_id=136291092&repo=mozilla-inbound
Attached patch Part3: talos-whitelist, v1 (deleted) β€” β€” Splinter Review
Hello :jmaher,
Previously we use a lazy load of cookie service, so startup
won't have any IO on cookies.sqlite and cookies.sqlite-shm.

Here we intentionally load the cookie service at startup and
make db loading off-main-thread to earn some main thread
performance win. comment 21 for a more detailed description.

Hence, we should put the db files into xperf whitelist.
Attachment #8917667 - Flags: review?(jmaher)
Comment on attachment 8917667 [details] [diff] [review]
Part3: talos-whitelist, v1

Review of attachment 8917667 [details] [diff] [review]:
-----------------------------------------------------------------

thanks for updating this.  As a note, did you test on pgo?  Often we get different file access patterns on PGO vs OPT.
Attachment #8917667 - Flags: review?(jmaher) → review+
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #99)
> Comment on attachment 8917667 [details] [diff] [review]
> Part3: talos-whitelist, v1
> 
> Review of attachment 8917667 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> thanks for updating this.  As a note, did you test on pgo?  Often we get
> different file access patterns on PGO vs OPT.

Thanks for reviewing and reminding the PGO.
Let's wait for the result.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=79d30898f16f15ef5e4f2a5f50209ca6fd10e8d6&group_state=expanded&filter-searchStr=talos
Comment 100 looks good
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe37f3401424
Part 1: Let cookie db startup-read off-main-thread. r=nwgh, r=jdm, data-r=francois
https://hg.mozilla.org/integration/mozilla-inbound/rev/d76e2193be25
Part 2: Close syncconn for edge cases. r=nwgh
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d316717a69a
Part 3: Add cookies.sqlite{|-shm} to xperf whitelist. r=jmaher
Keywords: checkin-needed
When this bug landed, these performance regressions were spotted. They are serious, so please take them into account.

== Change summary for alert #9984 (as of October 12 2017 14:56 UTC) ==

Regressions:

 19%  tp5n nonmain_startup_fileio windows7-32 pgo e10s     2,021,375.67 -> 2,404,610.00
  6%  tp5n nonmain_startup_fileio windows7-32 opt e10s     2,034,222.38 -> 2,149,956.09

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9984
The backout canceled the regressions:

== Change summary for alert #9991 (as of October 12 2017 18:32 UTC) ==

Improvements:

 16%  tp5n nonmain_startup_fileio windows7-32 pgo e10s     2,402,032.33 -> 2,021,094.25
  1%  tp5n nonmain_startup_fileio windows7-32 opt e10s     2,080,908.83 -> 2,050,527.25

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9991
(In reply to IonuΘ› Goldan [:igoldan], Performance Sheriffing from comment #106)
> The backout canceled the regressions:
> 
> == Change summary for alert #9991 (as of October 12 2017 18:32 UTC) ==
> 
> Improvements:
> 
>  16%  tp5n nonmain_startup_fileio windows7-32 pgo e10s     2,402,032.33 ->
> 2,021,094.25
>   1%  tp5n nonmain_startup_fileio windows7-32 opt e10s     2,080,908.83 ->
> 2,050,527.25
> 
> For up to date results, see:
> https://treeherder.mozilla.org/perf.html#/alerts?id=9991

Thanks for your notice. 
As comment 98 said, we move the io earlier.
(Before this patch, we load the cookie file when we access the cookie first time.
Usually we open browser for browsing a web, hence the io is almost necessary.)
Flags: needinfo?(juhsu)
Depends on: 1401883
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #103)
> Backed out for frequently failing chrome tests on Windows 10:
> 
> Push with failures:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&revision=5d316717a69a8bbc6892ac97c415d79c463d2673&filter-
> resultStatus=testfailed&filter-resultStatus=busted&filter-
> resultStatus=exception&filter-resultStatus=retry&filter-
> resultStatus=usercancel
> Failure log example, please also check other failures:
> https://treeherder.mozilla.org/logviewer.html#?job_id=136543672&repo=mozilla-
> inbound

I've re-triggered other failures and taken a look.
Other failures all passed and I think they are un-related to cookie.

On the other hand, 
I have no idea about the Windows 10 leakages.

Per bug 1401883 comment 5, nsBaseAppShell is not released for some reason.
And my patch unveils this leakage frequently.

I have some random try on treeherder and wait for the result now.

> > TEST-UNEXPECTED-FAIL | leakcheck | default process: 176 bytes leaked (Mutex, nsBaseAppShell)
> 
> The issue could also be observed on the previous landing:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-
> resultStatus=testfailed&filter-resultStatus=busted&filter-
> resultStatus=exception&tochange=ede558a891066160cef9ec410526b64e33537c22&from
> change=0bf9f589e2428cd6fdd49fea0c93632870ec2cb4
Should be a thread life-cycle issue. 
It works to replace nsIThread to StreamTransportService.
Attached patch Part4: threadLifecycle - v1 (obsolete) (deleted) β€” β€” Splinter Review
I believe changing threading life-cycle should work.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6963a183d2032a456d6f58f031251e89601f5d0
Comment on attachment 8919205 [details] [diff] [review]
Part4: threadLifecycle - v1

treeherder shows the intermittent rate is decreased from 30% to <10%.
Attachment #8919205 - Flags: review?(hurley)
(In reply to Junior[:junior] from comment #111)
> Comment on attachment 8919205 [details] [diff] [review]
> Part4: threadLifecycle - v1
> 
> treeherder shows the intermittent rate is decreased from 30% to <10%.

As Comment 109 said, if I use a short life-cycle thread (StreamTransportService).
The intermittent rate is down to <5%
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0bfd99ad67846e6b54f24ad1f6107d0a03176063

If we only omt startup and rebuild, it's fine to use StreamTransportService.
[1] indicates that we should Shutdown at event "xpcom-shutdown-threads"
Shutdown sequence would be: profile-before-change -> xpcom-shutdown-threads -> xpcom-shutdown

FWIW, before Part4, cookie service is released during nsCycleCollector_shutdown, later than xpocm-shutdown.

[1] http://searchfox.org/mozilla-central/rev/dca019c94bf3a840ed7ff50261483410cfece24f/xpcom/build/XPCOMInit.cpp#805
ni? for needing review in Comment 111
Flags: needinfo?(hurley)
Sorry, this got lost in the shuffle last week. I'm reviewing today.
Flags: needinfo?(hurley)
Comment on attachment 8919205 [details] [diff] [review]
Part4: threadLifecycle - v1

Review of attachment 8919205 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. Seems like the leak is a pre-existing condition, and these patches just made it more common, right? And with this patch it returns to its previous level of intermittent-ness?
Attachment #8919205 - Flags: review?(hurley) → review+
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #116)
> Comment on attachment 8919205 [details] [diff] [review]
> Part4: threadLifecycle - v1
> 
> Review of attachment 8919205 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> LGTM. Seems like the leak is a pre-existing condition, and these patches
> just made it more common, right? And with this patch it returns to its
> previous level of intermittent-ness?

As far as I can tell,
1) yes we are supposed to shutdown the thread before xpcom-shutdown-thread.
2) The intermittent rate decreases a lot. With some confidence, it goes back to previous level.
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6c4818576443
Part 1: Let cookie db startup-read off-main-thread. r=nwgh, r=jdm, data-r=francois
https://hg.mozilla.org/integration/mozilla-inbound/rev/b0ddf460db07
Part 2: Close syncconn for edge cases. r=nwgh
https://hg.mozilla.org/integration/mozilla-inbound/rev/224e86671743
Part 3: Add cookies.sqlite{|-shm} to xperf whitelist. r=jmaher
https://hg.mozilla.org/integration/mozilla-inbound/rev/edacb0144b17
Part 4: Make the lifecycle of cookie thread alongwith the profile. r=ngwh
Keywords: checkin-needed
Backed out for failing many xpcshell tests, e.g. extensions/cookie/test/unit/test_cookies_async_failure.js:

https://hg.mozilla.org/integration/mozilla-inbound/rev/69d0650dfbe945e0b44d28efafdd483d1ef4e738
https://hg.mozilla.org/integration/mozilla-inbound/rev/df2b4f5744e320ede6a31985c008dbf21195338d
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d652fad1384c147b6e904a7644dff7f323d5067
https://hg.mozilla.org/integration/mozilla-inbound/rev/c560488dbd1157aa1f143e12c4c488775f69269c

Push with failures (also check the TV jobs): https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=edacb0144b171382406a69e1b74cd9a4df83b9fd&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=retry&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=139182828&repo=mozilla-inbound
07:07:35     INFO -  TEST-START | extensions/cookie/test/unit/test_cookies_async_failure.js
07:12:35  WARNING -  TEST-UNEXPECTED-TIMEOUT | extensions/cookie/test/unit/test_cookies_async_failure.js | Test timed out
07:12:35     INFO -  TEST-INFO took 300000ms
07:12:35     INFO -  >>>>>>>
07:12:35     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
07:12:35     INFO -  (xpcshell/head.js) | test pending (2)
07:12:35     INFO -  TEST-PASS | extensions/cookie/test/unit/test_cookies_async_failure.js | do_run_test - [do_run_test : 43] true == true
07:12:35     INFO -  TEST-PASS | extensions/cookie/test/unit/test_cookies_async_failure.js | do_run_test - [do_run_test : 44] true == true
07:12:35     INFO -  (xpcshell/head.js) | test MAIN run_test finished (2)
07:12:35     INFO -  running event loop
07:12:35     INFO -  "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
07:12:35     INFO -  TEST-PASS | extensions/cookie/test/unit/test_cookies_async_failure.js | observe - [observe : 67] "cookie-db-closed" == "cookie-db-closed"
07:12:35     INFO -  TEST-PASS | extensions/cookie/test/unit/test_cookies_async_failure.js | run_test_1 - [run_test_1 : 140] 2 == 2
07:12:35     INFO -  <<<<<<<
Flags: needinfo?(juhsu)
Attached patch Part4: threadLifecycle - v2 (deleted) β€” β€” Splinter Review
Since nsThread::Shutdown would execute all the tasks dispatched to main thread, we should do those task before closing everything.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=febe2b5fccf196006e58ca12ce5ad5590f46cfbc
Attachment #8919205 - Attachment is obsolete: true
Flags: needinfo?(juhsu)
Attachment #8921791 - Flags: review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1d83d62684e2
Part 1: Let cookie db startup-read off-main-thread. r=nwgh, r=jdm, data-r=francois
https://hg.mozilla.org/integration/mozilla-inbound/rev/17f26189df9a
Part 2: Close syncconn for edge cases. r=nwgh
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e5213e725a2
Part 3: Add cookies.sqlite{|-shm} to xperf whitelist. r=jmaher
https://hg.mozilla.org/integration/mozilla-inbound/rev/3531caed0bd9
Part 4: Make the lifecycle of cookie thread alongwith the profile. r=ngwh
Keywords: checkin-needed
Depends on: 1411880
Blocks: 1412218
No longer depends on: 1401883
Blocks: 1413839
Depends on: 1419548
Depends on: 1420175
Depends on: 1534745
Performance Impact: --- → P2
Whiteboard: [qf:p2][necko-quantum][necko-triaged] → [necko-quantum][necko-triaged]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: