1462046 - mozilla::storage::AsyncExecuteStatements::Run spins the cpu at 100% for longer than I'm willing to wait

Reporter

Description

•

7 years ago

I've run into this twice in the last two days. Both times ended up with Firefox locking up completely. It has been suggested that this might be caused by bug 1239708.

Marco Bonardo [:mak]

Assignee

Comment 1

•

7 years ago

we have some heavy schema migration task to do, ideally for a few minutes. Not sure if it's related to that bug, it's possible, but in such a case it would be a once-only thing.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 2

•

7 years ago

This happened to me again. Is there something I can do find out what's causing it?

tracking-firefox62: --- → ?

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Assignee

Comment 3

•

7 years ago

Yes, it was very likely due to the new autofill, we had a few queries taking too long time on the helper thread, and the main thread was locking waiting for that. Bug 1461753 should have solved this, but we're still monitoring the situation. Please let us know if things improve. Regarding how to find the cause, in the about:telemetry page there should be a slow SQL section that may point at multi-second queries, that are more likely to be the causes.

Blocks: 1239708

Depends on: 1461753

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Assignee

Comment 4

•

7 years ago

please let us know if this is still common/happening

Flags: needinfo?(jmuizelaar)

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 5

•

7 years ago

How are you monitoring? Is there any particular telemetry query/alert we could use?

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Assignee

Comment 6

•

7 years ago

We have a couple telemetry probes that showed a spike related to this, PLACES_IDLE_FRECENCY_DECAY_TIME_MS and PLEASES_IDLE_MAINTENANCE_TIME_MS. We're also monitoring reported bugs and common support forums for newly reported hangs. Unfortunately Slow SQL telemetry has been removed from servers, I filed bug 1367023 to get it back, since it would be extremely useful, but there has been no action there. I was hoping to also use arewesmoothyet, but it seems to be a little bit behind in time (Showing April?). Mike, do we have any "real-time" (even a couple builds back) tracking of the number of hangs in Nightly? arewesmoothyet seems to be back by 15 days or more.

Flags: needinfo?(mak77) → needinfo?(mconley)

Marco Bonardo [:mak]

Assignee

Updated

•

7 years ago

Priority: -- → P1

Whiteboard: [fxsearch]

Marco Bonardo [:mak]

Assignee

Comment 8

•

7 years ago

The telemetry probes involving frecency (see comment 6) have gone back to usual values after the fix for Bug 1461753.

Comment hidden (obsolete)

Mike Conley (:mconley) (:⚙️)

Comment 10

•

7 years ago

Redirecting to dthayer - I'm not sure when the jobs run.

u554753

Updated

•

7 years ago

QA Contact: gwimberly

Doug Thayer [:dthayer] (he/him)

Comment 11

•

7 years ago

Trying to kick the arewesmoothyet job manually now. It just has issues when run automatically, even though it should be doing the exact same thing. Tried to run it manually yesterday but I forgot to extend the lifetime of the cluster and it timed out. Anyway, just giving an update. Leaving the needinfo until it's done.

u554753

Comment 12

•

7 years ago

Hi Marco, Jeff, Any suggested STR here to be able to see if I can reproduce the hang reported here?

u554753

Comment 13

•

7 years ago

Per Comment 12.

Flags: needinfo?(mak77)

u554753

Updated

•

7 years ago

Flags: qe-verify?

TMart

Comment 14

•

7 years ago

This is still happening for me. Makes Awesomebar completely unusable and as this usually happens after the computer was in hibernate and needs a restart for the daily Nightly update, I also noticed that the update fails the first time and I have to force it again after starting Nightly again. So I suspect Nightly can't actually restart but hangs in shutdown without a browser window showing anymore. I found a very slow SQL statement on helper threads from earlier today that I probably cancelled with a browser restart after 17875 milliseconds. The statement is: UPDATE moz_places SET frecency = CALCULATE_FRECENCY(id) WHERE frecency = -1 /* places.sqlite */ Here is a screenshot: https://abload.de/img/unbenanntxsuoo.png I would be willing to share my profile data if that helps in any way. I'm running a pretty powerful setup with an i7 6700k, so that is probably not the cause. I can't reproduce on my other machine, a much slower laptop.

TMart

Comment 15

•

7 years ago

Before https://bugzilla.mozilla.org/show_bug.cgi?id=1461753 landed I noticed these very slow statements: 6977 ms PRAGMA integrity_check /* places.sqlite */ 3535 ms DELETE FROM moz_updateoriginsinsert_temp /* places.sqlite */

Lina Butler [:lina]

Comment 16

•

7 years ago

(In reply to TMart from comment #14) > UPDATE moz_places SET frecency = CALCULATE_FRECENCY(id) WHERE frecency = -1 > /* places.sqlite */ Aha! That's almost certainly the mirror: https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/toolkit/components/places/SyncedBookmarksMirror.jsm#4532-4535 The `BEGIN IMMEDIATE` is probably queuing up behind that. TMart, I'm guessing you have `services.sync.engine.bookmarks.buffer` enabled? Would you mind sharing how many bookmarks you have? Mak, is there a good way for us to chunk frecency recalculation for new URLs? Or, is there an approximate frecency we can use that's "good enough" for bookmarks? Bookmarks sync before history, so there's a very good chance we'll recalculate the frecency when we sync history, anyway.

TMart

Comment 17

•

7 years ago

Yes, I have the new Bookmark Sync enabled. Sorry, totally forgot about that. I have about 1500 bookmarks in total.

Marco Bonardo [:mak]

Assignee

Comment 18

•

7 years ago

(In reply to Kit Cambridge (they/them) [:kitcambridge] from comment #16) > Mak, is there a good way for us to chunk frecency recalculation for new > URLs? Or, is there an approximate frecency we can use that's "good enough" > for bookmarks? Bookmarks sync before history, so there's a very good chance > we'll recalculate the frecency when we sync history, anyway. In general, I think we should completely re-evaluate the way we calculate frecency, probably by having a single invalidate field we set and recalculating in chunks on idle.

Lina Butler [:lina]

Updated

•

7 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1463938

Marco Bonardo [:mak]

Assignee

Comment 19

•

7 years ago

(In reply to Grover Wimberly IV [:Grover-QA] from comment #12) > Any suggested STR here to be able to see if I can reproduce the hang > reported here? It's tricky, it requires a profile with a large places.sqlite (70MiB) and new Sync disabled (to avoid false positives). Just browsing and immediately opening a bookmarks view (menu/overflow of the toolbar) could be enough. But it may depend on timing, because some of the expensive operations happen on idle or even worse on idle-daily, that is hard to reproduce at will. Checking Slow SQL in about:telemetry for multi-second statements execution is also a good hint. I fear we'll have to rely on telemetry and arewesmoothyet (when it gets updated), plus we're trying to get back slow SQL telemetry reports.

Flags: needinfo?(mak77)

u554753

Updated

•

7 years ago

Flags: qe-verify? → qe-verify+

u554753

Comment 20

•

7 years ago

Actually, this should stay ? until we discuss more about this on Wednesday.

Flags: qe-verify+ → qe-verify?

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Updated

•

6 years ago

status-firefox62: --- → affected

tracking-firefox62: ? → +

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 21

•

6 years ago

This may be fixed now in bug 1461753. mak, can you verify that from telemetry results or do we need more time?

Flags: needinfo?(mak77)

Lina Butler [:lina]

Updated

•

6 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1466267

Marco Bonardo [:mak]

Assignee

Comment 22

•

6 years ago

The frecency telemetry I pointed at in comment 6 confirms that a computation regression disappeared (there's a clear spike). We didn't get new reports of this problem and last week I spent some time on arewesmoothyet to check if anything would jump to my eyes. I didn't notice any particular stacks that could be related to this. That said, verifying this problem is tricky, our current long term plan includes: 1. getting back Slow SQL telemetry (work is in progress) and monitor that 2. Evaluate a new approach to frecency recalculation (that kit filed as bug 1466267) I don't think we can do more here, but please if you still see this problem in Nightly, let us know ASAP.

Assignee: nobody → mak77

Status: NEW → RESOLVED

Closed: 6 years ago

Flags: needinfo?(mak77)

Flags: needinfo?(dothayer)

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox60: --- → unaffected

status-firefox61: --- → unaffected

status-firefox62: affected → fixed

status-firefox-esr52: --- → unaffected

status-firefox-esr60: --- → unaffected

Target Milestone: --- → mozilla62

Chris Peterson [:cpeterson]

Comment 23

•

6 years ago

Firefox Nightly on macOS has been frequently hanging for me since mid-May, particularly when my MacBook Pro wakes from hibernation. With today's Nightly build (2018-06-05), however, the hangs are much worse. STR seems to be to enable one or more extensions and then open Firefox's History menu. I tried bisecting the hangs with mozregression, which pointed to autofill bug 1239708 and led me to this hang bug. Marco, do my hangs sound like a different problem than this bug?

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Assignee

Comment 24

•

6 years ago

When it happens, could you please check for Slow SQL telemetry in about:telemetry and report any queries? The hangs from hibernations is something I heard often in the past, so it may not be particularly new. Can you post some examples of WebExtensions causing this? Fwiw, I'm not sure bisecting here will work properly because bug 1239708 has effectively added an hang problem to the existing ones (currently resolved), that may make it easier to reproduce a hang just after bug bug 1239708 landed, but may still not be the same hang.

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Assignee

Updated

•

6 years ago

Flags: needinfo?(cpeterson)

Chris Peterson [:cpeterson]

Comment 25

•

6 years ago

Here are the slow SQL queries logged after I reproduced the (~70 second) beach ball hang. Slow SQL Statements on Main Thread Hits Avg. Time (ms) Statement 1 75231 SELECT h.id, h.url, h.title AS page_title, h.rev_host, h.visit_count, h.last_visit_date, null, null, null, null, null, null AS tags , h.frecency, h.hidden, h.guid, null, null, null FROM moz_places h WHERE h.hidden = 0 AND EXISTS (SELECT id FROM moz_historyvisits WHERE place_id = h.id AND visit_type NOT IN (0,4,8) LIMIT 1) ORDER BY last_visit_date DESC LIMIT 15 /* places.sqlite */ Slow SQL Statements on Helper Threads Hits Avg. Time (ms) Statement 2 39383 DELETE FROM moz_updateoriginsinsert_temp /* places.sqlite */ I can usually reproduce the hang when enabling one or both of these simple extensions (and the opening the History menu): https://addons.mozilla.org/en-US/firefox/addon/always-right/ https://addons.mozilla.org/en-US/firefox/addon/in-page-pop-up-reporter/

Flags: needinfo?(cpeterson)

Marco Bonardo [:mak]

Assignee

Comment 26

•

6 years ago

Thank you, basically the second statement took an unexpected long time to run (40s) and due to legacy architectural problem we didn't solve yet (UI queries are on the main-thread yet), the main-thread tries to run a query in the meanwhile, and locks on the connection mutex. It doesn't look like related to the autofill change though, and as such I think it should be filed as a separate new bug that we'll investigate apart. Would you mind filing such a bug? Honestly I don't think the add-ons have any relation with this, or at least I can't see one. Is the disk particularly busy and loud (if mechanical) before these lock-ups happen? Did you enable the new Sync mirroring?

Flags: needinfo?(cpeterson)

Chris Peterson [:cpeterson]

Comment 27

•

6 years ago

(In reply to Marco Bonardo [::mak] from comment #26) > It doesn't look like related to the autofill change though, and as such I > think it should be filed as a separate new bug that we'll investigate apart. > Would you mind filing such a bug? I filed bug 1467537 and answered your questions there.

Flags: needinfo?(cpeterson)

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1467537

u554753

Comment 28

•

6 years ago

Per prior discussion on this bug on Vidyo with Marco/Drew, taking off the qe-verify+ tag. If the situation on this changes, please set the flag back and NeedInfo me.

Flags: qe-verify?

Marco Bonardo [:mak]

Assignee

Updated

•

6 years ago

Flags: needinfo?(jmuizelaar)

Sylvestre Ledru [:Sylvestre]

Updated

•

6 years ago

Keywords: regression