Closed Bug 723611 (VacuumManagerPragmas) Opened 13 years ago Closed 13 years ago

Browser hung while trying to vacuum

Categories

(Toolkit :: Telemetry, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla13

People

(Reporter: joe, Assigned: mak)

References

(Blocks 1 open bug)

Details

(Whiteboard: [snappy:p1])

Attachments

(2 files)

Attached file stack and locals (deleted) —
I walked away from my computer for a few minutes, and when I came back, my browser was hung. Some investigation shows that it's trying to vacuum my telemetry database. I presume that it's currently open in another thread.
I captured a dump that includes the heap: http://people.mozilla.org/~jdrew/telemetry-dump.7z
I should mention that I am running 2012-02-02's nightly build on Windows 7.
there is no telemetry database. Do you mean it was running vacuum while doing some telemetry queries?

I suspect we can make use of the hang detector to detect sqlite contention and break it by cancelling background queries once bug 722243 is done.
Depends on: 722243
If it was doing telemetry queries on mainthread, this is likely related to bug 722242 that Paolo pushed yesterday.

Btw, you exactly caught the single month day where we do vacuum to go back to your idle computer :) Apart jokes, we should do what taras said and priority the async  bookmarking api as the immediate next Places project.
Ehsan and I found that sqlite was apparently working on a telemetry database:

zName	0x66c85118 "telemetry-vfs"	const char *

Perhaps this was a red herring?
that's just the telemetry vfs we use to measure and report performances, it's a transparent layer.
Whiteboard: [snappy]
Just to check, you determined that it was vacuuming by looking at zSQL on the relevant object?

Do we have any idea what the main thread was doing at this time?  If it's in the stack trace attached, I don't know how to read it.

It's curious that neither Joe nor I (bug 724288) had run into long hangs like this before, and we're seeing them all of a sudden.
You can actually download the .dmp file if you want to debug, but the stack I uploaded *was* the main thread.
We vacuum on the main thread?  O.o

Looking at the stacks again, I'm not sure that bug 724288 is a dupe.  This bug's main thread is doing

	sIdleServiceDaily::Observe
 	nsIdleService::TryNotifyIdleState

but bug 724288 has the main thread stuck on nsNavBookmarks::RunInBatchMode (I couldn't figure out exactly what statements it was trying to run).
(In reply to Justin Lebar [:jlebar] from comment #9)
> We vacuum on the main thread?  O.o

no

> Looking at the stacks again, I'm not sure that bug 724288 is a dupe.  This
> bug's main thread is doing
> 
> 	sIdleServiceDaily::Observe
>  	nsIdleService::TryNotifyIdleState
> 
> but bug 724288 has the main thread stuck on nsNavBookmarks::RunInBatchMode
> (I couldn't figure out exactly what statements it was trying to run).

The fact is that decayFrecency has become extremely expensive, and it pushes all the async stuff down by seconds into the actual session, included vacuum.
Bug 723044 will bring back the situation to what it was before, that is a small probability to have contention vs a large one.
that means, all this hang after idle bugs are just evidence of the same bug.
Okay, I think I understand bug 723044; thanks.

> The fact is that decayFrecency has become extremely expensive, and it pushes all the async stuff 
> down by seconds into the actual session, included vacuum.

I think I understand "pushing all the async stuff down by seconds", but I don't understand, "into the actual session".

Why does the stack in this bug show the main thread doing a vacuum?  Or is that not what the main thread stack shows?
(In reply to Justin Lebar [:jlebar] from comment #12)
> I think I understand "pushing all the async stuff down by seconds", but I
> don't understand, "into the actual session".

Sorry, I was unclear indeed. Before decayFrecency() became expensive, the possibility to have some maintenance task happen when the user comes back from idle was really small, on my system that call takes about 1 or 2 seconds. Vacuum may likely happen after this call, and if decayFrecency() takes seconds, or even minutes, the possibility for vacuum to be delayed up to when the user is back and tries to actually use the browser, is very high.

> Why does the stack in this bug show the main thread doing a vacuum?  Or is
> that not what the main thread stack shows?

Ah, this is interesting, it's not the vacuum creating contention here, rather the PRAGMAS that are run before it are contending with frecency.
Alias: VacuumManagerPragmas
Blocks: StorageJank
Whiteboard: [snappy] → [snappy:p1]
Blocks: 722188
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Summary: Browser hung while trying to vacuum telemetry database → Browser hung while trying to vacuum
Attached patch patch v1.0 (deleted) — Splinter Review
The problem:
- synchronous pragmas executed before vacuum may contend with
  asynchronous queries executing on the connection

What this patch does:
- removes unused NotifyCallback class
- removes all synchronous pragmas by simplifying page_size fixup.
  The database will just be fixed at its turn.
- makes assertions fatal
Attachment #598026 - Flags: review?(sdwilsh)
Comment on attachment 598026 [details] [diff] [review]
patch v1.0

Review of attachment 598026 [details] [diff] [review]:
-----------------------------------------------------------------

r=sdwilsh

::: storage/src/VacuumManager.cpp
@@ +223,5 @@
>    if (os) {
> +    DebugOnly<nsresult> rv =
> +      os->NotifyObservers(nsnull, OBSERVER_TOPIC_HEAVY_IO,
> +                          OBSERVER_DATA_VACUUM_BEGIN.get());
> +    MOZ_ASSERT(NS_SUCCEEDED(rv), "Should be able to notify");

\o/
Attachment #598026 - Flags: review?(sdwilsh) → review+
Thanks. Someone complained we were casting to void too many result values :)

https://hg.mozilla.org/integration/mozilla-inbound/rev/616f4f937da2
Target Milestone: --- → mozilla13
https://hg.mozilla.org/mozilla-central/rev/616f4f937da2
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: