Closed
Bug 552023
Opened 15 years ago
Closed 14 years ago
Kill Places partitioned temp tables
Categories
(Toolkit :: Places, defect)
Toolkit
Places
Tracking
()
RESOLVED
FIXED
mozilla2.0b9
People
(Reporter: mak, Assigned: sdwilsh)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fixed-in-places])
Attachments
(2 files, 18 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
There are nice reasons to do so, but we can't till we have both icons and visits adding async. Thus we need to kill LAZY_ADD.
Even if the tables avoid doing some I/O on visits adding, they make all queries unreadable, much more slower, and bookmarks adding causes a sync at each change, so the gain will be smaller than what we would like.
Once adding is async we should provide a non-temp version to Portable Firefox guys and Fennec and get feedback on perfs.
Reporter | ||
Updated•14 years ago
|
Assignee: nobody → mak77
Reporter | ||
Comment 1•14 years ago
|
||
this bug should block because now that we have async visits we can get a really nice perf improvement here (in some area I expect a 20-30% improvement, a more global but smaller improvement in all the rest of Places). It will for sure reduce statements time, greatly reducing possibility of lock contentions (see bug 563538)
Reporter | ||
Updated•14 years ago
|
blocking2.0: --- → ?
Updated•14 years ago
|
Blocks: placesAsyncBookmarks
Reporter | ||
Comment 2•14 years ago
|
||
will need a way for extensions (like Sync) to query temp tables when needed, but don't query them if they don't exist.
Assignee | ||
Comment 3•14 years ago
|
||
(In reply to comment #2)
> will need a way for extensions (like Sync) to query temp tables when needed,
> but don't query them if they don't exist.
We can keep the views in place I guess.
Reporter | ||
Comment 4•14 years ago
|
||
(In reply to comment #3)
> (In reply to comment #2)
> > will need a way for extensions (like Sync) to query temp tables when needed,
> > but don't query them if they don't exist.
> We can keep the views in place I guess.
that means keeping all triggers too, and temp tables. This is all Ts stuff that we could remove.
Reporter | ||
Comment 5•14 years ago
|
||
saving work as a backup. it should work, but I'm still tweaking some query.
Reporter | ||
Comment 6•14 years ago
|
||
Reporter | ||
Comment 7•14 years ago
|
||
Attachment #459207 -
Attachment is obsolete: true
Reporter | ||
Comment 8•14 years ago
|
||
Attachment #459208 -
Attachment is obsolete: true
Reporter | ||
Comment 9•14 years ago
|
||
Attachment #459795 -
Attachment is obsolete: true
Reporter | ||
Comment 10•14 years ago
|
||
Attachment #459796 -
Attachment is obsolete: true
Reporter | ||
Comment 11•14 years ago
|
||
I'm still investigating a couple oranges.
a timeout in test_history_expiration.js (dm test) that is visible only on the branch, but I can't reproduce it locally.
a failure in browser_tabMatchesInAwesomebar.js that could be fixed already by these patches.
I'm waiting for try server results now.
Reporter | ||
Comment 12•14 years ago
|
||
there is also a failure in test_history_removeAllPages.js due to a timing issue on when we fix invalid frecencies (sounds like it could run before our check)... actually we could even comment out the negative frecency check for now and file it as followup, if needed.
the download test_history_expiration.js failure that was on places branch has instead disappeared on tryserver... but I also see other failures on places branch that i've never seen on central, so hard to tell.
Reporter | ||
Comment 13•14 years ago
|
||
Attachment #459917 -
Attachment is obsolete: true
Reporter | ||
Comment 14•14 years ago
|
||
Attachment #459918 -
Attachment is obsolete: true
Reporter | ||
Comment 15•14 years ago
|
||
I've splitted out Bug 581660, Bug 581657 and Bug 581648 that I've found while working on this, and could be needed before landing this.
Notice I've seen a Ts increase on Places branch, I suspect it's an fsync effect. Now Ts takes in count first visit addition, without temp tables the addition happens on disk rather than in the memory table, adding a couple fsyncs.
Assignee | ||
Updated•14 years ago
|
blocking2.0: ? → betaN+
Comment 16•14 years ago
|
||
Is this ready for review?
Reporter | ||
Comment 17•14 years ago
|
||
there are still 2 random oranges and the regression, so I'm reordering patches so that WAL can land first.
I'm unsure we can do much for the regression due to additional fsyncs, but I have to check it again on the branch with the new ordering
Reporter | ||
Comment 18•14 years ago
|
||
Attachment #460027 -
Attachment is obsolete: true
Reporter | ||
Comment 19•14 years ago
|
||
All oranges should be gone. I have to fix bug 584658 before this one though because it ended up being persistent.
Attachment #460028 -
Attachment is obsolete: true
Reporter | ||
Comment 20•14 years ago
|
||
Attachment #463124 -
Attachment is obsolete: true
Attachment #463296 -
Flags: review?(sdwilsh)
Reporter | ||
Comment 21•14 years ago
|
||
Attachment #463126 -
Attachment is obsolete: true
Attachment #463297 -
Flags: review?(sdwilsh)
Reporter | ||
Updated•14 years ago
|
Whiteboard: [needs bug 584658 for a persistent orange][WAL will land later]
Assignee | ||
Comment 22•14 years ago
|
||
Comment on attachment 463297 [details] [diff] [review]
Tests changes v1.5
For review comments with expandable code context, please see http://reviews.visophyte.org/r/463297/.
on file: docshell/test/test_bug94514.html line 47
> // Because adding visits is async, we will not be notified imemdiately.
wow...how the hell did this test pass before? :(
on file: toolkit/components/places/tests/unit/test_history_removeAllPages.js line 93
> // Memory table has been updated, disk table has not
this comment is wrong, right?
on file: toolkit/components/places/tests/unit/test_history_removeAllPages.js line 95
> "SELECT h.id FROM moz_places h WHERE h.frecency > 0 " +
> "AND EXISTS (SELECT id FROM moz_bookmarks WHERE fk = h.id) LIMIT 1");
This is a behavior change that I don't see why it is needed. Can you please
explain it?
on file: toolkit/components/places/tests/unit/test_history_removeAllPages.js line 179
> QueryInterface: function(iid) {
> if (iid.equals(Ci.nsINavHistoryObserver) ||
> iid.equals(Ci.nsISupports)) {
> return this;
> }
> throw Cr.NS_ERROR_NO_INTERFACE;
> }
XPCOMUtils please
r=sdwilsh
Attachment #463297 -
Flags: review?(sdwilsh) → review+
Reporter | ||
Comment 23•14 years ago
|
||
(In reply to comment #22)
> on file: toolkit/components/places/tests/unit/test_history_removeAllPages.js
> line 95
> > "SELECT h.id FROM moz_places h WHERE h.frecency > 0 " +
> > "AND EXISTS (SELECT id FROM moz_bookmarks WHERE fk = h.id) LIMIT 1");
>
> This is a behavior change that I don't see why it is needed. Can you please
> explain it?
actually it's an orange, even if it never shown up, I can't tell which order onClearHistory handlers are called, so this could run too early or too late. FixInvalidIndices can run before it and the test would fail. Plus we are going to make frecency async so that would fail regardless as it was.
Reporter | ||
Comment 24•14 years ago
|
||
FixInvalidFrecencies, I meant.
Assignee | ||
Comment 25•14 years ago
|
||
Comment on attachment 463296 [details] [diff] [review]
Code changes v1.5
This only took me like three hours. That's not so bad at all!
For review comments with expandable code coverage, please see http://reviews.visophyte.org/r/463296/.
on file: toolkit/components/places/src/nsNavHistory.cpp line 364
> NS_IMPL_ISUPPORTS2(PlacesEvent, mozIStorageCompletionCallback, nsIRunnable)
nit: this doesn't follow our normal style
on file: toolkit/components/places/src/nsNavHistory.cpp line 846
> // Visits triggers.
> rv = mDBConn->ExecuteSimpleSQL(CREATE_HISTORYVISITS_AFTERINSERT_TRIGGER);
> NS_ENSURE_SUCCESS(rv, rv);
> rv = mDBConn->ExecuteSimpleSQL(CREATE_HISTORYVISITS_AFTERDELETE_TRIGGER);
> NS_ENSURE_SUCCESS(rv, rv);
Hmm, did we have these before?
on file: toolkit/components/places/src/nsNavHistory.cpp line 1020
> rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> "SELECT id, session, visit_date "
> "FROM moz_historyvisits "
> "WHERE place_id = (SELECT id FROM moz_places WHERE url = :page_url) "
> "ORDER BY visit_date DESC "
> "LIMIT 1 "
LIMIT 1 seems not needed here and adds extra opcodes
on file: toolkit/components/places/src/nsNavHistory.cpp line 1029
> rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> "SELECT id FROM moz_historyvisits "
> "WHERE place_id = :page_id "
> "AND visit_date = :visit_date "
> "AND session = :session "
> "LIMIT 1 "
Likewise, here. LIMIT 1 seems uneeded
on file: toolkit/components/places/src/nsNavHistory.cpp line 1696
> if (!triggerExists) {
Please add a comment explaining that we only have to check one trigger because
both should have been removed in the past.
on file: toolkit/components/places/src/nsNavHistory.cpp line 4059
> nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> "SELECT url, visit_date FROM moz_historyvisits v "
> "JOIN moz_places h ON v.place_id = h.id "
> "WHERE h.hidden <> 1 "
> "ORDER BY visit_date DESC "
> "LIMIT 1 "),
LIMIT 1 is not needed here either.
on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 190
> const SQL_BASE = "SELECT h.url, h.title, f.url, " + kBookTagSQLFragment + ", "
> + "h.visit_count, h.typed, h.id, :query_type, "
> + "t.open_count "
> + "FROM moz_places h "
> + "LEFT JOIN moz_favicons f ON f.id = h.favicon_id "
> + "LEFT JOIN moz_openpages_temp t ON t.place_id = h.id "
> + "WHERE h.frecency <> 0 "
> + "AND AUTOCOMPLETE_MATCH(:searchString, h.url, "
> + "IFNULL(bookmark, h.title), tags, "
> + "h.visit_count, h.typed, parent, "
> + "t.open_count, "
> + ":matchBehavior, :searchBehavior) "
> + "{ADDITIONAL_CONDITIONS} "
> + "ORDER BY h.frecency DESC, h.id DESC "
> + "LIMIT :maxResults";
oh wow this is easier to read!
on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 237
> return this._db.createAsyncStatement(
ugh...wow. I thought I fixed this!
on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 283
> "/* do not warn (bug 487789) */ "
I just want to confirm that this warns still if we remove this, correct?
on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 311
> "/* do not warn (bug 487787) */ "
ditto here - I think doing the ORDER BY h.frecency means we shouldn't warn
anymore.
r=sdwilsh
Attachment #463296 -
Flags: review?(sdwilsh) → review+
Reporter | ||
Comment 26•14 years ago
|
||
(In reply to comment #25)
mDBConn->ExecuteSimpleSQL(CREATE_HISTORYVISITS_AFTERINSERT_TRIGGER);
> > NS_ENSURE_SUCCESS(rv, rv);
> > rv = mDBConn->ExecuteSimpleSQL(CREATE_HISTORYVISITS_AFTERDELETE_TRIGGER);
> > NS_ENSURE_SUCCESS(rv, rv);
>
> Hmm, did we have these before?
yes!
> on file: toolkit/components/places/src/nsNavHistory.cpp line 1020
> > rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> > "SELECT id, session, visit_date "
> > "FROM moz_historyvisits "
> > "WHERE place_id = (SELECT id FROM moz_places WHERE url = :page_url) "
> > "ORDER BY visit_date DESC "
> > "LIMIT 1 "
>
> LIMIT 1 seems not needed here and adds extra opcodes
is is gettin the most recent visit for that url, so it does not need to fetch more than 1 record. limiting usually helps perf in these cases because sqlite knows before to stop after first result.
> on file: toolkit/components/places/src/nsNavHistory.cpp line 1029
> > rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> > "SELECT id FROM moz_historyvisits "
> > "WHERE place_id = :page_id "
> > "AND visit_date = :visit_date "
> > "AND session = :session "
> > "LIMIT 1 "
>
> Likewise, here. LIMIT 1 seems uneeded
probably here we can avoid it since it's hard that we have 2 visits at the same time, but covers that rare case
> on file: toolkit/components/places/src/nsNavHistory.cpp line 4059
> > nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> > "SELECT url, visit_date FROM moz_historyvisits v "
> > "JOIN moz_places h ON v.place_id = h.id "
> > "WHERE h.hidden <> 1 "
> > "ORDER BY visit_date DESC "
> > "LIMIT 1 "),
>
> LIMIT 1 is not needed here either.
looks like same as case 1, there are more visits for a single page, and we need just one
> on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 283
> > "/* do not warn (bug 487789) */ "
>
> I just want to confirm that this warns still if we remove this, correct?
It should warn due to tags fragment
> on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 311
> > "/* do not warn (bug 487787) */ "
>
> ditto here - I think doing the ORDER BY h.frecency means we shouldn't warn
> anymore.
I think we warn for the tags fragment, but will check
Reporter | ||
Comment 27•14 years ago
|
||
So, about the LIMIT stuff, I've found this comment from drh: http://www.mail-archive.com/sqlite-users@sqlite.org/msg35473.html
LIMIT can save memory if it's used with an ORDER BY that is not using an index. In our cases we should always be on an index and I can remove them.
Reporter | ||
Comment 28•14 years ago
|
||
Attachment #463296 -
Attachment is obsolete: true
Attachment #463297 -
Attachment is obsolete: true
Reporter | ||
Comment 29•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/b1e658a33221
and disabled test_bug500328.html with
http://hg.mozilla.org/mozilla-central/rev/83ae1826aa80
From what I see we try to add a page to moz_places twice (I guess from real visit and some other api, either registerOpenPage and/or pushstate), and the second addition fails, so visit is not added. Will file a bug for this.
Notice this also shown an increadible Ts increase on Linux, that we'll have to figure out. will file a bug for it as well, it could cause a backout though.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b4
Reporter | ||
Comment 30•14 years ago
|
||
backed out both changes, due to Talos and stil la xpcshell test failure (typo), but mostly what is blocking this is the strange Ts regression on Linux.
http://hg.mozilla.org/mozilla-central/rev/d2f7b3ee3a73
http://hg.mozilla.org/mozilla-central/rev/a673a956d339
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Updated•14 years ago
|
Whiteboard: [needs bug 584658 for a persistent orange][WAL will land later] → [Caused regressions that need investigation]
Reporter | ||
Comment 31•14 years ago
|
||
small unbitrot that applies on top of my long queue. Hope to be able to push something from that queue in the next hours.
Attachment #464068 -
Attachment is obsolete: true
Comment 32•14 years ago
|
||
test_bug94514.html started failing intermittently after I pushed my patch to bug 580069. I'm going to try to pull out the test fix from patch v1.7 and see if that fixes the problem. If it does, I'll report back here.
Comment 33•14 years ago
|
||
I've been trying to pull out the fix for test_bug94514.html from this patch, but I've been causing much more orange on try than I've been fixing.
From the tryserver logs [1], it appears that the observer added in the test never gets removed. (Notice that tests after test_bug94514.html fail *in* test_bug94514.html.) Something must be wrong with the removeObserver call, since the line after it certainly executes [2].
I've been unable to reproduce this failure locally, unfortunately, so I don't have much more insight into what's going on. But I'd really like to figure out how to fix it so I can resolve bug 580069.
[1] http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282415594.1282416222.4180.gz
[2] http://hg.mozilla.org/try/pushloghtml?changeset=6c5c9a6c3ba4
Reporter | ||
Comment 34•14 years ago
|
||
looks like you did not apply the change to bug94514-postpage.html though
Comment 35•14 years ago
|
||
Ah, that would probably do it. Thanks.
Assignee | ||
Comment 36•14 years ago
|
||
Grabbing this since mak is busy.
Assignee: mak77 → sdwilsh
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 37•14 years ago
|
||
This is the v1.5 patch just modified to apply to tip. This does mean I need to address my review comments still.
Attachment #464835 -
Attachment is obsolete: true
Reporter | ||
Comment 38•14 years ago
|
||
why not using v1.7?
Assignee | ||
Comment 39•14 years ago
|
||
Same as above, but tests this time.
Reporter | ||
Comment 40•14 years ago
|
||
fwiw I could have done additional fixed in v1.6 and v1.7, so please interdiff them with v1.5
Assignee | ||
Comment 41•14 years ago
|
||
(In reply to comment #38)
> why not using v1.7?
Largely because I didn't notice the difference and thought it was just a combined patch. I noticed when I was uploading and had already spent two hours rebasing it. :(
(In reply to comment #40)
> fwiw I could have done additional fixed in v1.6 and v1.7, so please interdiff
> them with v1.5
Sadly, interdiff isn't going to be terribly useful
Reporter | ||
Comment 42•14 years ago
|
||
well 1.6 was combined, but unbitrotted and with your review comments fixed.Inded that's what we pushed.
1.7 was just a further unbitrot over 1.6.
Assignee | ||
Comment 43•14 years ago
|
||
v1.7 rebased. This doesn't pass tests yet.
Attachment #474740 -
Attachment is obsolete: true
Attachment #474742 -
Attachment is obsolete: true
Assignee | ||
Comment 44•14 years ago
|
||
Only failing on unit\test_history_removeAllPages.js right now. Out of time to debug, but I'll come back to this friday.
Assignee | ||
Comment 45•14 years ago
|
||
Changes I used to get stuff passing.
Reporter | ||
Comment 46•14 years ago
|
||
(In reply to comment #45)
> Changes I used to get stuff passing.
These make sense, I thought I had fixed that FROM FROM typo in the last patch... looks like I did not. r=me to merge these changes to the patch.
Reporter | ||
Comment 47•14 years ago
|
||
test_bug94514.html got a double merge in your patch, my fix to that test has been already pushed by jlebar in another bug. So you should remove that change.
Assignee | ||
Comment 48•14 years ago
|
||
Minor other fixes that let tests pass locally (or at least tests in toolkit/components/places)
Attachment #474829 -
Attachment is obsolete: true
Attachment #474843 -
Attachment is obsolete: true
Assignee | ||
Comment 49•14 years ago
|
||
(In reply to comment #29)
> http://hg.mozilla.org/mozilla-central/rev/b1e658a33221
> and disabled test_bug500328.html with
> http://hg.mozilla.org/mozilla-central/rev/83ae1826aa80
> From what I see we try to add a page to moz_places twice (I guess from real
> visit and some other api, either registerOpenPage and/or pushstate), and the
> second addition fails, so visit is not added. Will file a bug for this.
Note that I'm still seeing this on windows on the places branch. Commented in bug 585703 about that.
Assignee | ||
Comment 50•14 years ago
|
||
Trivial fix for test_clearHistory_shutdown.js that I'm going to roll into v1.9.
Reporter | ||
Comment 51•14 years ago
|
||
yeah, we won't receive sync anymore, the test fix can be merged.
The failure as I commented in bug 585703 must be something other, unless you still see a "constraint 19 failed" message
Assignee | ||
Comment 52•14 years ago
|
||
And for the sake of interesting numbers, this is bug 573492 (WAL) and bug 552023 (kill temp tables) perf numbers: http://tinyurl.com/2dodkwm
Assignee | ||
Comment 53•14 years ago
|
||
So, I'm renominating this for blocking because I think that since we now understand this problem a bit better and I'm starting to lean towards not blocking the release on this. So let's start with what we know:
1) This blocks a bunch of bugs, only one of which is a blocker that absolutely can not happen without this (bug 582703). I'm not even sure that needs to be a blocker.
2) This greatly reduces the complexity in all of our SQL queries for places, which makes our lives way easier.
3) This change nets us several performance wins:
* Slight tp4 win on linux (1-2%)
* Memory usage reduction on during tp4 on all platforms (0-6%)
* tp4_shutdown on all platforms (4-30%)
* ts on windows (2%) and 32-bit OS X (3%)
* Slight ts_cold on 32-bit OS X (1%) and linux (0-2%)
[note: these hold true for the generated profile tests too]
* ts_cold_shutdown (0-16%)
[note: these hold true for the generated profile tests too] * txul on OS X (0-3%)
4) This change gives us some performance regressions
* 39% tp4 regression on windows
* ts regression on linux (18-25%)
* ts_cold regression on 64-bit OS X (18%)
[note: this turns into a win in ts_cold_generated_max_shutdown]
* txul on windows (0-5%) and linux (0-2%)
5) Punting on this means we'll have to continue trying to maintain it and prevent it from bitrotting until we can get this into a workable state.
And what we don't know:
a) How much work the Sync team would also need to do to update themselves due to this change (I literally just realized this is a problem now, which is
unfortunate. I thought we'd be running Sync tests on our tinderbox...)
b) The cause of the tp4 regression on windows
c) The cause of the ts regression on linux, although increased number of fsyncs, even if they are not on the main thread, are highly suspect.
d) Why 32-bit and 64-bit OS X are getting different numbers.
e) What is wrong with test_bug500328.html and fix it.
f) What else might break from this that doesn't have test coverage, including add-ons that make raw SQL queries.
g) How much time it will take to get i) answers to (a)-(f) and ii) how much time solutions based on those answers might take.
What we don't know is what concerns me. I can see it taking a month or two of an engineer's time, and some of the questions would be answered far too late, like (f), to deal with at that time. For these reasons, I'm leaning towards making this not block, but someone else should make that call.
blocking2.0: betaN+ → ?
Assignee | ||
Comment 54•14 years ago
|
||
And since I never put this in the bug, here are the perf numbers for this (on top of bug 573492 but not with it like comment 52):
http://tinyurl.com/234ewzo
Comment 55•14 years ago
|
||
(In reply to comment #53)
> a) How much work the Sync team would also need to do to update themselves due
> to this change (I literally just realized this is a problem now, which is
> unfortunate.
Sync is ready for this ever since bug 583852 which got us from querying views back to tables + temp tables. Since this bug was very close to landing (and it did, for a brief while, land) we put the temp-less setup in for forwards compatibility (if not Sync would've been broken instantly).
> I thought we'd be running Sync tests on our tinderbox...)
We are. They're xpcshell-tests under services/sync/tests/. When they break they show up as orange X on tbpl as I painfully discovered myself recently ;)
Assignee | ||
Comment 56•14 years ago
|
||
(In reply to comment #55)
> Sync is ready for this ever since bug 583852 which got us from querying views
> back to tables + temp tables. Since this bug was very close to landing (and it
> did, for a brief while, land) we put the temp-less setup in for forwards
> compatibility (if not Sync would've been broken instantly).
Yeah, maybe that's why Places branch didn't go orange when I landed this then.
> We are. They're xpcshell-tests under services/sync/tests/. When they break they
> show up as orange X on tbpl as I painfully discovered myself recently ;)
To be clear, I meant the Places tinderbox, which may not be an exact copy of mozilla-central
Glad to know that (a) isn't an issue.
Assignee | ||
Comment 57•14 years ago
|
||
(In reply to comment #53)
> 4) This change gives us some performance regressions
> * ts regression on linux (18-25%)
mak and I suspect this is related to an increased number of fsyncs, which would likely be mitigated by bug 599969 but there is no way we'd have time for that in 2.0.
Comment 58•14 years ago
|
||
The level of performance regression as currently stands is not something we can live with - and it's especially surprising because this change was supposed to improve performance!
Given that we don't know what caused the various problems, and the fact that these kinds of changes tend to result in many months of bugs, I'm sufficiently scared, and want to punt on it for Firefox 4.
However, this blocks 3 other blockers:
bug 582703
bug 519514
bug 563538
All of these are bad, and should be fixed. This makes me hesitant to unblock on this.
Maybe it'd be worth having Shawn or Marco spend a week in solitary to look at nothing but this?
blocking2.0: ? → betaN+
Comment 59•14 years ago
|
||
eep, reset the mobile flag somehow, and don't have privs to set it back - requesting re-blocking there.
tracking-fennec: --- → ?
Assignee | ||
Comment 60•14 years ago
|
||
(In reply to comment #58)
> However, this blocks 3 other blockers:
Sorry, I should have elaborated on these more...
> bug 582703
This bug blocks bug 563538 which is a blocker (and the reason why it became a blocker). However, I just removed that dependency and renominated it (bug 563538 comment 5 for rationale).
> bug 519514
This would be accomplished by making an asynchronous bookmarks API. It'd still be potentially slow even with this fixed because we'd still be doing disk I/O on the main thread. Sadly, I missed this in my original assessment because I wasn't looking for the fennec blocking flag. Regardless, fixing this bug won't change the approach that needs to be done to fix this bug. It just means slightly more complicated queries, but we are already used to that. In fact, I'm going to remove this bug from blocking it. (see also bug 519514 comment 17)
> bug 563538
Everything in the dependency list for that bug is "oh this should help make this less of a problem". No single thing there is going to make it go away, and missing one or two isn't likely to hurt. The only way we can make that go away completely is to make *all* of places async, which is totally out of scope for Firefox 4.
I'm renoming this since I actually have rationale about the blockers here (down two, maybe all three) and dietrich's and my timezone offsets make it so that we can't discuss this in person. I think the only point of contention here is ug 563538.
No longer blocks: placesAsyncBookmarks
blocking2.0: betaN+ → ?
Comment 61•14 years ago
|
||
Honestly, I'm having trouble understanding the last few comments. It seems like we have a known performance regression - if that's true, then it should block. Plain and simple.
Ultimately I agree with Dietrich, though, in that we should fix the dependent bugs first and then hopefully not need this fix for Firefox 4. It feels large, intensive and dangerous at this time.
Comment 62•14 years ago
|
||
It's not that cut and dry. Temp tables was a change in 3.6 that improved performance. But now we want to improve performance in further and broader ways, and that 3.6 change is holding us back.
Removing the 3.6 change (this bug) results in a fabulously contradictory mix of perf wins and losses, sometimes straddling platforms in double-digit deltas (eg: 2% tp4 win on linux, 39% tp4 regression on windows).
Comment 63•14 years ago
|
||
So is it the case that we have a known perf regression, or a reported-in-some instances regression? How bad is the regression? We might have to sequester Mak and sdwilsh, as you suggest.
Assignee | ||
Comment 64•14 years ago
|
||
(In reply to comment #61)
> Honestly, I'm having trouble understanding the last few comments. It seems like
> we have a known performance regression - if that's true, then it should block.
> Plain and simple.
It's a known performance regression in that we know we have one. What we don't know, is what is causing it. That will be at least a days worth of work, if not more to figure out. The performance regression is also only related to the patch in this bug - it doesn't exist in the tree now.
> Ultimately I agree with Dietrich, though, in that we should fix the dependent
> bugs first and then hopefully not need this fix for Firefox 4. It feels large,
> intensive and dangerous at this time.
I'm not sure what you are referencing here to be honest.
(In reply to comment #63)
> So is it the case that we have a known perf regression, or a reported-in-some
> instances regression? How bad is the regression? We might have to sequester Mak
> and sdwilsh, as you suggest.
To restate: this patch will introduce a performance regression. A very large one. See http://tinyurl.com/234ewzo for all the details, or comment 53 for the summary (points (3) and (4)).
Reporter | ||
Comment 65•14 years ago
|
||
Windows Tp4 regression is what hurts, and it is so because of the way Tp4 works (a bunch of websites loaded in a short time) that is quite far from the common user's perspective. Personally I'd expect various UI Tsnappiness real benefits instead, but we trust Talos, so this can't land with that regression in.
I'd suggest to check where we will be when async frecency will land, if this will still be so bad, I fear there won't be time for this to make fx4, we could try 4.1.
Assignee | ||
Comment 66•14 years ago
|
||
(In reply to comment #65)
> Windows Tp4 regression is what hurts, and it is so because of the way Tp4 works
> (a bunch of websites loaded in a short time) that is quite far from the common
> user's perspective. Personally I'd expect various UI Tsnappiness real benefits
> instead, but we trust Talos, so this can't land with that regression in.
> I'd suggest to check where we will be when async frecency will land, if this
> will still be so bad, I fear there won't be time for this to make fx4, we could
> try 4.1.
Well, we are doing something that makes us take significantly longer to accomplish our work on windows that isn't the case on other platforms. Even if talos doesn't measure what the user might actually see, it's measuring something here that we very likely care about.
Assignee | ||
Comment 67•14 years ago
|
||
Even better: the tp4 regression doesn't reproduce locally. There are only two differences in my setup: I'm running win7 64-bit and the talos boxes run 32-bit, and the pageset is not what we test in production. Gonna get the production page set now, but I'm not yet optimistic.
Assignee | ||
Comment 69•14 years ago
|
||
So far I have been unable to reproduce the persistent orange on tinderbox locally or in a record & replay session.
Assignee | ||
Comment 70•14 years ago
|
||
(In reply to comment #53)
> * ts regression on linux (18-25%)
Seeing 0 change in the number of fsyncs, and the number of reads and writes seems to be about the same between before and after. :(
Assignee | ||
Comment 71•14 years ago
|
||
(In reply to comment #53)
> * txul on windows (0-5%) and linux (0-2%)
Likewise here on linux. Sadfaces. It is possible that this isn't a real regression though.
Reporter | ||
Comment 72•14 years ago
|
||
(In reply to comment #70)
> (In reply to comment #53)
> > * ts regression on linux (18-25%)
> Seeing 0 change in the number of fsyncs, and the number of reads and writes
> seems to be about the same between before and after. :(
maybe a query that we run on startup that in the rewrite has become slower? Sounds strange but it's possible that sqlite optimizer is taking the wrong way like in the other case we saw.
Assignee | ||
Comment 73•14 years ago
|
||
(In reply to comment #72)
> maybe a query that we run on startup that in the rewrite has become slower?
> Sounds strange but it's possible that sqlite optimizer is taking the wrong way
> like in the other case we saw.
I would expect that to be cross platform regressions then since the optimizer is not platform specific.
Assignee | ||
Comment 74•14 years ago
|
||
Whiteboard: [Caused regressions that need investigation] → [fixed-in-places]
Assignee | ||
Comment 75•14 years ago
|
||
(In reply to comment #69)
> So far I have been unable to reproduce the persistent orange on tinderbox
> locally or in a record & replay session.
And it's not persistent orange anymore! I haven't even seen it go orange yet. Maybe we had a bad places tree before...
Comment 77•14 years ago
|
||
How soon will the glory that is this patch land on m-c?
Assignee | ||
Comment 78•14 years ago
|
||
(In reply to comment #77)
> How soon will the glory that is this patch land on m-c?
When the performance regressions are all taken care of. I realize that's not a great estimate, but it's my number one focus right now.
Comment 79•14 years ago
|
||
Is there anything I can do to help?
Assignee | ||
Comment 80•14 years ago
|
||
(In reply to comment #79)
> Is there anything I can do to help?
I could use some help with the Ts regression on linux if you have linux hardware. Also, if you can reproduce the Tp4 regression on windows, getting profiles may be helpful. I can't reproduce it locally myself, which is making it hard to analyses.
Also, unclear why we have regressions on only 64-bit OS X (I have no insight on that at all). If you want to jump on any of those, that would be helpful.
Reporter | ||
Comment 81•14 years ago
|
||
An interesting question: with temp tables we know PlacesDBFlush was initing Places pretty early. Do we know who is the king of the hill today? We could be initing Places at a point where we can avoid it. Did nsBrowserGlue take back the lead?
Reporter | ||
Comment 82•14 years ago
|
||
hm, or probably it's not nsBrowserGlue that listens for init-complete... I'm maybe thinking to tryBookmarkCharset and bug 582712 could help Ts quite a bit if that's the case.
Assignee | ||
Comment 83•14 years ago
|
||
(In reply to comment #81)
> An interesting question: with temp tables we know PlacesDBFlush was initing
> Places pretty early. Do we know who is the king of the hill today? We could be
> initing Places at a point where we can avoid it. Did nsBrowserGlue take back
> the lead?
Well, we add the startup test page to history anyway, so we are still going to init ourselves in the startup path, and there really is no way around that AFAICT.
Reporter | ||
Comment 84•14 years ago
|
||
yes but there is an interesting difference, the startup page is going to fire async stuff that can walk in parallel, while other sync stuff would hurt much more.
Assignee | ||
Comment 85•14 years ago
|
||
My current hunch on the tp numbers is that we are constantly idle, and therefore are doing expiration all the time. Since we no longer have temp tables, that results in more writes. I just pushed a patch to the try server that effectively disables the idle service (http://hg.mozilla.org/try/rev/a42e3a7b0d9c) to see if the Tp4 numbers go down. Right now they are about 20-25% on windows with various other work we've landed on the places branch to fix these regressions:
http://tinyurl.com/23obk3a
Reporter | ||
Comment 86•14 years ago
|
||
if that would be true idle service would be completely messed up and we had noticed a regression when landing expiration. Actually expiration on idle should be running just once and then stopping any activity till 'back' is received. If idle would be notified multiple times I'd be pretty much sad. Also from my experience OSX is always reporting idle, but it doesn't show regression.
Assignee | ||
Comment 87•14 years ago
|
||
(In reply to comment #86)
> if that would be true idle service would be completely messed up and we had
> noticed a regression when landing expiration. Actually expiration on idle
> should be running just once and then stopping any activity till 'back' is
> received. If idle would be notified multiple times I'd be pretty much sad. Also
> from my experience OSX is always reporting idle, but it doesn't show
> regression.
I'll find out soon - builds just finished (http://hg.mozilla.org/try/rev/58adeeb9e0ca).
Assignee | ||
Comment 88•14 years ago
|
||
And...no significant change. Back to do the drawing board on the tp4 regression :(
http://tinyurl.com/2fzktd8
Updated•14 years ago
|
blocking2.0: ? → betaN+
Comment 89•14 years ago
|
||
Assignee | ||
Comment 90•14 years ago
|
||
(In reply to comment #89)
> http://hg.mozilla.org/mozilla-central/rev/676388cb7893
Thanks, but I was going to close all these bugs tomorrow with the m-c changesets ;)
Reporter | ||
Updated•14 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Target Milestone: mozilla2.0b4 → mozilla2.0b9
Updated•14 years ago
|
Attachment #476842 -
Attachment description: v1.8 → v1.8
[Checked in: Comment 74+89]
Updated•14 years ago
|
Attachment #477632 -
Attachment description: fix test_clearHistory_shutdown.js → fix test_clearHistory_shutdown.js
[Checked in: Comment 74+89]
Comment 91•14 years ago
|
||
+
http://hg.mozilla.org/mozilla-central/rev/c8d8af0fcc38
"Fix test failure due to merge. _views are no longer around on the places branch."
Updated•14 years ago
|
Attachment #476842 -
Attachment description: v1.8
[Checked in: Comment 74+89] → v1.8
[Checked in: Comment 74 & 89]
Updated•14 years ago
|
Attachment #477632 -
Attachment description: fix test_clearHistory_shutdown.js
[Checked in: Comment 74+89] → fix test_clearHistory_shutdown.js
[Checked in: Comment 74 & 89]
Updated•14 years ago
|
Flags: in-testsuite+
You need to log in
before you can comment on or make changes to this bug.
Description
•