Closed Bug 834545 Opened 12 years ago Closed 10 years ago

Implement History.remove

Categories

(Toolkit :: Places, defect)

defect
Not set
normal
Points:
13

Tracking

()

RESOLVED FIXED
mozilla36
Iteration:
36.1

People

(Reporter: mak, Assigned: Yoric)

References

Details

Attachments

(1 file, 10 obsolete files)

(deleted), patch
Yoric
: review+
Details | Diff | Splinter Review
Should replace:
removePage(in nsIURI aURI);
removePages([array, size_is(aLength)] in nsIURI aURIs, in unsigned long aLength);
removePagesFromHost(in AUTF8String aHost, in boolean aEntireDomain);
removePagesByTimeframe(in PRTime aBeginTime, in PRTime aEndTime);
removeVisitsByTimeframe(in PRTime aBeginTime, in PRTime aEndTime);
removeAllPages();

There is already some reusable code in History.cpp for removals.
Blocks: 580374
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Blocks: 854886
Whiteboard: p=0
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Whiteboard: p=0 → p=13
Attached patch wip (obsolete) (deleted) — Splinter Review
I found this old wip, saving it here.
Assignee: mak77 → nobody
Status: ASSIGNED → NEW
Points: --- → 13
Flags: qe-verify?
Whiteboard: p=13
Attached patch WIP (obsolete) (deleted) — Splinter Review
Unbitrotten patch.
Assignee: nobody → dteller
Attachment #8453600 - Attachment is obsolete: true
Attached patch mozIAsyncHistory.removePlaces (obsolete) (deleted) — Splinter Review
Attachment #8482682 - Attachment is obsolete: true
Attached patch mozIAsyncHistory.removePlaces, v1 (obsolete) (deleted) — Splinter Review
Here's a first attempt. Marco, I am heading in the direction you expect?
Attachment #8487221 - Attachment is obsolete: true
Attachment #8487231 - Flags: feedback?(mak77)
Comment on attachment 8487231 [details] [diff] [review]
mozIAsyncHistory.removePlaces, v1

Ah, there was a problem in my merge. I'll repost once I have finished rebasing.
Attachment #8487231 - Flags: feedback?(mak77)
Attached patch mozIAsyncHistory.removePlaces, v2 (obsolete) (deleted) — Splinter Review
This time, the rebase seems to have succeeded. Sorry for the snafu, I'm still new at hg bookmarks.
Attachment #8487231 - Attachment is obsolete: true
Attachment #8487920 - Flags: feedback?(mak77)
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #7)
> I'm still new at hg bookmarks.

I'm even newer than you, since I like my mercurial queues workflow...
Comment on attachment 8487920 [details] [diff] [review]
mozIAsyncHistory.removePlaces, v2

Review of attachment 8487920 [details] [diff] [review]:
-----------------------------------------------------------------

I just did a quick read through it, not a line by line review, since I didn't have enough time for that

::: toolkit/components/places/History.cpp
@@ +220,5 @@
> +  /**
> +   * The host of the place, for example "mozilla.com".  Only
> +   * subdomain wildcard is accepted, in the form "*.mozilla.com".
> +   *
> +   * If `void`, do not restrict to a subdomain.

why just void? is Empty a valid filter?

@@ +222,5 @@
> +   * subdomain wildcard is accepted, in the form "*.mozilla.com".
> +   *
> +   * If `void`, do not restrict to a subdomain.
> +   */
> +  nsString mHost;

nsCString should be fine here

@@ +1772,5 @@
> +    // more performant way, by initiating a refresh after a limited number of
> +    // single changes.
> +    (void)mNavHistory->BeginUpdateBatch();
> +    mPlaces.EnumerateEntries(NotifyVisitRemoval, this);
> +    (void)mNavHistory->EndUpdateBatch();

we should probably do the batching only if the number of notifications is over a given threshold, like 3

@@ +1805,5 @@
> +    MOZ_ASSERT(visits.Length() > 0);
> +
> +    nsCOMPtr<nsIURI> uri;
> +    (void)NS_NewURI(getter_AddRefs(uri), visits[0].spec);
> +    // XXX visitCount should really just be unsigned (bug 1049812)

this bug has been fixed so the comment can probably be removed

@@ +1930,5 @@
> +		 
> +    return rv;
> +  }
> +
> +  NS_IMETHOD DoIt() {

well, might get a slightly better name :)

@@ +2011,5 @@
> +
> +    nsresult rv;
> +    WhereConditions conditions;
> +
> +    // Extract `mFilter` into `whereConditions` and `whereParameters`. 

Hm, unfortunately I really dislike systems that try to abstract queries into pieces, cause they make very hard to check and optimize the queries... I also think it's inflating the code that would be smaller without it. 

The other problem is that IN operator is quite slow, the best thing to do would be to either implement BindArray in Storage, or, easier to implement, create a temp table for removals, add the ids into it here, then later join with it. since everything is serialized there should be no risk of using data from another call, and regardless both calls would have to remove that stuff.

I'm not sure I'll block on this, cause if it works we can change it later, it's just that I'd like to see explicit and readable queries, we have another query builder in the tree and it's hard to optimize.

@@ +2106,5 @@
> +    while (NS_SUCCEEDED((rv = stmt->ExecuteStep(&hasResult))) && hasResult) {
> +      int64_t id;
> +      rv = stmt->GetInt64(0, &id);
> +      if (NS_FAILED(rv)) {
> +        return rv;

maybe we should continue... or just use AsInt64 here... I don't think this is going to fail

@@ +2121,5 @@
> +   *
> +   * @param aIDs If provided, restrict to the visits with the corresponding `place_id`.
> +   */
> +  nsresult
> +  RemoveVisitsFromDatabase(nsTArray<int64_t>* aIDs)

please bring over the aPlaceIds name, ID is too generic, everything has ids

Please note the subtle difference between RemovePagesByTimeframe and RemoveVisitsByTimeframe

The former removes any page that has even just one visit into the timeframe, the latter removes all visits in the timeframe (but maybe not the page).

Looks like you are implementing the latter, the API is currently unable to replace RemovePagesByTimeframe (that should really be named RemovePagesWithVisitsInTimeframe). We should figure what to do for it. The use-case was removing an history container (Like Yesterday, or August in the history sidebar). I don't know if we should bring that API on though.

The problem is figuring a way to remove the container when we get the onDeleteVisits notifications, they only report the larger visit time removed, not the interval, for performance reasons. But if we move OMT we could notify that properly.

@@ +2188,5 @@
> +   * @param aRemoved If provided, extract all relevant information on the places that
> +   * are going to beremoved, for notification purposes.
> +   */
> +  nsresult
> +  RemovePlacesFromDatabase(nsTArray<int64_t>* aIDs,

So, the point to bring on place objects instead of ids (in the original implementation) was to not have to requery again later when notifying...
Using a temp table this query would be almost free, but instead here we are effectively hitting the db once to collect ids and once to collect other data. I feel like it can be done in one sweep.
The original query could already tell if an entry is bookmarked or if it has visits.

The only case where it might be problematic to not requery is when removing only some visits on pages that are not bookmarked, then we need to requery to see if any visits is left. Though even in that case we might store the first and last visit date and guess based on that.

My ideal final implementation runs just 2 queries, one to find places, one to remove them.

@@ +2214,5 @@
> +    // ?
> +    conditions.AppendCondition("foreign_count = 0");
> +
> +    // Entry does not have a bookmark
> +    conditions.AppendCondition("NOT EXISTS (SELECT 1 FROM moz_bookmarks WHERE moz_bookmarks.fk = id LIMIT 1)");

please use foreign_count = 0 here
Attachment #8487920 - Flags: feedback?(mak77)
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #9)
> ::: toolkit/components/places/History.cpp
> @@ +220,5 @@
> > +  /**
> > +   * The host of the place, for example "mozilla.com".  Only
> > +   * subdomain wildcard is accepted, in the form "*.mozilla.com".
> > +   *
> > +   * If `void`, do not restrict to a subdomain.
> 
> why just void? is Empty a valid filter?

For the moment, "" restricts to subdomain "", which is not very useful. If you wish, I can make it behave as `void`.

> @@ +1772,5 @@
> > +    // more performant way, by initiating a refresh after a limited number of
> > +    // single changes.
> > +    (void)mNavHistory->BeginUpdateBatch();
> > +    mPlaces.EnumerateEntries(NotifyVisitRemoval, this);
> > +    (void)mNavHistory->EndUpdateBatch();
> 
> we should probably do the batching only if the number of notifications is
> over a given threshold, like 3

For the moment, I haven't changed the behavior in place. Do you wish me to?

> @@ +1930,5 @@
> > +		 
> > +    return rv;
> > +  }
> > +
> > +  NS_IMETHOD DoIt() {
> 
> well, might get a slightly better name :)

Possibly :)

> 
> @@ +2011,5 @@
> > +
> > +    nsresult rv;
> > +    WhereConditions conditions;
> > +
> > +    // Extract `mFilter` into `whereConditions` and `whereParameters`. 
> 
> Hm, unfortunately I really dislike systems that try to abstract queries into
> pieces, cause they make very hard to check and optimize the queries... I
> also think it's inflating the code that would be smaller without it. 

I have tried to find a better way and, so far, that's the best one I found. Unless perhaps I put the query directly inside the RemoveVisitsFilter, which I would find extremely ugly, I need to build the query from that filter, i.e. from pieces. Some of my previous attempts included building it from big sequences of string concatenations/nsPrintfCString, and this didn't end up well. The code was longer, harder to read, and failed in a few places, either because I had forgotten to encode/escape some strings or because nsPrintfCString doesn't seem to handle 64 bit integers too well.

> The other problem is that IN operator is quite slow,

Ok.

> the best thing to do
> would be to either implement BindArray in Storage,

I don't understand what you suggest.

> or, easier to implement,
> create a temp table for removals, add the ids into it here, then later join
> with it. since everything is serialized there should be no risk of using
> data from another call, and regardless both calls would have to remove that
> stuff.

This "don't forget to remove your temporary table that pollutes the global namespace" approach does sound a bit scary, I admit.

> I'm not sure I'll block on this, cause if it works we can change it later,
> it's just that I'd like to see explicit and readable queries, we have
> another query builder in the tree and it's hard to optimize.

Well, for the moment, I don't see any alternative that works.

> @@ +2106,5 @@
> > +    while (NS_SUCCEEDED((rv = stmt->ExecuteStep(&hasResult))) && hasResult) {
> > +      int64_t id;
> > +      rv = stmt->GetInt64(0, &id);
> > +      if (NS_FAILED(rv)) {
> > +        return rv;
> 
> maybe we should continue... or just use AsInt64 here... I don't think this
> is going to fail

I picked that code from a previous version. Your call.

> @@ +2121,5 @@
> > +   *
> > +   * @param aIDs If provided, restrict to the visits with the corresponding `place_id`.
> > +   */
> > +  nsresult
> > +  RemoveVisitsFromDatabase(nsTArray<int64_t>* aIDs)
> 
> please bring over the aPlaceIds name, ID is too generic, everything has ids
> 
> Please note the subtle difference between RemovePagesByTimeframe and
> RemoveVisitsByTimeframe
> 
> The former removes any page that has even just one visit into the timeframe,
> the latter removes all visits in the timeframe (but maybe not the page).
> 
> Looks like you are implementing the latter, the API is currently unable to
> replace RemovePagesByTimeframe (that should really be named
> RemovePagesWithVisitsInTimeframe). We should figure what to do for it. The
> use-case was removing an history container (Like Yesterday, or August in the
> history sidebar). I don't know if we should bring that API on though.

How should we decide between both behaviors?

> The problem is figuring a way to remove the container when we get the
> onDeleteVisits notifications, they only report the larger visit time
> removed, not the interval, for performance reasons. But if we move OMT we
> could notify that properly.

I'm not sure what you want me to do here.

> @@ +2188,5 @@
> > +   * @param aRemoved If provided, extract all relevant information on the places that
> > +   * are going to beremoved, for notification purposes.
> > +   */
> > +  nsresult
> > +  RemovePlacesFromDatabase(nsTArray<int64_t>* aIDs,
> 
> So, the point to bring on place objects instead of ids (in the original
> implementation) was to not have to requery again later when notifying...
> Using a temp table this query would be almost free, but instead here we are
> effectively hitting the db once to collect ids and once to collect other
> data. I feel like it can be done in one sweep.
> The original query could already tell if an entry is bookmarked or if it has
> visits.

I admit I didn't manage to adapt the original query to the more powerful filters that we now have. Unless I'm mistaken, the logics of that query was replicated several times in the code (including notifications), and I'm a bit uncomfortable with that.

> The only case where it might be problematic to not requery is when removing
> only some visits on pages that are not bookmarked, then we need to requery
> to see if any visits is left. Though even in that case we might store the
> first and last visit date and guess based on that.

Mmmh.... Ok, I guess.

> 
> My ideal final implementation runs just 2 queries, one to find places, one
> to remove them.
> 
> @@ +2214,5 @@
> > +    // ?
> > +    conditions.AppendCondition("foreign_count = 0");
> > +
> > +    // Entry does not have a bookmark
> > +    conditions.AppendCondition("NOT EXISTS (SELECT 1 FROM moz_bookmarks WHERE moz_bookmarks.fk = id LIMIT 1)");
> 
> please use foreign_count = 0 here

I don't understand. Do you wish me to change the order of AND operands?
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #10)
> For the moment, "" restricts to subdomain "", which is not very useful. If
> you wish, I can make it behave as `void`.

I think isEmpty() is the right check for this (that will catch both void and empty string.

> > we should probably do the batching only if the number of notifications is
> > over a given threshold, like 3
> 
> For the moment, I haven't changed the behavior in place. Do you wish me to?

yes please, now that we use this for more APIs it makes sense.

> nsPrintfCString doesn't seem to handle 64 bit integers too well.

heh, strange it should support 64 integers without many issues using %lld.

I'm ok with keeping this code, even if I guess I'd change it soon cause I prefer plain built queries. But it's matter of personal taste that should probably be not part of a review.

> > The other problem is that IN operator is quite slow,
> 
> Ok.
> 
> > the best thing to do
> > would be to either implement BindArray in Storage,
> 
> I don't understand what you suggest.

stmt->BindArrayByName
that would be bug 483318, that was hard at the time but recently Sqlite team added support for binding arrays. though that may take time to implement properly in Storage with correct XPCOM.

> > or, easier to implement,
> > create a temp table for removals, add the ids into it here, then later join
> > with it. since everything is serialized there should be no risk of using
> > data from another call, and regardless both calls would have to remove that
> > stuff.
> 
> This "don't forget to remove your temporary table that pollutes the global
> namespace" approach does sound a bit scary, I admit.

Well the temporary table would be created just after we clone the connection and would stay there. The risk of a call intermixing data is low, if one wants to be safe could introduce a static operation counter and annotate each entry with it, so each call would have its operation counter.

> > @@ +2106,5 @@
> > > +    while (NS_SUCCEEDED((rv = stmt->ExecuteStep(&hasResult))) && hasResult) {
> > > +      int64_t id;
> > > +      rv = stmt->GetInt64(0, &id);
> > > +      if (NS_FAILED(rv)) {
> > > +        return rv;
> > 
> > maybe we should continue... or just use AsInt64 here... I don't think this
> > is going to fail
> 
> I picked that code from a previous version. Your call.

previous code is not always correct, not just cause it was already there :)

> > Please note the subtle difference between RemovePagesByTimeframe and
> > RemoveVisitsByTimeframe
> > 
> > The former removes any page that has even just one visit into the timeframe,
> > the latter removes all visits in the timeframe (but maybe not the page).
> > 
> > Looks like you are implementing the latter, the API is currently unable to
> > replace RemovePagesByTimeframe (that should really be named
> > RemovePagesWithVisitsInTimeframe). We should figure what to do for it. The
> > use-case was removing an history container (Like Yesterday, or August in the
> > history sidebar). I don't know if we should bring that API on though.
> 
> How should we decide between both behaviors?
> 
> > The problem is figuring a way to remove the container when we get the
> > onDeleteVisits notifications, they only report the larger visit time
> > removed, not the interval, for performance reasons. But if we move OMT we
> > could notify that properly.
> 
> I'm not sure what you want me to do here.

The right path forward would be to change onDeleteVisits to notify the beginTime and endTime of the operation that generated it. then here implement only the removeVisitsByTimeframe behavior, and in nsNavHistoryResult handle onDeleteVisits so that it will recognize it and rebuild a single date container based on the timeframe.
The wrong (but likely faster to implement) path forward is to add something to the removePlaces API to be able to choose between RemoveVisitsByTimeframe or RemovePagesWithVisitsInTimeframe (the current RemovePagesByTimeframe)...

Personally I'd go for the former, but I know the history result code better, and that approach could also fail...
Note we could also have an onDeletePages notification in future (bug 937560) that could help solving this problem in a more efficient and elegant way (it could say "I removed these pages that are in this timeframe")

What do you think is better?

> > @@ +2214,5 @@
> > > +    // ?
> > > +    conditions.AppendCondition("foreign_count = 0");
> > > +
> > > +    // Entry does not have a bookmark
> > > +    conditions.AppendCondition("NOT EXISTS (SELECT 1 FROM moz_bookmarks WHERE moz_bookmarks.fk = id LIMIT 1)");
> > 
> > please use foreign_count = 0 here
> 
> I don't understand. Do you wish me to change the order of AND operands?

NO, to check for bookmarks existance, use foreign_count.
foreign_count = 0 means no bookmarks, foreign_count > 0 means there are bookmarks.
Blocks: 871908
Status: NEW → ASSIGNED
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #11)
> The other problem is that IN operator is quite slow,

Double-checking the code, I realize that operator IN is probably faster than extending what we have at the moment (which is re-doing the complex WHERE). Do you think that it's worth optimizing this in the current bug or would a followup do?


> > > please use foreign_count = 0 here
> > 
> > I don't understand. Do you wish me to change the order of AND operands?
> 
> NO, to check for bookmarks existance, use foreign_count.
> foreign_count = 0 means no bookmarks, foreign_count > 0 means there are
> bookmarks.

So does this mean that I can replace the existing "EXISTS(SELECT 1 FROM moz_bookmarks WHERE fk = moz_places.id) as bookmarked" query with just "FOREIGN_COUNT > 0"?
Flags: needinfo?(mak77)
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #12)
> (In reply to Marco Bonardo [::mak] (needinfo? me) from comment #11)
> > The other problem is that IN operator is quite slow,
> 
> Double-checking the code, I realize that operator IN is probably faster than
> extending what we have at the moment (which is re-doing the complex WHERE).
> Do you think that it's worth optimizing this in the current bug or would a
> followup do?

I think a large IN is actually slower than any query, even if I think latest Sqlite improved it a little bit. Sure we can attack that in a follow-up with proper measurement.

> > > > please use foreign_count = 0 here
> > > 
> > > I don't understand. Do you wish me to change the order of AND operands?
> > 
> > NO, to check for bookmarks existance, use foreign_count.
> > foreign_count = 0 means no bookmarks, foreign_count > 0 means there are
> > bookmarks.
> 
> So does this mean that I can replace the existing "EXISTS(SELECT 1 FROM
> moz_bookmarks WHERE fk = moz_places.id) as bookmarked" query with just
> "FOREIGN_COUNT > 0"?

yes (lowercase though :D)
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #11)
> The right path forward would be to change onDeleteVisits to notify the
> beginTime and endTime of the operation that generated it. then here
> implement only the removeVisitsByTimeframe behavior, and in
> nsNavHistoryResult handle onDeleteVisits so that it will recognize it and
> rebuild a single date container based on the timeframe.

I don't follow. Are you talking about `nsNavHistoryQueryResultNode::OnDeleteVisits`? `nsNavHistoryResult::OnDeleteVisits`?

> The wrong (but likely faster to implement) path forward is to add something
> to the removePlaces API to be able to choose between RemoveVisitsByTimeframe
> or RemovePagesWithVisitsInTimeframe (the current RemovePagesByTimeframe)...
> 
> Personally I'd go for the former, but I know the history result code better,
> and that approach could also fail...
> Note we could also have an onDeletePages notification in future (bug 937560)
> that could help solving this problem in a more efficient and elegant way (it
> could say "I removed these pages that are in this timeframe")
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #14)
> (In reply to Marco Bonardo [::mak] (needinfo? me) from comment #11)
> > The right path forward would be to change onDeleteVisits to notify the
> > beginTime and endTime of the operation that generated it. then here
> > implement only the removeVisitsByTimeframe behavior, and in
> > nsNavHistoryResult handle onDeleteVisits so that it will recognize it and
> > rebuild a single date container based on the timeframe.
> 
> I don't follow. Are you talking about
> `nsNavHistoryQueryResultNode::OnDeleteVisits`?
> `nsNavHistoryResult::OnDeleteVisits`?

yes, the result (that is what populates the views like menu, toolbar, tree) listens to history notifications and updated its contents.
one of the possible views in the history sidebar is "by date", there you can right click a date container and ask for it to be removed. Today we use RemovePagesByTimeframe there that removed not just the visits among those dates, but all of the pages with visits among those dates. We do that because from such a view it's easier to search and remove a given uri. but if we'd have an onDeleteVisits notification with a timeframe perfectly mathing the container, we could know that we can just remove the container.

Mind, this may not be easy.
Attached patch mozIAsyncHistory.removePlaces, v3 (obsolete) (deleted) — Splinter Review
As discussed over IRC, let's keep the changes to time-based removal for a followup bug.
Attachment #8487920 - Attachment is obsolete: true
Attachment #8491553 - Flags: review?(mak77)
Marco, please add this to the current iteration.
Flags: needinfo?(mmucci)
ehr, sorry wrong tab.
Flags: needinfo?(mmucci)
Comment on attachment 8491553 [details] [diff] [review]
mozIAsyncHistory.removePlaces, v3

Review of attachment 8491553 [details] [diff] [review]:
-----------------------------------------------------------------

::: .hgtags
@@ +102,5 @@
>  83c9853e136451474dfa6d1aaa60a7fca7d2d83a FIREFOX_AURORA_30_BASE
>  cfde3603b0206e119abea76fdd6e134b634348f1 FIREFOX_AURORA_31_BASE
>  16f3cac5e8fe471e12f76d6a94a477b14e78df7c FIREFOX_AURORA_32_BASE
>  dc23164ba2a289a8b22902e30990c77d9677c214 FIREFOX_AURORA_33_BASE
> +1cf181c7d2ac2a8ba4051f996dc38537cb988200 try

This change looks unrelated.

::: toolkit/components/places/History.cpp
@@ +186,5 @@
> +    return !mClearEverything
> +      &&   mTransitionType != UINT32_MAX
> +      &&   mURIs.Length() == 0
> +      &&   mGUIDs.Length() == 0
> +      &&   mHost.IsEmpty();

this styling is exotic, even referring to the coding style... we usually append the &&.
if you want to retain this for blame reasons, at least please avoid triple spacing after &&

@@ +190,5 @@
> +      &&   mHost.IsEmpty();
> +  }
> +
> +
> +public:

please remove one newline above this
also, are not members of a struct always public by default?

@@ +220,5 @@
> +   * subdomain wildcard is accepted, in the form "*.mozilla.com".
> +   *
> +   * If empty, do not restrict to a subdomain.
> +   */
> +  nsString mHost;

why not an nsCString?

@@ +227,5 @@
> +// Utility enumerator for nsThashtable. Used to avoid type errors.
> +template<typename T, typename Acc>
> +size_t Enumerate(nsTHashtable<T>& aTable,
> +               PLDHashOperator (*aEnumerator)(T* aBinding, Acc* aAccumulator),
> +               Acc* aAccumulator)

fix indentation

@@ +238,5 @@
> +/**
> + * An entry in a hashtable used to map a mozStorage binding variable
> + * (as a C string) to its value (as a nsIVariant).
> + */
> +class BindingHashEntry: public nsCStringHashKey

MOZ_FINAL

@@ +241,5 @@
> + */
> +class BindingHashEntry: public nsCStringHashKey
> +{
> +public:
> +  BindingHashEntry(const nsACString& aSpec)

explicit

@@ +245,5 @@
> +  BindingHashEntry(const nsACString& aSpec)
> +    : nsCStringHashKey(&aSpec)
> +    , mValue(new nsVariant())
> +  { }
> +  BindingHashEntry(const nsACString* aSpec)

ditto

@@ +249,5 @@
> +  BindingHashEntry(const nsACString* aSpec)
> +    : nsCStringHashKey(aSpec)
> +    , mValue(new nsVariant())
> +  { }
> +  BindingHashEntry(const BindingHashEntry& aOther)

ditto

@@ +279,5 @@
> +    mConditions.AppendElement(aCondition);
> +  }
> +  void AppendCondition(const char* aCondition)
> +  {
> +    nsCString condition(aCondition);

please use an auto string here

@@ +441,5 @@
> +   * `true` if the entire place is being removed, `false` if we are
> +   * only removing visits to this place.
> +   */
> +  bool IsPlaceBeingRemoved() {
> +    return !IsBookmarked() && mVisits.Length() == VisitCount();

This means for any removal we need to know how many visits a place has, but in most cases we don't care about that info (RemovePage for example already knows all of the visits should be removed).
In such cases we should not join with visits, and we won't know the visit count.

(not I didn't look at the code below this yet, so I'm not sure what the query is doing)

@@ +464,5 @@
> +
> +
> +  PRTime VisitTime() const
> +  {
> +    return mVisits[0].visitTime;

will this be first visit, last visit, random visit?

@@ +965,4 @@
>                            bool aIsSingleVisit,
>                            nsresult aResult)
>    : mCallback(aCallback)
> +  , mPlace(aPlace?new VisitData(*aPlace):nullptr)

add spacing please..

btw, how can the consumer recognize what caused an error if we don't pass out a VisitData?
do we notify an error for each entry or just once?

@@ +1010,5 @@
> +
> +      // The frecency isn't exposed because it may not reflect the updated value
> +      // in the case of InsertVisitedURIs.
> +      return
> +        new PlaceInfo(mPlace->placeId, mPlace->guid, uri.forget(), mPlace->title,

please reindent these correctly

@@ +1016,5 @@
> +    }
> +    else {
> +      // Same as above.
> +      return
> +        new PlaceInfo(mPlace->placeId, mPlace->guid, uri.forget(), mPlace->title,

ditto

@@ +1025,3 @@
>  private:
>    nsMainThreadPtrHandle<mozIVisitInfoCallback> mCallback;
> +  const UniquePtr<const VisitData> mPlace;

Did you ensure this is created and destroyed on the same thread?

@@ +1165,5 @@
>        bool known = lastPlace && lastPlace->IsSamePlaceAs(place);
>        if (!known) {
>          nsresult rv = mHistory->FetchPageInfo(place, &known);
>          if (NS_FAILED(rv)) {
> +          if (!!mCallback) {            

trailing spaces

@@ +1817,5 @@
> +   * @param aPlaces The list of places effectively removed.
> +   * @param aTransitionType If the places have been removed because we cleared all the values for
> +   * a given transition type, the transition type. Otherwise, specify UINT32_MAX.
> +   */
> +  explicit NotifyRemoveVisits(nsTHashtable<PlaceHashKey>& aPlaces,

since you incremented the number of arguments this doesn't need anymore to be explicit

@@ +1846,5 @@
> +      NS_WARNING("Cannot notify without the history service!");
> +      return NS_OK;
> +    }
> +
> +    if (mPlaces.Count() > 3) {

please move the number to a constant, I think we'll have to tweak this a little bit and that will make tweaking easier.

@@ +1914,5 @@
> +   * Strong reference to the History object because we do not want it to
> +   * disappear out from under us.
> +   */
> +  nsRefPtr<History> mHistory;
> +  nsRefPtr<nsNavHistory> mNavHistory;

History supports thread-safe refcounting, but navHistory doesn't... I'd suggest to just call nsNavHistory::GetHistoryService(); where needed, or you should switch to mainthreadptrhandle

@@ +2013,5 @@
> +    }
> +
> +    if (NS_FAILED(rv)) {
> +      nsCOMPtr<nsIRunnable> cb =
> +        new NotifyPlaceInfoCallback(mCallback,

please add comment explaining when this is expected to be fired (once, one per error, whether partial changes are executed or not in case of error)

@@ +2022,5 @@
> +    }
> +
> +    nsCOMPtr<nsIRunnable> cb = new NotifyCompletion(mCallback);
> +    (void)NS_DispatchToMainThread(cb);
> +		 

trailing spaces

@@ +2050,5 @@
>      mozStorageTransaction transaction(mDBConn, false,
>                                        mozIStorageConnection::TRANSACTION_IMMEDIATE);
>  
> +    // Remove the visits
> +    rv = RemoveVisitsFromDatabase(clearHistory?nullptr:&places);

add spacing please.

@@ +2063,5 @@
>  
> +    if (clearHistory) {
> +      rv = NS_DispatchToMainThread(new NotifyClearHistory());
> +    } else {
> +      uint32_t transition = mFilter.IsOnlyTransitionFilter()?mFilter.mTransitionType:UINT32_MAX;

ditto

@@ +2097,5 @@
>  
> +    nsresult rv;
> +    WhereConditions conditions;
> +
> +    // Extract `mFilter` into `whereConditions` and `whereParameters`. 

trailing space

@@ +2189,5 @@
> +    nsCOMPtr<mozIStorageStatement> stmt;
> +    rv = conditions.GetStatement("SELECT moz_historyvisits.id, moz_places.id, url, guid, visit_date, visit_type, foreign_count, "
> +                                 "(SELECT count(*) FROM moz_historyvisits WHERE place_id = moz_places.id) as full_visit_count "
> +                                 "FROM moz_historyvisits "
> +                                 "JOIN moz_places ON place_id = moz_places.id",

ok this was my fear.

The existing option (transition) is a visits filter, if it's defined then it makes sense to query visits, as well as if we want to just remove visits in a given timeframe.

Though most of the filters are page filters, that means for them all of the visits are removed. in those most common cases the above join between moz_historyvisits and moz_places can generate a giant dataset that will slow down all of the operation with no benefit.

We need ti distinguish visits filters from page filters, and use the most appropriate query, that means if we don't have visits filters we should use the faster query (no join with moz_historyvisits). This also means in this case we don't have info about existing visits (apart the last_visit_date field)

@@ +2209,5 @@
> +      visit.visitTime = stmt->AsInt64(4); // visit_date
> +      visit.transitionType = stmt->AsInt32(5); // visit_type
> +
> +      int32_t bookmarked = stmt->AsInt32(6); // foreign_count
> +      int32_t visitCount = stmt->AsInt32(7); // full_visit_count

the visit_count that is in the database IS NOT the full visit count, it excludes certain transition. So it cannot be used to know the real visit count.

@@ +2244,5 @@
> +
> +    if (aPlaces) {
> +      // Restrict ourselves to a subset
> +      nsCString visitIdsToRemove;
> +      Enumerate(*aPlaces, ListToBeRemovedVisitIds, &visitIdsToRemove);

this is an overkill in most cases where we don't need to filter on visits... moreover it should be faster to query directly than to provide thousands-long list of ids cause in the former case the db can just find begin-end from the index, rather then having to walk the index for each id.

::: toolkit/components/places/mozIAsyncHistory.idl
@@ +199,5 @@
> +   *                    identified by either the GUID or the URI.
> +   *          - host: the host of the place, for example mozilla.com.  Only
> +   *                  subdomain wildcard is accepted, in the form *.mozilla.com.
> +   * @param [optional] aCallback
> +   *        A mozIVisitInfoCallback object which consists of callbacks to be 

trailing space

@@ +204,5 @@
> +   *        notified for successful or failed removals.
> +   *        handleResult is invoked for each place passed into, not once for
> +   *        each removed visit.  In case of a clear history operation, thus when
> +   *        no rules are provided, handleResult won't be called.
> +   *        The returned places info objects DO NOT include the visits data (the

cropped?
Attachment #8491553 - Flags: review?(mak77) → feedback+
So, I feel like this has enough architectural problems (see the visits/places query) that it's worth discussing a totally different approach, or the code will become totally unmanageable.

We could do something like we are doing for the new bookmarking API. Create an History.jsm module, implement update() and remove() there. Update will just wrap updatePlaces() while in remove() you can port this patch to js.

You could use wrapStorageConnection there (on PlacesUtils.history.DBConnection), so the nicer Sqlite.jsm API.

Then you could Proxy nsINavHistoryService, mozIAsyncHistory, nsPIPlacesDatabase and History.jsm out of PlacesUtils.history (like we do for Bookmarks.jsm: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/PlacesUtils.jsm#1832), so that becomes again our main point of access (PlacesUtils.asyncHistory will just become a deprecated alias)

Advantages:
1. we could move much faster here
2. people could contribute much easier
3. everything will again be available through PlacesUtils.history
4. we have a place for further js porting of some history pieces
5. we can breakdown the bug more easily (first create the module with empty API for SR, then create the proxy, then implement the API and test, then convert old tests to use it).

Disadvantages:
1. we must throw away the cpp work done so far (even if that was useful to define the behavior), that means at least a couple weeks of development
2. you might become sad due to that

My vision about this is that even if the cost is very high, I'm sure you'd move much faster in js world and reviewing the patch would be ten times easier for me, so in the end we'd finish this earlier...
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #19)
> @@ +441,5 @@
> > +   * `true` if the entire place is being removed, `false` if we are
> > +   * only removing visits to this place.
> > +   */
> > +  bool IsPlaceBeingRemoved() {
> > +    return !IsBookmarked() && mVisits.Length() == VisitCount();
> 
> This means for any removal we need to know how many visits a place has, but
> in most cases we don't care about that info (RemovePage for example already
> knows all of the visits should be removed).
> In such cases we should not join with visits, and we won't know the visit
> count.
> 
> (not I didn't look at the code below this yet, so I'm not sure what the
> query is doing)

Fwiw, I'm basically consolidating in this method code that existed in several places throughout the file.

> @@ +464,5 @@
> > +
> > +
> > +  PRTime VisitTime() const
> > +  {
> > +    return mVisits[0].visitTime;
> 
> will this be first visit, last visit, random visit?

Here, too. We had code hitting arbitrarily in mVisits[0] without any explanation.

> add spacing please..
> 
> btw, how can the consumer recognize what caused an error if we don't pass
> out a VisitData?
> do we notify an error for each entry or just once?

I may be misunderstanding you, but in `mozIStorage::Remove()`, errors are not related to a visit but to a query, so the `VisitData` is pretty useless here.

> @@ +1817,5 @@
> > +   * @param aPlaces The list of places effectively removed.
> > +   * @param aTransitionType If the places have been removed because we cleared all the values for
> > +   * a given transition type, the transition type. Otherwise, specify UINT32_MAX.
> > +   */
> > +  explicit NotifyRemoveVisits(nsTHashtable<PlaceHashKey>& aPlaces,
> 
> since you incremented the number of arguments this doesn't need anymore to
> be explicit

When exactly do we need `explicit`? Is this a recent convention? As far as I can tell, these appeared in a recent merge.


> @@ +2189,5 @@
> > +    nsCOMPtr<mozIStorageStatement> stmt;
> > +    rv = conditions.GetStatement("SELECT moz_historyvisits.id, moz_places.id, url, guid, visit_date, visit_type, foreign_count, "
> > +                                 "(SELECT count(*) FROM moz_historyvisits WHERE place_id = moz_places.id) as full_visit_count "
> > +                                 "FROM moz_historyvisits "
> > +                                 "JOIN moz_places ON place_id = moz_places.id",
> 
> ok this was my fear.

> The existing option (transition) is a visits filter, if it's defined then it
> makes sense to query visits, as well as if we want to just remove visits in
> a given timeframe.
> 
> Though most of the filters are page filters, that means for them all of the
> visits are removed. in those most common cases the above join between
> moz_historyvisits and moz_places can generate a giant dataset that will slow
> down all of the operation with no benefit.
> 
> We need ti distinguish visits filters from page filters, and use the most
> appropriate query, that means if we don't have visits filters we should use
> the faster query (no join with moz_historyvisits). This also means in this
> case we don't have info about existing visits (apart the last_visit_date
> field)

Yeah, I was querying because apparently, we needed moz_historyvisits.id for NotifyPlaceInfoCallback. I'll be quite happy to get rid of it, if I manage to do without `visitId`.

> @@ +2209,5 @@
> > +      visit.visitTime = stmt->AsInt64(4); // visit_date
> > +      visit.transitionType = stmt->AsInt32(5); // visit_type
> > +
> > +      int32_t bookmarked = stmt->AsInt32(6); // foreign_count
> > +      int32_t visitCount = stmt->AsInt32(7); // full_visit_count
> 
> the visit_count that is in the database IS NOT the full visit count, it
> excludes certain transition. So it cannot be used to know the real visit
> count.

So is it sufficient to find out whether the place is orphaned and should be removed?


> @@ +2244,5 @@
> > +
> > +    if (aPlaces) {
> > +      // Restrict ourselves to a subset
> > +      nsCString visitIdsToRemove;
> > +      Enumerate(*aPlaces, ListToBeRemovedVisitIds, &visitIdsToRemove);
> 
> this is an overkill in most cases where we don't need to filter on visits...
> moreover it should be faster to query directly than to provide
> thousands-long list of ids cause in the former case the db can just find
> begin-end from the index, rather then having to walk the index for each id.

I don't understand your last sentence.
I'll try and sketch a History.jsm and see how that works out.
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #21)
> Fwiw, I'm basically consolidating in this method code that existed in
> several places throughout the file.

yeah I'm not sure anymore this approach is the right one, mixing up visits and pages.

> > will this be first visit, last visit, random visit?
> 
> Here, too. We had code hitting arbitrarily in mVisits[0] without any
> explanation.

that was fine for transition, since all of the visits had the same transition, checking any was fine.

> When exactly do we need `explicit`? Is this a recent convention? As far as I
> can tell, these appeared in a recent merge.

if a constructor has a single mandatory argument, to avoid implicit constructor
so one argument, or one mandatory argument and multiple optional ones.
 
> Yeah, I was querying because apparently, we needed moz_historyvisits.id for
> NotifyPlaceInfoCallback. I'll be quite happy to get rid of it, if I manage
> to do without `visitId`.

we should, in the cases where all visits to a page are removed we should not populate the visits property of the object. 

> > @@ +2209,5 @@
> > > +      visit.visitTime = stmt->AsInt64(4); // visit_date
> > > +      visit.transitionType = stmt->AsInt32(5); // visit_type
> > > +
> > > +      int32_t bookmarked = stmt->AsInt32(6); // foreign_count
> > > +      int32_t visitCount = stmt->AsInt32(7); // full_visit_count
> > 
> > the visit_count that is in the database IS NOT the full visit count, it
> > excludes certain transition. So it cannot be used to know the real visit
> > count.
> 
> So is it sufficient to find out whether the place is orphaned and should be
> removed?

it is NOT sufficient cause it doesn't track all visits. last_visit_date is sufficient, visit_count is not.

> > this is an overkill in most cases where we don't need to filter on visits...
> > moreover it should be faster to query directly than to provide
> > thousands-long list of ids cause in the former case the db can just find
> > begin-end from the index, rather then having to walk the index for each id.
> 
> I don't understand your last sentence.

an IN list is like a giant OR, for each entry it must look through the index, while with a range it will likely be faster to lookup the index.
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #22)
> I'll try and sketch a History.jsm and see how that works out.

yes please, this will make everything easier for everyone.
You can look at how we proxied bookmarks in PlacesUtils to expose your method.
Re-scoping
Summary: Add new async removePlaces API in mozIAsyncHistory → Implement History.remove
Attached patch History.remove (obsolete) (deleted) — Splinter Review
Attachment #8491553 - Attachment is obsolete: true
Attachment #8503059 - Flags: review?(mak77)
Comment on attachment 8503059 [details] [diff] [review]
History.remove

Review of attachment 8503059 [details] [diff] [review]:
-----------------------------------------------------------------

let's create an history subfolder in tests and create the new tests there. this should be test_history_remove.js

note the original RemovePage(s) methods also invoke clearEmbedVisits(); that method is internal, we need to expose it to be able to call it from here, I think it's fine to expose it in nsINavHistoryService

::: toolkit/components/places/History.jsm
@@ +211,5 @@
> +    // Normalize and type-check arguments
> +    if (Array.isArray(pages)) {
> +      if (pages.length == 0) {
> +        // Nothing to do, bailout early.
> +        return Promise.resolve(false);

I'd rather throw here, passing an empty array is a clear coding error.

@@ +220,5 @@
> +
> +    let guids = [];
> +    let urls = [];
> +    for (let page of pages) {
> +      let normalized = normalizeToURLOrGUID(page);

nit: it's a little bit unclear that normalizeToURLorGUID throws, what about renaming it to validateURLorGUID?

@@ +231,5 @@
> +    // At this stage, we know that either `guids` is not-empty
> +    // or `urls` is not-empty.
> +
> +    if (onResult && typeof onResult != "function") {
> +      throw new TypeError("Invalid function: " + onResult);

let's throw simple Error for now, cause of Task limitations.

@@ +236,5 @@
> +    }
> +
> +    // Now perform queries
> +    return Task.spawn(function*() {
> +      let db = yield DBConnPromised;

I'd probably move most of the implementation to a generator function (like removePages) in the module scope and here keep the code as simple as possible.
What I'm doing in Bookmarks for example is keeping most of immediate error reporting in the main method, once I'm sure the input is as good as possible, I pass it to the internal function that does the job.

The problem I'm trying to address with that, is that if we inline all of the implementations in the API object, it will shortly be unreadable.

@@ +249,5 @@
> +        // We need to extract more data for the sake of `onResult`.
> +        query = "SELECT id, url, guid, foreign_count, title, frecency FROM moz_places " +
> +                "WHERE (" +
> +                  condGuids + condOperator + condUrls +
> +                ")";

I suggest you to use template strings to build queries, they are SO MUCH nicer.

I'm also not sure why you distinguish the onResult case, the cost you pay to select a couple more fields is ignorable, much lower than the maintenance cost for this double code

finally the parenthesis around the where condition are not very useful

@@ +254,5 @@
> +      } else {
> +        query = "SELECT id, url, guid, foreign_count FROM moz_places " +
> +                "WHERE (" +
> +                  condGuids + condOperator + condUrls +
> +                ")";

I think you could just do

`SELECT id, url, guid, foreign_count, title, frecency FROM moz_places
 WHERE guid IN(${guids.join(", ")})
 OR url IN (${urls.map(u => JSON.stringify(u.href)).join(", ")})`

it's quite simple and doesn't hide any logic.

I'm not even sure it's needed to generate 2 separate arrays for urls and guids, you could just use pages here, so

`SELECT id, url, guid, foreign_count, title, frecency FROM moz_places
 WHERE guid IN( ${pages.filter(p => typeof p === "string").join(", ")} )
 OR url IN ( ${[JSON.stringify(p.href) for (p of pages) if (p instanceof URL)].join(", ")} )`

note that an empty IN() will just select nothing, so there's no need to avoid it.

@@ +257,5 @@
> +                  condGuids + condOperator + condUrls +
> +                ")";
> +      }
> +
> +      let pages = [];

you cannot shadow an argument with a local var.. that makes me wonder if you set "use strict" in History.jsm?
Regardless, it's a bad idea, maybe infos or rename the argument to urlsOrGuids

@@ +259,5 @@
> +      }
> +
> +      let pages = [];
> +      let idPagesToRemove = [];
> +      let idPagesToKeep = [];

so, exactly why can't you keep a single array of objects with a keep/remove property and use .filter later?

@@ +260,5 @@
> +
> +      let pages = [];
> +      let idPagesToRemove = [];
> +      let idPagesToKeep = [];
> +      yield db.execute(query, null, Task.async(function*(row) {

executeCached please

@@ +277,5 @@
> +            guid: guid,
> +            title: row.getResultByName("title"),
> +            frecency: row.getResultByName("frecency"),
> +            url: new URL(url)
> +          };

this should be unified since ther's little point into separating them.
you might build an internal representation object with not-enumerable private properties (for example _id, _uri) and later use Object.assign({}, internal_object) to copy enumerable properties not a new object for onResult.

@@ +282,5 @@
> +          try {
> +            yield onResult(pageInfo);
> +          } catch (ex) {
> +            // Errors should be reported but should not stop `remove`.
> +            Promise.reject(ex);

since you are inside a task, won't the original exception already reject? I think Sqlite onRow handler ignores (but reports) exceptions thrown by the onRow handler, please check?

@@ +290,5 @@
> +        if (toRemove) {
> +          idPagesToRemove.push(id);
> +        } else {
> +          idPagesToKeep.push(id);
> +        }

yeah I don't think this is useful, it's very easy to use .filter or array comprehensions
at a maximum you might need bool caches like hasPagesToRemove or hasPagesToKeep (easy to build with array.some)

@@ +302,5 @@
> +      return db.executeTransaction(function*() {
> +        // 2. Remove all visits to these pages.
> +        let visitsToRemove = pages.map(x => x.id);
> +        yield db.execute("DELETE FROM moz_historyvisits " +
> +                         "WHERE (place_id IN " + sqlList(visitsToRemove) + ")");

no need for parenthesis around where clause.
visitsToRemove is a misleading name. please just inline the .map call into a template string

@@ +322,5 @@
> +            /*visitTime*/ 0, // There are no more visits
> +            /*wholeEntry*/ toRemove,
> +            /*guid*/ guid,
> +            /*reason*/ Ci.nsINavHistoryObserver.REASON_DELETED,
> +            /*transition*/ -1);

could you please move /*argname*/ to // argname ? I'm asking cause often we use /**/ to disable block of codes for debugging purposes, and having /**/ in the middle of functions makes that harder.

@@ +426,5 @@
> +  }
> +  if (key instanceof Ci.nsIURI) {
> +    return new URL(key.spec);
> +  }
> +  throw new TypeError("Invalid url or guid: " + key);

just throw Error, as said

@@ +446,5 @@
> + * @param idList: (Array)
> + *      The `moz_places` identifiers for the places to invalidate.
> + * @return (Promise)
> + */
> +let invalidateFrecencies = Task.async(function*(db, idList) {

is it not enough for it to be
function* invalidateFrecencies(db, idList) ?

@@ +447,5 @@
> + *      The `moz_places` identifiers for the places to invalidate.
> + * @return (Promise)
> + */
> +let invalidateFrecencies = Task.async(function*(db, idList) {
> +  if (idList.length == 0) {

what's the point? regardless you do this check outside already...

@@ +450,5 @@
> +let invalidateFrecencies = Task.async(function*(db, idList) {
> +  if (idList.length == 0) {
> +    return Promise.resolve();
> +  }
> +  let ids = sqlList(idList);

just embed in the template...

::: toolkit/components/places/tests/unit/test_browserhistory.js
@@ +10,5 @@
>  const TEST_SUBDOMAIN_URI = NetUtil.newURI("http://foobar.mozilla.com/");
>  
> +Cu.importGlobalProperties(["URL"]);
> +
> +add_task(function* test_addPage()

please do not change the old test for now, make a new one specific for the new code.
We'll convert old tests at a later step.

@@ +69,5 @@
>  
> +// Test `History.remove`, as implemented in History.jsm, to remove visits to a single page
> +add_task(function* test_remove_single()
> +{
> +  let WITNESS_URI = NetUtil.newURI("http://mozilla.com/test_browserhistory/test_remove/" + Math.random());;

double semicolon

@@ +149,5 @@
> +    if (options.useCallback) {
> +      removed = yield PlacesUtils.history.remove(removeArg, page => {
> +        Assert.equal(page.url.href, uri.spec, "Callback provides the correct url");
> +        Assert.equal(page.guid, do_get_guid_for_uri(uri), "Callback provides the correct guid");
> +        Assert.equal(page.title, title, "Callback provides the correct title");

you should also check the callback has been invoked... if its not invoked these tests won't run.
also check it is invoked the expected number of times.

If the page has not title, I think there should not be a title property in the returned object, the same for visits.

you should also check frecency (even if honestly I'm not sure we have an use for that currently...)

@@ +158,5 @@
> +
> +    yield promiseObserved;
> +    PlacesUtils.history.removeObserver(observer);
> +
> +    Assert.ok(visits_in_database(uri) == 0, "History entry has disappeared");

equal(something, 0)

@@ +159,5 @@
> +    yield promiseObserved;
> +    PlacesUtils.history.removeObserver(observer);
> +
> +    Assert.ok(visits_in_database(uri) == 0, "History entry has disappeared");
> +    Assert.ok(visits_in_database(WITNESS_URI) != 0, "Witness URI still has visits");

notEqual

and so on

@@ +300,5 @@
> +    Assert.ok(origin, "onResult has a valid page");
> +    Assert.ok(!origin.onResultCalled, "onResult has not seen this page yet");
> +    origin.onResultCalled = true;
> +    Assert.equal(page.guid, origin.guid, "onResult has the right guid");
> +    Assert.equal(page.title, origin.title, "onResult has the right title");

ditto for null title, visits and frecency

@@ +357,5 @@
> +  Assert.throws(
> +    () => PlacesUtils.history.remove("http://example.org", "not a function, obviously"),
> +    /TypeError: Invalid function/,
> +    "History.remove with a second argument that is not a function argument should throw a TypeError"
> +  );

add a test for passing [], passing null, passing [ null ], also passing null as onResult (I think it should be ignored... what do you think?)
Attachment #8503059 - Flags: review?(mak77) → feedback+
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #27)
> note the original RemovePage(s) methods also invoke clearEmbedVisits(); that
> method is internal, we need to expose it to be able to call it from here, I
> think it's fine to expose it in nsINavHistoryService

Ok.

> @@ +231,5 @@
> > +    // At this stage, we know that either `guids` is not-empty
> > +    // or `urls` is not-empty.
> > +
> > +    if (onResult && typeof onResult != "function") {
> > +      throw new TypeError("Invalid function: " + onResult);
> 
> let's throw simple Error for now, cause of Task limitations.

What do you mean "limitations"? If there is a TypeError, we want to report it, and that's what Task does.

> @@ +249,5 @@
> > +        // We need to extract more data for the sake of `onResult`.
> > +        query = "SELECT id, url, guid, foreign_count, title, frecency FROM moz_places " +
> > +                "WHERE (" +
> > +                  condGuids + condOperator + condUrls +
> > +                ")";
> 
> I suggest you to use template strings to build queries, they are SO MUCH
> nicer.

Unfortunately, they kill js2-mode :/

> I'm also not sure why you distinguish the onResult case, the cost you pay to
> select a couple more fields is ignorable, much lower than the maintenance
> cost for this double code

Oh? I must have misunderstood one of your remarks from the C++ version. I'll collapse the two codepaths.

> I'm not even sure it's needed to generate 2 separate arrays for urls and
> guids, you could just use pages here, so
> 
> `SELECT id, url, guid, foreign_count, title, frecency FROM moz_places
>  WHERE guid IN( ${pages.filter(p => typeof p === "string").join(", ")} )
>  OR url IN ( ${[JSON.stringify(p.href) for (p of pages) if (p instanceof
> URL)].join(", ")} )`
> 
> note that an empty IN() will just select nothing, so there's no need to
> avoid it.

That feels less readable than generating the list above. Also, counting on JSON.stringify is kind of ugly, so I'd rather doing it in a single place that we will be able to fix if necessary.

> 
> @@ +257,5 @@
> > +                  condGuids + condOperator + condUrls +
> > +                ")";
> > +      }
> > +
> > +      let pages = [];
> 
> you cannot shadow an argument with a local var.. that makes me wonder if you
> set "use strict" in History.jsm?
> Regardless, it's a bad idea, maybe infos or rename the argument to
> urlsOrGuids

Technically, I can, since I'm in a different scope. But yeah, let's get rid of this.

> @@ +259,5 @@
> > +      }
> > +
> > +      let pages = [];
> > +      let idPagesToRemove = [];
> > +      let idPagesToKeep = [];
> 
> so, exactly why can't you keep a single array of objects with a keep/remove
> property and use .filter later?

Because that looked nicer.

> @@ +260,5 @@
> > +
> > +      let pages = [];
> > +      let idPagesToRemove = [];
> > +      let idPagesToKeep = [];
> > +      yield db.execute(query, null, Task.async(function*(row) {
> 
> executeCached please

That looks weird. Since I'm passing the IDs as part of the argument, I'm just going to waste memory and time attempting to memoize something that cannot be reused.

> 
> @@ +277,5 @@
> > +            guid: guid,
> > +            title: row.getResultByName("title"),
> > +            frecency: row.getResultByName("frecency"),
> > +            url: new URL(url)
> > +          };
> 
> this should be unified since ther's little point into separating them.
> you might build an internal representation object with not-enumerable
> private properties (for example _id, _uri) and later use Object.assign({},
> internal_object) to copy enumerable properties not a new object for onResult.

The only field in common is `guid`. What exactly would we win with your suggestion?

> @@ +282,5 @@
> > +          try {
> > +            yield onResult(pageInfo);
> > +          } catch (ex) {
> > +            // Errors should be reported but should not stop `remove`.
> > +            Promise.reject(ex);
> 
> since you are inside a task, won't the original exception already reject? I
> think Sqlite onRow handler ignores (but reports) exceptions thrown by the
> onRow handler, please check?

Ok. If I get rid of idPagesTo{Remove, Keep}, I should be able to do that.

> is it not enough for it to be
> function* invalidateFrecencies(db, idList) ?

That looks weird. Placing generators and hoping that the caller doesn't forget to call `Task.spawn` when calling them seems more fragile.

> ::: toolkit/components/places/tests/unit/test_browserhistory.js
> @@ +10,5 @@
> >  const TEST_SUBDOMAIN_URI = NetUtil.newURI("http://foobar.mozilla.com/");
> >  
> > +Cu.importGlobalProperties(["URL"]);
> > +
> > +add_task(function* test_addPage()
> 
> please do not change the old test for now, make a new one specific for the
> new code.
> We'll convert old tests at a later step.

Ok. I switched it to `add_task` because 1/ I needed to test asynchronous code; 2/ it was just a simple search and replace.

> add a test for passing [], passing null, passing [ null ], also passing null
> as onResult (I think it should be ignored... what do you think?)

Agreed, `null` as `onResult` should be ignored, especially since we consider eventually adding support for a third argument.
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #28)
> > @@ +231,5 @@
> > > +    // At this stage, we know that either `guids` is not-empty
> > > +    // or `urls` is not-empty.
> > > +
> > > +    if (onResult && typeof onResult != "function") {
> > > +      throw new TypeError("Invalid function: " + onResult);
> > 
> > let's throw simple Error for now, cause of Task limitations.
> 
> What do you mean "limitations"? If there is a TypeError, we want to report
> it, and that's what Task does.

The problem is that with TypeError, even if you handle the rejection it will still dump stuff to the debug console, we don't want to add trash to the debug console (this is by design, ask paolo for details).

> > @@ +249,5 @@
> > > +        // We need to extract more data for the sake of `onResult`.
> > > +        query = "SELECT id, url, guid, foreign_count, title, frecency FROM moz_places " +
> > > +                "WHERE (" +
> > > +                  condGuids + condOperator + condUrls +
> > > +                ")";
> > 
> > I suggest you to use template strings to build queries, they are SO MUCH
> > nicer.
> 
> Unfortunately, they kill js2-mode :/

yes, they also kill Sublime Text hilighter, but it's a minor annoyance compared to the gain.


> > I'm not even sure it's needed to generate 2 separate arrays for urls and
> > guids, you could just use pages here, so
> > 
> > `SELECT id, url, guid, foreign_count, title, frecency FROM moz_places
> >  WHERE guid IN( ${pages.filter(p => typeof p === "string").join(", ")} )
> >  OR url IN ( ${[JSON.stringify(p.href) for (p of pages) if (p instanceof
> > URL)].join(", ")} )`
> > 
> > note that an empty IN() will just select nothing, so there's no need to
> > avoid it.
> 
> That feels less readable than generating the list above. Also, counting on
> JSON.stringify is kind of ugly, so I'd rather doing it in a single place
> that we will be able to fix if necessary.

I'm find if you want to keep the stinrgify and join parts in an helper, but pleae inline the call in the template string and keep the IN parenthesis in the query.

> > executeCached please
> 
> That looks weird. Since I'm passing the IDs as part of the argument, I'm
> just going to waste memory and time attempting to memoize something that
> cannot be reused.

uh whoops, you are right! damn IN operator.

> > @@ +277,5 @@
> > > +            guid: guid,
> > > +            title: row.getResultByName("title"),
> > > +            frecency: row.getResultByName("frecency"),
> > > +            url: new URL(url)
> > > +          };
> > 
> > this should be unified since ther's little point into separating them.
> > you might build an internal representation object with not-enumerable
> > private properties (for example _id, _uri) and later use Object.assign({},
> > internal_object) to copy enumerable properties not a new object for onResult.
> 
> The only field in common is `guid`. What exactly would we win with your
> suggestion?

uh, I misremembered what onDeleteURI got... I think it might still look simpler to go through a single object, but it's NIT now.

> > is it not enough for it to be
> > function* invalidateFrecencies(db, idList) ?
> 
> That looks weird. Placing generators and hoping that the caller doesn't
> forget to call `Task.spawn` when calling them seems more fragile.

It's internal code and we have nice warnings when rejections are not handled. I think it's fine.
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #29)
> The problem is that with TypeError, even if you handle the rejection it will
> still dump stuff to the debug console, we don't want to add trash to the
> debug console (this is by design, ask paolo for details).

Well, this is a programmer error. Why would we want to hide it?

> I'm find if you want to keep the stinrgify and join parts in an helper, but
> pleae inline the call in the template string and keep the IN parenthesis in
> the query.

Sounds good.

> > That looks weird. Placing generators and hoping that the caller doesn't
> > forget to call `Task.spawn` when calling them seems more fragile.
> 
> It's internal code and we have nice warnings when rejections are not
> handled. I think it's fine.

Having had to review and debug code that happily does this, I'd rather avoid reproducing what I have seen. If it's ok with you, I'd rather keep it a Task.async.
ok.
I still think the debug spew of TypeError is not really useful, and the error message is very confusing like "a rejection is not handled" when instead it is.
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #27)
> note the original RemovePage(s) methods also invoke clearEmbedVisits(); that
> method is internal, we need to expose it to be able to call it from here, I
> think it's fine to expose it in nsINavHistoryService

> nit: it's a little bit unclear that normalizeToURLorGUID throws, what about
> renaming it to validateURLorGUID?

I tried and it doesn't seem better.

> @@ +282,5 @@
> > +          try {
> > +            yield onResult(pageInfo);
> > +          } catch (ex) {
> > +            // Errors should be reported but should not stop `remove`.
> > +            Promise.reject(ex);
> 
> since you are inside a task, won't the original exception already reject? I
> think Sqlite onRow handler ignores (but reports) exceptions thrown by the
> onRow handler, please check?

Actually, `onRow` handler ignores but _logs_ exceptions, which means that we effectively lose them.
By opposition, `Promise.reject` causes test failures, which is something that we certainly want. Bug 1080457 will introduce a better way to handle this properly, but it's somewhat low priority.

> @@ +447,5 @@
> > + *      The `moz_places` identifiers for the places to invalidate.
> > + * @return (Promise)
> > + */
> > +let invalidateFrecencies = Task.async(function*(db, idList) {
> > +  if (idList.length == 0) {
> 
> what's the point? regardless you do this check outside already...

It feels safer and more robust wrt refactorings that way.
If you prefer, I can throw an Error instead.

> If the page has not title, I think there should not be a title property in
> the returned object, the same for visits.

What is the benefit? I can only see drawbacks to this change (i.e. this increases the number of shapes and decreases the accuracy of inline caching).

(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #31)
> ok.
> I still think the debug spew of TypeError is not really useful, and the
> error message is very confusing like "a rejection is not handled" when
> instead it is.

Have you ever encountered a case in which it was a false alarm?
Attached patch History.remove, v2 (obsolete) (deleted) — Splinter Review
Applied feedback.
Attachment #8503059 - Attachment is obsolete: true
Attachment #8507032 - Flags: review?(mak77)
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #32)
> What is the benefit? I can only see drawbacks to this change (i.e. this
> increases the number of shapes and decreases the accuracy of inline caching).

Hm, I'm not sure there's downsides from a js point of view, if there is we should also change bookmarks API, since it's actually removing undefined entries.
Worth discussing this with Mano, the problem I think it's mostly that in some cases we might not return a property, in other cases we might return null or undefined, depending on the internal code. it feels safer to be coherent... whether that means sticking to undefined or removing the property is up for discussion. let's discuss this in #places when we are all there.

> (In reply to Marco Bonardo [::mak] (needinfo? me) from comment #31)
> > ok.
> > I still think the debug spew of TypeError is not really useful, and the
> > error message is very confusing like "a rejection is not handled" when
> > instead it is.
> 
> Have you ever encountered a case in which it was a false alarm?

Yes, multiple times in tests where we actually "want" to fail and the output is really confusing.
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #34)
> > Have you ever encountered a case in which it was a false alarm?
> 
> Yes, multiple times in tests where we actually "want" to fail and the output
> is really confusing.

Bug 1080457 should improve that.
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #35)
> Bug 1080457 should improve that.

but the test should not fail, or better I should be able to handle the failure nicely.
That's the idea: each test will be able to whitelist expected errors.
Comment on attachment 8507032 [details] [diff] [review]
History.remove, v2

Review of attachment 8507032 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/places/History.jsm
@@ +219,5 @@
> +
> +    let guids = [];
> +    let urls = [];
> +    for (let page of pages) {
> +      let normalized = normalizeToURLOrGUID(page);

please add a comment that this is going to throw in case input is invalid

@@ +379,5 @@
> +
> +  // 1. Find out what needs to be removed
> +  let query =
> +    `SELECT id, url, guid, foreign_count, title, frecency FROM moz_places
> +     WHERE 

trailing space

@@ +381,5 @@
> +  let query =
> +    `SELECT id, url, guid, foreign_count, title, frecency FROM moz_places
> +     WHERE 
> +         guid IN (${ sqlList(guids) })
> +      OR url  IN (${ sqlList(urls)  })

nit
I usually align like:
WHERE something
   OR something_else

@@ +428,5 @@
> +
> +  return db.executeTransaction(function*() {
> +    // 2. Remove all visits to these pages.
> +    yield db.execute(`DELETE FROM moz_historyvisits
> +                     WHERE place_id IN (${ sqlList([p.id for (p of pages)]) })`);

align WHERE with DELETE and `); on a new line please (ticks aligned)

@@ +438,5 @@
> +
> +    // 4. For pages that should be removed, remove page.
> +    if (hasPagesToRemove) {
> +      yield db.execute(`DELETE FROM moz_places WHERE id IN
> +        (${ sqlList([p.id for (p of pages) if (p.toRemove)]) })`);

might be worth a temp var and better align the query as above.

::: toolkit/components/places/tests/unit/history/test_remove.js
@@ +21,5 @@
> +    do_print(name);
> +    do_print(JSON.stringify(options));
> +
> +
> +    do_print("Setting up visit");

too many newlines

@@ +301,5 @@
> +    /TypeError: Expected at least one page/,
> +    "History.remove with an empty array should throw a TypeError"
> +  );
> +  Assert.throws(
> +    () => PlacesUtils.history.remove([null]),

might we also test [ "valid_guid", null ]
and undefined
Attachment #8507032 - Flags: review?(mak77) → review+
Attached patch Cumulative patch (obsolete) (deleted) — Splinter Review
I had forgotten to call clearEmbedVisits.
Did that and applied feedback.
Attachment #8508634 - Flags: review?(mak77)
Blocks: 1086549
Comment on attachment 8508634 [details] [diff] [review]
Cumulative patch

Review of attachment 8508634 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/places/History.jsm
@@ +210,2 @@
>     */
>    remove: function (pages, onResult = null) {

nit: now you can user shorthands in objects, like 

remove(pages, onResult = null) {

(see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Method_definitions)

::: toolkit/components/places/nsINavHistoryService.idl
@@ +1406,5 @@
> +
> +  /**
> +   * Clear all TRANSITION_EMBED visits.
> +   */
> +  void clearEmbedVisits();

you must rev the UUID
Attachment #8508634 - Flags: review?(mak77) → review+
I like the new syntax, but I'm not using it yet as it breaks my emacs mode. Since I trust my emacs mode to inform me of undefined/typo-ed/out of scope variables, I find it more useful than just removing a few chars.
Attached patch History.remove, v3 (obsolete) (deleted) — Splinter Review
Applied feedback, folded.
Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5840b1c13fdb
Attachment #8507032 - Attachment is obsolete: true
Attachment #8508634 - Attachment is obsolete: true
Attachment #8509382 - Flags: review+
https://hg.mozilla.org/integration/fx-team/rev/1842c1026e8c
Flags: in-testsuite+
Keywords: checkin-needed
Whiteboard: [fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/1842c1026e8c
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla36
Iteration: --- → 36.1
Depends on: 1091693
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: