Closed Bug 1276823 Opened 8 years ago Closed 8 years ago

Bookmarks de-duping changes item GUID without changing reference in parent record

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 51
Tracking Status
firefox51 --- fixed

People

(Reporter: markh, Assigned: markh)

References

Details

(Whiteboard: [data-integrity])

Attachments

(1 file, 1 obsolete file)

On a test profile, the validator tells me: > Server record references child DbFRuGKzK7e2 that doesn't exist on the server. and the record with the reference is "menu" with childGUIDs ["1upn0DjHWM0h", "q1otz3OaoK-6", "DbFRuGKzK7e2", "SpnHPUdolgVw", "ZsXNl9WUqjIe", "MCk-v4SG0eyk", "buaNIJIeTStt", "I6s-dTVEHJxm"] Grepping through the logs: > 1464576351527 Sync.Engine.Bookmarks TRACE Incoming: { id: menu index: 1000000 modified: 1464576300.37 ttl: undefined payload: {"id":"menu","children":["1upn0DjHWM0h","q1otz3OaoK-6","DbFRuGKzK7e2","SpnHPUdolgVw","ZsXNl9WUqjIe","MCk-v4SG0eyk","buaNIJIeTStt","I6s-dTVEHJxm"],"parentid":"places","title":"Bookmarks Menu","type":"folder","parentName":""} collection: bookmarks } So an incoming "menu" record, with 2 children of interest: DbFRuGKzK7e2 and MCk-v4SG0eyk - both are separators > ... > 1464576351626 Sync.Engine.Bookmarks TRACE Incoming: { id: MCk-v4SG0eyk index: 0 modified: 1464576300.37 ttl: undefined payload: {"id":"MCk-v4SG0eyk","parentid":"menu","type":"separator","pos":2,"parentName":"Bookmarks Menu"} collection: bookmarks } > 1464576351626 Sync.Store.Bookmarks TRACE Number of rows matching GUID MCk-v4SG0eyk: 0 So MCk-v4SG0eyk is a new record we haven't seen before. > 1464576351627 Sync.Engine.Bookmarks DEBUG MCk-v4SG0eyk mapped to DbFRuGKzK7e2 > 1464576351627 Sync.Engine.Bookmarks TRACE Local item DbFRuGKzK7e2 is a duplicate for incoming item MCk-v4SG0eyk We've decided the incoming item MCk-v4SG0eyk is a dupe of DbFRuGKzK7e2 - probably because of "pos" being wrong (bug 1228827 comment 4) > 1464576351628 Sync.Store.Bookmarks TRACE Number of rows matching GUID DbFRuGKzK7e2: 1 > 1464576351628 Sync.Engine.Bookmarks DEBUG Switching local ID to incoming: DbFRuGKzK7e2 -> MCk-v4SG0eyk > 1464576351628 Sync.Store.Bookmarks DEBUG Changing GUID DbFRuGKzK7e2 to MCk-v4SG0eyk > 1464576351628 Sync.Store.Bookmarks TRACE Number of rows matching GUID DbFRuGKzK7e2: 1 > 1464576351628 Sync.Store.Bookmarks TRACE Creating SQL statement: UPDATE moz_bookmarks SET guid = :guid WHERE id = :item_id > 1464576351628 Sync.Engine.Bookmarks DEBUG Local item after duplication: age=null; modified=false; exists=true > 1464576351628 Sync.Store.Bookmarks TRACE Number of rows matching GUID MCk-v4SG0eyk: 1 So now there's no local record with DbFRuGKzK7e2 - the existing one had its GUID changed to MCk-v4SG0eyk. > 1464576351629 Sync.Engine.Bookmarks TRACE Finding mapping: Bookmarks Menu, s2 > 1464576351629 Sync.Engine.Bookmarks TRACE Mapped dupe: DbFRuGKzK7e2 > 1464576351630 Sync.Engine.Bookmarks TRACE Ignoring incoming item because the local item is identical. > 1464576351630 Sync.Engine.Bookmarks TRACE Skipping reconciled incoming item MCk-v4SG0eyk Then later we reorder the children of menu: > 1464576351894 Sync.Store.Bookmarks TRACE reordering children of: menu ... > 1464576351895 Sync.Store.Bookmarks TRACE Number of rows matching GUID DbFRuGKzK7e2: 0 > 1464576351895 Sync.Store.Bookmarks TRACE Could not locate record DbFRuGKzK7e2 At which point we've removed the child from the local version of "menu", and finally: > 1464576352161 Sync.Collection DEBUG mesg: DELETE success 200 https://sync-345-us-west-2.sync.services.mozilla.com/1.5/46723051/storage/bookmarks?ids=DbFRuGKzK7e2 We deleted the record from the server - but the parent record "menu" has not been updated on the server - the server record now references *both* GUIDs as being children. While this is a separator, I see no reason to believe the same thing isn't going to happen if a real bookmark is de-duped. There also don't seem to be any existing tests for bookmark duplicates :( Naively, it looks like _orderChildren should record a boolean to indicate if it actually changed the ordering, and if so, that item should be marked again for upload. The obvious risk here is a "sync loop" though - another client does its own faulty de-duplication and re-uploads the parent, then we do the same, repeat ad-nauseum. Richard, any insights here?
Flags: needinfo?(rnewman)
Flags: firefox-backlog+
And FTR: (In reply to Mark Hammond [:markh] from comment #0) > > 1464576352161 Sync.Collection DEBUG mesg: DELETE success 200 https://sync-345-us-west-2.sync.services.mozilla.com/1.5/46723051/storage/bookmarks?ids=DbFRuGKzK7e2 > > We deleted the record from the server Note that this is a "real" delete (this ID is now never returned by the server) rather than a "logical" delete (where the record would still exist but with deleted=true)
Regarding the deletion: Let's assume two records: A and B. Clients will dupe them. I think the rationale here is that a client is in one of four states. 1. It has seen A but not B. Deleting B from the server is a no-op. 2. It has seen A and B. It should have duped B to A itself, so deleting B from the server is a no-op. 3. It has seen neither A nor B. When it downloads A it'll get the correct state, so deleting B from the server is a no-op. 4. It has seen B but not A. When it downloads A, which wasn't deleted, it'll take guid A, and try to delete B, which is a no-op. The potential bug is that one client has seen A but not B, and deletes in the wrong direction. I believe I've seen history records where two live server records have the same URL, and so duping will be unpredictable. We *could* upload a deleted record -- and I think some other clients do so! -- but for bookmarks that will leave a growing set of deletions on the server forever, and the benefit isn't clear.
Flags: needinfo?(rnewman)
Re the main thrust of this bug: We should be maintaining a list of remapped IDs, and/or rewriting our reparenting bookkeeping. IIRC we do so on Android, where we keep detailed records of reparenting work that needs to happen at the end of a sync; if we saw: "foo": ["bar", "baz"] and then we duped "bar" to "noo", we need to update that bookkeeping to say: "foo": ["noo", "baz"] We'd use that both for local reparenting/reordering and also to construct a new remote record if necessary. I don't remember if Android uploads a new remote record when remapping a child. On iOS the merged tree includes this information explicitly via pointers to both parent records, and then explicitly records the action by outputting a remote deletion of the absorbed record, a local deletion of the same, and a merged insertion, as well as a new remote record if a change was required. The only way a sync loop could occur here, I think, is if two clients disagreed on duping and didn't see the other client's deletion, but my hunch is that wouldn't loop.
Flags: firefox-backlog+
Blocks: 1269904
No longer blocks: 1257961
I'm afraid I don't quite understand what you are saying here, in terms of concrete actions desktop should take in this scenario. (In reply to Richard Newman [:rnewman] from comment #3) > Re the main thrust of this bug: > > We should be maintaining a list of remapped IDs, and/or rewriting our > reparenting bookkeeping. In the context of desktop, isn't this just "add the parent to the tracker?" > IIRC we do so on Android, where we keep detailed records of reparenting work > that needs to happen at the end of a sync; On desktop we perform this problematic de-duping and child reordering at incoming records time, so I think just marking the parent as being changed at this time will ensure a new parent record is written that no longer references the now deleted child. > if we saw: > > "foo": > ["bar", "baz"] > > and then we duped "bar" to "noo", we need to update that bookkeeping to say: > > "foo": > ["noo", "baz"] so in our scenario we duped "bar" to "baz" and deleted "bar" - I think we just need to track the parent. In your scenario, we probably need to add "foo", "noo", and whatever the previous parent of "noo" was to the tracker, right? > We'd use that both for local reparenting/reordering and also to construct a > new remote record if necessary. I guess what I'm saying is that if a goal is to have the server in a consistent state, we *must* construct at least one new remote record, and possibly up to 3 new remote records? Further, I think the delete of the dupe child should be logicial. Let's consider the simpler scenario of starting with { foo: ["bar", "baz"] } and de-duping ending up with { foo: ["baz"] } - I think it's the same as the more complex example above that introduces "noo", but easier to picture. * client 1 has a new record "bar", so uploads the new "bar" and the parent { foo: ["bar", "baz"] }. * client 2 sees incoming "bar", but because it has the wrong parentName or wrong pos, it (questionably) dupes "bar" to "baz" - it hard-deletes the ID "baz", and changes the local GUID of "baz" to the incoming "bar". We change the current behaviour so it also uploads the new { foo: ["bar"] } * client 1 sees incoming "foo" (now missing "baz") but doesn't see a deletion for "baz" as it was hard-deleted - but it will exists locally as a valid child of "foo". The right thing for client1 to do here is decide that "baz" must have been added locally since the last Sync, so reconciles by re-adding "baz" back to "foo" and re-uploading a brand-new "baz" and a modified "foo" - ie, uploads the same as it did in step 1. * goto step 2. I think a soft-deletion of "baz" would prevent this - on the second sync client 2 would accept that "baz" is deleted, and it's local copy of "foo" would match the server once it processes the delete. > The only way a sync loop could occur here, I think, is if two clients > disagreed on duping and didn't see the other client's deletion, Right, I think that's what will happen above if we (a) hard-delete the dupe and (b) re-upload the parent record. I don't think there's a sync loop with the current behaviour, but only because we leave the server in an incorrect state and client 1 simply never sees the de-duping done by client 2. However, client 1 *would* currently resurrect "baz" next time it ended up doing a full sync (eg, after node reassignment) because it still exists locally. IOW, I think I'm saying: * All de-duping must upload new records for all parents affected by the de-dupe. * Any records deleted due to de-duping should be logically deleted. But I'm not sure if you are saying something similar or different?
Flags: needinfo?(rnewman)
Whiteboard: [sync-data-integrity]
(In reply to Mark Hammond [:markh] from comment #4) > I'm afraid I don't quite understand what you are saying here, in terms of > concrete actions desktop should take in this scenario. > > (In reply to Richard Newman [:rnewman] from comment #3) > > Re the main thrust of this bug: > > > > We should be maintaining a list of remapped IDs, and/or rewriting our > > reparenting bookkeeping. > > In the context of desktop, isn't this just "add the parent to the tracker?" I don't think so. * Local records that we previously uploaded with the old parent ID now need to be reuploaded. So we have to re-track all of the children of the remapped folder. ** Unless they're remotely changed already, in which case we probably don't? (This could happen in a three-device scenario.) * In the unusual case, new incoming remote records with the obsolete parent ID -- again, a three-device scenario -- need to be processed differently. There's an ordering dependency here: one sequence of records will result in a GUID match, and another will hit guidMap. > so in our scenario we duped "bar" to "baz" and deleted "bar" - I think we > just need to track the parent. In your scenario, we probably need to add > "foo", "noo", and whatever the previous parent of "noo" was to the tracker, > right? Yes. We need to ensure that all records that used to mention the remapped record -- its children, its parent -- are uploaded, and that a deleted version of that record all make it to the server. Otherwise a fetch of all records will grab some that don't make sense. > But I'm not sure if you are saying something similar or different? I will re-read later after some sleep and see if we really do agree :D
Whiteboard: [sync-data-integrity] → [data-integrity]
(In reply to Richard Newman [:rnewman] from comment #5) > > so in our scenario we duped "bar" to "baz" and deleted "bar" - I think we > > just need to track the parent. In your scenario, we probably need to add > > "foo", "noo", and whatever the previous parent of "noo" was to the tracker, > > right? > > Yes. We need to ensure that all records that used to mention the remapped > record -- its children, its parent -- are uploaded In almost all cases, the item being duplicated can't have children and is being duplicated within the same parent. So that only leaves the parent (which has just had one of its "children" elements removed. > and that a deleted version of that record all make it to the server. Yep - currently it is hard-deleted. So I still think this is much like comment 4: > IOW, I think I'm saying: > * All de-duping must upload new records for all parents affected by the de-dupe. > * Any records deleted due to de-duping should be logically deleted. You are correct though that it's more problematic if the deleted item has children and I hadn't really considered that - although if I'm reading the code correctly a folder could be de-duped (and then presumably doing bad things wrt children). However, a simple de-dupe of 2 folders without considering the children is going to be broken from the user's POV already - so maybe we just shouldn't do that? :) Richard, were you mainly concerned with the "de-dupe" folders case, or are there additional complications (beyond my "IOW" above) if we only consider leafs?
Comment on attachment 8767849 [details] Bug 1276823 - ensure bookmarks engine tracks parents when de-duping incoming bookmarks. This patch contains test which demonstrate the problem. It forces the bookmarks engine to see what it thinks is a dupe bookmark (which is fixed with the following patch) and also a dupe folder (which is *not* fixed with the following patch). The "folder" issue is tricky - to do "proper" de-duping we need to inspect records we may not have seen yet - but even then "proper" is difficult to define here. I'd like to propose that we only de-duplicate folders when the list of children is identical (ie, as identical GUIDs in .children) - in the rare cases we hit this I assert it's better to just leave the dupe in-place - there's less risk of losing bookmarks and the user can clean it up if they desire, which we should then handle correctly. What do you think?
Attachment #8767849 - Flags: feedback?(rnewman)
Comment on attachment 8767850 [details] Bug 1276823 - partial fix This patch allows the "switch local record to new GUID" logic to be overridden by the engine and adds such an override for bookmarks. It just flags the affected parent as requiring upload, so the server ends up with correct data for the parent. The "bookmarks" patch passes with this patch, the "folder" one doesn't - see previous comments.
Attachment #8767850 - Flags: feedback?(rnewman)
Priority: -- → P1
(In reply to Mark Hammond [:markh] from comment #6) > In almost all cases, the item being duplicated can't have children and is > being duplicated within the same parent. Can you expand on this? Desktop's duping depends only on type, parent name, and the details of the record itself, right? So it's possible that whole subtrees will unify, where two duped folders don't have the same parent, or that individual records will dupe to records elsewhere in the tree. > You are correct though that it's more problematic if the deleted item has > children and I hadn't really considered that - although if I'm reading the > code correctly a folder could be de-duped (and then presumably doing bad > things wrt children). However, a simple de-dupe of 2 folders without > considering the children is going to be broken from the user's POV already - > so maybe we just shouldn't do that? :) But we can't get away with not duping folders; that'll just leave a trail of duplicate empty folders in the case of a full merge. > Richard, were you mainly concerned with the "de-dupe" folders case, or are > there additional complications (beyond my "IOW" above) if we only consider > leafs? It's possible for an incoming leaf node to dupe against a local leaf in a different folder. The local leaf will be assigned the remote GUID, then the remote record will be applied, causing a move. That's supposed to work for records without children. I worry about: * Records with children: The children need to have their parent ID changed. But, horrifyingly, they themselves might dupe against incoming server records, and do so in any order. If the child is duped first, we won't know the remote record yet. If the parent is duped first, we won't know the new GUID of the child. Neither is a good thing. When a remote folder record is duped, it'll be applied… but its child list might be partially or completely different to the re-GUIDed local dupe, and that could have bad consequences. This is particularly hairy when you have some local changes -- you can't trust remote duping to work flawlessly even in the best case, but when there's some small local difference you definitely can't. * Separators not having their position recomputed when folders are merged or records are moved or deleted. That's a separate (haha) but related bug. In the broadest case, my feeling is that duping is unavoidable, because GUIDs aren't always preserved across operations that desktop can perform… but that the more I think about it, duping as written (one-pass, non-structural) is fundamentally broken. I'm still thinking about whether there is anything we can do to make this safe without a rewrite.
Flags: needinfo?(rnewman)
It's worth noting explicitly that the results of a bookmark sync likely depend on the order of evaluation of incoming records, which is not a good property. Different orders can cause different dupes to be detected if multiple parent/child pairs match, and different behavior is likely if the tree is seen bottom-up, top-down, or random. :scream:
(In reply to Richard Newman [:rnewman] from comment #11) > (In reply to Mark Hammond [:markh] from comment #6) > > > In almost all cases, the item being duplicated can't have children and is > > being duplicated within the same parent. > > Can you expand on this? That was vague on my part. I meant that all types other than "folder" don't have children we sync. (In reply to Richard Newman [:rnewman] from comment #12) > :scream: Yeah :( The tree problem is difficult. However, I think fixing leafs is tractable, shouldn't depend on the record order, and shouldn't make things any worse WRT folder duping. They are also easier to repair. It's even possible that this problem mainly occurs for leafs :) What do you think about explicitly excluding folders here and tackling folders in a different bug where we can take a more considered approach and after we have telemetry?
Assignee: nobody → markh
Comment on attachment 8767850 [details] Bug 1276823 - partial fix https://reviewboard.mozilla.org/r/62242/#review66434 ::: services/sync/modules/engines/bookmarks.js:442 (Diff revision 1) > + // contract were stronger, this could be changed. > + this._log.debug("Switching local ID to incoming: " + dupeID + " -> " + > + item.id); > + this._store.changeItemID(dupeID, item.id); > + // And mark the parent as being modified. We just use the new item's parent > + // as we only retur a dupe when old and new items have the same parent. That's not strictly true, is it? The GUID map lookup is based on the parent *name*, so we can dupe two leaf nodes that have the same URL, title, and parent folders with the same name, even if the parents differ. I don't think we should content-dupe records across folder trees, but not doing so is a bit of a challenge. We can't even use the complete name-path, because this is valid: Bookmarks Menu Stuff Slashdot.org Foo Stuff Slashdot.org We _probably_ will do OK if we track both parents here, for some value of 'OK'. Nit: retur -> return
Attachment #8767850 - Flags: review+
Attachment #8767850 - Flags: review+
Attachment #8767850 - Flags: feedback?(rnewman)
Attachment #8767850 - Flags: feedback+
(In reply to Mark Hammond [:markh] from comment #13) > What do you think about explicitly excluding folders here and tackling > folders in a different bug where we can take a more considered approach and > after we have telemetry? Actually, I think that's wrong. We still want to treat folders just like bookmarks - if we de-dupe a folder, we still want the parent references updated. However, this bug will make no attempt to fixup the children of affected folders - I opened bug 1293163 for that. > That's not strictly true, is it? The GUID map lookup is based on the parent *name*, so we can dupe two leaf > nodes that have the same URL, title, and parent folders with the same name, even if the parents differ. ... > We _probably_ will do OK if we track both parents here, for some value of 'OK'. Doh - of course you are correct. The new version of the patch does track both IDs, and I've tried to exercise the relevant scenarios in the test - (ie, 2 folders of the same name, with an incoming dupe being in a different folder than the other). Where the child ends up depends on whether we applied in the incoming record or not (ie, whether the local item was also marked as modified), and both these are in the tests. The tests for folders is more limited than the last version - if a de-duped folder has children the validator fails (which is bug 1293163).
Attachment #8767850 - Attachment is obsolete: true
Comment on attachment 8767849 [details] Bug 1276823 - ensure bookmarks engine tracks parents when de-duping incoming bookmarks. https://reviewboard.mozilla.org/r/62240/#review67422 ::: services/sync/modules/engines.js:1354 (Diff revision 3) > // > // If we find a duplicate, we change the local ID to the incoming ID and we > // refresh the metadata collected above. See bug 710448 for the history > // of this logic. > if (!existsLocally) { > let dupeID = this._findDupe(item); Can we rename this to `localDupeGUID`? ::: services/sync/modules/engines/bookmarks.js:430 (Diff revision 3) > return mapped; > - } > + }, > + > + // Called when _findDupe returns a dupe item and the engine has decided to > + // switch the existing item to the new incoming item. > + _switchItemToDupe(dupeID, item) { `(localDupeGUID, incomingItem)` ::: services/sync/modules/engines/bookmarks.js:449 (Diff revision 3) > + let localParentID = PlacesUtils.bookmarks.getFolderIdForItem(localID); > + let parentGUID = this._store.GUIDForId(localParentID); > + this._modified[parentGUID] = now; > + > + // And we also add the parent as reflected in the incoming record as the > + // de-dupe process might have used an existing item in a different folder. Firstly, I think you want to avoid tracking the root here. I'm sure we have safeguards, but… Secondly, unconditional use of the remote parent worries me a little. Tracking an unknown GUID will cause a deletion to be uploaded if the download phase of a sync completes successfully without seeing that GUID. Indeed, we use exactly that to upload the deletion for a duped-away local record. Imagine that a partial write is seen here. Client A has uploaded Record X with parent Y, but has not yet uploaded Y. Client B downloads X, duping it to Record D with parent E. It'll now track D (deleted), X (has new parent), E (has new child), and Y (which doesn't exist locally). B uploads two deletions (D and Y) and an updated X and E. A now syncs again. It downloads the deletion for Y and recursively deletes the folder, including any non-uploaded children. Oops. Atomic uploads will eventually help with this, but in general, I think clients should only upload deletions for records that they know were deleted. Here I think we should instead fetch the old local parent ID prior to `changeItemID` on line 436, then again after the change, and track those -- both local IDs, both safe.
Attachment #8767849 - Flags: review?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #19) > Firstly, I think you want to avoid tracking the root here. I'm sure we have > safeguards, but… TBH I can't see any such existing safeguards :/ > Secondly, unconditional use of the remote parent worries me a little. > > Tracking an unknown GUID will cause a deletion to be uploaded if the > download phase of a sync completes successfully without seeing that GUID. > Indeed, we use exactly that to upload the deletion for a duped-away local > record. > > Imagine that a partial write is seen here. Yes, thanks - that's very true - also tricky - see below. > Here I think we should instead fetch the old local parent ID prior to > `changeItemID` on line 436, then again after the change, and track those -- > both local IDs, both safe. Actually, I don't think that's quite correct - changeItemID doesn't do any reparenting. The reparenting is done later in update(), which only happens if the local item isn't marked as changed (hence one test with the local item being unmodified and other when it is) But, regardless of how we actually do that, it still leads to a problem WRT the local state vs the server state. Consider existing bookmark B1 in existing folder F1, then an incoming record B2 that we treat as a dupe of B1, but the remote record has a parent F2 that does not exist locally. The correct thing to do here depends on being able to see into the future. 1) If the F2 record really doesn't exist on the server and never will, the correct thing to do would be to re-upload the B2 record to refer to an existing GUID. 2) If the F2 record doesn't currently exist on the server due to another client failing to upload it with B2, but will successfully upload it later, any action we take now will be "wrong". 3) If the F2 record does exist on the server (but we simply missed it somehow), we could try and fetch it by ID and apply it. Let's ignore this for now. So the choice between doing (1) or (2) depends on what happens in the future. Further, any action assuming (2) *must* leave the server and the local client in an invalid state - we are going to reference B2 *somewhere* locally, so one of our existing local folders must reference it as a child. Thus, F1 will list B2 as a child, but B2 on the server will have F2 as the parent. ISTM the only sane thing to do is (1) - and possibly (3) in a different bug. That other client that failed to upload F2 will then see B2 as being moved to F1 and a user will be sad. But this is an edge-case that will be even edgier with atomic-writes, so maybe that's OK. Does the above make sense, and if so, WDYT? (This bug is making my brain hurt - MFBT for me!)
Flags: needinfo?(rnewman)
We also go to lengths to reparent at the end of a sync to account for parents and children coming in different orders, and I'm fairly sure that any bookmarks that remain in unfiled after this process also will have mis-matched server records. If the parent ever arrives, we will *eventually* end up with consistent local and remote copies, but in general, this is a similar issue to the above - is our local state "hopefully eventually consistent" with the server, or "currently consistent with the server"? If the former, I'm struggling to define both "hopefully" and "eventually" in this context ;) /me gets another MFB!
(In reply to Mark Hammond [:markh] from comment #20) > ISTM the only sane thing to do is (1) - and possibly (3) in a different bug. > That other client that failed to upload F2 will then see B2 as being moved > to F1 and a user will be sad. There's no good solution here except to recognize this kind of inconsistency and take no action, which -- as you demonstrated by filing Bug 1294599! -- requires having a snapshot of the server rather than doing streaming record application. In the mean time, the question becomes: assuming that we'll see F2 either later or in a subsequent sync, what's the most consistent thing to do? The worst thing to do is to put B2 in Unsorted Bookmarks. Better is to put it in F1. It's likely that F1 will be duped to F2 later in the sync or in the next sync. But we don't want to upload a record for B2 until we have its parent sorted… and if that's a dupe, we have to wait for that, and so on up the tree -- we can't finalize the sync and upload adjusted records until we have the entire parent chain. And knowing that is a special case of proper tree merging. So… dunno, this is hard. We pretty much have two choices: after a sync, make sure the server reflects local state; or, don't. Android does the former, and what that means is that an Android client immediately 'freezes' corruption. Desktop kinda does the latter, which allows for divergence and increasingly odd behavior. It's hard to imagine an option that we can take now other than (1), but the risk is concerning. > But this is an edge-case that will be even > edgier with atomic-writes, so maybe that's OK. Not entirely so -- if you interrupt Firefox before it finishes processing incoming records (a partial read) then any actions taken during the processing, such as reassigning IDs, can't be damaging and must be replayable. (I wonder what the scope of that problem is?)
Flags: needinfo?(rnewman)
(In reply to Mark Hammond [:markh] from comment #21) > in general, this is a > similar issue to the above - is our local state "hopefully eventually > consistent" with the server, or "currently consistent with the server"? Heh, yes, exactly. Or the inverse: do we make the server consistent with our current state or not? And this is why I use the bookmark engine's one-shot streaming record application as a cardinal example of how not to do syncing. It's just so fundamentally broken!
(In reply to Richard Newman [:rnewman] from comment #22) > There's no good solution here except to recognize this kind of inconsistency > and take no action, which -- as you demonstrated by filing Bug 1294599! -- > requires having a snapshot of the server rather than doing streaming record > application. Yeah :( > In the mean time, the question becomes: assuming that we'll see F2 either > later or in a subsequent sync, what's the most consistent thing to do? The patch I'm about to push simply ignores these edge-cases - ie, some scenarios captured in the test do leave us with inconsistent client and server states, even when the parent does arrive later, primarily because our "reparenting" step also doesn't update the server with that info (eg, the record remained in the location it was before it was duped, then later the "real" parent comes in, we reparent at that time, but don't update the record with the old location.) These are real edge-cases though (ie, a dupe found in a different folder due to the parent names matching) and this patch doesn't make that worse, so I'm ignoring it here. I opened bug 1297955 which might go some way to fixing these, but as implied above, I think it might be best to reconsider how we dupe in those cases - IMO it probably leaves the user confused when de-duping causes an item to move to a completely different folder. I suspect we need to rewrite the engine to be more inline with Android or iOS (ie, obtaining a local copy of the entire server state before working out how to apply new records) to actually fix some of these edge-cases. Fun times.
Gentle reminder that this is waiting for review from rnewman - it was probably missed in all the other review requests we are making :)
Will get to it today!
Status: NEW → ASSIGNED
Comment on attachment 8767849 [details] Bug 1276823 - ensure bookmarks engine tracks parents when de-duping incoming bookmarks. https://reviewboard.mozilla.org/r/62240/#review75224 ::: services/sync/modules/engines/bookmarks.js:438 (Diff revision 4) > + // And mark the parent as being modified. Given we de-dupe based on the > + // parent *name* it's possible the item having its GUID changed has a > + // different parent from the incoming record. > + // So we need to find the GUID of the local parent. > + let now = this._tracker._now(); > + let localID = this._store.idForGUID(incomingItem.id); If `changeItemID` fails, this might fail, right? Is it possible for `changeItemID` to not result in a local record with the new GUID? ::: services/sync/modules/engines/bookmarks.js:441 (Diff revision 4) > + // So we need to find the GUID of the local parent. > + let now = this._tracker._now(); > + let localID = this._store.idForGUID(incomingItem.id); > + let localParentID = PlacesUtils.bookmarks.getFolderIdForItem(localID); > + let localParentGUID = this._store.GUIDForId(localParentID); > + this._modified[localParentGUID] = now; If the parent itself is subsequently duped, this has the handy side effect of uploading a deleted record for it. ::: services/sync/tests/unit/test_bookmark_duping.js:372 (Diff revision 4) > + // this is still unknown.) > + collection.insert(newParentGUID, encryptPayload({ > + id: newParentGUID, > + type: "folder", > + title: "Folder 1", > + parentName: "A second folder", Can we get two variants of this test, in which `newParentGUID` is supposed to dupe to `folder1_guid`, and the new folder arrives either before or after `newGUID`?
Attachment #8767849 - Flags: review?(rnewman) → review+
Comment on attachment 8767849 [details] Bug 1276823 - ensure bookmarks engine tracks parents when de-duping incoming bookmarks. https://reviewboard.mozilla.org/r/62240/#review75224 > If `changeItemID` fails, this might fail, right? Is it possible for `changeItemID` to not result in a local record with the new GUID? changeItemID may throw, which will certainly stop us in our tracks. If I read the code correctly, it may succeed but not result in an item with the new GUID if the old GUID didn't exist - in which case we will fail attempting to lookup the ID for the GUID we just failed to change. I believe this patch hasn't changed those semantics, and the failures should be "catastrophic" (in terms of the sync failing), so I think this is fine. > Can we get two variants of this test, in which `newParentGUID` is supposed to dupe to `folder1_guid`, and the new folder arrives either before or after `newGUID`? new test added.
Pushed by mhammond@skippinet.com.au: https://hg.mozilla.org/integration/autoland/rev/c42cb9b636fd ensure bookmarks engine tracks parents when de-duping incoming bookmarks. r=rnewman
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: