Closed Bug 615284 Opened 14 years ago Closed 14 years ago

download chunking needs to be more resilient against app shutdowns

Categories

(Firefox :: Sync, defect)

Product:

Component:

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

blocking2.0

---

betaN+

fennec

2.0+

---

People

(Reporter: mconnor, Assigned: philikon)

References

Details

(Whiteboard: [softblocker][has patch][has review])

Attachments

(1 file, 1 obsolete file)

wip v1 14 years ago Philipp von Weitershausen [:philikon] (deleted), patch		Details \| Diff \| Splinter Review
v2 14 years ago Philipp von Weitershausen [:philikon] (deleted), patch	mconnor : review+	Details \| Diff \| Splinter Review

Mike Connor [:mconnor]

Reporter

Description

•

14 years ago

I've had an Android device running Fennec for four days, and haven't finished a sync yet. Android kills background processes, and we may or may not be hitting OOM due to lack of GC inside of the sync call (some conflicting reports). We need to: * sync more records per GET (I have 17k history records, so it's 340 gets to sync everything, as it stands). * periodically add remaining IDs to the tracker so we don't start from scratch on interrupt. * Consider only fetching some cap of records on mobile (i.e. limit the query to 5k records, same as the initial 5k upload cap?) These two things should help us catch up much quicker, without really hurting our disk I/O story.

Mike Connor [:mconnor]

Reporter

Comment 1

•

14 years ago

We need this ASAP.

blocking2.0: --- → beta8+

tracking-fennec: --- → 2.0b3+

Tony Chung [:tchung]

Comment 2

•

14 years ago

same thing has happened on one of my test accounts on the nexus one. it syncs fine on a desktop build, but once i switch to mobile it remains "in progress" indefinitely.

Philipp von Weitershausen [:philikon]

Assignee

Comment 3

•

14 years ago

(In reply to comment #0) > * sync more records per GET (I have 17k history records, so it's 340 gets to > sync everything, as it stands). Yes, that's painful. Need to watch memory here, though (see bug 615167) > * periodically add remaining IDs to the tracker so we don't start from scratch > on interrupt. I think we should probably do it the other way around. We should get the changedIDs from the engine (which probably uses the tracker, but it might not in the future) and add them to the tracker at the beginning of the Sync. Then, as we fetch items in batches, we can remove successfully fetched items from the tracker (in batches, too)

Mike Connor [:mconnor]

Reporter

Updated

•

14 years ago

Assignee: philipp → rnewman

Mike Connor [:mconnor]

Reporter

Updated

•

14 years ago

blocking2.0: beta8+ → beta9+

tracking-fennec: 2.0b3+ → 2.0b4+

Richard Newman [:rnewman]

Comment 4

•

14 years ago

Dropping this from me, as I'm not actively working on the contents of this bug.

Assignee: rnewman → nobody

Philipp von Weitershausen [:philikon]

Assignee

Comment 5

•

14 years ago

(In reply to comment #4) > Dropping this from me, as I'm not actively working on the contents of this bug. No probs, I already started messing around with this anyway (see comment 3).

Assignee: nobody → philipp

Comment 6

•

14 years ago

As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)

blocking2.0: beta9+ → betaN+

Comment 7

•

14 years ago

Fixing fields my automated script accidentally blanked. Apologies for the bugspam

Mike Beltzner [:beltzner, not reading bugmail]

Updated

•

14 years ago

Whiteboard: [softblocker]

Philipp von Weitershausen [:philikon]

Assignee

Updated

•

14 years ago

Summary: sync chunking needs to be more resilient against app shutdowns → Upload chunking needs to be more resilient against app shutdowns

Stuart Parmenter

Updated

•

14 years ago

tracking-fennec: 2.0b4+ → 2.0+

Tony Chung [:tchung]

Comment 8

•

14 years ago

mconnor, can you comment what the plan for this bug is? from what i read on irc, this wasn't needing beta testers, and there are other bugs tracking perf fixes that are more important for testing. also, tying bug dependencies to those bugs would be helpful. Thanks.

Mike Connor [:mconnor]

Reporter

Comment 9

•

14 years ago

Effectively, this is about backing off on some of the changes to no longer write IDs we're fetching to disk (bug 609398). However, instead of wholesale reverting that, we'll just periodically flush things to the toFetch file, so that we don't have to redownload and dupe-check everything from scratch. This is 100% unrelated to other perf work.

Summary: Upload chunking needs to be more resilient against app shutdowns → download chunking needs to be more resilient against app shutdowns

Philipp von Weitershausen [:philikon]

Assignee

Comment 10

•

14 years ago

Oh, this is about *down*load chunking? Your mentioning the tracker in comment 0 threw me off and I thought this was about upload (separate issue, but my comment 3 about upload still holds for that case.) (In reply to comment #9) > Effectively, this is about backing off on some of the changes to no longer > write IDs we're fetching to disk (bug 609398). However, instead of wholesale > reverting that, we'll just periodically flush things to the toFetch file, so > that we don't have to redownload and dupe-check everything from scratch. Bringing back toFetch would also help us with bug 622762. Then again, we might not need to bring back toFetch at all. We could sync items down in the order of their timestamp and update our lastSync timestamp as we process individual WBOs. That way we could restart exactly where we left off the last time. This might put a bit more stress on the server, though mostly the records should already be in input order so the sort operation shouldn't be too expensive. Thoughts?

Mike Connor [:mconnor]

Reporter

Comment 11

•

14 years ago

Not using sort order means we lose the advantage of syncing folders first, so we don't need to reparent on a large sync.

Philipp von Weitershausen [:philikon]

Assignee

Comment 12

•

14 years ago

(In reply to comment #11) > Not using sort order means we lose the advantage of syncing folders first, so > we don't need to reparent on a large sync. Good point! (Not that we do this right now on the desktop... bug 625295)

Philipp von Weitershausen [:philikon]

Assignee

Updated

•

14 years ago

Blocks: 622762

Philipp von Weitershausen [:philikon]

Assignee

Comment 13

•

14 years ago

Attached patch wip v1 (obsolete) (deleted) — Details — Splinter Review

Bring back the toFetch file that was removed in bug 609398. We still sync all batches in one sync on mobile, so bug 609398 wasn't backed out entirely. Unlike before, toFetch is now saved on the next event loop tick so that we don't do synchronous I/O while syncing stuff down. We still have bug 608757 for making Utils.json{Load|Save} async. I have a WIP for that so if we decide the synchronous I/O (now at least not interfering with places I/O and the like) is bad on mobile, I can always resurrect that patch.

Attachment #506862 - Flags: review?(mconnor)

Philipp von Weitershausen [:philikon]

Assignee

Updated

•

14 years ago

Whiteboard: [softblocker] → [softblocker][has patch][needs review mconnor]

Philipp von Weitershausen [:philikon]

Assignee

Comment 14

•

14 years ago

Comment on attachment 506862 [details] [diff] [review] wip v1 Hmm, we don't update lastSync until the end of processIncoming when all records have been downloaded. So if the app gets shutdown in the middle processIncoming, we still refetch all records. So once we're in the batching mode and save to toFetch, we should update lastSync and not lastModified.

Attachment #506862 - Attachment description: v1 → wip v1

Attachment #506862 - Flags: review?(mconnor)

Philipp von Weitershausen [:philikon]

Assignee

Comment 15

•

14 years ago

Attached patch v2 (deleted) — Details — Splinter Review

This patch now also ensures that lastSync is fast-forwarded when in batch mode. That way on the next sync we resume where we left off rather than refetching everything.

Attachment #506862 - Attachment is obsolete: true

Attachment #506915 - Flags: review?(mconnor)

Shawn Wilsher :sdwilsh

Comment 16

•

14 years ago

Sorry to rain on this parade, but loadToFetch calls Utils.jsonLoad, which does sync disk I/O. If we are doing all this work to avoid synchronous disk I/O in the first place, it seems rather silly to end up adding synchronous disk I/O.

Shawn Wilsher :sdwilsh

Comment 17

•

14 years ago

Although it shouldn't be hard to make Utils.jsonLoad async. It would probably be a lot like bug 627772.

Philipp von Weitershausen [:philikon]

Assignee

Comment 18

•

14 years ago

(In reply to comment #16) > Sorry to rain on this parade, but loadToFetch calls Utils.jsonLoad, which does > sync disk I/O. If we are doing all this work to avoid synchronous disk I/O in > the first place, it seems rather silly to end up adding synchronous disk I/O. We're talking about much different scales here, though. Syncing down 1000 history items in batches of 50 will have us do a tiny bit of synchronous I/O 20 times, rather than a N*1000 times (for some number of N) like now. (In reply to comment #17) > Although it shouldn't be hard to make Utils.jsonLoad async. It's not, it just hasn't been a priority up until now. As I said, we could decide to nominate bug bug 608757 if it turns out to be a problem. It's a small fix that can land between beta and RC.

Shawn Wilsher :sdwilsh

Comment 19

•

14 years ago

(In reply to comment #18) > We're talking about much different scales here, though. Syncing down 1000 > history items in batches of 50 will have us do a tiny bit of synchronous I/O 20 > times, rather than a N*1000 times (for some number of N) like now. Unfortunately, it looks like any synchronous I/O on mobile ends up having fairly bad pauses (for example, bug 627635). This is probably partly because on android prior to gingerbread (which uses ext4), the file system serializes all access to itself. If we are already doing I/O on a background thread, the main thread cannot do anything because it's blocked by the background thread.

Philipp von Weitershausen [:philikon]

Assignee

Comment 20

•

14 years ago

(In reply to comment #19) > fairly bad pauses (for example, bug 627635). This is probably partly because > on android prior to gingerbread (which uses ext4), the file system serializes > all access to itself. If we are already doing I/O on a background thread, the > main thread cannot do anything because it's blocked by the background thread. F-I-N-E. Have it your way! :p I uploaded patches for bug 608757 just now. Pls to be stopping with the rain on my parade nao. kthxbai.

Mike Connor [:mconnor]

Reporter

Comment 21

•

14 years ago

Comment on attachment 506915 [details] [diff] [review] v2 Looks good here, woo.

Attachment #506915 - Flags: review?(mconnor) → review+

Philipp von Weitershausen [:philikon]

Assignee

Updated

•

14 years ago

Whiteboard: [softblocker][has patch][needs review mconnor] → [softblocker][has patch][has review]

Philipp von Weitershausen [:philikon]

Assignee

Comment 22

•

14 years ago

https://hg.mozilla.org/services/fx-sync/rev/d020b67d5dcf

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Philipp von Weitershausen [:philikon]

Assignee

Updated

•

14 years ago

Blocks: 627511

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: Firefox Sync: Backend → Sync

Product: Cloud Services → Firefox

You need to log in before you can comment on or make changes to this bug.