Closed Bug 802723 Opened 12 years ago Closed 4 years ago

Sync error logging during large failed upload/download drains battery

Categories

(Firefox for Android Graveyard :: Android Sync, defect, P5)

ARM
Android
defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: sanfordarmstrong, Unassigned)

References

Details

(Whiteboard: [sync:scale])

Latest FF17 beta in Google Play
T-Mobile G2 (HTC)
Android 2.3.4

1) Enable sync

Expected results:

Yay, sync is on and I can still use my phone all day.

Actual results:

Boo, sync is killing my battery within a few hours.


Normally my G2 gets a day or two of battery life.  I've been using FF for Android Betas for several versions, with sync enabled, and a recent update has started killing my battery within a few hours.

I installed BatteryMix so I could be sure, and it's definitely org.mozilla.firefox_beta that's been responsible for the massive drain.  I don't have to actually launch the app.  I've tested for three days, never using it.  The first day I kept sync enabled until my battery got too low.  The second day I kept sync disabled and everything was back to normal.  Today I reenabled sync and had to disable after ~4 hours took me from full charge to < 20% battery life.

Please let me know if there's anything I can do to help debug this.  I love FF on Android and sync is obviously the killer feature!
Component: General → Android Sync
Product: Firefox for Android → Mozilla Services
Version: Firefox 17 → unspecified
Sandy, thanks for your detailed report.  I'm sorry Android Sync is killing your battery!

We work hard to keep Sync from eating too much battery, so we'd really like to get to the bottom of your issue.  What I could really use is logs from your device: can you try to follow the instructions at

http://160.twinql.com/how-to-file-a-good-android-sync-bug

for capturing logs, and either attach them to this bug or mail them to me:

nalexander AT mozilla DOT com

For this type of issue, the time you'd need to log might be too big.  If you can check your phone and see how frequently it seems to be syncing (Android will say "last synced at XXX") that would help a lot too.

Thanks!
Flags: needinfo?(sanfordarmstrong)
Taking while for sync to complete, but so far in the log I see this repeated over and over again:

10-18 05:47:38.890 W/FxSync  (17538): firefox_beta :: SynchronizerSession :: Second RecordsChannel onFlowStoreFailed. Logging remote store error.
10-18 05:47:38.890 W/FxSync  (17538): org.mozilla.gecko.sync.Server11PreviousPostFailedException
10-18 05:47:38.890 W/FxSync  (17538): 	at org.mozilla.gecko.sync.repositories.Server11RepositorySession$RecordUploadRunnable.run(Server11RepositorySession.java:585)
10-18 05:47:38.890 W/FxSync  (17538): 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1088)
10-18 05:47:38.890 W/FxSync  (17538): 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:581)
10-18 05:47:38.890 W/FxSync  (17538): 	at java.lang.Thread.run(Thread.java:1019)
Flags: needinfo?(sanfordarmstrong)
Oh, I also just noticed that if I go in my Android Sync preferences and selected Firefox Beta, the date given as the last sync is 10/14/2012 3:36PM.  So I guess that means that the many syncs on the 10/15 and 10/17 that killed my battery were not successful.

Log is still going, occasionally seeing references to gecko, will post when done, or if you have something specific you want me to look for. 4MB so far.
Log uploaded here: http://armstrong-clan.net/dump/adblog

13MB :-)
(In reply to Sandy Armstrong from comment #4)
> Log uploaded here: http://armstrong-clan.net/dump/adblog
> 
> 13MB :-)

This helps a lot, and definitely explains your battery drain: all that logging is not so awesome.  One action item will be to deal more intelligently with that.

The relevant log line is

  4300:10-18 05:46:27.841 W/FxSync  (17538): Caused by: java.lang.IllegalStateException: Connection pool shut down

... which should never happen.  Also, I see

  18346:10-18 05:58:03.530 W/FxSync  (17733): javax.net.ssl.SSLException: Write error: ssl=0x66cc58: I/O error during system call, Broken pipe

Sandy, can you tell me a little bit about your set up?

* Were you syncing on cellular data or WiFi?

* Do you have a "large" profile in Firefox for Mobile (lots of bookmarks, lots of browsing history)?  (Estimate is totally fine.)

* Same question, but for Firefox on your desktop machine.

Also, are you willing to try Firefox Aurora for some comparison data?  Aurora and Beta sync separately, so you could install Aurora, see how it works for you, and then remove it without affecting your Beta install at all.

Thanks again for getting back to me so quickly!
Flags: needinfo?(sanfordarmstrong)
(In reply to Nick Alexander :nalexander from comment #5)
> Sandy, can you tell me a little bit about your set up?
> 
> * Were you syncing on cellular data or WiFi?

Wifi, at home, very good reception (I do all of my calls over Wifi when I'm at home).

> * Do you have a "large" profile in Firefox for Mobile (lots of bookmarks,
> lots of browsing history)?  (Estimate is totally fine.)

I'm not sure how to answer this question well for mobile.  I think my mobile profile is probably fairly small.  I mostly use the desktop and then occasionally load something up on my phone to read or lookup if I'm away.  I appear to only have desktop bookmarks show up on my phone, so I guess I have no mobile bookmarks.

I do more mobile browsing on my Nexus 7, where I also have FF Beta installed, but it is not having any battery or sync issues.

> * Same question, but for Firefox on your desktop machine.

Yeah, a ton of history.  My quota usage according to Sync prefs on the desktop is:

Total: 389MB (not sure why this doesn't add up from what's below)
Bookmarks: 89.8KB
History: 9.9MB
Passwords: 16.6KB
Preferences: 20.8KB
Tabs: 26.7KB

> Also, are you willing to try Firefox Aurora for some comparison data? 
> Aurora and Beta sync separately, so you could install Aurora, see how it
> works for you, and then remove it without affecting your Beta install at all.

Sure.  I can pair an Aurora install with my FF Beta desktop client?

> Thanks again for getting back to me so quickly!

No problem!
Flags: needinfo?(sanfordarmstrong)
(In reply to Sandy Armstrong from comment #6)
> (In reply to Nick Alexander :nalexander from comment #5)
> > Sandy, can you tell me a little bit about your set up?
> > 
> > * Were you syncing on cellular data or WiFi?
> 
> Wifi, at home, very good reception (I do all of my calls over Wifi when I'm
> at home).

Interesting.  I expected you to hit upload errors due to bad connections, but it sounds like that's not likely.

> > * Do you have a "large" profile in Firefox for Mobile (lots of bookmarks,
> > lots of browsing history)?  (Estimate is totally fine.)
> 
> I'm not sure how to answer this question well for mobile.  I think my mobile
> profile is probably fairly small.

Thanks.

> > Also, are you willing to try Firefox Aurora for some comparison data? 
> > Aurora and Beta sync separately, so you could install Aurora, see how it
> > works for you, and then remove it without affecting your Beta install at all.
> 
> Sure.  I can pair an Aurora install with my FF Beta desktop client?

You can, but I would suggest you create a new FF Beta desktop profile (http://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles) and a new Sync account and do some testing with those two.  I'd hate to corrupt your desktop experience trying to improve your phone experience.

So if you could set up desktop sync, with a new account in a new profile, and try Aurora on mobile, we could see if that works and start working backwards to find the issue.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to Nick Alexander :nalexander from comment #7)
> So if you could set up desktop sync, with a new account in a new profile,
> and try Aurora on mobile, we could see if that works and start working
> backwards to find the issue.

Okay, that might take a little more time to get back to you on, but I'll do it when I can.
This explains matters:

 FATAL EXCEPTION: pool-45-thread-1
 java.lang.OutOfMemoryError
 	at org.json.simple.parser.Yylex.<init>(Yylex.java:231)
 	at org.json.simple.parser.JSONParser.<init>(JSONParser.java:34)
 	at org.mozilla.gecko.sync.ExtendedJSONObject.parse(ExtendedJSONObject.java:44)
 	at org.mozilla.gecko.sync.ExtendedJSONObject.parse(ExtendedJSONObject.java:50)
 	at org.mozilla.gecko.sync.net.SyncResponse.jsonBody(SyncResponse.java:83)
 	at org.mozilla.gecko.sync.net.SyncResponse.jsonObjectBody(SyncResponse.java:102)
 	at org.mozilla.gecko.sync.repositories.Server11RepositorySession$RecordUploadRunnable.handleRequestSuccess(Server11RepositorySession.java:460)


Then the sync gets killed:

  D/SyncManager( 1294): canceling and rescheduling sync because it ran too long: authority: org.mozilla.firefox_beta.db.browser account: Account {name=sanfordarmstrong@gmail.com, type=org.mozilla.firefox_sync} extras: [] syncSource: 0 when: 391555344 expedited: false


The error Nick observed in Comment 5 is a consequence of this plus Bug 708521:

 D/SyncManager( 1294): failed sync operation authority: org.mozilla.firefox_beta.db.browser account: Account {name=sanfordarmstrong@gmail.com, type=org.mozilla.firefox_sync} extras: [upload=true ] syncSource: 1 when: 391594784 expedited: false, SyncResult: syncAlreadyInProgress: true stats []
 D/SyncManager( 1294): retrying sync operation as a two-way sync because an upload-only sync encountered an error: authority: org.mozilla.firefox_beta.db.browser account: Account {name=sanfordarmstrong@gmail.com, type=org.mozilla.firefox_sync} extras: [] syncSource: 1 when: 391856322 expedited: false
 W/FxSync  (17538): firefox_beta :: SyncAdapter :: Waiting on sync monitor interrupted.
 W/FxSync  (17538): java.lang.InterruptedException
 W/FxSync  (17538): 	at java.lang.Object.wait(Native Method)
 W/FxSync  (17538): 	at java.lang.Object.wait(Object.java:358)
 W/FxSync  (17538): 	at org.mozilla.gecko.sync.syncadapter.SyncAdapter.onPerformSync(SyncAdapter.java:371)
 W/FxSync  (17538): 	at android.content.AbstractThreadedSyncAdapter$SyncThread.run(AbstractThreadedSyncAdapter.java:164)
 D/class ch.boye.httpclientandroidlib.impl.conn.tsccm.ThreadSafeClientConnManager(17538): Closing expired connections
 D/class ch.boye.httpclientandroidlib.impl.conn.tsccm.ConnPoolByRoute(17538): Closing expired connections
 D/class ch.boye.httpclientandroidlib.impl.conn.tsccm.ThreadSafeClientConnManager(17538): Shutting down
 D/class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(17538): Connection closed
 D/class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(17538): Connection closed
 D/class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(17538): Connection shut down
 D/class ch.boye.httpclientandroidlib.impl.conn.DefaultClientConnection(17538): Connection closed
 W/FxSync  (17538): firefox_beta :: RecordUploadRunnable :: Got request error.
 W/FxSync  (17538): java.io.IOException
 W/FxSync  (17538): 	at org.mozilla.gecko.sync.net.BaseResource.execute(BaseResource.java:271)
 W/FxSync  (17538): 	at org.mozilla.gecko.sync.net.BaseResource.retryRequest(BaseResource.java:292)
 W/FxSync  (17538): 	at org.mozilla.gecko.sync.net.BaseResource.execute(BaseResource.java:275)
 W/FxSync  (17538): 	at org.mozilla.gecko.sync.net.BaseResource.go(BaseResource.java:316)
 W/FxSync  (17538): 	at org.mozilla.gecko.sync.net.BaseResource.post(BaseResource.java:336)
 W/FxSync  (17538): 	at org.mozilla.gecko.sync.net.SyncStorageRequest.post(SyncStorageRequest.java:170)
 W/FxSync  (17538): 	at org.mozilla.gecko.sync.repositories.Server11RepositorySession$RecordUploadRunnable.run(Server11RepositorySession.java:607)
 W/FxSync  (17538): 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1088)
 W/FxSync  (17538): 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:581)
 W/FxSync  (17538): 	at java.lang.Thread.run(Thread.java:1019)
 W/FxSync  (17538): Caused by: java.lang.IllegalStateException: Connection pool shut down


We get an OOM, but we don't terminate the sync. Our threads get interrupted after 5 minutes, which breaks out of a wait, which causes a loop to finish and spew log output because everything died while they were waiting.

There are seven or 8 OOMs happening throughout this log, each of which then causes cascading failures, which generate a lot of log output.


Sandy, how much free memory does your phone say it has? My suspicion is simply that Sync is failing to complete because you're running out of memory, and it's just not handling that case well.
Flags: needinfo?(sanfordarmstrong)
(In reply to Richard Newman [:rnewman] from comment #9)
> Sandy, how much free memory does your phone say it has? My suspicion is
> simply that Sync is failing to complete because you're running out of
> memory, and it's just not handling that case well.

Settings -> Applications -> Running Services says I have ~100MB free.

Running the Memory Usage app from Google Play it says:
Total: 368MB
Inact: 157MB
Thres: 24MB
Avail 59MB
Act: 148MB
Low: No

Wikipedia says the G2 is a 512MB device: http://en.wikipedia.org/wiki/HTC_Desire_Z
Flags: needinfo?(sanfordarmstrong)
Nick: is it still worth me making an Aurora install with a fresh profile?  Or is it clear that the problem is memory-related, and so testing a new profile would not be helpful?

For what it's worth, FF Beta sync works fine with this same account on my Nexus 7.
(In reply to Sandy Armstrong from comment #11)
> Nick: is it still worth me making an Aurora install with a fresh profile? 
> Or is it clear that the problem is memory-related, and so testing a new
> profile would not be helpful?
> 
> For what it's worth, FF Beta sync works fine with this same account on my
> Nexus 7.

Hi Sandy,

I have no reason to think newer Sync versions are less resource hungry, so no, I don't think additional testing is needed.  (I'm irritated that I missed the memory log errors when I looked -- sorry.)

I think we'll take this ticket as, "improve our logging to not spew so badly".  We are trying to support ARMv6 devices with 256M of RAM, so it's possible we'll move on memory errors like yours.  Until then, sorry that Sync brings down your hardware!
Summary: Sync is massive battery drain in latest beta → Sync error logging during large failed upload/download drains battery
Nick: I was about to ask if there was any reason why my quota usage was so high, and if that could represent some sort of backend bug.  Pasting from above:

Total: 389MB (not sure why this doesn't add up from what's below)
Bookmarks: 89.8KB
History: 9.9MB
Passwords: 16.6KB
Preferences: 20.8KB
Tabs: 26.7KB

But then I rechecked my quota usage and today it is down to 10MB (which does add up).  I am going to see if sync is working now.
Woo!  It seemed to take 20 minutes and 9% of my battery life, but sync settings now say that I have had a successful sync for the first time in 5 days (and I can see my desktop tabs in FF on Android now).

I'll let you guys decide if a server bug needs to be opened about my account growing so large in the first place.
Sigh, looks like I spoke to soon.  My account is now at 31.4MB (according to my desktop FF) and FF on Android is once again devouring my battery.  Can't check logs now.
Pretty confident that the quota report was wrong, rather than having uploaded 400MB!
(In reply to Richard Newman [:rnewman] from comment #16)
> Pretty confident that the quota report was wrong, rather than having
> uploaded 400MB!

It seems to change every time I look at it.  So I dunno.  One working sync in 5 days is still weird (and free memory seems unchanged).
(In reply to Nick Alexander :nalexander from comment #12)

> I think we'll take this ticket as, "improve our logging to not spew so
> badly".  We are trying to support ARMv6 devices with 256M of RAM, so it's
> possible we'll move on memory errors like yours.  Until then, sorry that
> Sync brings down your hardware!

There are a few other things we could do here:

* Extend our local backoff interval if we die catastrophically; at least we won't keep hammering the device.
* If we get an OOM, enter a resource-constrained state, where we sync only one collection at a time. Special case of tuning for available resources.
* Track collections that fail due to OOM, and do the others first (until they're done), then last (so we eventually do try the failing connection with a possibly clean VM).


A really, really important thing to note: Android's GC doesn't compact. That's why we can end up with OOM in JSON parsing -- we get a big blob of text, we need to allocate a large chunk to process it, but the (ample!) free memory is too fragmented to allocate.

We *might* still see a benefit from running System.gc() at various points — between each engine, for example, or after each record batch. But otherwise we just have to hope we get lucky and get a fresh VM at some point.

That's probably what Sandy described in Comment 14 -- the process got restarted, so we got a fresh heap and were lucky enough to complete a sync before it fragmented.

Running engines in a different order (biggest first after failure, so it gets a clean slate) might get past a logjam.
> A really, really important thing to note: Android's GC doesn't compact.
> That's why we can end up with OOM in JSON parsing -- we get a big blob of
> text, we need to allocate a large chunk to process it, but the (ample!) free
> memory is too fragmented to allocate.

So, this is interesting, but I'm not sure it's relevant.  What do you think these large JSON blobs are?  I'm assuming we have them, since we fail in YYLex.java fairly frequently, but it was my understanding that the "average" record was something like 500 bytes.

I just did some testing, and our application/newlines streaming does in fact stream, so we should only be accumulating relatively small JSON objects.  The Sync server does toss back a large chunk of state after upload, but it *should* just be an array of GUIDs.
Severity: critical → major
Depends on: 818218
Priority: -- → P2
Whiteboard: [sync:scale]
Product: Mozilla Services → Android Background Services
Depends on: 726055
Blocks: 967996
On one hand, we still do a terrible job with logging in case of errors. On another hand, it's been 5 years, and modern devices might just not care (as much).

We also do have a considerate rate limiting for how often we'd sync, but we now also have restart triggering in case of failed collections.
Severity: major → normal
Priority: P2 → P4
Product: Android Background Services → Firefox for Android
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195

Needinfo :susheel if you think this bug should be re-triaged.
Priority: P4 → P5
We have completed our launch of our new Firefox on Android. The development of the new versions use GitHub for issue tracking. If the bug report still reproduces in a current version of [Firefox on Android nightly](https://play.google.com/store/apps/details?id=org.mozilla.fenix) an issue can be reported at the [Fenix GitHub project](https://github.com/mozilla-mobile/fenix/). If you want to discuss your report please use [Mozilla's chat](https://wiki.mozilla.org/Matrix#Connect_to_Matrix) server https://chat.mozilla.org and join the [#fenix](https://chat.mozilla.org/#/room/#fenix:mozilla.org) channel.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.