Closed Bug 985128 Opened 11 years ago Closed 7 years ago

WebappsUpdateTimer.js attempts to hit network during test runs, causing intermittent failures

Categories

(Firefox OS Graveyard :: Runtime, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ahal, Unassigned)

References

(Blocks 1 open bug)

Details

According to philor, if this timer [1] gets fired during a mochitest run, it will try to hit the network and cause a test failure. I haven't had a chance to look into this much, philor, could you elaborate on anything I'm missing and add this as a dependency to any other intermittent bugs you think it might be causing? [1] http://mxr.mozilla.org/mozilla-central/source/b2g/components/WebappsUpdateTimer.js#27
Flags: needinfo?(philringnalda)
This log seems to corroborate philor's theory: https://tbpl.mozilla.org/php/getParsedLog.php?id=35954437&full=1&branch=try#error0 In it you'll notice the webapps update timer was notified just before the tests started timing out.
Of course, it can't be as simple as "if it fires, it will hang" - there's the combination of bug 970239 and bug 983015, where it does appear to be the intermittent cause of the intermittent hang the inputmethod tests of mochitest-7, and there's bug 975867 where it appears to intermittently hang startup, and there's https://tbpl.mozilla.org/?tree=Try&rev=94f2424bf18b which strongly suggests that it really is either the webapps-update or the user-agent-updates or both doing it, but there's also the fact that it fires every run, in every mochitest chunk, and as far as I know only hangs mochitest-7, and there's also the ever so fun fact from that Try push that (unless I accidentally disabled something else while I was beating on network access with a club) there's actually a test in mochitest-3 and another in mochitest-5 which depend on webapps-update having run for them to work at all. (And I'm pretty sure that last is not from any of the other things I disabled, since I accidentally did a push without having neutered webapps-update, and in that push neither one failed, but mochitest-7 did hang.)
Blocks: 983015, 975867
Flags: needinfo?(philringnalda)
Do we have any theories as to why hitting the network would cause all of this? I'm not sure if I understand our working theory here very well, and am really curious to know more!
It's possible that the "hitting the network" part is unrelated. I believe what we know is: A) For the intermittents in bug 970239, the webapps-update-timer and user-agent-updates-timer are fired right before tests start timing out B) This patch [1] (which disables the timers) seems to fix the intermittents (though the patch also causes some other problems) I think the hitting the network part is just an assumption we jumped to since the webapps-update-timer does hit the network and that has been a source of failure in the past. [1] http://mxr.mozilla.org/mozilla-central/source/b2g/components/WebappsUpdateTimer.js#27
Thanks for the explanation. Looking at that code, it doesn't actually hit the network, at least, not synchronously: <http://mxr.mozilla.org/mozilla-central/source/b2g/components/WebappsUpdater.jsm#60> I think we should try to understand what is going on here before disabling the app update in our test. It _could_ be that we're doing something destructive to the apps which are currently running, or something. But this sounds like the kind of issue which might bite our users too.
Hitting the network is guilty until proven innocent, not the other way around (and it's not possible to prove innocence), as you know very well from zombocom. But, you're probably right and I should have started finding some other short way of describing this busted behavior, since I *think* (without having ever found the connection to be sure) that where we're getting updates is coming from ping.manifestURL, and if so, when I set that to localhost but didn't comment out the call to WebappsUpdater.updateApps() it didn't solve my problems. So, perhaps, if that's what URL it hits, then s/attempts to hit network/is of and by itself a disruption to tests and a cause of hangs/.
Oh, except those m3 and m5 failures on my first try run say that ping.manifestURL is not it. So maybe someone is going to have to actually install wireshark and an emulator build in a VM, and run mochitests, and see just what all it does hit.
(In reply to comment #6) > Hitting the network is guilty until proven innocent, not the other way around > (and it's not possible to prove innocence), as you know very well from > zombocom. Fair enough! :-) I *do* remember that glorious day when I debugged that problem for like 12 hours! One simple thing for someone to try out is to turn on the NSPR logging in netwerk/ and have a look at it. It should be much easier than resorting to wireshark etc.
Also interesting: https://tbpl.mozilla.org/php/getParsedLog.php?id=36205115&tree=Cedar is mochitests on the emulator-jb build, hanging in the very first test (https://tbpl.mozilla.org/php/getParsedLog.php?id=36277523&tree=Cedar is hanging in the next test when that first test is disabled); https://tbpl.mozilla.org/php/getParsedLog.php?id=36347229&tree=Cedar is them still hanging in the very first test with just the call to WebappsUpdater.updateApps() commented out; https://tbpl.mozilla.org/php/getParsedLog.php?id=36357267&tree=Cedar is them actually making it quite a ways in (though with a timeout in the test which was running when the user-agent-updates-timer fired, but at least not a hang), with my full set of "disable anything that looks even vaguely like the network-hitting sort of things that we disabled in desktop test harnesses years ago, and got enormous gains in test reliability from" pref changes. And if, as I strongly suspect is the case, we don't have any docs on "how do I, for various platforms/products, turn on and then see NSPR logging for network accesses of things other than localhost/mochi.test/etc. in a try push," we certainly should. Assuming it's possible, and wouldn't be 100K lines of mostly example.com and mochi.test to dig through, stopping when the log overflows the maximum size.
Blocks: 984620
I recently did hacks in bug 968200 to turn on NSPR logging, but you're right, the max log size *will* bite us in the back if we attempted that...
Blocks: 1001246
Closing all intermittent test failures for Firefox OS (since we're not focusing on it anymore). Please reopen if my search included your bug by mistake.
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.