<a class="header-button" href="https://bugzilla-dev.allizom.org/home" title="Go to home page"> Bugzilla

Reporter

Comment 2

•

11 years ago

(In reply to Chris Turra [:cturra] from comment #1) > :Tomcat - could you help us find other occurrences of these http 503s? you > mention that you're seeing them intermittently, did this just start recently > or have you observed it for some time now? any more details you can provide > would be helpful. Hey Chris, so far this was the first time i have seen this and TBPL also reported this only one time. Ed Morley mentioned this issue with a 503 on ftp.m.o were seen in the past too (and if its happen frequently trees will closed" maybe he has a history of this issue here :)

Flags: needinfo?(cbook) → needinfo?(emorley)

Comment 3

•

11 years ago

We intermittently see 503s (maybe a few times a week), and often don't file, since the issues has resolved itself by the time it is spotted. However it seems worthwhile to track them in this bug, so we can see if there is a pattern (eg logrotate cron causing issues, or other similar things we've had happen in the past).

Flags: needinfo?(emorley)

Chris Turra [:cturra]

Comment 4

•

11 years ago

:edmorley - i agree it's worth looking into, but am going to need more information to do that. can you help me track down date/times that you have observed these in the past? and if there is and header information from those failed attempts, that would as be useful.

Flags: needinfo?(emorley)

Comment 5

•

11 years ago

Not easily sadly. What I meant by comment 3, is that this bug is going to be what we track them in from this point forwards - ie no action required here until we have sufficient data points.

Flags: needinfo?(emorley)

More instances: https://tbpl.mozilla.org/php/getParsedLog.php?id=30422479&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=30422559&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=30422657&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=30422690&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=30422646&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=30422689&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=30422776&tree=Mozilla-Inbound All trees closed.

Comment 6

•

11 years ago

Severity: normal → blocker

Updated

•

11 years ago

Summary: Intermittent ERROR 503: Server Too Busy. on ftp.m.o → Intermittent ftp.m.o "ERROR 503: Server Too Busy" or "command timed out: 1200 seconds without output, attempting to kill" during download-and-extract step

https://tbpl.mozilla.org/php/getParsedLog.php?id=30422515&tree=Mozilla-Central https://tbpl.mozilla.org/php/getParsedLog.php?id=30422985&tree=Mozilla-Central https://tbpl.mozilla.org/php/getParsedLog.php?id=30422943&tree=Mozilla-Central https://tbpl.mozilla.org/php/getParsedLog.php?id=30422966&tree=Mozilla-Central etc

Comment 7

•

11 years ago

bhearsum@mozilla.com (:bhearsum)

Comment 8

•

11 years ago

This most recent episode appears to only be hitting ec2 slaves, which makes me think that this our recurring link-to-aws-is-slow issue. I'm getting 12M/sec now though, so it might be over already.

Brandon Burton [:solarce]

Comment 9

•

11 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #8) > This most recent episode appears to only be hitting ec2 slaves, which makes > me think that this our recurring link-to-aws-is-slow issue. I'm getting > 12M/sec now though, so it might be over already. Have all the failures noted in this bug been from EC2 or have any been from a Mozilla DC? Dropping to normal for now

Severity: blocker → normal

Comment hidden (Legacy TBPL/Treeherder Robot)

https://tbpl.mozilla.org/php/getParsedLog.php?id=31867036&tree=Fx-Team

Reporter

Comment 55

•

11 years ago

Comment hidden (Legacy TBPL/Treeherder Robot)

https://tbpl.mozilla.org/php/getParsedLog.php?id=32981144&tree=Mozilla-Central

Comment 366

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=32980910&tree=Mozilla-Inbound

Comment 367

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=32981136&tree=B2g-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=32981201&tree=B2g-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=32981094&tree=B2g-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=32981198&tree=B2g-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=32981204&tree=B2g-Inbound

Comment 368

•

11 years ago

Comment hidden (Legacy TBPL/Treeherder Robot)

https://tbpl.mozilla.org/php/getParsedLog.php?id=33033273&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=33033249&tree=Mozilla-Inbound

Comment 371

•

11 years ago

Comment hidden (Legacy TBPL/Treeherder Robot)

https://tbpl.mozilla.org/php/getParsedLog.php?id=33107996&tree=Mozilla-Inbound

Comment 373

•

11 years ago

Comment hidden (Legacy TBPL/Treeherder Robot)

Reporter

Comment 583

•

11 years ago

raising severity since this spiked last night

Severity: normal → major

Comment hidden (Legacy TBPL/Treeherder Robot)

Rick Bryce [:rbryce]

Comment 658

•

11 years ago

(In reply to Carsten Book [:Tomcat] from comment #583) > raising severity since this spiked last night Raising the severity is fine, but that also causes this bug to page the oncall sysadmin. Reading the comments here, I have no idea what action needs to be taken. -> P1:normal so it doesn't page

Comment hidden (Legacy TBPL/Treeherder Robot)

Rick Bryce [:rbryce]

Updated

•

11 years ago

Severity: major → normal

Priority: -- → P1

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 669

•

11 years ago

What to do is say "in its current state, this bug is purely depending on bug 957502, so there's no point in twiddling flags here."

Depends on: 957502

Comment hidden (Legacy TBPL/Treeherder Robot)

Updated

•

11 years ago

Depends on: 961030

Comment hidden (Legacy TBPL/Treeherder Robot)

https://tbpl.mozilla.org/php/getParsedLog.php?id=33324631&tree=Mozilla-Inbound

Reporter

Comment 842

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=33328461&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=33328468&tree=Mozilla-Inbound

Reporter

Comment 843

•

11 years ago

Comment hidden (Legacy TBPL/Treeherder Robot)

Chris Turra [:cturra]

Comment 852

•

11 years ago

i would like to request that we stop sending these tbpl robot comments to this bug. it's great that we're tracking the frequency of these timeouts and i firmly believe we should continue to do so, but don't think that filling up a bug with comments is the right way to do that. additionally, while this information is useful, there are a couple dependent bugs here that have been identified as the root cause (bugs 957502 & 961030). it might be better to track the frequency of timeouts with those respectively?

Flags: needinfo?(emorley)

Ryan VanderMeulen [:RyanVM]

Comment 853

•

11 years ago

There's not a way to stop them on a per-bug basis. We need to be able to star these failures, so there's really no way to avoid it. For frequently-occurring bugs, we recommend leaving this bug as a dumping ground for stars and use dependencies for investigating and fixing the underlying causes.

Flags: needinfo?(emorley)

Comment hidden (Legacy TBPL/Treeherder Robot)

https://tbpl.mozilla.org/php/getParsedLog.php?id=33394155&tree=B2g-Inbound

Reporter

Comment 1518

•

11 years ago

Jake Maul [:jakem]

Comment 1519

•

11 years ago

I switched ftp.mozilla.org away from Dynect DNS load balancing to simple DNS Round Robin. This should result in better load balancing between the two IPs hosting ftp.mozilla.org. The suspect problem solved by this is a thundering herd problem. Many (all?) of the releng nodes downloading files from ftp.mozilla.org do so in a very bursty fashion, and they all use the same DNS resolvers. Dynect only returns one record, not both. The effect is that all of these machines will burst all at once to the same IP, and (I think) hit its 2gbps data transfer cap (licensing). That's why there's 2 IPs in the first place, to get up to 4gbps. DNS Round Robin returns both IPs and leaves it up to the client to decide which to use. Thanks to the law of large numbers, this results in a very even distribution of traffic between the two IPs, because most operating systems will randomize (or otherwise alternate) which IP in a result set they use. Since making this change the traffic has not yet been high enough to where we think the problem was triggering, so at the moment I can't say for certain how effective this change is. I can say the bandwidth distribution between the two nodes is much more even and consistent... what I can't say is if this will actually cure the problem reported here.

Comment hidden (Legacy TBPL/Treeherder Robot)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 1557

•

11 years ago

(In reply to Jake Maul [:jakem] from comment #1519) > Since making this change the traffic has not yet been high enough to where > we think the problem was triggering, so at the moment I can't say for > certain how effective this change is. I can say the bandwidth distribution > between the two nodes is much more even and consistent... what I can't say > is if this will actually cure the problem reported here. I'm leaning towards it not being the cure.

:glob ✱

Comment 1558

•

11 years ago

in order to prevent further tbpl-robot comments on this bug i've enabled comment restricting.

Restrict Comments: true

Comment 1559

•

11 years ago

It was the cure for a different disease, the tiny smattering of 503 Too Busy failures during the day; the choking out of the VPN around 19:30 every night resulting in the timeouts is another thing entirely.

Comment 1560

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=33463523&tree=Mozilla-Inbound, though, says it wasn't entirely a magic bullet for 503 Too Busy.

And https://tbpl.mozilla.org/php/getParsedLog.php?id=33463530&tree=Mozilla-Central, https://tbpl.mozilla.org/php/getParsedLog.php?id=33463529&tree=Mozilla-Central.

Comment 1561

•

11 years ago

Ryan VanderMeulen [:RyanVM]

Comment 1562

•

11 years ago

Trees closed again, not inclined to open them any time soon. This / bug 957502 needs escalating asap, my patience is somewhat diminishing :-( I'll send some emails out.

Comment 1563

•

11 years ago

Things seem to have settled down, so I've reopened for now.