Closed Bug 1760052 Opened 3 years ago Closed 3 years ago

(Production trees closed) abort: HTTP Error 500: Internal Server Error / HTTP error fetching bundle: HTTP Error 403: Forbidden

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: imoraru, Unassigned)

References

(Regression)

Details

(Keywords: intermittent-failure)

We got this kind of failures: https://treeherder.mozilla.org/logviewer?job_id=371401329&repo=mozilla-release&lineNumber=37
and this: https://treeherder.mozilla.org/logviewer?job_id=371404088&repo=autoland&lineNumber=34

Also pull command doesn't work in VM machines and at the same time with this failures we got some alert messages in vcs channel from slack:

Thu 15:35:13 UTC [95233] [devsvcslist] hgssh1.dmz.mdc1.mozilla.com:Zookeeper - hg is WARNING ENSEMBLE WARNING - node (hgweb1.dmz.mdc1.mozilla.com) is alive but not available
5:35
:nagios_status_warning_yellow: Thu 15:35:13 UTC [95235] [devsvcslist] hgweb4.dmz.mdc1.mozilla.com:Zookeeper - hg is WARNING ENSEMBLE WARNING - node (hgweb1.dmz.mdc1.mozilla.com) is alive but not available
5:35
:nagios_status_warning_yellow: Thu 15:35:15 UTC [95237] [devsvcslist] hgweb2.dmz.mdc1.mozilla.com:Zookeeper - hg is WARNING ENSEMBLE WARNING - only have 3/4 expected followers
5:35
:nagios_status_warning_yellow: Thu 15:35:18 UTC [95239] [devsvcslist] hgweb1.dmz.mdc1.mozilla.com:Zookeeper - hg is WARNING ENSEMBLE WARNING - only have 3/4 expected followers
5:35
:nagios_status_warning_yellow: Thu 15:35:44 UTC [95241] [devsvcslist] hgweb4.dmz.mdc1.mozilla.com:httpd max clients is WARNING Using 53 out of 53 Clients
5:36
:nagios_status_critical_red: Thu 15:35:59 UTC [95243] [devsvcslist] hgweb2.dmz.mdc1.mozilla.com:hg vcsreplicator lag is CRITICAL CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
5:36
:nagios_status_ok_green: Thu 15:36:21 UTC [95245] [devsvcslist] hgweb2.dmz.mdc1.mozilla.com:Zookeeper - hg is OK zookeeper node and ensemble OK
5:36
:nagios_status_ok_green: Thu 15:36:21 UTC [95247] [devsvcslist] hgssh1.dmz.mdc1.mozilla.com:Zookeeper - hg is OK zookeeper node and ensemble OK
5:36
:nagios_status_critical_red: Thu 15:36:29 UTC [95249] [devsvcslist] hgweb1.dmz.mdc1.mozilla.com:Zookeeper - hg is CRITICAL CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
5:37
:nagios_status_ok_green: Thu 15:37:14 UTC [95251] [devsvcslist] hgweb4.dmz.mdc1.mozilla.com:Zookeeper - hg is OK zookeeper node and ensemble OK
5:37
:nagios_status_ok_green: Thu 15:37:29 UTC [95253] [devsvcslist] hgweb1.dmz.mdc1.mozilla.com:Zookeeper - hg is OK zookeeper node and ensemble OK
5:38
:nagios_status_ok_green: Thu 15:38:02 UTC [95255] [devsvcslist] hgweb2.dmz.mdc1.mozilla.com:hg vcsreplicator lag is OK OK - 2/8 consumers out of sync but within tolerances
5:38
:nagios_status_critical_red: Thu 15:38:34 UTC [95257] [devsvcslist] hgssh1.dmz.mdc1.mozilla.com:hg pulse notifier lag is CRITICAL CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
5:38
:nagios_status_ok_green: Thu 15:38:48 UTC [95259] [devsvcslist] hgweb2.dmz.mdc1.mozilla.com:Load is OK OK - load average: 16.35, 32.30, 32.47
5:39
:nagios_status_critical_red: Thu 15:39:10 UTC [95261] [devsvcslist] hgweb3.dmz.mdc1.mozilla.com:hg vcsreplicator lag is CRITICAL CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
5:39
:nagios_status_ok_green: Thu 15:39:44 UTC [95263] [devsvcslist] hgweb4.dmz.mdc1.mozilla.com:httpd max clients is OK Using 14 out of 53 Clients
5:39
:nagios_status_warning_yellow: Thu 15:39:55 UTC [95265] [devsvcslist] hgweb4.dmz.mdc1.mozilla.com:Load is WARNING WARNING - load average: 34.26, 41.01, 33.51
5:40
:nagios_status_ok_green: Thu 15:40:34 UTC [95267] [devsvcslist] hgssh1.dmz.mdc1.mozilla.com:hg pulse notifier lag is OK OK - 1/1 consumers completely in sync
5:41
:nagios_status_warning_yellow: Thu 15:41:12 UTC [95269] [devsvcslist] hgweb3.dmz.mdc1.mozilla.com:hg vcsreplicator lag is WARNING WARNING - 2/8 partitions out of sync
5:41
:nagios_status_critical_red: Thu 15:41:42 UTC [95271] [devsvcslist] hgssh1.dmz.mdc1.mozilla.com:hg sns notifier lag is CRITICAL CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
5:43
:nagios_status_ok_green: Thu 15:43:05 UTC [95273] [devsvcslist] hgssh1.dmz.mdc1.mozilla.com:hg push data aggregator lag is OK OK - 19 messages from 1 partitions behind
5:43
:nagios_status_critical_red: Thu 15:43:23 UTC [95275] [devsvcslist] hgweb3.dmz.mdc1.mozilla.com:hg vcsreplicator lag is CRITICAL CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
5:44
:nagios_status_ok_green: Thu 15:44:55 UTC [95278] [devsvcslist] hgweb4.dmz.mdc1.mozilla.com:Load is OK OK - load average: 18.34, 24.63, 28.33
5:46
:nagios_status_ok_green: Thu 15:46:42 UTC [95280] [devsvcslist] hgssh1.dmz.mdc1.mozilla.com:hg sns notifier lag is OK OK - 1/1 consumers completely in sync
5:47
:nagios_status_ok_green: Thu 15:46:59 UTC [95282] [devsvcslist] hg.public.mdc1.mozilla.com:https - /try is OK HTTP OK: HTTP/1.1 200 Script output follows - 29241 bytes in 4.384 second response time
5:48
:nagios_status_critical_red: Thu 15:48:04 UTC [95284] [devsvcslist] hgweb1.dmz.mdc1.mozilla.com:hg vcsreplicator lag is CRITICAL CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
New
5:48
:nagios_status_critical_red: Thu 15:48:45 UTC [95286] [devsvcslist] hgssh1.dmz.mdc1.mozilla.com:hg push data aggregator lag is CRITICAL CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.

Conversation can be found in vcs channel on slack, issue cleared up.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

The 403s were due to an incorrect ACL permission on a new GCS bucket. I updated the ACLs manually via the cloud console and I'll fix the inconsistency in Terraform this afternoon.

Regressed by: 1760180
Depends on: 1760180
Regressed by: 1749820
No longer regressed by: 1760180
You need to log in before you can comment on or make changes to this bug.