Closed Bug 1211415 Opened 9 years ago Closed 9 years ago

Treeherder's API is getting hammered again, causing HTTP 500s

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mlankford, Unassigned)

References

Details

Intermittent errors with page - https://treeherder.mozilla.org/ Unknown repository. This repository is either unknown to Treeherder or it doesn't exist. If this repository does exist, please file a bug against the Treeherder product in Bugzilla to get it added to the system.
I also see "Service Unavailable" on most reload retries. CC'ing some treeherder folks too.
Group: mozilla-employee-confidential
Probably similar to the incidents from bug 1203518.
Trunk trees are closed. Escalating to P1: Blocker.
Severity: normal → blocker
Priority: -- → P1
Summary: Intermittent errors with page - https://treeherder.mozilla.org/ Unknown repository → Treeherder's API is getting hammered again, causing HTTP 500s
Briefly banned 217.111.161.212, as it had a reasonable number of connections open at the time, all of which had a somewhat crazy looking query (see below). treeherder became responsive shortly thereafter. unbanned the IP to see if it was actually the problem, and we haven't had any further issues. will have to look through the logs to see if there's a more likely culprit. treeherder.mozilla.org/api/project/mozilla-inbound/jobs/?count=2000&last_modified__gt=2015-09-25T20:36:20.000&result_set_id__in=21015,21014,21013,21012,21011,21010,21009,21008,21007,21006,21005,21004,21003,21002,21001,21000,20999,20998,20997,20996,20995,20994,20993,20992,20991,20990,20989,20988,20987,20986,20985,20984,20983,20982,20981,20980,20979,20978,20977,20976,20975,20974,20973,20972,20971,20970,20969,20968,20967,20966,20965,20964,20963,20962,20961,20960,20959,20958,20957,20956,20955,20954,20953,20952,20951,20950,20949,20948,20947,20946,20945,20944,20943,20942,20941,20940,20939,20938,20937,20936,20935,20934,20933,20932,20931,20930,20929,20928,20927,20926,20925,20924,20923,20922,20921,20920,20919,20918,20917,20916,20915,20914,20913,20912,20911,20910,20909,20908,20907,20906,20905,20904,20903,20902,20901,20900,20899,20898,20897,20896,20895,20894,20893,20892,20891,20890,20889,20888,20887,20886,20885,20884,20883,20882,20881,20880,20879,20878,20877,20876,20875,20874,20873,20872,20871,20870,20869,20868,20867,20866,20865,20864,20863,20862,20861,20860,20859,20858,20857,20856,20855,20854,20853,20852,20851,20850,20848,20849,20847,20846,20845,20844,20843,20842,20841,20840,20839,20838,20837,20836,20835,20834,20833,20832,20831,20830,20829,20828,20827,20826,20825,20824,20823,20822,20821,20820,20819,20818,20817,20816,20815,20814,20813,20812,20811,20810,20809,20808,20807,20806,20805,20804,20803,20802,20801,20800,20799,20798,20797,20796,20795,20794,20793,20792,20791,20790,20789,20788,20787,20786,20785,20784,20783,20782,20781,20780,20779,20778,20777,20776,20775,20774,20773,20772,20771,20770,20769,20768,20767,20766,20765,20764,20763,20762,20761,20760,20759,20758,20757,20756,20755,20754,20753,20752,20751,20750,20749,20748,20747,20746,20745,20744,20743,20742,20741,20740,20739,20738,20737,20736,20735,20734,20733,20732,20731,20730,20729,20728,20727,20726,20725,20724,20723,20722,20721,20720,20719,20718,20717,20716,20715,20714,20713,20712,20711,20710,20709,20708,20707,20706,20705,20704,20703,20702,20701,20700,20699,20698,20697,20696,20695,20694,20693,20692,20691,20690,20689,20688,20687,20686,20685,20684,20683,20682,20681,20680,20679,20678,20677,20676,20675,20674,20673,20672,20671,20670,20669,20668,20667,20666,20665,20664,20663,20662,20661,20660,20659,20658,20657,20656,20655,20654,20653,20652,20651,20650,20649,20648,20647,20646,20645,20644,20643,20642,20641,20640,20639,20638,20637,20636,20635,20634,20633,20632,20631,20630,20629,20628,20627,20626,20625,20624,20623,20622,20621,20620,20619,20618,20617,20616,20615,20614,20613,20612,20611,20610,20609,20608,20607,20606,20605,20604,20603,20602,20601,20600,20599,20598,20597,20596,20595,20594,20593,20592,20591,20590,20589,20588,20587,20586,20585,20584,20583,20582,20581,20580,20579,20578,20577,20576,20575,20574,20573,20572,20571,20570,20569,20568,20567,20566,20565,20564,20563,20562,20561,20560,20559,20558,20557,20556,20555,20554,20553,20552,20551,20550,20549,20548,20547,20546,20545,20544,20543,20542,20541,20540,20539,20538,20537,20536,20535,20534,20533,20532,20531,20530,20529,20528,20527,20526,20525,20524,20523,20522,20521,20520,20519,20518,20517,20516,20515,20514,20513,20512,20511,20510,20509,20508,20507,20506,20505,20504,20503,20502,20501,20500,20499,20498,20497,20496,20495,20494,20493,20492,20491,20490,20489,20488,20487,20486,20485,20484,20483,20482,20481,20480,20479,20478,20477,20476,20475,20474,20473,20472,20471,20470,20469,20468,20467,20466,20465,20464,20463,20462,20461,20460,20459,20458,20457,20456,20455,20454,20453,20452,20451,20450,20449,20448,20447,20446,20445,20444,20443,20442,20441,20440,20439,20438,20437,20436,20435,20434,20433,20432,20431,20430,20429,20428,20427,20426,20425,20424,20423,20422,20421,20420,20419,20418,20417,20416,20415,20414,20413,20412,20411,20410,20409,20408,20407,20406,20405,20404,20403,20402,20401,20400,20399,20398,20397,20396,20395,20394,20393,20392,20391,20390,20389,20388,20387,20386,20385,20384,20383,20382,20381,20380,20379,20378,20377,20376,20375,20374,20373,20372,20371,20370,20369,20368,20367,20366,20365,20364,20363,20362&return_type=list
We had some 500s again today and in /var/log/gunicorn/treeherder_error.log-20151006 on th-prod-web1 I saw a fair number of: [2015-10-12 15:28:44 +0000] [2019] [CRITICAL] WORKER TIMEOUT (pid:28111) [2015-10-12 15:29:08 +0000] [2019] [CRITICAL] WORKER TIMEOUT (pid:28251) [2015-10-12 15:29:40 +0000] [2019] [CRITICAL] WORKER TIMEOUT (pid:28107) [2015-10-12 15:29:58 +0000] [2019] [CRITICAL] WORKER TIMEOUT (pid:28303) [2015-10-12 15:33:07 +0000] [2019] [CRITICAL] WORKER TIMEOUT (pid:28427) [2015-10-12 15:33:41 +0000] [2019] [CRITICAL] WORKER TIMEOUT (pid:28466) Not sure if that's related.
Depends on: 1221064
Depends on: 1221806
Depends on: 1221816
Bug 1221064 has improved the perf of the /jobs/ endpoint quite considerably - the API request in comment 4 now only takes 7 secs to run. It's also worth noting that the IP in comment 4 was for the london office - looks like someone had treeherder open on mozilla-inbound and had pressed the "get next 100" button many times, to the point at which they had 650 pushes open - hence the massive list of result_set_id__in ids. The best fix for that would be bug 1107667. Finally, shortening the gunicorn timeout from 120s to 30s (bug 1221806) will help reduce the impact of any other footguns in the future :-)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.