Closed
Bug 1332821
Opened 8 years ago
Closed 7 years ago
mochitest-browser-chrome-screenshots jobs with screenshots don't get marked as completed
Categories
(Testing :: mozscreenshots, defect)
Testing
mozscreenshots
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: MattN, Unassigned)
References
()
Details
The jobs are successfully completing on buildbot but TreeHerder[1] is showing them as still "running" for over 1000 minutes. I found the logs[2] on archive.mozilla.org and they look normal at first glance.
[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=aa3e49299a3aa5cb0db570532e3df9e75d30c2d1&filter-searchStr=screenshot&filter-tier=1&filter-tier=2&filter-tier=3&filter-resultStatus=success&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&filter-resultStatus=coalesced&group_state=expanded&selectedJob=70666805
[2] https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-pgo/1484910134/mozilla-central_win7_ix_test_pgo-mochitest-browser-screenshots-bm110-tests1-windows-build7.txt.gz
Reporter | ||
Comment 1•8 years ago
|
||
The problem seemed to start after bug 1329262 landed which added 60 more captured screenshot artifacts. Note that this job only captures on "Nightly" builds (not regular m-c builds) so if you see green builds since bug 1329262 landed on m-c (rev: fe22af79bacf) those are probably non-Nightly builds with no screenshot artifacts captured.
I'm guessing there is a problem ingesting so many screenshot artifacts. Maybe a request or database columns size limit?
I think I'll back bug 1329262 out for now and hopefully there are server logs somewhere that will pinpoint the issue.
Blocks: 1329262
Comment 2•8 years ago
|
||
I don't think this is a treeherder issue, but a buildbot one. Coop, could you get someone to investigate (ordinarily I would ask :catlee, but he seems to be away)?
Flags: needinfo?(coop)
Comment 3•8 years ago
|
||
I'll check the database, but from a casual check of the running and pending interfaces I don't think these are actually scheduled.
I'm curious how the extra runs were scheduled in the first place. Did someone retrigger?
Flags: needinfo?(coop)
Reporter | ||
Comment 4•8 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #3)
> I'll check the database, but from a casual check of the running and pending
> interfaces I don't think these are actually scheduled.
BuildAPI shows them as completed and the logs were uploaded to archive.mozilla.org. See [2] in comment 0.
> I'm curious how the extra runs were scheduled in the first place. Did
> someone retrigger?
There are three ss for each of the affected platforms (from left to right):
1) Regular PGO/opt scheduling in buildbot. These are green and completed because screenshots (at the time of push) don't get captured if the update channel isn't Nightly. Regular PGO/OPT use "default" as the channel. This is done in the code of the tests themselves since there wasn't a way to only schedule the jobs on Nightlies via BB.
2) These were scheduled for the Nightly builds that got triggered on this push. Since the update channel is "nightly" this job was expected to generate dozens of png artifacts using blobber. In [2] you see that the images were successfully captured and uploaded but TH didn't get told about the finished job.
3) These were schedule by me re-triggering #2 jobs when I saw that it wasn't finishing to see if it was an intermittent infra issue or whether it's a permanent issue. After seeing them also not complete in the expected time I filed this bug.
Comment 5•8 years ago
|
||
(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #1)
> I'm guessing there is a problem ingesting so many screenshot artifacts.
> Maybe a request or database columns size limit?
Looking at the job output directly on the buildbot-master, I see the following:
blobber_files {"20170120030214-primaryUI_099_tabsOutsideTitlebar_fiveTabs_maximized_allToolbars_compactLight.png": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/f46de60815d0077800e24b39ba8064503d79cb3a4be7b2c619d6a4a96bcefa3e118aadcab2b4af40bd35d2968764c4d82ebd811246bbe4b561e5386d679f3969", "20170120030214-controlCenter_011_noLWT_mixedPassive.png": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/0484f32a58dc9e3dedae5682d4c4ce80cf93b40475931418d7b182bd0b9 .. [property value too long]
That value is going into a TEXT field which by default in MySQL holds 65,535 chars. I checked the complete blobber_files prop string, and it's 187,866 chars.
We're going to have to rethink how we do this. I would suggest figuring out a way to make this work in TaskCluster vs expending effort in buildbot.
Reporter | ||
Comment 6•8 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #5)
> (In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from
> comment #1)
> > I'm guessing there is a problem ingesting so many screenshot artifacts.
> > Maybe a request or database columns size limit?
>
> Looking at the job output directly on the buildbot-master, I see the
> following:
>
> blobber_files
> {"20170120030214-
> primaryUI_099_tabsOutsideTitlebar_fiveTabs_maximized_allToolbars_compactLight
> .png":
> "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/
> f46de60815d0077800e24b39ba8064503d79cb3a4be7b2c619d6a4a96bcefa3e118aadcab2b4a
> f40bd35d2968764c4d82ebd811246bbe4b561e5386d679f3969",
> "20170120030214-controlCenter_011_noLWT_mixedPassive.png":
> "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/
> 0484f32a58dc9e3dedae5682d4c4ce80cf93b40475931418d7b182bd0b9 .. [property
> value too long]
>
> That value is going into a TEXT field which by default in MySQL holds 65,535
> chars. I checked the complete blobber_files prop string, and it's 187,866
> chars.
Thanks for investigating Chris! I figured some limit was getting hit.
> We're going to have to rethink how we do this. I would suggest figuring out
> a way to make this work in TaskCluster vs expending effort in buildbot.
OK, I just landed a patch for linux64 builds to use TaskCluster in bug 1332727 but the problem is that I need this job to run on every different OS configuration (where the UI differs) and we don't have full OS support yet AFAIK.
It seems like I should switch to TC as much as possible but for remaining OSs I will have to split the job into two parts I guess if there's no simple fix in BB.
Component: Treeherder: Data Ingestion → General
Product: Tree Management → Firefox
Version: --- → unspecified
Reporter | ||
Updated•7 years ago
|
Component: General → mozscreenshots
Product: Firefox → Testing
Reporter | ||
Comment 7•7 years ago
|
||
We're switched to TC and will disable BB in bug 1411811.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•