Closed
Bug 1385900
Opened 7 years ago
Closed 7 years ago
generic-worker: Gracefully handle public/logs/live_backing.log being included via a directory artifact
Categories
(Taskcluster :: Workers, enhancement)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: pmoore, Unassigned)
References
Details
If public/logs/live_backing.log gets included in a payload artifact, it will cause the worker to panic, since the worker already explicitly uploads it, and when it gets included again, the queue will return with a HTTP 409 status code, which will cause the worker to panic and exit.
Options
1) internally, artifacts should be stored in a map, so they never get included more than once
and/or
2) the public/logs/live_backing.log artifact should be located outside of the task directory, so that the task cannot interfere with it, and it can't get included twice
In the case that a different file gets included as an artifact twice, we should make sure this also doesn't cause the worker to crash.
It goes without saying, tests should be added for all these things.
Reporter | ||
Comment 1•7 years ago
|
||
Note, if we can determine upfront that a file would get included twice as an artifact, we have the options of
1) rejecting as malformed-payload with an exception to explain why
2) allowing the duplicate artifact(s) but only uploading them once
Instinctively 1) feels safer than 2) since it is stricter, and might lead to less obscure issues downstream. I also can't think of a use case where including the same artifact more than once would be useful, and not an accident (although perhaps in a situation where a task payload is dynamically generated, and different logic was used to generate different artifacts, in some complex task generation process where it was difficult to intercept the artifact generation).
Also going with option 1) and then changing to option 2) is easier than the other way around.
But I'm still not sure.
Reporter | ||
Comment 2•7 years ago
|
||
In the end, I went with the following implementation details:
1) Artifacts that are uploaded by features (such as log files, chain-of-trust artifacts etc) are "reserved" when the task features start, which prevents them getting uploaded as a payload artifact (i.e. if public/logs/live_backing.log gets included via a task payload file/directory artifact, it will be skipped, and a warning will be logged to task log.
2) When a http 409 status code is returned by queue to a createArtifact request, the worker will resolve the task as exception/malformed-payload as it implies two different versions of the same artifact have been submitted
3) Artifacts that are uploaded by features are placed in a generic-worker/ folder inside the task directory, rather than directly in public/ to also reduce chance of collision, or a task overwriting an internal artifact
4) I added numerous tests that colliding artifacts result in a malformed-payload exception, that task artifacts don't override internal feature artifacts, etc
Note, for simplicity, malformed-payload exception is only calculated at end of task run, rather than upfront, since it is not always possible to know which artifacts will exist when the task completes, because e.g. directory artifacts are only determined once the task has completed. File artifacts can be determined upfront, however, if two files had same content, it would be valid for the artifact to be listed twice in the payload, once pointing to each file. In fact this mechanism could be used to ensure two files are identical (you could construct a task that uploaded both files with the same artifact name, and if they have different content, you could rely on the task being resolved as exception/malformed-payload - it isn't pretty, but it could conceivably be a valid use case).
I've released this in generic-worker 10.1.7.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•6 years ago
|
Component: Generic-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•