Closed Bug 1150304 Opened 10 years ago Closed 10 years ago

index: Restrict characters allowed in namespace (or encode them)

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jonasfj, Assigned: jonasfj)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

(deleted), text/x-github-pull-request
garndt
: review+
Details
The current index API looks as follows: GET /task/<namespace> (get task from index) PUT /task/<namespace> (put task into index) See: http://docs.taskcluster.net/services/index/ It would be nice if we could have end-points like: GET /task/<namespace>/artifacts/<artifactName> Which redirects to the queue, which redirects to the artifact... (Might need special tricks for secret artifacts). Anyways, we can't have that because current <namespace> format in the URI doesn't encode / or any other special characters. Doing so is obviously a breaking change. But I don't think there're many if any of the current namespaces that would change if we required that the argument be encoded by encodeURIComponent. Ie. in the future we should always do: task.routes = [ 'index.' + encodeURIComponent('my.namespace.something') ] And: var index = new taskcluster.Index(...); index.findTask(encodeURIComponent('my.namespace.something')) The only real change is that we just reduce the number if legal characters and require that illegal characters be encoded. Implementation-wise there is no difference in the index service, except we validate and reject namespace names that aren't URI encoded. IMO, We don't have to decode before storing the database. Is this crazy, will I break anything... If now shall we speed this up :) This would reduce allowed characters in namespace names to: [a-zA-Z0-9_.!~*'()-] With "." still carrying special meaning. taskcluster-index side I would change: route: '/task/:namespace(*)', Into: route: '/task/:namespace', At: https://github.com/taskcluster/taskcluster-index/blob/master/routes/api/v1.js#L119
@mshal, @jlal, @auswerk, Are you cool with this change? And do you know of anyone else using index who would be in trouble by this change? Note, we would only do: index.findTask(encodeURIComponent('my.namespace.something')) If the namespace we really really wanted to use contained a character not in the set of allowed characters: [a-zA-Z0-9_.!~*'()%-] (Sorry, I forgot to add % before, obviously it's allowed otherwise URI component encoding won't work). For most namespace name we would refrain from using illegal characters, so no encoding is necessary. (From a brief look at the inspector I could see any with "/" or other special characters in use)
Flags: needinfo?(mshal)
Flags: needinfo?(jlal)
Flags: needinfo?(aus)
I don't think that restricting the namespaces to that allowed set of characters would cause problems. AFAIK the only place we use characters outside of the set is in the artifact names, not the namespace (eg: a release build package may have spaces in the filename).
Flags: needinfo?(mshal)
Artifacts names also contains slashes "/", example: "public/logs/live.log" is a good example. Note, for the HTTP API using illegal characters in namespace will return some 4xx error. For indexing using routing key a message will be printed in log. At the moment we do allow indexing with routing keys like: "index.my/key.otherkey" After this, that wouldn't be allowed. But I think the concept of illegal values already exists in some form, AMQP allows for routing keys as bytes, so you can have invalid unicode in there. I suspect using that would cause internal errors in various places of the index :) So maybe limiting allow characters would harden the index too.
Oh right, I was only thinking of the final pathname of the artifact for some reason.
That seems fine to me. I don't actually need to change anything but, to be safe, I would have one small minor update to make to the taskcluster-npm-cache creation process to encodeURIComponent(namespace) but it's not a necessity before this change is committed.
Flags: needinfo?(aus)
Attached file Github PR (deleted) —
This also upgrades TC-base, so we keep azure connections alive, and upgrade to node 0.12, this improves latency a lot... @jlal, I can't assign your for review... Anyways, I'm not rolling this out before you clear the NI. I don't see any conflicts as all existing namespaces are within the allowed character set. This requires that task are indexed under string on the form: /^([a-zA-Z0-9_!~*'()%-]+\.)*[a-zA-Z0-9_!~*'()%-]+$/ So no empty strings between the dots, ie. "my.namespace..test" is not valid.
Assignee: nobody → jopsen
Status: NEW → ASSIGNED
Attachment #8596914 - Flags: review?(garndt)
Comment on attachment 8596914 [details] Github PR r+ with a couple of comments in GH.
Attachment #8596914 - Flags: review?(garndt) → review+
Flags: needinfo?(jlal)
Rolled out the patch... Allowed characters are: a-zA-Z0-9_!~*'()%- Please, let me know if this causes problems anywhere... @mshal, @jlal, you might want keep an eye on the things you index to make sure they are still being indexed...
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1158376
Component: TaskCluster → General
Product: Testing → Taskcluster
Target Milestone: --- → mozilla41
Version: unspecified → Trunk
Resetting Version and Target Milestone that accidentally got changed...
Target Milestone: mozilla41 → ---
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: