Closed
Bug 1295948
Opened 8 years ago
Closed 8 years ago
query_build_dir_url() doesn't support encoded characters in build URL
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(firefox53 fixed)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox53 | --- | fixed |
People
(Reporter: whimboo, Assigned: svezauzeto12, Mentored)
Details
(Whiteboard: [lang=py])
Attachments
(1 file, 1 obsolete file)
query_build_dir_url() [1] is doing some magic to find test packages as referenced in the test_packages.json file. It searches for the last '/' and replaces the remaining string with the appropriate test file. As I just noticed this doesn't work if the '/' is HTML encoded. Here an example:
https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public%2Fbuild%2Ftarget.test_packages.json
With this above logic we end up with a common.tests.zip file as:
https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/common.tests.zip
It strips off the 'public/build/' path which forces TC to ask for permissions to download the file, which ends-up with a 403 forbidden failure.
We should hardening the method so it can also cope with encoded slashes.
[1] https://dxr.mozilla.org/mozilla-central/rev/fe895421dfbe1f1f8f1fc6a39bb20774423a6d74/testing/mozharness/mozharness/mozilla/testing/testbase.py#150
Comment 1•8 years ago
|
||
I'm guessing that instead of looking for '/' in the url, we should use something that already parses urls like urlparse, and urldecode the path if we want to allow for encoded slashes.
Reporter | ||
Comment 2•8 years ago
|
||
Sounds like a good idea, and I feel it would be great for a mentored project. Do you want to take that Aki?
Reporter | ||
Updated•8 years ago
|
Whiteboard: [lang=py]
Assignee | ||
Comment 4•8 years ago
|
||
Can I work on this ? Thanks
Comment 5•8 years ago
|
||
Sure.
It looks like you've already contributed patches; let me know if you have any questions or need any guidance to write this patch.
Assignee | ||
Comment 6•8 years ago
|
||
If I understood correctly I should just decode url so it goes from :
https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public%2Fbuild%2Ftarget.test_packages.json
to :
https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build/target.test_packages.json
than can be done ( as you mentioned above ) with
urllib2.unquote(our_url)
after that I should parse url and return url path, which in our case should be:
/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build/target.test_packages.json
and then I should strip "target.test_packages.json" so filename can be added.
We should get this:
/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build (/filename is added after build )
Is that correct ?
If it is please review this commit. It is possible that I have made error even if the above assumption is correct, I have never worked on this large project before.
I have added comments in code so you follow my thoughts-slow more easily, we can just remove them later when you are satisfied with commit.
Thanks.
Assignee | ||
Comment 7•8 years ago
|
||
error when trying to push commit:
error publishing review request 95750: Error publishing: Bugzilla error: Aki Sasaki [:aki] (PTO, back Nov28) <aki@mozilla.com> is not currently accepting 'review' requests. (HTTP 500, API Error 225)
(review requests not published; visit review url to attempt publishing there)
Comment 8•8 years ago
|
||
I just unblocked review requests. Try again?
Comment hidden (mozreview-request) |
Assignee | ||
Comment 10•8 years ago
|
||
commited, thanks :)
Comment 11•8 years ago
|
||
mozreview-review |
Comment on attachment 8814566 [details]
Bug 1295948 - fixing method so it works with encoded URLs
https://reviewboard.mozilla.org/r/95752/#review95792
r- because this patch changes the return value from a url to a path.
In the review comments, I've shown a couple ways we can do this.
::: testing/mozharness/mozharness/mozilla/testing/testbase.py:169
(Diff revision 1)
>
> + #first we decode url
> + reference_url = urllib2.unquote(reference_url)
> +
> + #now we parse url so we can extract path
> + reference_url = urlparse(reference_url)
Here you overwrite the reference_url string with a `urlparse.ParseResult`.
::: testing/mozharness/mozharness/mozilla/testing/testbase.py:172
(Diff revision 1)
> +
> + #now we parse url so we can extract path
> + reference_url = urlparse(reference_url)
> +
> + #now we should extract path to string so we can find last slash
> + reference_url = reference_url.path
Here you overwrite `reference_url` with just the path. So if `reference_url` was once `https://server.name/path/to/artifact`, it is now `/path/to/artifact`.
::: testing/mozharness/mozharness/mozilla/testing/testbase.py:177
(Diff revision 1)
> + reference_url = reference_url.path
> +
> + # and now this should work even with encoded urls ( which are not encoded anymore
> + # when they come by the time they come here
> last_slash = reference_url.rfind('/')
> base_url = reference_url[:last_slash]
Since `reference_url` no longer has the scheme or netloc (or any other url information, just the path), `base_url` will also be just a path.
::: testing/mozharness/mozharness/mozilla/testing/testbase.py:179
(Diff revision 1)
> + # and now this should work even with encoded urls ( which are not encoded anymore
> + # when they come by the time they come here
> last_slash = reference_url.rfind('/')
> base_url = reference_url[:last_slash]
>
> return '%s/%s' % (base_url, file_name)
Now, since `base_url` is just a path, we're returning a path instead of a url.
I can think of two things that might work:
1. remove the urlparse lines (lines 168-172). That means we'll just unquote and then remove everything in the url from the rightmost `/` onwards as the `base_url`, and add the filename. This is not proper behavior, because a url can have lots of info post-filename (e.g., https://server.name/path/to/artifact?query=string&query2=string2#anchor) but I don't think we use queries or anchors in the artifact urls, so this will probably be sufficient.
2. Instead of saying `reference_url = urlparse(reference_url)` and `reference_url = reference_url.path`, you can do something like
```python
from urlparse import urlparse, ParseResult
reference_url = "https://server/path/to/artifact?query=string#anchor"
file_name = "new_artifact"
# parts will be a list from the ParseResult:
# ['https', 'server', '/path/to/artifact', '', 'query=string', 'anchor']
parts = list(urlparse(reference_url))
last_slash = parts[2].rfind('/')
parts[2] = '/'.join([parts[2][:last_slash], file_name])
print ParseResult(*parts).geturl()
```
which changes `https://server/path/to/artifact?query=string#anchor` to `https://server/path/to/new_artifact?query=string#anchor` and is the proper behavior.
Does that help?
Attachment #8814566 -
Flags: review?(aki) → review-
Updated•8 years ago
|
Assignee: nobody → svezauzeto12
Assignee | ||
Comment 12•8 years ago
|
||
I have made new commit based on what we discussed.
Hope it is working now, I have made few small tests in my own script but I am still not 100% sure that it is correct.
I have made changes according to your 2. method which you described above.
Thanks
Comment hidden (mozreview-request) |
Comment 14•8 years ago
|
||
Thanks! There's at least one nit. I triggered https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b2c13ed853747fdf554fb0a2cddd200825ff2de to verify this works.
Comment 15•8 years ago
|
||
mozreview-review |
Comment on attachment 8814732 [details]
Bug 1295948 - fixing method so it works with encoded URLs, changed return method
https://reviewboard.mozilla.org/r/95826/#review96166
One small nit, below. I'd like to see a try run finish with a unit test and a talos test successfully passing this part.
I triggered a new one because the old one wasn't cooperating for some reason:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=63db69e1f268f3ad22f99d17a3a75244f3a3953e
Using a small python script for testing was a good approach! I'm pretty sure this is good, but I'd rather be on the safe side.
::: testing/mozharness/mozharness/mozilla/testing/testbase.py:174
(Diff revision 1)
> - # and now this should work even with encoded urls ( which are not encoded anymore
> - # when they come by the time they come here
> - last_slash = reference_url.rfind('/')
> - base_url = reference_url[:last_slash]
>
> - return '%s/%s' % (base_url, file_name)
> + return '%s' %url
You only need to `return url` :)
The '%s' is string replacement. It was there to join the `base_url` and `file_name` earlier, but we don't need that.
Attachment #8814732 -
Flags: review?(aki) → review-
Comment 16•8 years ago
|
||
Looks like the try run looks good, so the return statement is the only thing that needs fixing. Thank you!
Assignee | ||
Comment 17•8 years ago
|
||
I think this should be it.
Thanks for mentoring.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8814732 -
Attachment is obsolete: true
Comment 19•8 years ago
|
||
mozreview-review |
Comment on attachment 8814566 [details]
Bug 1295948 - fixing method so it works with encoded URLs
https://reviewboard.mozilla.org/r/95752/#review96524
Thanks!
Attachment #8814566 -
Flags: review?(aki) → review+
Comment 20•8 years ago
|
||
Pushed by asasaki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/46a0a73fde9a
fixing method so it works with encoded URLs r=aki
Comment 21•8 years ago
|
||
bugherder |
You need to log in
before you can comment on or make changes to this bug.
Description
•