Python 3 "TypeError: a bytes-like object is required, not 'str'"
Categories
(Tree Management :: Treeherder, defect, P1)
Tracking
(Not tracked)
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(1 file)
(deleted),
text/x-github-pull-request
|
Details |
Under Python 3, most of the log parsing tests fail with:
> if 'PERFHERDER_DATA' not in line:
E TypeError: a bytes-like object is required, not 'str'
(eg https://travis-ci.org/mozilla/treeherder/jobs/490990871#L1200)
I believe this is due to our using the python request package's iter_lines()
function, but not passing decode_unicode=True
.
Assignee | ||
Comment 1•6 years ago
|
||
This is because iter_lines()
returns bytes by default, unless decode_unicode=True
is passed:
http://docs.python-requests.org/en/master/user/advanced/#streaming-requests
However if we use decode_unicode=True
this means that iter_lines()
now splits log lines not only on the standard newline characters (\n
, \r
and \r\n
) but also any newline-related Unicode characters such as \u0085.
One might think that wouldn't affect us, however it turns out some test logs include characters like \u0085
, and so using decode_unicode=True
means:
- the line numbers in such logs are changed (which means the line number we record for errors may differ; affecting log viewer linking)
- the error message is split over multiple lines
We need to ensure we handle such characters the same as react-lazylog
, so that when we link to a particular error line, the correct line is highlighted. Looking at their implementation, only traditional newlines are treated as newlines:
https://github.com/mozilla-frontend-infra/react-lazylog/blob/v3.1.3/src/utils.js#L3-L7
...and as such the existing Treeherder behaviour seems to be correct here, so we must preserve that.
One possible workaround appeared to be to use iter_lines()
's delimiter
option, however that takes the passed string literally, and doesn't allow passing several variations (to be able to specify all of \n
, \r
and \r\n
etc).
Therefore the best approach seems to be to not use decode_unicode=True
at all, and instead manually decode each line only after iter_lines()
has already split the response.
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
|
||
https://github.com/mozilla/treeherder/commit/4bdf9f91018b3a3306bf78db947a512f24225bab
https://github.com/mozilla/treeherder/commit/be58ab1b48b91ec9c4ba4f811a2e8e3a600ca4ab
https://github.com/mozilla/treeherder/commit/e1ebb72c1d0b24bc61411005f133f24a81a11f23
Updated•3 years ago
|
Description
•