Closed Bug 1526743 Opened 6 years ago Closed 6 years ago

Python 3 "TypeError: a bytes-like object is required, not 'str'"

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4628 6 years ago GitHub Bugzilla PR Linker (deleted), text/x-github-pull-request		Details

Ed Morley [:emorley]

Assignee

Description

•

6 years ago

Under Python 3, most of the log parsing tests fail with:

>       if 'PERFHERDER_DATA' not in line:
E       TypeError: a bytes-like object is required, not 'str'

(eg https://travis-ci.org/mozilla/treeherder/jobs/490990871#L1200)

I believe this is due to our using the python request package's iter_lines() function, but not passing decode_unicode=True.

Ed Morley [:emorley]

Assignee

Comment 1

•

6 years ago

This is because iter_lines() returns bytes by default, unless decode_unicode=True is passed:
http://docs.python-requests.org/en/master/user/advanced/#streaming-requests

However if we use decode_unicode=True this means that iter_lines() now splits log lines not only on the standard newline characters (\n, \r and \r\n) but also any newline-related Unicode characters such as \u0085.

One might think that wouldn't affect us, however it turns out some test logs include characters like \u0085, and so using decode_unicode=True means:

the line numbers in such logs are changed (which means the line number we record for errors may differ; affecting log viewer linking)
the error message is split over multiple lines

We need to ensure we handle such characters the same as react-lazylog, so that when we link to a particular error line, the correct line is highlighted. Looking at their implementation, only traditional newlines are treated as newlines:
https://github.com/mozilla-frontend-infra/react-lazylog/blob/v3.1.3/src/utils.js#L3-L7

...and as such the existing Treeherder behaviour seems to be correct here, so we must preserve that.

One possible workaround appeared to be to use iter_lines()'s delimiter option, however that takes the passed string literally, and doesn't allow passing several variations (to be able to specify all of \n, \r and \r\n etc).

Therefore the best approach seems to be to not use decode_unicode=True at all, and instead manually decode each line only after iter_lines() has already split the response.

GitHub Bugzilla PR Linker

Comment 2

•

6 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4628 (deleted) — Details

Ed Morley [:emorley]

Assignee

Comment 3

•

6 years ago

https://github.com/mozilla/treeherder/commit/4bdf9f91018b3a3306bf78db947a512f24225bab
https://github.com/mozilla/treeherder/commit/be58ab1b48b91ec9c4ba4f811a2e8e3a600ca4ab
https://github.com/mozilla/treeherder/commit/e1ebb72c1d0b24bc61411005f133f24a81a11f23

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

3 years ago

Component: Treeherder: Log Parsing & Classification → TreeHerder

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Python 3 "TypeError: a bytes-like object is required, not 'str'"

Categories

(Tree Management :: Treeherder, defect, P1)

Tracking

(Not tracked)

People

(Reporter: emorley, Assigned: emorley)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Attachment

General

Description

File Name

Content Type