Closed Bug 1526743 Opened 6 years ago Closed 6 years ago

Python 3 "TypeError: a bytes-like object is required, not 'str'"

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

Under Python 3, most of the log parsing tests fail with:

>       if 'PERFHERDER_DATA' not in line:
E       TypeError: a bytes-like object is required, not 'str'

(eg https://travis-ci.org/mozilla/treeherder/jobs/490990871#L1200)

I believe this is due to our using the python request package's iter_lines() function, but not passing decode_unicode=True.

This is because iter_lines() returns bytes by default, unless decode_unicode=True is passed:
http://docs.python-requests.org/en/master/user/advanced/#streaming-requests

However if we use decode_unicode=True this means that iter_lines() now splits log lines not only on the standard newline characters (\n, \r and \r\n) but also any newline-related Unicode characters such as \u0085.

One might think that wouldn't affect us, however it turns out some test logs include characters like \u0085, and so using decode_unicode=True means:

  • the line numbers in such logs are changed (which means the line number we record for errors may differ; affecting log viewer linking)
  • the error message is split over multiple lines

We need to ensure we handle such characters the same as react-lazylog, so that when we link to a particular error line, the correct line is highlighted. Looking at their implementation, only traditional newlines are treated as newlines:
https://github.com/mozilla-frontend-infra/react-lazylog/blob/v3.1.3/src/utils.js#L3-L7

...and as such the existing Treeherder behaviour seems to be correct here, so we must preserve that.

One possible workaround appeared to be to use iter_lines()'s delimiter option, however that takes the passed string literally, and doesn't allow passing several variations (to be able to specify all of \n, \r and \r\n etc).

Therefore the best approach seems to be to not use decode_unicode=True at all, and instead manually decode each line only after iter_lines() has already split the response.

Component: Treeherder: Log Parsing & Classification → TreeHerder
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: