Closed Bug 1453967 Opened 7 years ago Closed 6 years ago

hashlib "TypeError: Unicode-objects must be encoded before hashing" under Python 3

Categories

(Tree Management :: Treeherder: Data Ingestion, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

Under Python 3, any test that ingests jobs currently fails with: """ vagrant ~/treeherder $ pytest tests/etl/test_job_ingestion.py -x ... @staticmethod def calculate_hash(options): """returns an option_collection_hash given a list of options""" options = sorted(list(options)) sha_hash = sha1() # equivalent to loop over the options and call sha_hash.update() > sha_hash.update(''.join(options)) E TypeError: Unicode-objects must be encoded before hashing treeherder/model/models.py:297: TypeError """ This is because Python 3's hashlib's .update() must be passed bytes not unicode: https://docs.python.org/3/library/hashlib.html#hash-algorithms https://docs.python.org/3/library/hashlib.html#hashlib.hash.update ...which means we just need to add an `.encode('utf-8')` or similar. There are several other uses of hashlib in the codebase that will also need adjusting. The question is: 1) Should we use `.encode('ascii')` or `.encode('utf-8')` ? - At the moment presumably under Python 2 we're using strings here, rather than unicode - Will one encoding or another change the resultant hashes for the strings that are actually seen here? 2) Do we have adequate test coverage to ensure we don't inadvertently change the existing hashes?
Assignee: nobody → emorley
Status: NEW → ASSIGNED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: