Open
Bug 1268484
Opened 9 years ago
Updated 3 years ago
Fuzzy autoclassification using ElasticSearch
Categories
(Tree Management :: Treeherder, defect, P3)
Tree Management
Treeherder
Tracking
(Not tracked)
NEW
People
(Reporter: jgraham, Unassigned)
References
Details
Attachments
(1 file)
To deal with cases where the autoclassifier doesn't make a match because of variable data, try using ElasticSearch to do word-only matching.
Reporter | ||
Updated•9 years ago
|
Comment 1•9 years ago
|
||
The output from running `./manage.py es_import_failure_lines` with a larger dyno size, to prevent the one-off dyno from being killed:
https://emorley.pastebin.mozilla.org/8869563
I've also filed an issue against Heroku for making `heroku run` more clearly show an error message, to save others from spending ages debugging like we did:
https://help.heroku.com/tickets/359578
Reporter | ||
Comment 2•9 years ago
|
||
https://treeherder-heroku.herokuapp.com/#/jobs?repo=try&revision=778e8b422d15&autoclassify
This seems to work, at least for simple things.
Comment 3•9 years ago
|
||
Reporter | ||
Updated•9 years ago
|
Attachment #8751686 -
Flags: review?(wlachance)
Attachment #8751686 -
Flags: review?(emorley)
Comment 4•9 years ago
|
||
Comment on attachment 8751686 [details]
[treeherder] mozilla:es_matcher > mozilla:master
Left some initial feedback. Please could you also add a multi-line commit message explaining the reasoning for the change and an overview of the feature. There are a few other places where there could be some additional inline comments/docstrings.
Re-request review when you'd like me to take a final look :-)
Attachment #8751686 -
Flags: review?(emorley)
Reporter | ||
Updated•8 years ago
|
Attachment #8751686 -
Flags: review?(emorley)
Updated•8 years ago
|
Assignee: nobody → james
Updated•8 years ago
|
Attachment #8751686 -
Flags: review?(emorley) → review+
Comment 5•8 years ago
|
||
Comment on attachment 8751686 [details]
[treeherder] mozilla:es_matcher > mozilla:master
I don't have any experience with elasticsearch, but this seems to make sense to me.
Attachment #8751686 -
Flags: review?(wlachance) → review+
Reporter | ||
Comment 6•8 years ago
|
||
Comment on attachment 8751686 [details]
[treeherder] mozilla:es_matcher > mozilla:master
I added some new commits which I think/hope improve the performance somewhat. At least it hasn't entirely blown up on heroku at the moment.
Attachment #8751686 -
Flags: review?(wlachance)
Attachment #8751686 -
Flags: review?(emorley)
Attachment #8751686 -
Flags: review+
Comment 7•8 years ago
|
||
Comment on attachment 8751686 [details]
[treeherder] mozilla:es_matcher > mozilla:master
This pretty much all looks fine to me, but I'm definitely not an elasticsearch expert.
Attachment #8751686 -
Flags: review?(wlachance) → review+
Updated•8 years ago
|
Attachment #8751686 -
Flags: review?(emorley) → review+
Reporter | ||
Updated•8 years ago
|
Attachment #8751686 -
Flags: review+ → review?(emorley)
Updated•8 years ago
|
Attachment #8751686 -
Flags: review?(emorley) → review+
Comment 8•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/52607c2a57c9f9780b2b7b96c39124db700357d2
Bug 1268484 - Add elastic-search based matcher for test failure lines (#1488)
Add support for matching test failures where the test, subtest, status,
and expected status are all exact matches, but the message is not an
exact match. The matching uses ElasticSearch and is initially optimised
for cases where the messages differ only in numeric values since this is
a relatively common case.
This commit also adds ElasticSearch to the travis environment.
Comment 9•8 years ago
|
||
Elasticsearch 5.1.2 is now out and supported by Elastic Cloud.
The addon on the treeherder-prototype app is currently using Elasticsearch 2.3.5.
Given we'll need to set up new Elasticsearch addons on stage/prod when this lands, I think it makes sense to use 5.x from the outset.
Please can we try updating both the treeherder-prototype's addon and also the Python clients to match?
Depends on: 1331397
Reporter | ||
Comment 10•7 years ago
|
||
We now have a clearer plan on how to improve this:
* ML approaches seem like overkill; text search probably does what we need whilst benefiting from robust implementations like ES.
* Instead of working line-by-line, we should consider the full set of lines from a document together. In the simplest case we can tokenize to remove expected-useless data and do an exact match on the full set of lines from a previous job. This has a couple of nice properties, notably that any context-dependent classifications (cases where the classification of identical lines with text M depends on whether they follow lines with text X or Y) will be retained as long as we see the same lines in the errorsummary file.
* For cases where a full match fails to produce a result, we can take the lines and match using Tf-idf weighting to find the most similar previous classifications, and apply some threshold to only get good-enough matches.
* To the extent that it's possible to work with full job summaries, it should be possible to work with job classification data rather than line classification data, which will allow more jobs to be autoclassified. It's not clear to me if this can work for the case where we aren't matching a full error summary.
Updated•7 years ago
|
Component: Treeherder → Treeherder: Log Parsing & Classification
Updated•7 years ago
|
Assignee: james → nobody
Updated•5 years ago
|
Priority: -- → P3
Assignee | ||
Updated•3 years ago
|
Component: Treeherder: Log Parsing & Classification → TreeHerder
You need to log in
before you can comment on or make changes to this bug.
Description
•