Closed Bug 1494069 Opened 6 years ago Closed 6 years ago

[mozlint] Be more explicit with arguments and return values in `filterpaths`

Categories

(Developer Infrastructure :: Lint and Formatting, enhancement)

enhancement
Not set
normal

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: ahal, Assigned: ahal)

References

Details

Attachments

(8 files, 1 obsolete file)

(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
Currently the pathutils.filterpaths function accepts a linter definition and lintargs. It then extracts the values out of those that it needs. It can also modify these dictionaries in-place sometimes. This can lead to surprising behaviour and make debugging more difficult. I'd like to change it so that we need to explicitly pass in only the values that it needs to perform the filtering. This way not only will it be easier to keep track of, but we won't accidentally modify any references in-place. Instead, we'll need to explicitly return the values that the greater mozlint module expects. This will give us clear insight into the inputs and outputs of this function, in a sense making it more "dumb".
Scandir is a faster implementation of os.listdir since it caches file metadata as it works. Using listdir and then calling things like os.path.isfile after the fact will result in multiple system calls. Whereas with scandir, there will only be a single system call per file. Scandir is part of the stdlib in Python 3.6+, so the following can be used for Python 2/3 compatible code: try: from os import scandir, walk except ImportError: from scandir import scandir, walk
Often we specify globs in our exclude patterns, e.g: exclude: - **/node_modules - obj* However, these globs get expanded out to *every* file that matches them. This can sometimes be thousands or even tens of thousands of files. We then pass these paths on to the underlying linters and tell them to exclude them all. This causes a lot of overhead and slows down performance. This commit implements a "collapse" function. Given a set of paths, it'll collapse them into the smallest set of parent directories that contain the original set, and that don't contain any extra files. For example, given a directory structure like this: a -- foo.txt -- b -- bar.txt -- baz.txt -- c -- ham.txt -- d -- spam.txt Then the following will happen: >>> collapse(['a/foo.txt', 'a/b/bar.txt', 'a/c/ham.txt', 'a/c/d/spam.txt']) ['a/foo.txt', 'b/bar.txt', 'c'] Since all files under directory 'c' are specified by the original set (both 'c/ham.txt' and 'c/d/spam.txt'), we can collapse it down to just 'c'. However not all files under 'b' are specified (we're missing 'a/b/baz.txt'), so we can't collapse 'b' (and by extension also can't collapse 'a'). If we had included 'a/b/baz.txt': >>> collapse(['a/foo.txt', 'a/b/bar.txt', 'a/b/baz.txt', 'a/c/ham.txt', 'a/c/d/spam.txt']) ['a'] In both cases, the smallest set of paths that contains the original set (and only the original set) is computed. The collapse function has a little bit of overhead but it's not too bad. For example collapsing all files matched by '**/node_modules' takes ~0.015s. Collapsing two full objdirs, takes ~0.6s. But a follow up commit is planned to make sure we stop using 'obj*' to reduce that overhead. Depends on D7738
When using globs in exclude directorives, FileFinder will return every *file* that gets matched. This is can be thousands of files in the case of an objdir. While we now collapse these files down to highest possible directories, this collapse operation can still take a noticeable amount of time (0.6s). This simply scans topsrcdir for files that start with 'obj' to avoid the glob. This also moves the '_activate_virtualenv' call to the top of the function because in CI, this will cause an objdir to be created (to store the virtualenv). If this happens *after* calculating the global excludes, we won't catch it since it doesn't exist yet. This will result in the objdir's virtualenv being linted and erroneous failures. Depends on D7739
Assignee: nobody → ahal
Status: NEW → ASSIGNED
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a2956764213e Vendor scandir==1.9.0 to third_party/python, r=davehunt https://hg.mozilla.org/integration/autoland/rev/fe0fb280dbfc [mozlint] Collapse exclude paths into their smallest possible set, r=egao https://hg.mozilla.org/integration/autoland/rev/9752f179b9c3 [lint] Explicitly list out objdirs rather than depend on 'obj*' in the global exclude, r=rwood
Blocks: 1498215
Backout by ncsoregi@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/580c03c8ae38 Backed out 3 changesets for blocking 1498215. a=backout
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Turns out vendoring scandir causes problems if people happen to have an older version of it installed on their system. Also I decided to try using os.listdir and do some profiling comparisons. It turns out that using scandir isn't even much of a perf improvement for mozlint's common case (where we use ** with FileFinder). There likely are edge cases where we would see some bigger improvements, but it's not really worth the headaches that vendoring scandir causes. So I'm just going to re-implement this with os.listdir. If we ever end up vendoring scandir for some other purpose (or start requiring python 3), we can switch back to using it at that point.
Keywords: leave-open
Attachment #9014452 - Attachment is obsolete: true
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b7e586708ecc [mozlint] Collapse exclude paths into their smallest possible set, r=egao https://hg.mozilla.org/integration/autoland/rev/2064477895c3 [lint] Explicitly list out objdirs rather than depend on 'obj*' in the global exclude, r=rwood
Patch dump incoming. This bug kind of ended up being a bit of a catch-all for all the minor issues I found with path filtering as I was working through my mozlint refactor. The good news is that I've been adding tests along the way and I'm now way more confident that the filtering logic is doing what it's supposed to. Also while there may be a large number of patches, I hope this makes them a lot easier to review than one giant monolithic patch.
I missed an edge case where the computed base itself was specified in the paths input.
This fixes a latent bug that is currently not being hit (by sheer luck). Basically the 'ignore' argument of a FileFinder object needs all paths to be relative to the base. As luck would have it, most of the time it worked out that way if you were running |mach lint| from the root of the repo. However there are edge cases where this will cause an 'exclude' directive to get missed. Plus this bug is about to be exposed 100% of the time in the next commit :). Depends on D8842
Right now there are excludes defined in the linter definition (via the .yml files), as well as excludes defined in lintargs (via the mach_commands.py). This is a minor simplification that extends each linter definition's local excludes with the global ones right off the bat. This just makes it a bit easier to keep track of. Depends on D5863
We were currently adding exclude paths to the "discard" set if the path contains the include, but we weren't adding them if the include contains the path. Depends on D5863
Currently pathutils.filterpaths computes some extra paths that the underlying linter should try to exclude if it can. It sets these on lintargs['exclude'], and relies on the fact that it was passed by reference to work. However, it seems like pytest does some magic under the hood that prevents this value from being propagated back. It will also fail if 'filterpaths' was invoked in a subprocess. It's much simpler and cleaner to pass this value back directly, rather than rely on a reference. Depends on D5862
Attachment #9017500 - Attachment description: Bug 1494069 - [mozlint] Return extra 'excludes' directly from pathutils.filterpaths, r=rwood → Bug 1494069 - [mozlint] Verify the expected_exclude paths in test_filterpaths, r?rwood
Attachment #9017500 - Attachment is obsolete: true
Currently pathutils.filterpaths computes some extra paths that the underlying linter should try to exclude if it can. It sets these on lintargs['exclude'], and relies on the fact that it was passed by reference to work. However, it seems like pytest does some magic under the hood that prevents this value from being propagated back. It will also fail if 'filterpaths' was invoked in a subprocess. It's much simpler and cleaner to pass this value back directly, rather than rely on a reference.
Attachment #9017500 - Attachment is obsolete: false
Attachment #9017496 - Attachment description: Bug 1494069 - [mozlint] Fix edge case in pathutils.collapse, r?egao → Bug 1494069 - [mozlint] Fix edge case in pathutils.collapse, r?rwood
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fbd19192e4fc [mozlint] Fix edge case in pathutils.collapse, r=rwood https://hg.mozilla.org/integration/autoland/rev/3cbd82a01bc2 [mozlint] Properly exclude files in LineType linters, r=rwood https://hg.mozilla.org/integration/autoland/rev/9c587734513a [mozlint] Return extra 'excludes' directly from pathutils.filterpaths, r=rwood https://hg.mozilla.org/integration/autoland/rev/53abbe1d64d4 [mozlint] Append global 'excludes' to each linter at parse time, r=rwood https://hg.mozilla.org/integration/autoland/rev/b9beb91e9992 [mozlint] Verify the expected_exclude paths in test_filterpaths, r=rwood https://hg.mozilla.org/integration/autoland/rev/8a1fd7460ad5 [mozlint] Make sure exclude paths always get discarded when necessary, r=rwood
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Keywords: leave-open
Resolution: --- → FIXED
Product: Firefox Build System → Developer Infrastructure
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: