Open Bug 1712131 Opened 3 years ago Updated 3 years ago

[meta] Centralize Python dependency definitions

Categories

(Firefox Build System :: Mach Core, task, P2)

task

Tracking

(Not tracked)

People

(Reporter: mhentges, Unassigned)

References

(Depends on 8 open bugs, Blocks 2 open bugs)

Details

(Keywords: meta)

We currently define Python dependencies in a bunch of different locations, sometimes vendored and sometimes from PyPI.

We should centralize this and manage constraints (such as the "build" mach command not permitting non-vendored dependencies) accordingly.

See "Phase 1" of the "Consistently Handle Python Dependencies" proposal.

Depends on: 1712133
Depends on: 1712137
Depends on: 1712151
Depends on: 1712159
Priority: -- → P2
Depends on: 1717051
Depends on: 1717104
Depends on: 1723237
Depends on: 1724273
Depends on: 1724274
Depends on: 1724277
Depends on: 1724279
Depends on: 1725708
Depends on: 1726814
Depends on: 1727819

Had minor Matrix chatter with :glandium and a good discussion with :ahal about some python-dependency constraints and how we can find a solution within them, which I'd like to document here for future reference.

Constraints:

  • [main point of this ticket]: different Mach commands should be able to have different, conflicting Python requirements
  • We can't vendor native Python package code because:
    • We'd need to vendor wheels (space-inefficient), or
    • We'd need to build packages before doing work in CI (time-inefficient, also have to set up Python build environment dependencies)
  • We can't download Python packages from PyPI.org because:
    • We don't want PyPI.org downtime to affect releases/chemspills
    • Some downstream builders of Firefox don't allow network access during a build whatsoever
  • We can't lock/pin python dependencies for all commands because:
    • We install native python packages onto our CI workers' system python at worker-provision time (so, the worker can't be compatible with all worked-on revisions' lockfiles).
  • In-process activation of virtualenvs is "risky", and has to be manually verified to be compatible. We verify compatibility only between the mach venv and each command venv. Activating from command-venv to another command-venv is unsafe, since they may have conflicting packages.
    • Therefore, Mach must not being invoked with a command venv's python, because then it's erroneously be doing command-venv -> Mach venv -> another command-venv, which is unsafe.
      • This is a new constraint that doesn't match existing CI expectations, and requires changes. (note: ahal, this is the cause of the failure of the task I sent to you today)
  • [existing behaviour] It must be possible to run scripts in a new Python process with the current virtualenv (keeping in-tree sys.path entries)
  • [ideal] A command virtualenv should behave identical between:
    • ./mach $command (virtualenv lifecycle: system python -> Mach venv -> $command-venv)
    • $objdir/_virtualenvs/$command/bin/python (virtualenv lifecycle: $command-venv)
    • This currently isn't true when the Mach venv has dependencies that $command-venv doesn't have. For example:
      • ./mach $command => import glean_sdk βœ…
      • $objdir/_virtualenvs/$command/bin/python -c "import glean_sdk" ❌

Solutions:

  • Virtualenv consistency:

    • On environments where we can download Python packages:
      • Create Mach env as normal
      • Create command venv, add Mach env's site-packages to a pthfile, add Mach env's sys.path modifications to the command venv too
    • On environments where we cannot download Python packages:
      • For the Mach env: first, verify that the external Python package top-level requirements are met (e.g. psutil>=5.4.0,<=5.8.0, etc). Then, create the venv with in-tree sys.path entries. Add system's site-packages to a pthfile.
      • For the command env: first, verify that the external Python package top-level requirements are met (e.g.: zstandard==0.15.2, etc). Then, create the command venv with in-tree sys.path entries, add Mach env's sys.path modifications, then finally add system site-packages to a pthfile.
    • Why?
      • This makes virtualenv-creation consistent: always use a Mach venv, always have a virtualenv for commands. (Right now in CI, the Mach virtualenv isn't created)
      • Regardless of how a virtualenv is used, import will always behave consistently
  • Pinning packages

    • For developer machines, Moz CI (non-build): talk to PyPI.org directly, use lockfiles
    • [discussion with :glandium upcoming] For Moz CI (build): use pypi.pub.b.m.o and lockfiles. This ensures we still don't have a dependency on PyPI for chemspills, while still allowing the embracing of lockfiles.
    • For downstream workers that don't have network access: Warn that lockfiles aren't used, but continue operation after verifing each venv's top-level requirements (e.g. psutil>=5.4.0,<=5.8.0) against the system python.
    • Why?
      • This embraces lockfiles everywhere possible while still supporting the downstream builder network-less constraint.
      • The assertion of "top-level requirements" against system Python is effective because Firefox developers will know how to determine their minimum/maximum library version bounds, while downstream will have sufficient information to simply change package versions on their workers without having to dig into Mach command source code.
      • Note: the assertion of "top-level requirements" is a bonus feature we can provide for downstream, but isn't necessarily required up-front.
  • ./mach python changes:

    • --no-virtualenv is obsolete (was needed for get_and_diffoscope task in the past, but no longer). Remove it.
    • --no-activate is a stand-in for supporting the "get native python packages from system python" use case. This will be solved by the "virtualenv consistency" solution listed above (system site-packages in-scope for command venvs)
    • --requirements is only used to download and install zstandard: replace this with a --virtualenv argument that uses a centralized virtualenv. Use the mach virtualenv by default.

./mach $command => import glean_sdk βœ…

Isn't that a special case because mach itself imports glean_sdk?

For the Mach env: first, verify that the external Python package top-level requirements are met (e.g. psutil>=5.4.0,<=5.8.0, etc). Then, create the venv with in-tree sys.path entries. Add system's site-packages to a pthfile.

It should be fine for mach native dependencies to be missing. The ones we have are psutil, glean_sdk and zstandard, and you can perfectly build with none of them installed. Note that adding the system's site-packages wholesale can be dangerous if different versions of what is available in-tree are also installed (at the very least, it would need to be guaranteed to come last). Also note that upper bounds can be a PITA.

For the command env: first, verify that the external Python package top-level requirements are met (e.g.: zstandard==0.15.2, etc). Then, create the command venv with in-tree sys.path entries, add Mach env's sys.path modifications, then finally add system site-packages to a pthfile.

(note zstandard is actually optional)

--no-virtualenv is obsolete (was needed for get_and_diffoscope task in the past, but no longer). Remove it.

It still is necessary if removing it prints an error message when psutil ends up being loaded. See bug 1620936.

--no-activate is a stand-in for supporting the "get native python packages from system python" use case

Not that I remember. It's used to not set $PATH. See bug 1675707.

--requirements is only used to download and install zstandard

The only uses I see in-tree for this are for psutil.

(In reply to Mike Hommey [:glandium] from comment #2)

./mach $command => import glean_sdk βœ…

Isn't that a special case because mach itself imports glean_sdk?

BTW, there's a timely complication, as per Matrix, @chutten is trying to add a mach command that uses the in-tree version of glean_parser, which conflicts with mach loading the one from the glean_sdk install. It's a problem today because the command uses the mach virtualenv, but a command virtualenv wouldn't solve the problem since for anything that wants to use a different version of a module that mach itself already loads, the in-process activation of the command virtualenv is hindering.

Isn't that a special case because mach itself imports glean_sdk?

That's exactly the case I'm referring to: code running in a command virtualenv should behave identically whether it's activated though the Mach venv, or is executed in a new python process started from the command virtualenv directly. We resolve this by ensuring that the mach venv's packages are also always in scope for command venvs.

BTW, there's a timely complication, as per Matrix, @chutten is trying to add a mach command that uses the in-tree version of glean_parser, which conflicts with mach loading the one from the glean_sdk install. It's a problem today because the command uses the mach virtualenv, but a command virtualenv wouldn't solve the problem since for anything that wants to use a different version of a module that mach itself already loads, the in-process activation of the command virtualenv is hindering.

Exactly. No matter which way we turn this problem, the command venv won't be able to use a different version of glean_parser than mach, because Mach imports it at the beginning of its process.
My solution is to embrace this: we're already validating that all command venvs are compatible with the mach venv (this caught the glean-parser mismatch, btw), so it's good behaviour if command virtualenvs always contain mach virtualenv packages.

It should be fine for mach native dependencies to be missing. The ones we have are psutil, glean_sdk and zstandard, and you can perfectly build with none of them installed.

(note zstandard is actually optional)

Yes, these packages are optional, but that doesn't mean that we, in Moz CI, aren't using them. And, when we are using, say psutil during the build, we want to be using a compatible version.
So, to append details to make the plan here more explicit: for each external python package dependency:

  • If it's required: see description in my original comment
  • If it's optional:
    • If it's not installed in the system environment, log a warning (e.g.: telemetry won't be collected), continue
    • If it's installed in the system environment: assert that it matches loose requirements. If this assertion fails, then raise an error.

Note that adding the system's site-packages wholesale can be dangerous if different versions of what is available in-tree are also installed (at the very least, it would need to be guaranteed to come last)

It is dangerous, but we're already doing it today in some cases.
Since the system site-packages will indeed be last in the sys.path, all shadowed packages won't be imported, which is good. The only risk is that, if some python script has some conditional if import some_unexpected_package behaviour, that could cause issues if that package is in the system python environment and doesn't have its version validated. This is unlikely, though, especially since a package containing such behaviour should explicitly describe that dependency.

Also note that upper bounds can be a PITA.

Can you elaborate?

It still is necessary if removing it prints an error message when psutil ends up being loaded. See bug 1620936.

It used to print an error because we used to try to build psutil from vendored source when we built virtualenvs. We don't anymore.

Not that I remember. It's used to not set $PATH. See bug 1675707.

It's used to not set $PATH for supporting the "get native python packages from system python" use case (in this case, zstandard). Once virtualenvs include a sys.path entry to the system python (when we can't download packages from a PyPI repo), it'll be OK to use a command virtualenv directly.

The only uses I see in-tree for this are for psutil.

Right, psutil, not zstandard. Mixed those up, thanks.

Hmm, there is one potential failure case with native packages coming from the system, though.
Imagine this scenario:

  1. A top-level Mach requirement is glean-sdk>=36.0.0,<=40.0.0
  2. In the build virtualenv, we vendor glean-parser==3.6.0 (compatible with glean-sdk==40.0.0, not glean-sdk==36.0.0)
  3. On machines that download pinned dependencies, we have glean-sdk==40.0.0 and glean-parser==3.6.0. βœ…
  4. On some CI workers, we have glean-sdk==36.0.0. However, when we run the build command, we have glean-sdk==36.0.0 in scope, but our version of glean-parser (3.6.0) is too new for it. It fails at runtime.
    • If we flipped this around and used glean-parser from the system (2.5.0, as is compatible with glean-sdk==36.0.0), then the build would fail because it expects 3.6.0, not 2.5.0.

We have three options here:

  1. Determine a way to verify that the "flexible requirements" are all compatible with vendored packages in-scope. This may be non-trivial
  2. Don't allow flexible requirements, ensure that all external python packages depended on by the mach or build venvs are optional requirements.
  3. Accept this risk, since it will only affect downstream builders and our CI (until if/when it embraces pypi.pub and pinning)

Hmm, thinking about this more, when using packages from the system, simply asserting that the top-level package meets requirements is insufficient.
This is because transitive dependencies may conflict with vendored requirements. For example:


Imagine a virtualenv with the requirements:

  • pypi:native_package==1.0
  • vendored:shared_package (is ==1.0)

The dependencies of native_package==1.0 are:

native_package==1.0
   |
   V
middle_package>=1.0,<=2.0
              ==1.0 | ==2.0
               /    |    \
              /     |     \
shared_package==1.0 | shared_package==2.0

In CI, we have a test for each virtualenv to check "are the specified pypi and vendored packages compatible?"

  • Essentially (for this venv) this does pip install native_package==1.0 AND shared_package==1.0
  • However, this asserts that there exists a combination of $(vendored packages) + $(pypi packages and dependencies) that are compatible, but NOT that every combination is compatible

So, for this example, it would find that:

  • native_package==1.0
  • middle_package==1.0
  • shared_package==1.0
  • βœ… Are compatible βœ…

However, if a system Python environment has the following installed:

  • native_package==1.0
  • middle_package==2.0
  • shared_package==2.0

When we run the command, the following packages are in-scope:

  • native_package==1.0 (from system)
  • middle_package==2.0 (from system)
  • shared_package==1.0 (from vendored)

But middle_package==2.0 is NOT compatible with shared_package==1.0!
Simply asserting native_package==1.0 is not sufficient to confirm compatibility.


There's a new solution here though, that allows us to still depend on system packages:

  • In addition to checking that top-level package requirements are met, do:
  • For each vendored package, assert that it doesn't conflict with the system environment. So, it should either:
    • Not be installed in the system environment
    • Be the exact same version (==) as the one installed in the system environment
  • Finally, run pip check on the system environment to ensure that the system environment is internally correct.

This should have us covered. Unfortunately, it will cause some additional work for system environments, because when vendored packages shared by the system environment change versions, the system environment will need to be updated - even if it has a compatible version. This is because we're doing a == check instead of a "compatible" check, but doing a "compatible" check requires knowledge of all packages in scope, and pip check doesn't allow us to hack our vendored packages into the scope. So, we'd have to create a brand new virtualenv, install vendored package plus system packages (without network/hitting PyPI!), then run pip check inside of it, which is a lot harder.

An additional constraint and associated solution is worth documenting here.
Right now in CI, we don't differentiate between tasks that can fetch packages from a PyPI repo and which tasks can't. Rather, it's done implicitly (and inconsistently, heh) - the general trend is that ./mach build shouldn't download pip packages (it does in our CI, but A: it shouldn't and B: downstream, it doesn't have to). All Mach commands that aren't needed for a chemspill (IIRC, only ./mach build is) should be able to fetch from PyPI.

Anyways, this means that, in the early part of Mach initialization, we don't know whether we'll have access to PyPI or whether we need to need to access pip packages via the system. We can't lean on the MOZ_AUTOMATION environment variable because it's even used when we are allowed to talk to external PyPI repos.

So, instead, I propose that we embrace the MACH_USE_SYSTEM_PYTHON environment variable in our CI. When it's set, Mach will:

  • Restrict the downloading of pip packages (for any venv, whether it's the "mach" venv, a command venv, or otherwise)
  • Perform the system-package-assertions listed in the previous comment.
MACH_USE_SYSTEM_PYTHON 1 <UNSET>
Mach venv Add Mach pths to sys.path Create Mach virtualenv
Mach venv Check system packages match pypi: requirements Install pypi: requirements to Mach virtualenv
Mach venv Do pip check with system pip Scrub system site-packages from sys.path
Mach venv Check that system packages are compatible with vendored: requirements Activate Mach virtualenv
Command venv Create command virtualenv Create command virtualenv
Command venv Add Mach pths to command virtualenv pthfile Add Mach pths to command virtualenv pthfile
Command venv Check system packages match pypi: requirements Install pypi: requirements to command virtualenv
Command venv Check that system packages are compatible with vendored: requirements Add Mach virtualenv's site-packages to command virtualenv pthfile
Command venv Add system site-packages to pthfile Activate command virtualenv
Command venv Activate command virtualenv

So far so good.
However, now let's consider the current situation where we aren't specifically using MACH_USE_SYSTEM_PYTHON=1 in CI for tasks that can't hit pip.
We want to push forward our virtualenv consistency, without causing breakage in the interrim.


First, my proposed solution doesn't reach the same level of stability/guarantees as above, so I'm going to prime your thoughts by listing the downsides with the approach that we've been using so far:

  • Due to command virtualenvs not containing Mach's scope, we've had do workarounds to access packages only installed to the system.
  • As mentioned above, a command virtualenv behaves differently (has different packages in scope for import) depending on whether it is activated via a ./mach <command>, or whether <command-venv>/bin/python is used directly.
  • In cases where we're doing ./mach <command> with a pip install, we've been at risk of system packages conflicting with the installed packages.

The CI-without-MACH_USE_SYSTEM_PYTHON solution I'm leaning towards looks like the following:

  • When MOZ_AUTOMATION isn't set, create virtualenvs and install from PyPI "as normal". This is identical to the above solution when MACH_USE_SYSTEM_PYTHON isn't set.
  • When MOZ_AUTOMATION=1, gate whether or not we can access PyPI based on the command. build will force us to use system packages, while others (e.g.: doc) will allow PyPI downloading.

As mentioned above, since we don't know the command at Mach-initialization time, we're forced to use system packages for the Mach env.

Command virtualenv build something else, e.g.: doc
Mach venv Add Mach pths to sys.path <- identical
Mach venv Check system packages match pypi: requirements <- identical
Mach venv Do pip check with system pip <- identical
Mach venv Check that system packages are compatible with vendored: requirements <- identical
Command venv Create command virtualenv Create command virtualenv
Command venv Add Mach pths to command virtualenv pthfile Add Mach pths to command virtualenv pthfile
Command venv Check system packages match pypi: requirements Install pypi: requirements to command virtualenv
Command venv Check that system packages are compatible with vendored: requirements Activate command virtualenv
Command venv Add system site-packages to pthfile
Command venv Activate command virtualenv

Notes and downsides with this approach:

  • When MOZ_AUTOMATION=1 and the command allows fetching from PyPI (e.g.: ./mach doc):
    • We aren't able to verify that the command's pypi: dependencies are compatible with the system packages.
    • The virtualenv will behave differently between ./mach <command> and <command-venv>/bin/python <script>.
      • For ./mach <command>, system packages will still be in scope. This makes sense, because, for example, Mach's glean handle is in scope, and it depends on native glean-sdk installed-to-system package.
      • For <command-venv>/bin/python <script>, system packages will not be in scope. This is to reduce the window in which incompatible system packages and virtualenv packages are imported at the same time. This also matches existing behaviour.

The benefits of implementing this approach to CI are:

  • The ./mach build virtualenv will behave more consistently and be easier to use
  • ./mach <command> will behave similar to before, without making any downsides worse, and
  • We'll be in a more stable, well-defined position when we do the small migration to MACH_USE_SYSTEM_PACKAGES.

Side-discovery here: pip respects sys.path (and therefore pth files as documented in site), so it automatically picks up all packages with a .dist-info or .egg-info directory.
So, we don't need to get fancy with how we verify requirements with the system Python. Instead, we can:

  1. Create a temporary virtualenv
  2. Add a pth file with:
import sys; sys.path.extend([$vendored_package_paths])
import sys; sys.path.extend([$system_site_paths])
  1. Finally, run pip check within that virtualenv.

This is a slightly additive solution on top of the one described in this comment:

For each vendored package, assert that it doesn't conflict with the system environment. So, it should either [not be installed] or be the exact same version (==) as the one installed in the system environment

Instead, now if a package exists both as a vendored package and as installed in the system environment, it doesn't have to be an == match, but rather just compatible. Great!
Additionally, there's less work that we have to do - we just entirely lean on pip.

The downside is that it can't see packages that were copied in manually - those without .dist-info or .egg-info.
However, there's relatively few of these packages, especially since we ported all possible third_party/python/* packages to use pip to vendor.

Depends on: 1730712
Depends on: 1731836
Depends on: 1731838
Depends on: 1732946
Blocks: 1732947
No longer blocks: 1732947
Depends on: 1732947
Depends on: 1732948
Depends on: 1755516
Depends on: 1755562

It's been 10 months, so it's worth summarizing the improvements that we've seen from this work so far:

Summary of landed improvements

What's next?

Depends on: 1763244
Depends on: 1764111
You need to log in before you can comment on or make changes to this bug.