Python environments revisited#
Note
This page was originally in PA12 README (TSA week; 1 week before Opt week).
Until now, we have been able to complete our work in MUDE with a few packages like numpy
and scipy
in our mude
environment, which we create and manage with conda
. However, as we start to cover more advanced analysis techniques, the need to use special packages increases, because they include, for example, a) specialized numerical techniques that are difficult or tedious to implement in (small) pieces of code, b) numerical schemes are implemented in a way to make computation faster, or c) advanced visualization to help interpret analysis and results.
Unfortunately, each of these packages are themselves require a number of different packages to function properly. To see this, try executing the following command in your Anaconda prompt or terminal (replacing ENV_NAME
with mude
for example). You can also try it for base
.
conda env export -n ENV_NAME
You should see a long list of packages. Do you remember installing all of them with conda install
or pip install
? You shouldn’t—but where did they come from? Many of them are packages required by the packages that you did specifically install. These packages are called dependencies, and are necessary to make your Python packages function as expected. When you run conda install numpy
, for example, conda
checks all of the dependent packages that are needed and makes sure they are also provided in the environment
that is being created. In reality, this is simply a folder on your computer with all of the *.py
files stored in it. This package management is what conda
and pip
are really doing when you run them with an install
command. It also checks that it has a suitable version of each dependency; this is why it sometime takes a long time to install a package (imagine you, as conda install
, going around to all of the other packages stored on your computer and asking what version of package X do you prefer? then trying to figure out how to make everything match with each of them, and doing it for all depedencies!). Unfortunately, this means that as you add more packages to a particular environment, it gets more and more difficult to make sure everything works well together. Luckily, there is a practical solution: create new environments for specific projects to make sure the proper packages can function properly!
Create environment with specific packages#
As you saw during Q1, it is easy to create a new environment with a specific version of Python, for example, with the command: conda create -n ENV_NAME python=3.11 anaconda
. We can install as many packages as we want when creating the environment by adding them to the end of the list. For example, can you see which packages would be installed by running the following command?
conda create -n ENV_NAME python=3.11 numpy scipy
Once you require a large number of packages, this can be tedious! Luckily there is a solution: listing the required packages in a text-based file, and then telling conda
to create the environment based on the contents of the file!
Create environment from text-based file#
All we need to do to create an environment from a file is to write a list of what we want and then tell conda
to read it. That’s it!
List requirements in *.yml
file#
To write our list of requirements, we will use a file with a new (to us) file extension: the *.yml
file (pronounced “yah-mul”). It is a text-readable file, that stands for “Yet another Markup Language.” You don’t need to worry about this, except to recognize that this is one of many types of files that use a particular type of text formatting to give a computer specific instructions. It is very similar to the way Markdown formatting works.
Take a look at the contents of the file environment.yml
in this repository. Can you understand what is being described? For each section (name
and dependencies
) you should see that it uses a colon :
to list the information. This will be processed by conda
when creating the new environment.
There is another special type of formatting with two colons ::
. This is how we tell conda
to look on a specific channel for the particular package. Conda channels are the locations where packages are stored; you can think of them as a specific URL web address. This is where the creator of the package can manage and maintain its distribution (e.g., publishing new versions, installation information, etc). Conda packages are downloaded from these URL’s, and if you know where a particular package is stored, you can give conda
explicit instructions. For example, we can see that Gurobi is stored on the gurobi
channel, because the URL is https://anaconda.org/gurobi/gurobi
(note that Anaconda is an organization that provides a wide variety of software; the website anaconda.com is used to provide documentation and information about the organization, whereas anaconda.org is explicitly used for package distribution). This is specified in the environment file using the channel::package
notation. In the *.yml
file, gurobi::gurobi
is equivalent to using the command conda install -c gurobi gurobi
in Anaconda prompt.
In summary, as you can see from reading the file, we will set up an environment specifically for this assignment, PA12, along with a number of dependency packages, two of which are installed from special conda channels.
Create environment from *.yml
file#
The command for creating the environment is simple. Do the following:
Open Anaconda Prompt (Windows) / your default terminal app (Mac)
Navigate to your working directory (where this file and
environment.yml
is located)Execute this command:
conda env create -f environment.yml
Keep reading this assignment as you wait (this may take several minutes)
Do you know why this takes so long? Because we are installing many packages at once! Keep an eye on the terminal window as this process is completed. First conda
is collecting information about the dependencies, then it will solve the environment; in other words, figure out which version of each package it should use. Once it is ready, it will present the list of packages and peoceed with the “installation” (really just downloading *.py
files and putting them in a folder on your computer (note that the prompt may ask you to confirm that the installation should proceed, depending on your system settings).
Once the environment is created, we can activate it, and also check that everything was installed properly. Try conda env export -n ENV_NAME
to see what was installed by “default.” The list is very long, even though we only asked for a few packages!
It is also interesting to try conda env export --from-history
(make sure you activated it already), which shows the specific packages requested. Do you notice anything in particular when looking at the output? That’s right, it’s exactly the same as our file environment.yml
! The only thing extra is that it identifies default
as the conda channel (since we didn’t specify anything else in the *.yml
file).