- Installing Python Modules¶
- Key terms¶
- Basic usage¶
- How do I …?¶
- … install pip in versions of Python prior to Python 3.4?¶
- … install packages just for the current user?¶
- … install scientific Python packages?¶
- … work with multiple versions of Python installed in parallel?¶
- Common installation issues¶
- Installing into the system Python on Linux¶
- Pip not installed¶
- Installing binary extensions¶
- 5. Building C and C++ Extensions on Windows¶
- 5.1. A Cookbook Approach¶
- 5.2. Differences Between Unix and Windows¶
- 5.3. Using DLLs in Practice¶
- 6. Modules¶
- 6.1. More on Modules¶
- 6.1.1. Executing modules as scripts¶
- 6.1.2. The Module Search Path¶
- 6.1.3. “Compiled” Python files¶
- 6.2. Standard Modules¶
- 6.3. The dir() Function¶
- 6.4. Packages¶
- 6.4.1. Importing * From a Package¶
- 6.4.2. Intra-package References¶
- 6.4.3. Packages in Multiple Directories¶
Installing Python Modules¶
As a popular open source development project, Python has an active supporting community of contributors and users that also make their software available for other Python developers to use under open source license terms.
This allows Python users to share and collaborate effectively, benefiting from the solutions others have already created to common (and sometimes even rare!) problems, as well as potentially contributing their own solutions to the common pool.
This guide covers the installation part of the process. For a guide to creating and sharing your own Python projects, refer to the distribution guide .
For corporate and other institutional users, be aware that many organisations have their own policies around using and contributing to open source software. Please take such policies into account when making use of the distribution and installation tools provided with Python.
Key terms¶
pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers.
A virtual environment is a semi-isolated Python environment that allows packages to be installed for use by a particular application, rather than being installed system wide.
venv is the standard tool for creating virtual environments, and has been part of Python since Python 3.3. Starting with Python 3.4, it defaults to installing pip into all created virtual environments.
virtualenv is a third party alternative (and predecessor) to venv . It allows virtual environments to be used on versions of Python prior to 3.4, which either don’t provide venv at all, or aren’t able to automatically install pip into created environments.
The Python Packaging Index is a public repository of open source licensed packages made available for use by other Python users.
the Python Packaging Authority is the group of developers and documentation authors responsible for the maintenance and evolution of the standard packaging tools and the associated metadata and file format standards. They maintain a variety of tools, documentation, and issue trackers on both GitHub and Bitbucket.
distutils is the original build and distribution system first added to the Python standard library in 1998. While direct use of distutils is being phased out, it still laid the foundation for the current packaging and distribution infrastructure, and it not only remains part of the standard library, but its name lives on in other ways (such as the name of the mailing list used to coordinate Python packaging standards development).
Changed in version 3.5: The use of venv is now recommended for creating virtual environments.
Basic usage¶
The standard packaging tools are all designed to be used from the command line.
The following command will install the latest version of a module and its dependencies from the Python Packaging Index:
For POSIX users (including Mac OS X and Linux users), the examples in this guide assume the use of a virtual environment .
For Windows users, the examples in this guide assume that the option to adjust the system PATH environment variable was selected when installing Python.
It’s also possible to specify an exact or minimum version directly on the command line. When using comparator operators such as > , or some other special character which get interpreted by shell, the package name and the version should be enclosed within double quotes:
Normally, if a suitable module is already installed, attempting to install it again will have no effect. Upgrading existing modules must be requested explicitly:
More information and resources regarding pip and its capabilities can be found in the Python Packaging User Guide.
Creation of virtual environments is done through the venv module. Installing packages into an active virtual environment uses the commands shown above.
How do I …?¶
These are quick answers or links for some common tasks.
… install pip in versions of Python prior to Python 3.4?¶
Python only started bundling pip with Python 3.4. For earlier versions, pip needs to be “bootstrapped” as described in the Python Packaging User Guide.
… install packages just for the current user?¶
Passing the —user option to python -m pip install will install a package just for the current user, rather than for all users of the system.
… install scientific Python packages?¶
A number of scientific Python packages have complex binary dependencies, and aren’t currently easy to install using pip directly. At this point in time, it will often be easier for users to install these packages by other means rather than attempting to install them with pip .
… work with multiple versions of Python installed in parallel?¶
On Linux, Mac OS X, and other POSIX systems, use the versioned Python commands in combination with the -m switch to run the appropriate copy of pip :
Appropriately versioned pip commands may also be available.
On Windows, use the py Python launcher in combination with the -m switch:
Common installation issues¶
Installing into the system Python on Linux¶
On Linux systems, a Python installation will typically be included as part of the distribution. Installing into this Python installation requires root access to the system, and may interfere with the operation of the system package manager and other components of the system if a component is unexpectedly upgraded using pip .
On such systems, it is often better to use a virtual environment or a per-user installation when installing packages with pip .
Pip not installed¶
It is possible that pip does not get installed by default. One potential fix is:
There are also additional resources for installing pip.
Installing binary extensions¶
Python has typically relied heavily on source based distribution, with end users being expected to compile extension modules from source as part of the installation process.
With the introduction of support for the binary wheel format, and the ability to publish wheels for at least Windows and Mac OS X through the Python Packaging Index, this problem is expected to diminish over time, as users are more regularly able to install pre-built extensions rather than needing to build them themselves.
Some of the solutions for installing scientific software that are not yet available as pre-built wheel files may also help with obtaining other binary extensions without needing to build them locally.
5. Building C and C++ Extensions on Windows¶
This chapter briefly explains how to create a Windows extension module for Python using Microsoft Visual C++, and follows with more detailed background information on how it works. The explanatory material is useful for both the Windows programmer learning to build Python extensions and the Unix programmer interested in producing software which can be successfully built on both Unix and Windows.
Module authors are encouraged to use the distutils approach for building extension modules, instead of the one described in this section. You will still need the C compiler that was used to build Python; typically Microsoft Visual C++.
This chapter mentions a number of filenames that include an encoded Python version number. These filenames are represented with the version number shown as XY ; in practice, ‘X’ will be the major version number and ‘Y’ will be the minor version number of the Python release you’re working with. For example, if you are using Python 2.2.1, XY will actually be 22 .
5.1. A Cookbook Approach¶
There are two approaches to building extension modules on Windows, just as there are on Unix: use the distutils package to control the build process, or do things manually. The distutils approach works well for most extensions; documentation on using distutils to build and package extension modules is available in Distributing Python Modules (Legacy version) . If you find you really need to do things manually, it may be instructive to study the project file for the winsound standard library module.
5.2. Differences Between Unix and Windows¶
Unix and Windows use completely different paradigms for run-time loading of code. Before you try to build a module that can be dynamically loaded, be aware of how your system works.
In Unix, a shared object ( .so ) file contains code to be used by the program, and also the names of functions and data that it expects to find in the program. When the file is joined to the program, all references to those functions and data in the file’s code are changed to point to the actual locations in the program where the functions and data are placed in memory. This is basically a link operation.
In Windows, a dynamic-link library ( .dll ) file has no dangling references. Instead, an access to functions or data goes through a lookup table. So the DLL code does not have to be fixed up at runtime to refer to the program’s memory; instead, the code already uses the DLL’s lookup table, and the lookup table is modified at runtime to point to the functions and data.
In Unix, there is only one type of library file ( .a ) which contains code from several object files ( .o ). During the link step to create a shared object file ( .so ), the linker may find that it doesn’t know where an identifier is defined. The linker will look for it in the object files in the libraries; if it finds it, it will include all the code from that object file.
In Windows, there are two types of library, a static library and an import library (both called .lib ). A static library is like a Unix .a file; it contains code to be included as necessary. An import library is basically used only to reassure the linker that a certain identifier is legal, and will be present in the program when the DLL is loaded. So the linker uses the information from the import library to build the lookup table for using identifiers that are not included in the DLL. When an application or a DLL is linked, an import library may be generated, which will need to be used for all future DLLs that depend on the symbols in the application or DLL.
Suppose you are building two dynamic-load modules, B and C, which should share another block of code A. On Unix, you would not pass A.a to the linker for B.so and C.so ; that would cause it to be included twice, so that B and C would each have their own copy. In Windows, building A.dll will also build A.lib . You do pass A.lib to the linker for B and C. A.lib does not contain code; it just contains information which will be used at runtime to access A’s code.
In Windows, using an import library is sort of like using import spam ; it gives you access to spam’s names, but does not create a separate copy. On Unix, linking with a library is more like from spam import * ; it does create a separate copy.
5.3. Using DLLs in Practice¶
Windows Python is built in Microsoft Visual C++; using other compilers may or may not work (though Borland seems to). The rest of this section is MSVC++ specific.
When creating DLLs in Windows, you must pass pythonXY.lib to the linker. To build two DLLs, spam and ni (which uses C functions found in spam), you could use these commands:
The first command created three files: spam.obj , spam.dll and spam.lib . Spam.dll does not contain any Python functions (such as PyArg_ParseTuple() ), but it does know how to find the Python code thanks to pythonXY.lib .
The second command created ni.dll (and .obj and .lib ), which knows how to find the necessary functions from spam, and also from the Python executable.
Not every identifier is exported to the lookup table. If you want any other modules (including Python) to be able to see your identifiers, you have to say _declspec(dllexport) , as in void _declspec(dllexport) initspam(void) or PyObject _declspec(dllexport) *NiGetSpamData(void) .
6. Modules¶
If you quit from the Python interpreter and enter it again, the definitions you have made (functions and variables) are lost. Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a script. As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program.
To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module; definitions from a module can be imported into other modules or into the main module (the collection of variables that you have access to in a script executed at the top level and in calculator mode).
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Within a module, the module’s name (as a string) is available as the value of the global variable __name__ . For instance, use your favorite text editor to create a file called fibo.py in the current directory with the following contents:
Now enter the Python interpreter and import this module with the following command:
This does not enter the names of the functions defined in fibo directly in the current symbol table; it only enters the module name fibo there. Using the module name you can access the functions:
If you intend to use a function often you can assign it to a local name:
6.1. More on Modules¶
A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement. 1 (They are also run if the file is executed as a script.)
Each module has its own private symbol table, which is used as the global symbol table by all functions defined in the module. Thus, the author of a module can use global variables in the module without worrying about accidental clashes with a user’s global variables. On the other hand, if you know what you are doing you can touch a module’s global variables with the same notation used to refer to its functions, modname.itemname .
Modules can import other modules. It is customary but not required to place all import statements at the beginning of a module (or script, for that matter). The imported module names are placed in the importing module’s global symbol table.
There is a variant of the import statement that imports names from a module directly into the importing module’s symbol table. For example:
This does not introduce the module name from which the imports are taken in the local symbol table (so in the example, fibo is not defined).
There is even a variant to import all names that a module defines:
This imports all names except those beginning with an underscore ( _ ). In most cases Python programmers do not use this facility since it introduces an unknown set of names into the interpreter, possibly hiding some things you have already defined.
Note that in general the practice of importing * from a module or package is frowned upon, since it often causes poorly readable code. However, it is okay to use it to save typing in interactive sessions.
If the module name is followed by as , then the name following as is bound directly to the imported module.
This is effectively importing the module in the same way that import fibo will do, with the only difference of it being available as fib .
It can also be used when utilising from with similar effects:
For efficiency reasons, each module is only imported once per interpreter session. Therefore, if you change your modules, you must restart the interpreter – or, if it’s just one module you want to test interactively, use importlib.reload() , e.g. import importlib; importlib.reload(modulename) .
6.1.1. Executing modules as scripts¶
When you run a Python module with
the code in the module will be executed, just as if you imported it, but with the __name__ set to «__main__» . That means that by adding this code at the end of your module:
you can make the file usable as a script as well as an importable module, because the code that parses the command line only runs if the module is executed as the “main” file:
If the module is imported, the code is not run:
This is often used either to provide a convenient user interface to a module, or for testing purposes (running the module as a script executes a test suite).
6.1.2. The Module Search Path¶
When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path . sys.path is initialized from these locations:
The directory containing the input script (or the current directory when no file is specified).
PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH ).
The installation-dependent default.
On file systems which support symlinks, the directory containing the input script is calculated after the symlink is followed. In other words the directory containing the symlink is not added to the module search path.
After initialization, Python programs can modify sys.path . The directory containing the script being run is placed at the beginning of the search path, ahead of the standard library path. This means that scripts in that directory will be loaded instead of modules of the same name in the library directory. This is an error unless the replacement is intended. See section Standard Modules for more information.
6.1.3. “Compiled” Python files¶
To speed up loading modules, Python caches the compiled version of each module in the __pycache__ directory under the name module. version .pyc , where the version encodes the format of the compiled file; it generally contains the Python version number. For example, in CPython release 3.3 the compiled version of spam.py would be cached as __pycache__/spam.cpython-33.pyc . This naming convention allows compiled modules from different releases and different versions of Python to coexist.
Python checks the modification date of the source against the compiled version to see if it’s out of date and needs to be recompiled. This is a completely automatic process. Also, the compiled modules are platform-independent, so the same library can be shared among systems with different architectures.
Python does not check the cache in two circumstances. First, it always recompiles and does not store the result for the module that’s loaded directly from the command line. Second, it does not check the cache if there is no source module. To support a non-source (compiled only) distribution, the compiled module must be in the source directory, and there must not be a source module.
Some tips for experts:
You can use the -O or -OO switches on the Python command to reduce the size of a compiled module. The -O switch removes assert statements, the -OO switch removes both assert statements and __doc__ strings. Since some programs may rely on having these available, you should only use this option if you know what you’re doing. “Optimized” modules have an opt- tag and are usually smaller. Future releases may change the effects of optimization.
A program doesn’t run any faster when it is read from a .pyc file than when it is read from a .py file; the only thing that’s faster about .pyc files is the speed with which they are loaded.
The module compileall can create .pyc files for all modules in a directory.
There is more detail on this process, including a flow chart of the decisions, in PEP 3147.
6.2. Standard Modules¶
Python comes with a library of standard modules, described in a separate document, the Python Library Reference (“Library Reference” hereafter). Some modules are built into the interpreter; these provide access to operations that are not part of the core of the language but are nevertheless built in, either for efficiency or to provide access to operating system primitives such as system calls. The set of such modules is a configuration option which also depends on the underlying platform. For example, the winreg module is only provided on Windows systems. One particular module deserves some attention: sys , which is built into every Python interpreter. The variables sys.ps1 and sys.ps2 define the strings used as primary and secondary prompts:
These two variables are only defined if the interpreter is in interactive mode.
The variable sys.path is a list of strings that determines the interpreter’s search path for modules. It is initialized to a default path taken from the environment variable PYTHONPATH , or from a built-in default if PYTHONPATH is not set. You can modify it using standard list operations:
6.3. The dir() Function¶
The built-in function dir() is used to find out which names a module defines. It returns a sorted list of strings:
Without arguments, dir() lists the names you have defined currently:
Note that it lists all types of names: variables, modules, functions, etc.
dir() does not list the names of built-in functions and variables. If you want a list of those, they are defined in the standard module builtins :
6.4. Packages¶
Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name A.B designates a submodule named B in a package named A . Just like the use of modules saves the authors of different modules from having to worry about each other’s global variable names, the use of dotted module names saves the authors of multi-module packages like NumPy or Pillow from having to worry about each other’s module names.
Suppose you want to design a collection of modules (a “package”) for the uniform handling of sound files and sound data. There are many different sound file formats (usually recognized by their extension, for example: .wav , .aiff , .au ), so you may need to create and maintain a growing collection of modules for the conversion between the various file formats. There are also many different operations you might want to perform on sound data (such as mixing, adding echo, applying an equalizer function, creating an artificial stereo effect), so in addition you will be writing a never-ending stream of modules to perform these operations. Here’s a possible structure for your package (expressed in terms of a hierarchical filesystem):
When importing the package, Python searches through the directories on sys.path looking for the package subdirectory.
The __init__.py files are required to make Python treat directories containing the file as packages. This prevents directories with a common name, such as string , unintentionally hiding valid modules that occur later on the module search path. In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable, described later.
Users of the package can import individual modules from the package, for example:
This loads the submodule sound.effects.echo . It must be referenced with its full name.
An alternative way of importing the submodule is:
This also loads the submodule echo , and makes it available without its package prefix, so it can be used as follows:
Yet another variation is to import the desired function or variable directly:
Again, this loads the submodule echo , but this makes its function echofilter() directly available:
Note that when using from package import item , the item can be either a submodule (or subpackage) of the package, or some other name defined in the package, like a function, class or variable. The import statement first tests whether the item is defined in the package; if not, it assumes it is a module and attempts to load it. If it fails to find it, an ImportError exception is raised.
Contrarily, when using syntax like import item.subitem.subsubitem , each item except for the last must be a package; the last item can be a module or a package but can’t be a class or function or variable defined in the previous item.
6.4.1. Importing * From a Package¶
Now what happens when the user writes from sound.effects import * ? Ideally, one would hope that this somehow goes out to the filesystem, finds which submodules are present in the package, and imports them all. This could take a long time and importing sub-modules might have unwanted side-effects that should only happen when the sub-module is explicitly imported.
The only solution is for the package author to provide an explicit index of the package. The import statement uses the following convention: if a package’s __init__.py code defines a list named __all__ , it is taken to be the list of module names that should be imported when from package import * is encountered. It is up to the package author to keep this list up-to-date when a new version of the package is released. Package authors may also decide not to support it, if they don’t see a use for importing * from their package. For example, the file sound/effects/__init__.py could contain the following code:
This would mean that from sound.effects import * would import the three named submodules of the sound package.
If __all__ is not defined, the statement from sound.effects import * does not import all submodules from the package sound.effects into the current namespace; it only ensures that the package sound.effects has been imported (possibly running any initialization code in __init__.py ) and then imports whatever names are defined in the package. This includes any names defined (and submodules explicitly loaded) by __init__.py . It also includes any submodules of the package that were explicitly loaded by previous import statements. Consider this code:
In this example, the echo and surround modules are imported in the current namespace because they are defined in the sound.effects package when the from. import statement is executed. (This also works when __all__ is defined.)
Although certain modules are designed to export only names that follow certain patterns when you use import * , it is still considered bad practice in production code.
Remember, there is nothing wrong with using from package import specific_submodule ! In fact, this is the recommended notation unless the importing module needs to use submodules with the same name from different packages.
6.4.2. Intra-package References¶
When packages are structured into subpackages (as with the sound package in the example), you can use absolute imports to refer to submodules of siblings packages. For example, if the module sound.filters.vocoder needs to use the echo module in the sound.effects package, it can use from sound.effects import echo .
You can also write relative imports, with the from module import name form of import statement. These imports use leading dots to indicate the current and parent packages involved in the relative import. From the surround module for example, you might use:
Note that relative imports are based on the name of the current module. Since the name of the main module is always «__main__» , modules intended for use as the main module of a Python application must always use absolute imports.
6.4.3. Packages in Multiple Directories¶
Packages support one more special attribute, __path__ . This is initialized to be a list containing the name of the directory holding the package’s __init__.py before the code in that file is executed. This variable can be modified; doing so affects future searches for modules and subpackages contained in the package.
While this feature is not often needed, it can be used to extend the set of modules found in a package.
In fact function definitions are also вЂstatements’ that are вЂexecuted’; the execution of a module-level function definition enters the function name in the module’s global symbol table.