- Lee On Coding
- How does python find packages?
- sys.path
- How sys.path gets populated
- You can manipulate sys.path
- The module __file__ attribute
- The imp module
- Ubuntu Python
- Ubuntu Python ( /usr/bin/python ):
- Python compiled from source ( /usr/local/bin/python )
- How did Ubuntu manipulate the sys.path ?
- Using PYTHONPATH¶
- Setting PYTHONPATH more permanently¶
- If you are on a Mac¶
- If you are on Linux¶
- If you are on Windows¶
- Python import, sys.path, and PYTHONPATH Tutorial
- Introduction
- Modules versus packages
- How import works
- Import versus from
- Import by string
- How __init__ and __main__ work
- In a package
- In a module
- Manage import paths
- sys.path
- PYTHONPATH
- The site module
- PYTHONHOME
Lee On Coding
My blog about coding and stuff.
How does python find packages?
I just ran into a situation where I compiled and installed Python 2.7.9 from source on Ubuntu, but Python could not find the packages I had previously installed. This naturally raises the question — how does Python know where to find packages when you call import ? This post applies specifically to Python 2.7.9, but I’m guessing Python 3x works very similarly.
In this post I first describe how Python finds packages, and then I’ll finish with the discovery I made regarding the default Python that ships with Ubuntu and how it differs from vanilla Python in how it finds packages.
sys.path
Python imports work by searching the directories listed in sys.path .
Using my default Ubuntu 14.04 Python:
So Python will find any packages that have been installed to those locations.
How sys.path gets populated
As the docs explain, sys.path is populated using the current working directory, followed by directories listed in your PYTHONPATH environment variable, followed by installation-dependent default paths, which are controlled by the site module.
You can read more about sys.path in the Python docs.
Assuming your PYTHONPATH environment variable is not set, sys.path will consist of the current working directory plus any manipulations made to it by the site module.
The site module is automatically imported when you start Python, you can read more about how it manipulates your sys.path in the Python docs.
It’s a bit involved.
You can manipulate sys.path
You can manipulate sys.path during a Python session and this will change how Python finds modules. For example:
The module __file__ attribute
When you import a module, you usually can check the __file__ attribute of the module to see where the module is in your filesystem:
However, the Python docs state that:
The file attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.
So, for example this doesn’t work:
It makes sense that the sys module is statically linked to the interpreter — it is essentially part of the interpreter!
The imp module
Python exposes the entire import system through the imp module. That’s pretty cool that all of this stuff is exposed for us to abuse, if we wanted to.
imp.find_module can be used to find a module:
You can also import and arbitrary Python source as a module using imp.load_source . This is the same example before, except imports our module using imp instead of by manipulating sys.path :
Passing ‘hi’ to imp.load_source simply sets the __name__ attribute of the module.
Ubuntu Python
Now back to the issue of missing packages after installing a new version of Python compiled from source. By comparing the sys.path from both the Ubuntu Python, which resides at /usr/bin/python , and the newly installed Python, which resides at /usr/local/bin/python , I could sort things out:
Ubuntu Python ( /usr/bin/python ):
Python compiled from source ( /usr/local/bin/python )
Turns out what mattered for me was dist-packages vs. site-packages . Using Ubuntu’s Python, my packages were installed to /usr/local/lib/python2.7/dist-packages , whereas the new Python I installed expects packages to be installed to /usr/local/lib/python2.7/site-packages . I just had to manipulate the PYTHONPATH environment variable to point to dist-packages in order to gain access to the previously installed packaged with the newly installed version of Python.
How did Ubuntu manipulate the sys.path ?
So how does the Ubuntu distribution of Python know to use /usr/local/lib/python2.7/dist-packages in sys.path ? It’s hardcoded into their site module! First, find where the site module code lives:
Here is an excerpt from Ubuntu Python’s site.py , which I peeked by opening /usr/lib/python2.7/site.py in a text editor. First, a comment at the top:
For Debian and derivatives, this sys.path is augmented with directories for packages distributed within the distribution. Local addons go into /usr/local/lib/python /dist-packages, Debian addons install into /usr/
/python /dist-packages. /usr/lib/python /site-packages is not used.
OK so there you have it. They explain how the Debian distribution of Python is different.
And now, for the code that implementes this change:
It’s all there, if you are crazy enough to dig this deep.
© Lee Mendelowitz – Built with Pure Theme for Pelican
Источник
Using PYTHONPATH¶
The PYTHONPATH variable has a value that is a string with a list of directories that Python should add to the sys.path directory list.
The main use of PYTHONPATH is when we are developing some code that we want to be able to import from Python, but that we have not yet made into an installable Python package (see: making a Python package ).
Returning to the example module and script in Where does Python look for modules? :
At the moment, on my machine, PYTHONPATH is empty:
Before we set PYTHONPATH correctly, a_script.py will fail with:
Now I set the PYTHONPATH environment variable value to be the path to the code directory:
Setting PYTHONPATH more permanently¶
You probably don’t want to have to set PYTHONPATH every time you start up a terminal and run a Python script.
Luckily, we can make the PYTHONPATH value be set for any terminal session, by setting the environment variable default.
For example, let’s say I wanted add the directory /Users/my_user/code to the PYTHONPATH:
If you are on a Mac¶
/.bash_profile in your text editor – e.g. atom
Add the following line to the end:
Start Terminal.app again, to read in the new settings, and type this:
It should show something like /Users/my_user/code .
If you are on Linux¶
Open your favorite terminal program;
/.bashrc in your text editor – e.g. atom
Add the following line to the end:
Close your terminal application;
Start your terminal application again, to read in the new settings, and type this:
It should show something like /home/my_user/code .
If you are on Windows¶
Got to the Windows menu, right-click on “Computer” and select “Properties”:
From the computer properties dialog, select “Advanced system settings” on the left:
From the advanced system settings dialog, choose the “Environment variables” button:
In the Environment variables dialog, click the “New” button in the top half of the dialog, to make a new user variable:
Give the variable name as PYTHONPATH and the value is the path to the code directory. Choose OK and OK again to save this variable.
Now open a cmd Window (Windows key, then type cmd and press Return). Type:
to confirm the environment variable is correctly set.
If you want your IPython sessions to see this new PYTHONPATH variable, you’ll have to restart your terminal and restart IPython so that it picks up PYTHONPATH from the environment settings.
Источник
Python import, sys.path, and PYTHONPATH Tutorial
Introduction
The import statement is usually the first thing you see at the top of any Python file. We use it all the time, yet it is still a bit mysterious to many people. This tutorial will walk through how import works and how to view and modify the directories used for importing.
If you want to learn how to import a module by using a string variable name to reference the module, check out my tutorial on Import Python Module by String Name
Also check out my Python Virtual Environments Tutorial to learn more about isolated Python environments.
Modules versus packages
First, let’s clarify the difference between modules and packages. They are very closely related, and often confused. They both serve the same purpose which is to organize code, but they each provide slightly different ways of doing that.
- A module is a single .py file with Python code.
- A package is a directory that can contains multiple Python modules.
A module can be thought of as a self-contained package, and a package is like a module that is separated out across multiple files. It really depends on how you want to organize your code and how large your project is. I always start with a module and turn it in to a package if needed later.
How import works
The import keyword in Python is used to load other Python source code files in to the current interpreter session. This is how you re-use code and share it among multiple files or different projects.
There are a few different ways to use import . For example, if we wanted to use the function join() that lives in the path module of the os package. Its full name would be os.path.join() . We have a few ways of importing and using the function. Read more about the os package at https://docs.python.org/3/library/os.path.html.
Import versus from
There are a few different ways you can import a package or a module. You can directly call import or use from x import y format. The from keyword tells Python what package or module to look in for the name specified with import . Here are a few example that all accomplish the same end goal.
Different ways to import and execute os.path.join() :
As you can see, you can import the whole package, a specific module within a package, a specific function from within a module. The * wildcard means load all modules and functions. I do not recommend using the wildcard because it is too ambiguous. It is better to explicitly list each import so you can identify where it came from. A good IDE like PyCharm will help you manage these easily.
When you call import in the Python interpreter searches through a set of directories for the name provided. The list of directories that it searches is stored in sys.path and can be modified during run-time. To modify the paths before starting Python, you can modify the PYTHONPATH environment variable. Both sys.path and PYTHONPATH are covered more below.
Import by string
If you want to import a module programmatically, you can use importlib.import_module() . This function is useful if you are creating a plugin system where modules need to be loaded at run-time based on string names.
This method is not commonly used, and is only useful in special circumstances. For example, if you are building a plugin system where you want to load every file in a directory as a module based on the filepath string.
How __init__ and __main__ work
Names that start and end with double underscores, often called ‘dunders’, are special names in Python. Two of them are special names related to modules and packages: __init__ and __main__ . Depending on whether you are organizing your code as a package or a module, they are treated slightly differently.
We will look at the difference between a module and a package in a moment, but the main idea is this:
- When you import a package it runs the __init__.py file inside the package directory.
- When you execute a package (e.g. python -m my_package ) it executes the __main__.py file.
- When you import a module it runs the entire file from top to bottom.
- When you execute a module ir runs the entire file from top-to-bottom and sets the __name__ variable to the string «__main__» .
In a package
In a Python package (a directory), you can have a module named __init__.py and another named __main__.py .
Here is an example structure of a package:
If a package is invoked directly (e.g. python -m my_package ), the __main__.py module is executed. The __init__.py file is executed when a package is imported (e.g. import my_package ).
In a module
In the previous section, we saw how a package can have separate files for __init__.py and __main__.py . In a module (a single .py file) the equivalent of __init__ and __main__ are contained in the single file. The entire itself essentially becomes both the __init__.py and the __main__.py .
When a module is imported, it runs the whole file, loading any functions defined.
When a module is invoked directly, for example, python my_module.py or python -m my_module , then it does the same thing as importing it, but also sets the __name__ variable to the string «__main__» .
You can take advantage of this and execute a section of code only if the module is invoked directly, and not when it is imported. To do this, you need to explicitly check the __name__ variable, and see if it equals __main__ . If it is set to the string __main__ , then you know the module was invoked directly, and not simply imported.
Take this example. Create a file named my_module.py with the following contents:
Try out a few different things to understand how it works:
- Run the file directly with Python: python my_module.py
- Invoke the module with -m flag: python -m my_module
- Import the module from another Python file: python -c «import my_module»
- Import and call the function defined: python -c «import my_module; my_module.my_function()»
Manage import paths
sys.path
When you start a Python interpreter, one of the things it creates automatically is a list that contains all of directories it will use to search for modules when importing. This list is available in a variable named sys.path . Here is an example of printing out sys.path . Note that the empty » entry means the current directory.
You are allowed to modify sys.path during run-time. Just be sure to modify it before you call import . It will search the directories in order stopping at the first place it finds the specified modules.
PYTHONPATH
PYTHONPATH is related to sys.path very closely. PYTHONPATH is an environment variable that you set before running the Python interpreter. PYTHONPATH , if it exists, should contain directories that should be searched for modules when using import . If PYTHONPATH is set, Python will include the directories in sys.path for searching. Use a semicolon to separate multiple directories.
Here is an example of setting the environment variable in Windows and listing the paths in Python:
And in Linux and Mac you can do the equivalent like this:
So, in order to import modules or packages, they need to reside in one of the paths listed in sys.path . You can modify the sys.path list manually if needed from within Python. It is just a regular list so it can be modified in all the normal ways. For example, you can append to the end of the list using sys.path.append() or to insert in an arbitrary position using sys.path.insert() . For more help, refer to https://docs.python.org/3/tutorial/datastructures.html
The site module
You can also use the site module to modify sys.path . See more at https://docs.python.org/3/library/site.html.
You can also direclty invoke the site module to get a list of default paths:
PYTHONHOME
The PYTHONHOME environment variable is similar to PYTHONPATH except it should define where the standard libraries are. If PYTHONHOME is set, it will assume some default paths relative to the home, which can be supplemented with PYTHONPATH .
This is particularly relevant if you embedded Python in to a C application and it is trying to determine the path of Python using the PYTHONHOME environment variable.
Just for reference, here is a quick example of how you would build a C application with Python embedded in it.
Источник