- Анализ данных (Программная инженерия)/Установка и настройка Python
- Содержание
- Windows
- Установка готового пакета
- Установку вручную
- Запуск IPython Notebook
- Mac OS X и Linux
- Mac OS
- Установка Python через brew
- Ручная установка Python
- Сторонний туториал
- Linux
- Запуск IPython Notebook
- Использование virtualenv
- Installing scikit-learn¶
- Installing the latest release¶
- Installing the development version of scikit-learn¶
- Installing nightly builds¶
- Building from source¶
- Dependencies¶
- Runtime dependencies¶
- Build dependencies¶
- Test dependencies¶
- Building a specific version from a tag¶
- Editable mode¶
- Platform-specific instructions¶
- Windows¶
- macOS¶
- macOS compilers from conda-forge¶
- macOS compilers from Homebrew¶
- Linux¶
- Linux compilers from the system¶
- Linux compilers from conda-forge¶
- FreeBSD¶
- Alternative compilers¶
- Building with Intel C Compiler (ICC) using oneAPI on Linux¶
- Parallel builds¶
Анализ данных (Программная инженерия)/Установка и настройка Python
Содержание
Windows
Установка готового пакета
Можно отдельно установить Python и все необходимые библиотеки и надстройки, однако это слишком долго. Поэтому воспользуемся уже собранным пакетом Python(X,Y).
- Загрузим Python(X,Y): страница загрузки, из раздела Current release.
- Установим его, причем обязательно не забыв поставить галочку для установки всех плагинов в Python(X,Y).
- Все готово.
По непонятным причинам в некоторых случаях установка происходит довольно криво. Попробуйте выполнить следующие команды в своем Python:
Юнит-тесты могут сразу показать, все ли хорошо. Аналогично их стоит запустить для библиотек pandas, pylab, sklearn. Если вдруг вываливается ошибка, то можно попробовать установить Python и библиотеки другим способом.
Установку вручную
Минимальный набор для работы:
- Python 2.7
- IPython Notebook
- NumPy
- Matplotlib
- Pandas
- SciKit-Learn
Лучше ставить 32-битные версии, поскольку 64-битные не всегда работают корректно под Windows. Обратите внимание, что некоторые библиотеки будут иметь дополнительные зависимости в виде других библиотек, их тоже придется поставить.
Запуск IPython Notebook
Попробуем запустить IPython Notebook. Для этого запустим командную строку (нажать win+R и ввести cmd), и введем ipython notebook —pylab inline. Должен открыться браузер с запущенным из текущей директории IPython Notebook. Все ноутбуки будут сохраняться в текущую директорию, из которой был вызван IPython Notebook.
Mac OS X и Linux
Mac OS
Установка Python через brew
официального сайта Если у вас нет питона, то придется поставить brew с вытекающей от туда установкой Xcode. Следуйте указаниям с сайта Homebrew и у вас все получится. Устанавливаем свежую версию Python и virtualenv:
Устанавливаем фортран (нужен для сборки NumPy и SciPy):
Создаем виртуальное окружение:
Устанавливаем необходимые пакеты питона:
Ручная установка Python
Можно попробовать поставить все вручную, список необходимых библиотек см. в разделе для Windows.
Сторонний туториал
Linux
Для сохранения здоровья, используйте Ubuntu 12.04 LTS или выше. Устанавливаем необходимые тулзы для Python:
Устанавливаем пакеты, необходимые для сборки NumPy, SciPy и Matplotlib:
Создаем виртуальное окружение питона (virtualenv).
Ставим необходимые для курса пакеты:
Почему не сделать apt-get install Вы можете сделать что-то вроде
и установить питоновские пакеты в систему из репозитория Debian. Однако, пакеты debian содержат достаточно старые версии python-пакетов (к примеру, IPython Notebook у вас будет значительно менее модный). Свежие версии загружаются утилитой pip из репозитория PyPI.
Запуск IPython Notebook
Для того, чтобы графики встраивались в отчёт, а не открывались в отдельном окне, IPython Notebook нужно запускать следующим образом:
Либо в уже запущенном Notebook выполнить
Использование virtualenv
virtualenv позволяет заключить в отдельный каталог необходимые версии python-пакетов и использовать только их. Используя virtualenv, Вы можете устанавливать свежие версии пакетов из Python Package Index, при этом не получить проблем с несовместимостью версий пакетов с установленными в системе. Нормальным решением также является установка python-пакетов через pip в системные каталоги. Для этого не нужно ничего с virtualenv, но запускать pip при этом следует от рута:
Но напоминаем, пакеты могут конфликтовать с системными, может фейлиться сборка, могут импортироваться старые версии и возникать другие проблемы. Для создания виртуального окружения необходимо сказать
при этом будет создан каталог yourenv с чистым окружением без каких либо пакетов. Для использования виртуального окружения можно использовать команды из соответствующего каталога:
Для того чтобы не говорить префикс yourenv/bin, удобно в текущей сесии командной строки выставить необходимые переменные окружения (активировать виртуальное окружение):
После активации, у приглашения командной строки появится префикс (yourenv). Для того, чтобы деактивировать виртуальное окружение, необходимо сказать
Installing scikit-learn¶
There are different ways to install scikit-learn:
Install the latest official release . This is the best approach for most users. It will provide a stable version and pre-built packages are available for most platforms.
Install the version of scikit-learn provided by your operating system or Python distribution . This is a quick option for those who have operating systems or Python distributions that distribute scikit-learn. It might not provide the latest release version.
Building the package from source . This is best for users who want the latest-and-greatest features and aren’t afraid of running brand-new code. This is also needed for users who wish to contribute to the project.
Installing the latest release¶
In order to check your installation you can use
Note that in order to avoid potential conflicts with other packages it is strongly recommended to use a virtual environment (venv) or a conda environment.
Using such an isolated environment makes it possible to install a specific version of scikit-learn with pip or conda and its dependencies independently of any previously installed Python packages. In particular under Linux is it discouraged to install pip packages alongside the packages managed by the package manager of the distribution (apt, dnf, pacman…).
Note that you should always remember to activate the environment of your choice prior to running any Python command whenever you start a new terminal session.
If you have not installed NumPy or SciPy yet, you can also install these using conda or pip. When using pip, please ensure that binary wheels are used, and NumPy and SciPy are not recompiled from source, which can happen when using particular configurations of operating system and hardware (such as Linux on a Raspberry Pi).
Installing the development version of scikit-learn¶
This section introduces how to install the main branch of scikit-learn. This can be done by either installing a nightly build or building from source.
Installing nightly builds¶
The continuous integration servers of the scikit-learn project build, test and upload wheel packages for the most recent Python version on a nightly basis.
Installing a nightly build is the quickest way to:
try a new feature that will be shipped in the next release (that is, a feature from a pull-request that was recently merged to the main branch);
check whether a bug you encountered has been fixed since the last release.
Building from source¶
Building from source is required to work on a contribution (bug fix, new feature, code or documentation improvement).
Use Git to check out the latest source from the scikit-learn repository on Github.:
If you plan on submitting a pull-request, you should clone from your fork instead.
Install a compiler with OpenMP support for your platform. See instructions for Windows , macOS , Linux and FreeBSD .
Optional (but recommended): create and activate a dedicated virtualenv or conda environment.
Install Cython and build the project with pip in Editable mode :
Check that the installed scikit-learn has a version number ending with .dev0 :
Please refer to the Developer’s Guide and Useful pytest aliases and flags to run the tests on the module of your choice.
You will have to run the pip install —no-build-isolation —editable . command every time the source code of a Cython file is updated (ending in .pyx or .pxd ). Use the —no-build-isolation flag to avoid compiling the whole project each time, only the files you have modified.
Dependencies¶
Runtime dependencies¶
Scikit-learn requires the following dependencies both at build time and at runtime:
Those dependencies are automatically installed by pip if they were missing when building scikit-learn from source.
For running on PyPy, PyPy3-v5.10+, Numpy 1.14.0+, and scipy 1.1.0+ are required. For PyPy, only installation instructions with pip apply.
Build dependencies¶
Building Scikit-learn also requires:
A C/C++ compiler and a matching OpenMP runtime library. See the platform system specific instructions for more details.
If OpenMP is not supported by the compiler, the build will be done with OpenMP functionalities disabled. This is not recommended since it will force some estimators to run in sequential mode instead of leveraging thread-based parallelism. Setting the SKLEARN_FAIL_NO_OPENMP environment variable (before cythonization) will force the build to fail if OpenMP is not supported.
Since version 0.21, scikit-learn automatically detects and use the linear algebrea library used by SciPy at runtime. Scikit-learn has therefore no build dependency on BLAS/LAPACK implementations such as OpenBlas, Atlas, Blis or MKL.
Test dependencies¶
Running tests requires:
Some tests also require pandas.
Building a specific version from a tag¶
If you want to build a stable version, you can git checkout to get the code for that particular version, or download an zip archive of the version from github.
Editable mode¶
If you run the development version, it is cumbersome to reinstall the package each time you update the sources. Therefore it is recommended that you install in with the pip install —no-build-isolation —editable . command, which allows you to edit the code in-place. This builds the extension in place and creates a link to the development directory (see the pip docs).
This is fundamentally similar to using the command python setup.py develop (see the setuptool docs). It is however preferred to use pip.
On Unix-like systems, you can equivalently type make in from the top-level folder. Have a look at the Makefile for additional utilities.
Platform-specific instructions¶
Here are instructions to install a working C/C++ compiler with OpenMP support to build scikit-learn Cython extensions for each supported platform.
Windows¶
You DO NOT need to install Visual Studio 2019. You only need the “Build Tools for Visual Studio 2019”, under “All downloads” -> “Tools for Visual Studio 2019”.
Secondly, find out if you are running 64-bit or 32-bit Python. The building command depends on the architecture of the Python interpreter. You can check the architecture by running the following in cmd or powershell console:
For 64-bit Python, configure the build environment by running the following commands in cmd or an Anaconda Prompt (if you use Anaconda):
Replace x64 by x86 to build for 32-bit Python.
Please be aware that the path above might be different from user to user. The aim is to point to the “vcvarsall.bat” file that will set the necessary environment variables in the current command prompt.
Finally, build scikit-learn from this command prompt:
macOS¶
The default C compiler on macOS, Apple clang (confusingly aliased as /usr/bin/gcc ), does not directly support OpenMP. We present two alternatives to enable OpenMP support:
either install conda-forge::compilers with conda;
or install libomp with Homebrew to extend the default Apple clang compiler.
For Apple Silicon M1 hardware, only the conda-forge method below is known to work at the time of writing (January 2021). You can install the macos/arm64 distribution of conda using the miniforge installer
macOS compilers from conda-forge¶
If you use the conda package manager (version >= 4.7), you can install the compilers meta-package from the conda-forge channel, which provides OpenMP-enabled C/C++ compilers based on the llvm toolchain.
First install the macOS command line tools:
It is recommended to use a dedicated conda environment to build scikit-learn from source:
If you get any conflicting dependency error message, try commenting out any custom conda configuration in the $HOME/.condarc file. In particular the channel_priority: strict directive is known to cause problems for this setup.
You can check that the custom compilers are properly installed from conda forge using the following command:
which should include compilers and llvm-openmp .
The compilers meta-package will automatically set custom environment variables:
They point to files and folders from your sklearn-dev conda environment (in particular in the bin/, include/ and lib/ subfolders). For instance -L/path/to/conda/envs/sklearn-dev/lib should appear in LDFLAGS .
In the log, you should see the compiled extension being built with the clang and clang++ compilers installed by conda with the -fopenmp command line flag.
macOS compilers from Homebrew¶
Another solution is to enable OpenMP support for the clang compiler shipped by default on macOS.
First install the macOS command line tools:
Install the Homebrew package manager for macOS.
Install the LLVM OpenMP library:
Set the following environment variables:
Finally, build scikit-learn in verbose mode (to check for the presence of the -fopenmp flag in the compiler commands):
Linux¶
Linux compilers from the system¶
Installing scikit-learn from source without using conda requires you to have installed the scikit-learn Python development headers and a working C/C++ compiler with OpenMP support (typically the GCC toolchain).
Install build dependencies for Debian-based operating systems, e.g. Ubuntu:
then proceed as usual:
Cython and the pre-compiled wheels for the runtime dependencies (numpy, scipy and joblib) should automatically be installed in $HOME/.local/lib/pythonX.Y/site-packages . Alternatively you can run the above commands from a virtualenv or a conda environment to get full isolation from the Python packages installed via the system packager. When using an isolated environment, pip3 should be replaced by pip in the above commands.
When precompiled wheels of the runtime dependencies are not avalaible for your architecture (e.g. ARM), you can install the system versions:
On Red Hat and clones (e.g. CentOS), install the dependencies using:
Linux compilers from conda-forge¶
Alternatively, install a recent version of the GNU C Compiler toolchain (GCC) in the user folder using conda:
FreeBSD¶
The clang compiler included in FreeBSD 12.0 and 11.2 base systems does not include OpenMP support. You need to install the openmp library from packages (or ports):
This will install header files in /usr/local/include and libs in /usr/local/lib . Since these directories are not searched by default, you can set the environment variables to these locations:
Finally, build the package using the standard command:
For the upcoming FreeBSD 12.1 and 11.3 versions, OpenMP will be included in the base system and these steps will not be necessary.
Alternative compilers¶
will build scikit-learn using your default C/C++ compiler. If you want to build scikit-learn with another compiler handled by distutils or by numpy.distutils , use the following command:
To see the list of available compilers run:
If your compiler is not listed here, you can specify it via the CC and LDSHARED environment variables (does not work on windows):
Building with Intel C Compiler (ICC) using oneAPI on Linux¶
Intel provides access to all of its oneAPI toolkits and packages through a public APT repository. First you need to get and install the public key of this repository:
Then, add the oneAPI repository to your APT repositories:
Install ICC, packaged under the name intel-oneapi-icc :
Before using ICC, you need to set up environment variables:
Finally, you can build scikit-learn. For example on Linux x86_64:
Parallel builds¶
It is possible to build scikit-learn compiled extensions in parallel by setting and environment variable as follows before calling the pip install or python setup.py build_ext commands:
On a machine with 2 CPU cores, it can be beneficial to use a parallelism level of 3 to overlap IO bound tasks (reading and writing files on disk) with CPU bound tasks (actually compiling).