- Installation GuideВ¶
- Stable ReleaseВ¶
- PythonВ¶
- Nightly BuildВ¶
- PythonВ¶
- Собираем XGBoost под OS X
- Как установить XGBoost для Python на macOS
- Обзор учебника
- 1. Установите MacPorts
- 2. Сборка XGBoost
- 3. Установите XGBoost
- Дальнейшее чтение
- Резюме
- XGBoost Python Package
- Installation
- Building From SourceВ¶
- Obtaining the Source CodeВ¶
- Building the Shared LibraryВ¶
- Building on Linux and other UNIX-like systemsВ¶
- Building on MacOSВ¶
- Building on WindowsВ¶
- Building with GPU supportВ¶
- Building Python Package from SourceВ¶
- Building Python Package with Default ToolchainsВ¶
- Building Python Package for Windows with MinGW-w64 (Advanced)В¶
- Building R Package From Source.В¶
- Installing the development version (Linux / Mac OSX)В¶
- Installing the development version with Visual Studio (Windows)В¶
- Building R package with GPU supportВ¶
- Building JVM PackagesВ¶
- Enabling OpenMP for Mac OSВ¶
- Building with GPU supportВ¶
- Building the DocumentationВ¶
- MakefilesВ¶
Installation GuideВ¶
XGBoost provides binary packages for some language bindings. The binary packages support the GPU algorithm ( gpu_hist ) on machines with NVIDIA GPUs. Please note that training with multiple GPUs is only supported for Linux platform. See XGBoost GPU Support . Also we have both stable releases and nightly builds, see below for how to install them. For building from source, visit this page .
Stable ReleaseВ¶
PythonВ¶
Pre-built binary are uploaded to PyPI (Python Package Index) for each release. Supported platforms are Linux (x86_64, aarch64), Windows (x86_64) and MacOS (x86_64).
You might need to run the command with —user flag or use virtualenv if you run into permission errors. Python pre-built binary capability for each platform:
Using all CPU cores (threads) on Mac OSX
If you are using Mac OSX, you should first install OpenMP library ( libomp ) by running
and then run install.packages(«xgboost») . Without OpenMP, XGBoost will only use a single CPU core, leading to suboptimal training speed.
We also provide experimental pre-built binary with GPU support. With this binary, you will be able to use the GPU algorithm without building XGBoost from the source. Download the binary package from the Releases page. The file name will be of the form xgboost_r_gpu_[os]_[version].tar.gz , where [os] is either linux or win64 . (We build the binaries for 64-bit Linux and Windows.) Then install XGBoost by running:
You can use XGBoost4J in your Java/Scala application by adding XGBoost4J as a dependency:
This will check out the latest stable version from the Maven Central.
For the latest release version number, please check release page.
To enable the GPU algorithm ( tree_method=’gpu_hist’ ), use artifacts xgboost4j-gpu_2.12 and xgboost4j-spark-gpu_2.12 instead (note the gpu suffix).
Windows not supported in the JVM package
Currently, XGBoost4J-Spark does not support Windows platform, as the distributed training algorithm is inoperational for Windows. Please use Linux or MacOS.
Nightly BuildВ¶
PythonВ¶
Nightly builds are available. You can go to this page, find the wheel with the commit ID you want and install it with pip:
The capability of Python pre-built wheel is the same as stable release.
Other than standard CRAN installation, we also provide experimental pre-built binary on with GPU support. You can go to this page, Find the commit ID you want to install and then locate the file xgboost_r_gpu_[os]_[commit].tar.gz , where [os] is either linux or win64 . (We build the binaries for 64-bit Linux and Windows.) Download it and run the following commands:
First add the following Maven repository hosted by the XGBoost project:
Источник
Собираем XGBoost под OS X
XGBoost — С++ библиотека, реализующая методы градиентного бустинга, которую все чаще можно встретить в описаниях алгоритмов-победителей на Kaggle. Для использования из R или Python есть соответствующие обвязки, но саму библиотеку необходимо собрать из исходников. Запустив make, я увидел массу ошибок, сообщающих о ненайденных хидерах и неподдерживаемом OpenMP. Ну, не впервой.
Прогресс не стоит на месте, и алгоритм несколько упростился:
- /usr/bin/ruby -e «$(curl -fsSL raw.githubusercontent.com/Homebrew/install/master/install)»
- brew install gcc —without-multilib
- pip install xgboost
- Покорять топы лидербордов на Kaggle
С полей сообщают, что при запуске XGBoost может ругаться на ненайденную библиотеку /usr/local/lib/gcc/5/libgomp.1.dylib. В этом случае следует найти ее и положить по указанному пути.
Раньше нужно было проделать следующее:
- Скачать XCode
- Установить command line tools
- Собрать Clang c поддержкой OpenMP
- Собрать Intel OpenMP библиотеку
- Прописать путь к OpenMP библиотеке и соответствующим хидерам
- Собрать XGBoost
- Установить обвязку для Python
- Покорять топы лидербордов на Kaggle
1. Скачать XCode
XCode можно совершенно бесплатно скачать из App Store. После установки, набрав в командной строке терминала gcc -v, мы должны увидеть на экране примерно следующее:
2. Установить command line tools
Если пропустить этот шаг, компилятор не сможет найти стандартные библиотеки C и C++. В терминале необходимо запустить
и следовать инструкциям.
3. Собрать Clang c поддержкой OpenMP
Эта версия компилятора поддерживает OpenMP-инструкции для распараллеливания и разрабатывается сотрудниками Intel. Можно надеяться, что в один прекрасный день эта ветка вольется обратно в транк, и OpenMP будет доступно в оригинальном Clang из коробки. Судя по всему, некоторое время назад можно было установить clang-omp с помощью brew, но это счастливое время прошло. Итак, собираем компилятор:
Если ядер на машине больше 4, имеет смысл поправить цифру в последней команде.
4. Собрать Intel OpenMP библиотеку
Библиотека для поддержки OpenMP так же собирается из исходников. Скачиваем, распаковываем, собираем:
5. Прописать путь к OpenMP библиотеке и соответствующим хидерам
Чтобы компилятор и линкер при сборке XGBoost смогли найти необходимые им компоненты, нужно прописать к ним пути. Для этого необходимо добавить в
/.bash_profile следующие строки:
Как можно догадаться, PATH_TO_LIBOMP — путь к папке, где лежит библиотека. Чтобы изменения вступили в силу, необходимо выполнить команду
Необходимо убедиться, что все работает корректно. Для этого создадим программу-сэмпл
и попытаемся ее скомпилировать:
Если все хорошо, запустив программу, мы увидим на экране сообщения из нескольких тредов.
6. Собрать XGBoost
Мы почти у цели. В папке xgboost лежит Makefile, первые строчки которого необходимо отредактировать следующим образом:
7. Установить обвязку для Python
Убедиться, что обвязка XGBoost работает должным образом, можно с помощью демо-скриптов, лежащих в xgboost/demo.
8. Покорять топы лидербордов на Kaggle
Внезапно отключилось электричество, и эта самая важная часть, к сожалению, была утрачена и будет освещена в дальнейшем.
Источник
Как установить XGBoost для Python на macOS
Дата публикации 2018-01-17
XGBoost — это библиотека для разработки очень быстрых и точных моделей повышения градиента.
Это библиотека в центре многих выигрышных решений в соревнованиях по науке о данных Kaggle.
В этом руководстве вы узнаете, как установить библиотеку XGBoost для Python на macOS.
Обзор учебника
Этот урок разделен на 3 части; они есть:
- Установите MacPorts
- Сборка XGBoost
- Установить XGBoost
ЗаметкаЯ использовал эту процедуру в течение многих лет на различных версиях MacOS, и она не изменилась. Это руководство было написано и протестировано на MacOS High Sierra (10.13.1).
1. Установите MacPorts
Для сборки и установки XGBoost для Python вам необходимо установить GCC и среду Python.
Я рекомендую GCC 7 и Python 3.6, и я рекомендую установить эти предварительные требования, используяMacPorts,
- 1. Пошаговое руководство по установке MacPorts и среды Python см. В этом руководстве:
- 2. После установки MacPorts и рабочей среды Python вы можете установить и выбрать GCC 7 следующим образом:
- 3. Убедитесь, что установка GCC прошла успешно, следующим образом:
Вы должны увидеть распечатанную версию GCC; например:
Какую версию вы видели?
Позвольте мне знать в комментариях ниже.
2. Сборка XGBoost
Следующим шагом является загрузка и компиляция XGBoost для вашей системы.
- 1. Сначала проверьте репозиторий кода из GitHub:
- 2. Перейдите в каталог xgboost.
- 3. Скопируйте конфигурацию, которую мы намереваемся использовать для компиляции XGBoost в нужное место.
- 4. Скомпилируйте XGBoost; для этого необходимо указать количество ядер в вашей системе (например, 8, при необходимости изменить).
Процесс сборки может занять минуту и не должен выдавать никаких сообщений об ошибках, хотя вы можете увидеть некоторые предупреждения, которые вы можете спокойно проигнорировать.
Например, последний фрагмент компиляции может выглядеть следующим образом:
Этот шаг сработал для вас?
Позвольте мне знать в комментариях ниже.
3. Установите XGBoost
Теперь вы готовы установить XGBoost в вашей системе.
- 1. Перейдите в пакет Python проекта xgboost.
- 2. Установите пакет Python XGBoost.
Установка очень быстрая.
Например, в конце установки вы можете увидеть такие сообщения:
- 3. Подтвердите, что установка прошла успешно, распечатав версию xgboost, которая требует загрузки библиотеки.
Сохраните следующий код в файл с именемversion.py.
Запустите скрипт из командной строки:
Вы должны увидеть версию XGBoost, напечатанную на экране:
Как ты это сделал?
Опубликуйте результаты в комментариях ниже.
Дальнейшее чтение
Этот раздел предоставляет больше ресурсов по теме, если вы хотите углубиться.
Резюме
В этом руководстве вы узнали, как поэтапно установить XGBoost для Python на macOS.
У вас есть вопросы?
Задайте свои вопросы в комментариях ниже, и я сделаю все возможное, чтобы ответить.
Источник
XGBoost Python Package
Installation
We are on PyPI __ now. For stable version, please install using pip:
- pip install xgboost
- Since this package contains C++ source code, pip needs a C++ compiler from the system to compile the source code on-the-fly. Please follow the following instruction for each supported platform.
- Note for Mac OS X users: please install gcc from brew by brew tap homebrew/versions; brew install gcc —without-multilib firstly.
- Note for Linux users: please install gcc by sudo apt-get install build-essential firstly or using the corresponding package manager of the system.
- Note for windows users: this pip installation may not work on some windows environment, and it may cause unexpected errors. pip installation on windows is currently disabled for further invesigation, please install from github.
For up-to-date version, please install from github.
- To make the python module, type ./build.sh in the root directory of project
- Make sure you have setuptools __
- Install with cd python-package; python setup.py install from this directory.
- For windows users, please use the Visual Studio project file under windows folder . See also the installation tutorial from Kaggle Otto Forum.
Add MinGW to the system PATH in Windows if you are using the latest version of xgboost which requires compilation:
Источник
Building From SourceВ¶
This page gives instructions on how to build and install XGBoost from the source code on various systems. If the instructions do not work for you, please feel free to ask questions at the user forum.
Pre-built binary is available: now with GPU support
Consider installing XGBoost from a pre-built binary, to avoid the trouble of building XGBoost from the source. Checkout Installation Guide .
Obtaining the Source CodeВ¶
To obtain the development repository of XGBoost, one needs to use git .
Use of Git submodules
XGBoost uses Git submodules to manage dependencies. So when you clone the repo, remember to specify —recursive option:
For windows users who use github tools, you can open the git shell and type the following command:
Building the Shared LibraryВ¶
This section describes the procedure to build the shared library and CLI interface independently. For building language specific package, see corresponding sections in this document.
On Linux and other UNIX-like systems, the target library is libxgboost.so
On MacOS, the target library is libxgboost.dylib
On Windows the target library is xgboost.dll
This shared library is used by different language bindings (with some additions depending on the binding you choose). The minimal building requirement is
A recent C++ compiler supporting C++11 (g++-5.0 or higher)
CMake 3.13 or higher.
For a list of CMake options like GPU support, see #— Options in CMakeLists.txt on top level of source tree.
Building on Linux and other UNIX-like systemsВ¶
After obtaining the source code, one builds XGBoost by running CMake:
Building on MacOSВ¶
Obtain libomp from Homebrew:
Now clone the repository:
Create the build/ directory and invoke CMake. After invoking CMake, you can build XGBoost with make :
Building on WindowsВ¶
You need to first clone the XGBoost repo with —recursive option, to clone the submodules. We recommend you use Git for Windows, as it comes with a standard Bash shell. This will highly ease the installation process.
XGBoost support compilation with Microsoft Visual Studio and MinGW. To build with Visual Studio, we will need CMake. Make sure to install a recent version of CMake. Then run the following from the root of the XGBoost directory:
This specifies an out of source build using the Visual Studio 64 bit generator. (Change the -G option appropriately if you have a different version of Visual Studio installed.)
After the build process successfully ends, you will find a xgboost.dll library file inside ./lib/ folder. Some notes on using MinGW is added in Building Python Package for Windows with MinGW-w64 (Advanced) .
Building with GPU supportВ¶
XGBoost can be built with GPU support for both Linux and Windows using CMake. See Building R package with GPU support for special instructions for R.
An up-to-date version of the CUDA toolkit is required.
Checking your compiler version
CUDA is really picky about supported compilers, a table for the compatible compilers for the latests CUDA version on Linux can be seen here.
Some distros package a compatible gcc version with CUDA. If you run into compiler errors with nvcc , try specifying the correct compiler with -DCMAKE_CXX_COMPILER=/path/to/correct/g++ -DCMAKE_C_COMPILER=/path/to/correct/gcc . On Arch Linux, for example, both binaries can be found under /opt/cuda/bin/ .
From the command line on Linux starting from the XGBoost directory:
Specifying compute capability
To speed up compilation, the compute version specific to your GPU could be passed to cmake as, e.g., -DGPU_COMPUTE_VER=50 . A quick explanation and numbers for some architectures can be found in this page.
Enabling distributed GPU training
By default, distributed GPU training is disabled and only a single GPU will be used. To enable distributed GPU training, set the option USE_NCCL=ON . Distributed GPU training depends on NCCL2, available at this link. Since NCCL2 is only available for Linux machines, distributed GPU training is available only for Linux.
On Windows, run CMake as follows:
(Change the -G option appropriately if you have a different version of Visual Studio installed.)
Visual Studio 2017 Win64 Generator may not work
Choosing the Visual Studio 2017 generator may cause compilation failure. When it happens, specify the 2015 compiler by adding the -T option:
The above cmake configuration run will create an xgboost.sln solution file in the build directory. Build this solution in release mode as a x64 build, either from Visual studio or from command line:
To speed up compilation, run multiple jobs in parallel by appending option — /MP .
Building Python Package from SourceВ¶
The Python package is located at python-package/ .
Building Python Package with Default ToolchainsВ¶
There are several ways to build and install the package from source:
Use Python setuptools directly
The XGBoost Python package supports most of the setuptools commands, here is a list of tested commands:
Running python setup.py install will compile XGBoost using default CMake flags. For passing additional compilation options, append the flags to the command. For example, to enable CUDA acceleration and NCCL (distributed GPU) support:
Please refer to setup.py for a complete list of avaiable options. Some other options used for development are only available for using CMake directly. See next section on how to use CMake with setuptools manually.
You can install the created distribution packages using pip. For example, after running sdist setuptools command, a tar ball similar to xgboost-1.0.0.tar.gz will be created under the dist directory. Then you can install it by invoking the following command under dist directory:
For details about these commands, please refer to the official document of setuptools, or just Google “how to install Python package from source”. XGBoost Python package follows the general convention. Setuptools is usually available with your Python distribution, if not you can install it via system command. For example on Debian or Ubuntu:
For cleaning up the directory after running above commands, python setup.py clean is not sufficient. After copying out the build result, simply running git clean -xdf under python-package is an efficient way to remove generated cache files. If you find weird behaviors in Python build or running linter, it might be caused by those cached files.
For using develop command (editable installation), see next section.
Build C++ core with CMake first
This is mostly for C++ developers who don’t want to go through the hooks in Python setuptools. You can build C++ library directly using CMake as described in above sections. After compilation, a shared object (or called dynamic linked library, jargon depending on your platform) will appear in XGBoost’s source tree under lib/ directory. On Linux distributions it’s lib/libxgboost.so . From there all Python setuptools commands will reuse that shared object instead of compiling it again. This is especially convenient if you are using the editable installation, where the installed package is simply a link to the source tree. We can perform rapid testing during development. Here is a simple bash script does that:
Use libxgboost.so on system path.
This is for distributing xgboost in a language independent manner, where libxgboost.so is separately packaged with Python package. Assuming libxgboost.so is already presented in system library path, which can be queried via:
Then one only needs to provide an user option when installing Python package to reuse the shared object in system path:
Building Python Package for Windows with MinGW-w64 (Advanced)В¶
Windows versions of Python are built with Microsoft Visual Studio. Usually Python binary modules are built with the same compiler the interpreter is built with. However, you may not be able to use Visual Studio, for following reasons:
VS is proprietary and commercial software. Microsoft provides a freeware “Community” edition, but its licensing terms impose restrictions as to where and how it can be used.
Visual Studio contains telemetry, as documented in Microsoft Visual Studio Licensing Terms. Running software with telemetry may be against the policy of your organization.
So you may want to build XGBoost with GCC own your own risk. This presents some difficulties because MSVC uses Microsoft runtime and MinGW-w64 uses own runtime, and the runtimes have different incompatible memory allocators. But in fact this setup is usable if you know how to deal with it. Here is some experience.
The Python interpreter will crash on exit if XGBoost was used. This is usually not a big issue.
-mtune=native is also OK.
Don’t use -march=native gcc flag. Using it causes the Python interpreter to crash if the DLL was actually used.
You may need to provide the lib with the runtime libs. If mingw32/bin is not in PATH , build a wheel ( python setup.py bdist_wheel ), open it with an archiver and put the needed dlls to the directory where xgboost.dll is situated. Then you can install the wheel with pip .
Building R Package From Source.В¶
By default, the package installed by running install.packages is built from source. Here we list some other options for installing development version.
Installing the development version (Linux / Mac OSX)В¶
Make sure you have installed git and a recent C++ compiler supporting C++11 (See above sections for requirements of building C++ core).
Due to the use of git-submodules, devtools::install_github can no longer be used to install the latest version of R package. Thus, one has to run git to check out the code first:
If all fails, try Building the shared library to see whether a problem is specific to R package or not. Notice that the R package is installed by CMake directly.
Installing the development version with Visual Studio (Windows)В¶
On Windows, CMake with Visual C++ Build Tools (or Visual Studio) can be used to build the R package.
While not required, this build can be faster if you install the R package processx with install.packages(«processx») .
Setting correct PATH environment variable on Windows
If you are using Windows, make sure to include the right directories in the PATH environment variable.
If you are using R 4.x with RTools 4.0: — C:\rtools40\usr\bin — C:\rtools40\mingw64\bin
If you are using R 3.x with RTools 3.x:
Open the Command Prompt and navigate to the XGBoost directory, and then run the following commands. Make sure to specify the correct R version.
Building R package with GPU supportВ¶
The procedure and requirements are similar as in Building with GPU support , so make sure to read it first.
On Linux, starting from the XGBoost directory type:
When default target is used, an R package shared library would be built in the build area. The install target, in addition, assembles the package files with this shared library under build/R-package and runs R CMD INSTALL .
On Windows, CMake with Visual Studio has to be used to build an R package with GPU support. Rtools must also be installed.
Setting correct PATH environment variable on Windows
If you are using Windows, make sure to include the right directories in the PATH environment variable.
If you are using R 4.x with RTools 4.0:
If you are using R 3.x with RTools 3.x:
Open the Command Prompt and navigate to the XGBoost directory, and then run the following commands. Make sure to specify the correct R version.
If CMake can’t find your R during the configuration step, you might provide the location of R to CMake like this: -DLIBR_HOME=»C:\Program Files\R\R-4.0.0″ .
If on Windows you get a “permission denied” error when trying to write to …Program Files/R/… during the package installation, create a .Rprofile file in your personal home directory (if you don’t already have one in there), and add a line to it which specifies the location of your R packages user library, like the following:
You might find the exact location by running .libPaths() in R GUI or RStudio.
Building JVM PackagesВ¶
Building XGBoost4J using Maven requires Maven 3 or newer, Java 7+ and CMake 3.13+ for compiling Java code as well as the Java Native Interface (JNI) bindings.
Before you install XGBoost4J, you need to define environment variable JAVA_HOME as your JDK directory to ensure that your compiler can find jni.h correctly, since XGBoost4J relies on JNI to implement the interaction between the JVM and native libraries.
After your JAVA_HOME is defined correctly, it is as simple as run mvn package under jvm-packages directory to install XGBoost4J. You can also skip the tests by running mvn -DskipTests=true package , if you are sure about the correctness of your local setup.
To publish the artifacts to your local maven repository, run
Or, if you would like to skip tests, run
This command will publish the xgboost binaries, the compiled java classes as well as the java sources to your local repository. Then you can use XGBoost4J in your Java projects by including the following dependency in pom.xml :
For sbt, please add the repository and dependency in build.sbt as following:
If you want to use XGBoost4J-Spark, replace xgboost4j with xgboost4j-spark .
XGBoost4J-Spark requires Apache Spark 2.3+
XGBoost4J-Spark now requires Apache Spark 2.3+. Latest versions of XGBoost4J-Spark uses facilities of org.apache.spark.ml.param.shared extensively to provide for a tight integration with Spark MLLIB framework, and these facilities are not fully available on earlier versions of Spark.
Also, make sure to install Spark directly from Apache website. Upstream XGBoost is not guaranteed to work with third-party distributions of Spark, such as Cloudera Spark. Consult appropriate third parties to obtain their distribution of XGBoost.
Enabling OpenMP for Mac OSВ¶
If you are on Mac OS and using a compiler that supports OpenMP, you need to go to the file xgboost/jvm-packages/create_jni.py and comment out the line
in order to get the benefit of multi-threading.
Building with GPU supportВ¶
If you want to build XGBoost4J that supports distributed GPU training, run
Building the DocumentationВ¶
XGBoost uses Sphinx for documentation. To build it locally, you need a installed XGBoost with all its dependencies along with:
Under xgboost/doc directory, run make with replaced by the format you want. For a list of supported formats, run make help under the same directory.
MakefilesВ¶
It’s only used for creating shorthands for running linters, performing packaging tasks etc. So the remaining makefiles are legacy.
Источник