Pandas python install linux

Содержание

Введение в библиотеку pandas: установка и первые шаги / pd 1
Библиотека Python для анализа данных
Установка pandas
Установка в Anaconda
Установка из PyPI
Установка в Linux
Установка из источника
Репозиторий для Windows
Проверка установки pandas
Первые шаги с pandas
InstallationВ¶
Python version supportВ¶
Installing pandasВ¶
Installing pandas with AnacondaВ¶
Installing pandas with MinicondaВ¶
Installing from PyPIВ¶
Installing using your Linux distribution’s package manager.В¶
Installing from sourceВ¶
Running the test suiteВ¶
DependenciesВ¶
Recommended DependenciesВ¶
Optional DependenciesВ¶

Введение в библиотеку pandas: установка и первые шаги / pd 1

Библиотека pandas в Python — это идеальный инструмент для тех, кто занимается анализом данных, используя для этого язык программирования Python.

В этом материале речь сначала пойдет об основных аспектах библиотеки и о том, как установить ее в систему. Потом вы познакомитесь с двумя структурам данных: series и dataframes . Сможете поработать с базовым набором функций, предоставленных библиотекой pandas, для выполнения основных операций по обработке. Знакомство с ними — ключевой навык для специалиста в этой сфере. Поэтому так важно перечитать материал до тех, пока он не станет понятен на 100%.

А на примерах сможете разобраться с новыми концепциями, появившимися в библиотеке — индексацией структур данных. Научитесь правильно ее использовать для управления данными. В конце концов, разберетесь с тем, как расширить возможности индексации для работы с несколькими уровнями одновременно, используя для этого иерархическую индексацию.

Библиотека Python для анализа данных

Pandas — это библиотека Python с открытым исходным кодом для специализированного анализа данных. Сегодня все, кто использует Python для изучения статистических целей анализа и принятия решений, должны быть с ней знакомы.

Библиотека была спроектирована и разработана преимущественно Уэсом Маккини в 2008 году. В 2012 к нему присоединился коллега Чан Шэ. Вместе они создали одну из самых используемых библиотек в сообществе Python.

Pandas появилась из необходимости в простом инструменте для обработки, извлечения и управления данными.

Этот пакет Python спроектирован на основе библиотеки NumPy. Такой выбор обуславливает успех и быстрое распространение pandas. Он также пользуется всеми преимуществами NumPy и делает pandas совместимой с большинством другим модулей.

Еще одно важное решение — разработка специальных структур для анализа данных. Вместо того, чтобы использовать встроенные в Python или предоставляемые другими библиотеками структуры, были разработаны две новых.

Они спроектированы для работы с реляционными и классифицированными данными, что позволяет управлять данными способом, похожим на тот, что используется в реляционных базах SQL и таблицах Excel.

Дальше вы встретите примеры базовых операций для анализа данных, которые обычно используются на реляционных или таблицах Excel. Pandas предоставляет даже более расширенный набор функций и методов, позволяющих выполнять эти операции эффективнее.

Основная задача pandas — предоставить все строительные блоки для всех, кто погружается в мир анализа данных.

Установка pandas

Простейший способ установки библиотеки pandas — использование собранного решения, то есть установка через Anaconda или Enthought.

Установка в Anaconda

В Anaconda установка занимает пару минут. В первую очередь нужно проверить, не установлен ли уже pandas, и если да, то какая это версия. Для этого введите следующую команду в терминале:

Если модуль уже установлен (например в Windows), вы получите приблизительно следующий результат:

Если pandas не установлена, ее необходимо установить. Введите следующую команду:

Anaconda тут же проверит все зависимости и установит дополнительные модули.

Если требуется обновить пакет до более новой версии, используется эта интуитивная команда:

Система проверит версию pandas и версию всех модулей, а затем предложит соответствующие обновления. Затем предложит перейти к обновлению.

Установка из PyPI

Pandas можно установить и с помощью PyPI, используя эту команду:

Установка в Linux

Если вы работаете в дистрибутиве Linux и решили не использовать эти решения, то pandas можно установить как и любой другой пакет.

В Debian и Ubuntu используется команда:

А для OpenSuse и Fedora — эта:

Установка из источника

Если есть желание скомпилировать модуль pandas из исходного кода, тогда его можно найти на GitHub по ссылке https://github.com/pandas-dev/pandas:

Убедитесь, что Cython установлен. Больше об этом способе можно прочесть в документации: (http://pandas.pydata.org/pandas-docs/stable/install.html).

Репозиторий для Windows

Если вы работаете в Windows и предпочитаете управлять пакетами так, чтобы всегда была установлена последняя версия, то существует ресурс, где всегда можно загрузить модули для Windows: Christoph Gohlke’s Python Extension Packages for Windows (www.lfd.uci.edu/

gohlke/pythonlibs/). Каждый модуль поставляется в формате WHL для 32 и 64-битных систем. Для установки нужно использовать приложение pip:

Например, для установки pandas потребуется найти и загрузить следующий пакет:

При выборе модуля важно выбрать нужную версию Python и архитектуру. Более того, если для NumPy пакеты не требуются, то у pandas есть зависимости. Их также необходимо установить. Порядок установки не имеет значения.

Недостаток такого подхода в том, что нужно устанавливать пакеты отдельно без менеджера, который бы помог подобрать нужные версии и зависимости между разными пакетами. Плюс же в том, что появляется возможность освоиться с модулями и получить последние версии вне зависимости от того, что выберет дистрибутив.

Проверка установки pandas

Библиотека pandas может запустить проверку после установки для верификации управляющих элементов (документация утверждает, что тест покрывает 97% всего кода).

Во-первых, нужно убедиться, что установлен модуль nose . Если он имеется, то тестирование проводится с помощью следующей команды:

Оно займет несколько минут и в конце покажет список проблем.

Этот модуль спроектирован для проверки кода Python во время этапов разработки проекта или модуля Python. Он расширяет возможности модуль unittest . Nose используется для проверки кода и упрощает процесс.

Здесь о нем можно почитать подробнее: _http://pythontesting.net/framework/nose/nose-introduction/.

Первые шаги с pandas

Лучший способ начать знакомство с pandas — открыть консоль Python и вводить команды одна за одной. Таким образом вы познакомитесь со всеми функциями и структурами данных.

Более того, данные и функции, определенные здесь, будут работать и в примерах будущих материалов. Однако в конце каждого примера вы вольны экспериментировать с ними.

Для начала откройте терминал Python и импортируйте библиотеку pandas. Стандартная практика для импорта модуля pandas следующая:

Теперь, каждый раз встречая pd и np вы будете ссылаться на объект или метод, связанный с этими двумя библиотеками, хотя часто будет возникать желание импортировать модуль таким образом:

В таком случае ссылаться на функцию, объект или метод с помощью pd уже не нужно, а это считается не очень хорошей практикой в среде разработчиков Python.

Источник

InstallationВ¶

The easiest way for the majority of users to install pandas is to install it as part of the Anaconda distribution, a cross platform distribution for data analysis and scientific computing. This is the recommended installation method for most users.

Instructions for installing from source, PyPI, various Linux distributions, or a development version are also provided.

Python version supportВ¶

Officially Python 2.7, 3.4, 3.5, and 3.6

Installing pandasВ¶

Installing pandas with AnacondaВ¶

Installing pandas and the rest of the NumPy and SciPy stack can be a little difficult for inexperienced users.

The simplest way to install not only pandas, but Python and the most popular packages that make up the SciPy stack (IPython, NumPy, Matplotlib, . ) is with Anaconda, a cross-platform (Linux, Mac OS X, Windows) Python distribution for data analytics and scientific computing.

After running a simple installer, the user will have access to pandas and the rest of the SciPy stack without needing to install anything else, and without needing to wait for any software to be compiled.

Installation instructions for Anaconda can be found here.

A full list of the packages available as part of the Anaconda distribution can be found here.

An additional advantage of installing with Anaconda is that you don’t require admin rights to install it, it will install in the user’s home directory, and this also makes it trivial to delete Anaconda at a later date (just delete that folder).

Installing pandas with MinicondaВ¶

The previous section outlined how to get pandas installed as part of the Anaconda distribution. However this approach means you will install well over one hundred packages and involves downloading the installer which is a few hundred megabytes in size.

If you want to have more control on which packages, or have a limited internet bandwidth, then installing pandas with Miniconda may be a better solution.

Conda is the package manager that the Anaconda distribution is built upon. It is a package manager that is both cross-platform and language agnostic (it can play a similar role to a pip and virtualenv combination).

Miniconda allows you to create a minimal self contained Python installation, and then use the Conda command to install additional packages.

First you will need Conda to be installed and downloading and running the Miniconda will do this for you. The installer can be found here

The next step is to create a new conda environment (these are analogous to a virtualenv but they also allow you to specify precisely which Python version to install also). Run the following commands from a terminal window:

This will create a minimal environment with only Python installed in it. To put your self inside this environment run:

On Windows the command is:

The final step required is to install pandas. This can be done with the following command:

To install a specific pandas version:

To install other packages, IPython for example:

To install the full Anaconda distribution:

If you require any packages that are available to pip but not conda, simply install pip, and use pip to install these packages:

Installing from PyPIВ¶

pandas can be installed via pip from PyPI.

This will likely require the installation of a number of dependencies, including NumPy, will require a compiler to compile required bits of code, and can take a few minutes to complete.

Installing using your Linux distribution’s package manager.В¶

The commands in this table will install pandas for Python 2 from your distribution. To install pandas for Python 3 you may need to use the package python3-pandas .

Distribution	Status	Download / Repository Link	Install method
Debian	stable	official Debian repository	sudo apt-get install python-pandas
Debian & Ubuntu	unstable (latest packages)	NeuroDebian	sudo apt-get install python-pandas
Ubuntu	stable	official Ubuntu repository	sudo apt-get install python-pandas
Ubuntu	unstable (daily builds)	PythonXY PPA; activate by: sudo add-apt-repository ppa:pythonxy/pythonxy-devel && sudo apt-get update	sudo apt-get install python-pandas
OpenSuse	stable	OpenSuse Repository	zypper in python-pandas
Fedora	stable	official Fedora repository	dnf install python-pandas
Centos/RHEL	stable	EPEL repository	yum install python-pandas

Installing from sourceВ¶

See the contributing documentation for complete instructions on building from the git source tree. Further, see creating a development environment if you wish to create a pandas development environment.

Running the test suiteВ¶

pandas is equipped with an exhaustive set of unit tests covering about 97% of the codebase as of this writing. To run it on your machine to verify that everything is working (and you have all of the dependencies, soft and hard, installed), make sure you have pytest and run:

DependenciesВ¶

setuptools
NumPy: 1.7.1 or higher
python-dateutil: 1.5 or higher
pytz: Needed for time zone support

Recommended DependenciesВ¶

numexpr: for accelerating certain numerical operations. numexpr uses multiple cores as well as smart chunking and caching to achieve large speedups. If installed, must be Version 2.4.6 or higher.
bottleneck: for accelerating certain types of nan evaluations. bottleneck uses specialized cython routines to achieve large speedups.

You are highly encouraged to install these libraries, as they provide large speedups, especially if working with large data sets.

Optional DependenciesВ¶

Cython: Only necessary to build development version. Version 0.23 or higher.

SciPy: miscellaneous statistical functions

xarray: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.

PyTables: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended.

Feather Format: necessary for feather-based storage, version 0.3.1 or higher.

SQLAlchemy: for SQL database support. Version 0.8.1 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the SQLAlchemy docs. Some common drivers are:

psycopg2: for PostgreSQL
pymysql: for MySQL.
SQLite: for SQLite, this is included in Python’s standard library by default.

xlrd/xlwt: Excel reading (xlrd) and writing (xlwt)
openpyxl: openpyxl version 1.6.1 or higher (but lower than 2.0.0), or version 2.2 or higher, for writing .xlsx files (xlrd >= 0.9.0)
XlsxWriter: Alternative Excel writer

Jinja2: Template engine for conditional HTML formatting.

s3fs: necessary for Amazon S3 access (s3fs >= 0.0.7).

blosc: for msgpack compression using blosc

One of PyQt4, PySide, pygtk, xsel, or xclip: necessary to use read_clipboard() . Most package managers on Linux distributions will have xclip and/or xsel immediately available for installation.

For Google BigQuery I/O — see here

Backports.lzma: Only for Python 2, for writing to and/or reading from an xz compressed DataFrame in CSV; Python 3 support is built into the standard library.

One of the following combinations of libraries is needed to use the top-level read_html() function:

BeautifulSoup4 and html5lib (Any recent version of html5lib is okay.)
BeautifulSoup4 and lxml
BeautifulSoup4 and html5lib and lxml
Only lxml, although see HTML Table Parsing for reasons as to why you should probably not take this approach.

if you install BeautifulSoup4 you must install either lxml or html5lib or both. read_html() will not work with onlyBeautifulSoup4 installed.
You are highly encouraged to read HTML Table Parsing gotchas . It explains issues surrounding the installation and usage of the above three libraries.
You may need to install an older version of BeautifulSoup4: Versions 4.2.1, 4.1.3 and 4.0.2 have been confirmed for 64 and 32-bit Ubuntu/Debian

if you’re on a system with apt-get you can do

to get the necessary dependencies for installation of lxml. This will prevent further headaches down the line.

Without the optional dependencies, many useful features will not work. Hence, it is highly recommended that you install these. A packaged distribution like Anaconda, or Enthought Canopy may be worth considering.

Источник