Содержание

Intel® Parallel Studio XE 2015 Professional Edition for Windows, Fortran Compiler
Sample Applications
Identify Performance Hotspots
Leverage Existing Performance Libraries
Optimize matrix operations using the Intel® Math Kernel Library
Leveraging the Intel® Visual Fortran Compiler
Optimize a Pythagorean prime number finder using OpenMP* with the Intel(R) Fortran Compiler
Leverage Parallelism
Identifying Candidates for Parallelization using Intel® Advisor XE 2015
Check Correctness
Checking Correctness using Intel® Inspector XE 2015
Redistributable Libraries for Intel(R) Parallel Studio XE 2015 Composer Edition for C++ and Fortran Linux*
Overview
OS requirement for redistributable packages
Installation instructions
Redistributable Libraries for Intel(R) Parallel Studio XE 2015 for C++ and Fortran Windows*
Intel® Parallel Studio XE 2015 — разговор о новых именах и «фишках»
Компиляторы
VTune Amplifier XE
Inspector и Advisor

Intel® Parallel Studio XE 2015 Professional Edition for Windows, Fortran Compiler

Published: 04/14/2015 Last Updated: 04/14/2015

Product tour with videos and samples

Identify the performance hotspots in your application using the VTune™ Amplifier XE
Leverage performance libraries to speed up the hotspots
Compile and optimize using the Intel® Fortran compiler, and link the optimized binary into your application
Check for issues using Intel® Inspector XE.

Note: Microsoft* Visual Studio is required to build and run the following library and compiler samples

Sample Applications

Some of the sample applications referred to by this article are available in your default installation directory, while others can be downloaded. To download the samples:

Click on download link
Download and unzip the file to a local folder
From the sample folder, double-click to open the version of the sln file that matches your version of Visual Studio*
Build the sample by pressing F7
Run it by pressing CTRL-F5
Click the Compute All button to run the unoptimized and the optimized version(s) on your system and see the results. Make a note of the improved time in the optimized version of the code

Identify Performance Hotspots

As a first step, use the Intel® VTune™ Amplifier to identify the functions, loops, and files that have the biggest impact on your application’s performance.

The following video and sample application demonstrate how to find the hotspots in a 3D rendering application called Tachyon, implement code changes to improve performance, and verify the performance improvements.

The sample used in the video can be found it the default installation location:
C:\Program Files (x86)\Intel\VTune Amplifier XE 2015\samples\en\C++\tachyon_vtune_amp_xe.zip

Leverage Existing Performance Libraries

Pick one strategic loop or function that consumes a significant portion of your application runtime. Explore performance libraries, such as Intel® Threading Building Blocks, Intel® Math Kernel Library, or Intel® Integrated Performance Primitives to discover already tuned algorithms that you can simply drop in and build into your application at link time.

Optimize matrix operations using the Intel® Math Kernel Library

Suppose you discover that a matrix multiplication has been identified as your chief hotspot. See how to use Intel® Math Kernel Library (MKL) to improve performance.

Leveraging the Intel® Visual Fortran Compiler

Optimize a Pythagorean prime number finder using OpenMP* with the Intel(R) Fortran Compiler

Learn how to parallelize a Pythagorean prime number finder using Intel® Visual Fortran with OpenMP*..

Leverage Parallelism

Modern architectures provide ample CPU cores to compute with. Before your implement parallelism use Intel® Advisor XE to get guidance and play “what if” scenarios to see where you should focus the parallelism design effort.

Identifying Candidates for Parallelization using Intel® Advisor XE 2015

Learn how to get guidance for where to start parallelizing your code using Intel® Advisor XE to survey, annotate, check suitability, correctness and scalability of your code for parallel execution.

This sample can be found in the default installation location:
C:\Program Files (x86)\Intel\Advisor XE 2015\samples\en\C++\tachyon_Advisor.zip

Check Correctness

At some point in your optimization process you will want to check that your application is computing correctly, avoiding memory and resource leaks, avoiding threading deadlocks and race conditions. Use Intel® Inspector XE to check your application for such issues.

Checking Correctness using Intel® Inspector XE 2015

Explore how to check for memory and resource issues, as well as how to do thread checking of your application. Use Intel® Inspector XE to check for memory leaks and thread correctness issues, such as race conditions and deadlocks.

Redistributable Libraries for Intel(R) Parallel Studio XE 2015 Composer Edition for C++ and Fortran Linux*

Published: 08/27/2014 Last Updated: 09/02/2016

Overview

This article contains links to the redistributable installation packages for Intel(R) Parallel Studio XE 2015 Composer Edition for C++ Linux* and Intel(R) Parallel Studio XE 2015 Composer Edition for Fortran Linux*.

If you are looking for other versions, please go to Redistributable Libraries by Version.

The redistributable packages are for the end users who use applications that are built with Intel Compilers. Please note that there is one redistributable package for every compiler update. Make sure you download and install the one recommended by the application vendor.

OS requirement for redistributable packages

Please read the Release Notes of the update for supported OS distributions:

Installation instructions

First of all, use following command to untar the .tgz file:
$tar -xzvf l_ccompxe_2015.0.090_redist.tgz

To start the installation, run following shell command:
$. ./l_ccompxe_2015.0.090_redist/install.sh
The installation shell program (install.sh) of the redistributable package will guide you through the installation. You will need to accept the EULA and the installation will install all the libraries to the following directory. But you can change the installation directory.

For the redistributable package, the default installation directory is
$HOME/intel/

Redistributable Libraries for Intel(R) Parallel Studio XE 2015 for C++ and Fortran Windows*

Published: 11/15/2014 Last Updated: 09/02/2016

This article contains links to the redistributable library packages for the Intel® C++ and Visual Fortran Compilers bundled in the following products:

Intel® Parallel Studio XE 2015 Composer Edition for C++ Windows*
Intel® Parallel Studio XE 2015 Professional Edition for C++ Windows*
Intel® Parallel Studio XE 2015 Cluster Edition for C++ Windows*
Intel® Parallel Studio XE 2015 Composer Edition for Fortran Windows*
Intel® Parallel Studio XE 2015 Professional Edition for Fortran Windows*
Intel® Parallel Studio XE 2015 Cluster Edition for Fortran Windows*

The redistributable library packages (.msi) are for the end users who use applications that are built with Intel Compilers. Please note that there is one redistributable package for every compiler update. Make sure you download and install the one recommended by the application vendor. If you are creating your own installation you can use the Windows Installer merge module (.msm) files that are found in the Redist subfolder of the Intel® C++ or Fortran Composer XE installation.

OS and prerequisite library requirements for redistributable packages for Intel® Parallel Studio XE 2015 for C++ Windows* and Intel(R) Parallel Studio XE 2015 for Fortran Windows*. Please see

The supported Operating Systems:

Windows 7*, Windows 8* & 8.1*, Windows Server 2008*, Windows HPC Server 2008*, or Windows Server 2012*
Note: in addition to a supported OS and Intel redistributable libraries, the target system must have either Microsoft Visual Studio with Visual C++ libraries installed OR the Microsoft Visual C++ redistributable library kit installed. In addition, the target system must have the same version of Microsoft Visual C++ libraries installed as the version of Visual C++ used on the development platform when the Intel compiled application was built. Visit the Microsoft website or search the web for «Microsoft Visual C++ Redistributable Package» and find the appropriate version to match your development system.
The Microsoft Visual C++ redistributable libraries OR Visual Studio with the C++ tools (libraries) may be installed before or after the Intel compiler redistributable libraries.

Installation instructions

The installation program of the redistributable package will guide you through the installation. You will need to accept the EULA and the installation will install all the libraries to the fixed directory: [Common Files]\Intel\Shared Libraries\

The installation creates a new env-var «INTEL_DEV_REDIST» with the value of above installation directory, and the PATH env-var is updated with [INTEL_DEV_REDIST]\redist\[ia32|intel64]\compiler and [INTEL_DEV_REDIST]\redist\[ia32|intel64]\mpirt (for Fortran packages). The «redist\intel64» directory is added only on 64-bit systems. See below for more information on PATH changes.

Additionally on 64-bit systems there is another subfolder [INTEL_DEV_REDIST]\compiler\lib\mic with redistributable libraries for Intel® Many Integrated Core Architecture(Intel MIC) architecture. And an environment variable MIC_LD_LIBRARY_PATH is set to this location.

If you wish to install the redistributable package «silently», so that no output is presented to the user, run the executable with the following options added to the command line like:

System PATH Environment Variable Changes

Installation of the redistributable libraries, in either MSI or MSM form, adds folders containing the installed DLLs to the system PATH environment variable. Microsoft Windows has a limit on the total size of the value of PATH; in versions later than Windows 7 the limit is 4095 characters. This limit applies not only to the system-wide definition, but the length as modified by any batch files or scripts run. If the length is exceeded, the value of PATH can be truncated and this can cause WIndows or some applications to operate improperly.

If you are concerned that PATH may get truncated, you can prevent the redistributable installer from modifying PATH, but then it is your responsibility to make sure that the proper folders are named in PATH when programs built using the Intel compilers are executed.

If you are using the MSI installer, use the command line and add the parameter NO_UPDATE_PATH=yes.For example:
msiexec /i w_ccompxe_redist_ia32_2013_sp1.0.103.msi NO_UPDATE_PATH=yes
If you are using the MSM merge module, set the update property NO_UPDATE_PATH=yes in the installer properties.

Testing your Installation:

After installation of the Intel redistributable libraries AND the prerequisite Microsoft Visual C++ redistributables or Visual Studio with C++ tools and libraries, try to run your Intel-compiled binary. If there are any issues, please try to determine the missing DLLs or libraries using a tool such as Dependency Walker. If you are still having issues, please consult the Intel User Forums:

The redistributable library packages

Intel® Parallel Studio XE 2015 — разговор о новых именах и «фишках»

Отныне все пакеты средств предлагаются под именем Intel Parallel Studio XE 2015, но в разных версиях. Выходит, что теперь для того, чтобы пользоваться компилятором и библиотеками, вам нужен Intel Parallel Studio XE 2015 Composer Edition. И жизнь у тысяч разработчиков сразу стала проще, а все баги ушли, испугавшись такого грозного названия. Ладно, давайте дальше без иронии. Если же к подобному базовому комплекту добавить широко известные и применяемые VTune + Inspector + Advisor, получим версию Pro. Накинув сверху Intel MPI и ITAC, мы получаем набор по имени Cluster Edition. Предвосхищаю путаницу с подобными именами, но, тем не менее, думаю, что народ постепенно к такому переименованию привыкнет, ведь логика присутствует.

Естественно, что это только маркетинговые новшества. Давайте посмотрим, что мы получаем с приходом новой версии в техническом плане, а здесь действительно есть что попробовать.

Компиляторы

Естественно, компилятор, как обычно, в новой версии стал ещё быстрее и производительнее прежнего. Частично, я уже описывал некоторые новые «плюшки» здесь, в основном фокусируясь на новых компиляторных отчетах.

Здесь же я приведу больше конкретики о том, что появилось. Скажем, в данной версии компилятора «отныне и во веки веков» полностью поддерживаются стандарты C++11 и Fortran 2003.
В первом добавлены новые строковые литералы, явное замещение виртуальных функций, thread_local и улучшение конструкторов объектов.
В Фортрановском компиляторе добавили поддержку параметризованных производных типов, закрыв тем самым «брешь» в поддержке стандарта Fortran 2003. Интересная кстати возможность – некий отдаленный аналог шаблонов из С++, дающий нам право контролировать размер данных во время компиляции и выполнения программы:

В этом примере, d можно задавать в рантайме и контролировать размер двумерного массива element. Кстати, k должен быть известен во время компиляции для того, чтобы задать длину целого типа в данном примере.

Кроме того, расширилась поддержка стандартов Fortran 2008 и OpenMP 4.0. В том же горячо мной любимом Фортране появилась конструкция BLOCK, весьма полезная при работе с DO CONCURRENT. Для OpenMP добавили поддержку директив CANCEL, CANCELLATION POINT и DEPEND для задач. Вообще, на OpenMP обратили самое пристальное внимание не только в новой версии компилятора, но и активно поработали над его расширенной поддержкой в профилировщике. Но об этом чуть позже.

Кроме такой богатой добавки в стандартах, были существенно переработаны отчеты об оптимизации, и в частности, о работе векторизатора. В упомянутом мной блоге я рассказывал про это подробнее. Здесь же хочу добавить, что интеграция с Visual Studio действительно работает, и теперь отчеты радуют глаз своей наглядностью:

Ну и в завершении темы компилятора – добавлена возможность offload’а вычислений на интегрированную графику – Intel Graphics Technology, реализованная с помощью директив. Тема объёмная, поэтому не буду распыляться в данном посте, а зарезервирую за собой право написать о ней отдельную историю.

VTune Amplifier XE

В новой версии студии изменился в лучшую сторону и VTune. Теперь у нас ещё больше возможностей для профилировки как на CPU, так и GPU. Например, раз уж мы теперь можем делать оффлоад на GPU, то и соответствующий анализ имеется, правда, пока только на Windows. Там же расширили и поддержку для OpenCL. Кроме того, появилась функция анализа TSX транзакций. Кстати, очень хороший обзор транзакционной памяти в процессоре Haswell представлен здесь.

Но это все полезные приятности для весьма ограниченного числа разработчиков. А вот то, что точно смогут оценить почти все, кто сталкивается с параллелизмом – это анализ масштабируемости OpenMP. Теперь по окончании профилировки, на странице Summary будет выдаваться отдельный пункт для OpenMP при условии, что приложение собиралось с соответствующим ключиком (и Intel’овским компилятором) и в нем есть параллельные области.

Мы наглядно можем увидеть, какое количество времени наше приложение работало последовательно, и, соответственно, оценить, как оно в дальнейшем будет масштабироваться, принимая во внимание закон Амдаля. Более того, есть аппроксимация идеального времени выполнения параллельной части, которая вычисляется без учёта оверхеда и с идеальным балансом загрузки. Значит, мы можем теперь заранее оценить, стоит ли вкладывать усилия в тот или иной кусок кода или алгоритм. Как это принято, есть возможность сразу перейти на интересующий нас код. Удобная штука, которая, я уверен, будет полезной для многих разработчиков.

Что ещё? Возможность использовать внешние коллекторы. Допустим, мы написали скрипт, который собирает различные события, происходящие в ОС, и хотим, чтобы они были собраны во время профилировки нашего приложения. Теперь есть и такая возможность, причём результат будет сгруппирован и показан на одной временной шкале.

Для любителей поработать на «яблоках», создан графический интерфейс для OS X. Профилировки там нет, зато можно просмотреть результаты, собранные на Linux или Windows, ну или удаленно собрать профили.

Inspector и Advisor

Напоследок оставил средства, которые могут значительно упростить жизнь любому разработчику, эдакий бонус в пакете Parallel Studio XE.

В Инспекторе значительно переработан механизм нахождения ошибок, связанных с общими данными. По заверениям разработчиков и основываясь на сравнении версии XE 2013 Update 3 и «свежевыпущенной», ускорение работы достигает 20 раз, что не может не радовать. Кроме того, уменьшилось и количество используемой при этом памяти.

Интересная возможность – это новый граф, который показывает использование памяти в реальном времени. Запустили анализ работы с памятью, и смотрим на него, осознавая при этом, как активно тратится память. Выглядит он следующим образом:

По завершении анализа, вы сможете также найти возможные ошибки, которые привели к значительному увеличению используемой памяти, перейдя на нужный кусок кода и пробежавшись по стеку:

Что нового появилось в Advisor’е? Это конечно возможность моделирования для Xeon Phi!
Напомню, что это средство без реализации какой либо параллельной модели, или проще, ничего не переписывая в коде, позволяет оценить возможный прирост производительности в разных частях нашего приложения. Причем имеется возможность отпрофилировать его и узнать, на какие места обратить пристальное внимание. Скажем, что будет если мы распараллелим в этом цикле и запустим вычисление? Во сколько раз ускоримся? Все что нужно, это вставить в интересующий нас участок кода аннотации и запустить средство.
Так вот, теперь можно заранее оценить, насколько наш алгоритм хорош для запуска на Xeon Phi:

На этих изображениях видно, что в одном случае алгоритм хорошо масштабируется и целесообразность использования Xeon Phi высока, а вот другой, напротив, начиная с 16 потоков не масштабируется.
Кроме того, появилась возможность спрогнозировать поведение при изменении количества итераций и их продолжительности/сложности:

В общем, появилось множество разных «вкусняшек», которые точно пригодятся при разработке высокооптимизированного параллельного и последовательного кода. Всем их можно и нужно пробовать здесь совершенно безвозмездно, то есть даром, как обычно, 30 дней.

Intel parallel studio xe 2015 windows