Nvidia cuda linux download

Содержание

CUDA Toolkit 3.2 Downloads
New and Improved CUDA Libraries
CUDA Driver & CUDA C Runtime
Development Tools
Miscellaneous
New GPU Computing SDK Code Samples
CUDA 7.0 Downloads
CUDA Toolkit 2.3 Downloads
CUDA Toolkit 2.3 (June 2009)
CUDA Toolkit 3.1 Downloads
CUDA Toolkit 3.1
Windows XP, Windows VISTA, Windows 7
Linux

CUDA Toolkit 3.2 Downloads

Individual code samples from the SDK are also available.

New and Improved CUDA Libraries

CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs, now 2x to 10x faster than MKL
New CUSPARSE library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations delivers 5x to 30x faster performance than MKL
New CURAND library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines at 10x to 20x faster than similar routines in MKL
H.264 encode/decode libraries now included in the CUDA Toolkit

CUDA Driver & CUDA C Runtime

Support for new 6GB Quadro and Tesla products
New support for enabling high performance Tesla Compute Cluster (TCC) mode on Tesla GPUs in Windows desktop workstations

Development Tools

Multi-GPU debugging support for both cuda-gdb and Parallel Nsight
Expanded cuda-memcheck support for all Fermi architecture GPUs
NVCC support for Intel C Compiler (ICC) v11.1 on 64-bit Linux distros
Support for debugging GPUs with more than 4GB device memory

Miscellaneous

Support for memory management using malloc() and free() in CUDA C compute kernels
New NVIDIA System Management Interface (nvidia-smi) support for reporting % GPU busy, and several GPU performance counters

New GPU Computing SDK Code Samples

Several code samples demonstrating how to use the new CURAND library, including MonteCarloCURAND, EstimatePiInlineP, EstimatePiInlineQ, EstimatePiP, EstimatePiQ, SingleAsianOptionP, and randomFog
Conjugate Gradient Solver, demonstrating the use of CUBLAS and CUSPARSE in the same application
Function Pointers, a sample that shows how to use function pointers to implement the Sobel Edge Detection filter for 8-bit monochrome images
Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion
Simple Printf, demonstrating best practices for using both printf and cuprintf in compute kernels
Bilateral Filter, an edge-preserving non-linear smoothing filter for image recovery and denoising implemented in CUDA C with OpenGL rendering
SLI with Direct3D Texture, a simple example demonstrating the use of SLI and Direct3D interoperability with CUDA C
cudaEncode, showing how to use the NVIDIA H.264 Encoding Library using YUV frames as input
Vflocking Direct3D/CUDA, which simulates and visualizes the flocking behavior of birds in flight
simpleSurfaceWrite, demonstrating how CUDA kernels can write to 2D surfaces on Fermi GPUs

Windows developers should be sure to check out the new debugging and profiling features in Parallel Nsight v1.5 for Visual Studio at www.nvidia.com/ParallelNsight.

Please refer to the Release Notes and Getting Started Guides for more information.

In CUDA Toolkit 3.2 and the accompanying release of the CUDA driver, some important changes have been made to the CUDA Driver API to support large memory access for device code and to enable further system calls such as malloc and free. Please refer to the CUDA Toolkit 3.2 Readiness Tech Brief for a summary of these changes.

Note: The developer driver packages below provide baseline support for the widest number of NVIDIA products in the smallest number of installers. More recent production driver packages for developers and end users may be available at www.nvidia.com/drivers.

For additional tools and solutions for Windows, Linux and MAC OS , such as CUDA Fortran, CULA, CUDA-GDB, please visit our Tools and Ecosystem Page

Источник

CUDA 7.0 Downloads

Please Note: There is a recommended patch for CUDA 7.0 which resolves an issue in the cuFFT library that can lead to incorrect results for certain inputs sizes less than or equal to 1920 in any dimension when cufftSetStream() is passed a non-blocking stream (e.g., one created using the cudaStreamNonBlocking flag of the CUDA Runtime API or the CU_STREAM_NON_BLOCKING flag of the CUDA Driver API).

Version	Network Installer	Local Installer
Windows 8.1 Windows 7 Win Server 2012 R2 Win Server 2008 R2	EXE (8.0MB)	EXE (939MB)
cuFFT Patch	ZIP (52MB) , README
Windows Getting Started Guide

Q: Where is the notebook installer?
A: Previous releases of the CUDA Toolkit had separate installation packages for notebook and desktop systems. Beginning with CUDA 7.0, these packages have been merged into a single package that is capable of installing on all supported platforms.

Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required.

Q: Where do I get the GPU Deployment Kit (GDK) for Windows?
A: The installers give you an option to install the GDK. If you only want to install the GDK, then you should use the network installer, for efficiency.

Q: Where can I find old versions of the CUDA Toolkit?
A: Older versions of the toolkit can be found on the Legacy CUDA Toolkits page.

Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.

Version	Network Installer	Local Package Installer	Runfile Installer
Fedora 21	RPM (3KB)	RPM (1GB)	RUN (1.1GB)
OpenSUSE 13.2	RPM (3KB)	RPM (1GB)	RUN (1.1GB)
OpenSUSE 13.1	RPM (3KB)	RPM (1GB)	RUN (1.1GB)
RHEL 7 CentOS 7	RPM (10KB)	RPM (1GB)	RUN (1.1GB)
RHEL 6 CentOS 6	RPM (18KB)	RPM (1GB)	RUN (1.1GB)
SLES 12	RPM (3KB)	RPM (1.1GB)	RUN (1.1GB)
SLES 11 (SP3)	RPM (3KB)	RPM (1.1GB)	RUN (1.1GB)
SteamOS 1.0-beta	RUN (1.1GB)
Ubuntu 14.10	DEB (3KB)	DEB (1.5GB)	RUN (1.1GB)
Ubuntu 14.04 *	DEB (10KB)	DEB (902MB)	RUN (1.1GB)
Ubuntu 12.04	DEB (3KB)	DEB (1.3GB)	RUN (1.1GB)
GPU Deployment Kit	Included in Installer	Included in Installer	RUN (4MB)
cuFFT Patch	TAR (122MB) , README
Linux Getting Started Guide

* Includes POWER8 cross-compilation tools.

Q: Where can I find the CUDA 7 Toolkit for my Jetson TK1?
A: Jetson TK1 is not supported by the CUDA 7 Toolkit. Please download the CUDA 6.5 Toolkit for Jetson TK1 instead.

Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.

Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.

Version	Network Installer	Local Package Installer	Runfile Installer
Ubuntu 14.10	DEB (3KB)	DEB (588MB)
Ubuntu 14.04	DEB (3KB)	DEB (588MB)
GPU Deployment Kit	n/a	n/a	RUN (1.7MB)
cuFFT Patch	TAR (105MB) , README
Linux Getting Started Guide

Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.

Q: Is cuSOLVER available for the POWER8 architecture?
A: The initial release of the CUDA 7.0 toolkit omitted the cuSOLVER library from the installer. On May 29, 2015, new CUDA 7.0 installers were posted for the POWER8 architecture that included the cuSOLVER library. If you downloaded the CUDA 7.0 toolkit for POWER8 on or earlier than this date, and you need to use cuSOLVER, you will need to download the latest installer and re-install.

Version	Network Installer	Local Installer
10.9 10.10	DMG (0.4MB)	PKG (977MB)
cuFFT Patch	TAR (104MB) , README
Mac Getting Started Guide

Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.

Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.

Q: What do I do if the Network Installer fails to run with the error message «The package is damaged and can’t be opened. You should eject the disk image»?
A: Check that your security preferences are set to allow apps downloaded from anywhere to run. This setting can be found under: System Preferences > Security & Privacy > General

Источник

CUDA Toolkit 2.3 Downloads

CUDA Toolkit 2.3 (June 2009)

The CUFFT Library now supports double-precision transforms and includes significant performance improvements for single-precision transforms as well. See the CUDA Toolkit release notes for details.
The cuda-gdb hardware debugger and CUDA Visual Profiler are now included in the CUDA Toolkit installer, and the CUDA-GDB debugger is now available for all supported Linux distros.
Each GPU in an SLI group is now enumerated individually, so compute applications can now take advantage of multi-GPU performance even when SLI is enabled for graphics.
The 64-bit versions of the CUDA Toolkit now support compiling 32-bit applications. Please note that the installation location of the libraries has changed, so developers on 64-bit Linux must update their LD_LIBRARY_PATH to contain either /usr/local/cuda/lib or /usr/local/cuda/lib64.
New support for fp16/fp32 conversion intrinsics allows storage of data in fp16 format with computation in fp32. Use of fp16 format is ideal for applications that require higher numerical range than 16-bit integer but less precision than fp32 and reduces memory space and bandwidth consumption.
The Visual Profiler includes several enhancements:
- All memory transfer API calls are now reported
- Support for profiling multiple contexts per GPU
- Synchronized clocks for requested start time on the CPU and start/end times on the GPU for all kernel launches and memory transfers
- Global memory load and store efficiency metrics for GPUs with compute capability 1.2 and higher
The CUDA Driver for MacOS now has it’s own installer, and is available separate from the CUDA Toolkit.
Support for major Linux distros, MacOS X, and Windows:
- MacOS X 10.5.6 and later (32-bit)
- Windows XP/Vista/7 with Visual Studio 8 (VC2005 SP1) and 9 (VC2008)
- Fedora 10, RHEL 4.7 & 5.3, SLED 10.2 & 11.0, OpenSUSE 11.1, and Ubuntu 8.10 & 9.04

New CUDA SDK code samples:

A new pitchLinearTexure code sample that shows how to efficiently texture from pitch linear memory.
A new PTXJIT code sample illustrating how to use cuModuleLoadDataEx() to load PTX source from memory instead of loading a file.
Two new code samples for Windows, showing how to use the NVCUVID library to decode MPEG-2, VC-1, and H.264 content and pass frames to OpenGL or Direct3D for display.
Updated code samples showing how to properly align CUDA kernel function parameters so the same code works on both x32 and x64 systems.

All Toolkit and Library Documentation included with the Toolkit and SDK Installers

Источник

CUDA Toolkit 3.1 Downloads

CUDA Toolkit 3.1

For the latest releases see the CUDA Toolkit and GPU Computing SDK home page

GPUDirect(tm) gives 3rd party devices direct access to CUDA Memory
Support for 16-way concurrency allows up to 16 different kernels to run at the same time on Fermi architecture GPUs
Runtime / Driver interoperability enables applications to mix-n-match use of the CUDA Driver API with CUDA C Runtim and math libraries via buffer sharing and context migration
New language features added to CUDA C / C++ include:
- Support for printf() in device code
- Support for function pointers and recursion make it easier to port many existing algorithms to Fermi GPUs
Unified Visual Profiler now supports both CUDA C/C++ and OpenCL, and now includes support for CUDA Driver API tracing
Math Libraries Performance Improvements, including:
- Improved performance of selected transcendental functions from the log, pow, erf, and gamma families
- Significant improvements in double-precision FFT performance on Fermi-architecture GPUs for 2^n transform sizes
- Streaming API now supported in CUBLAS for overlapping copy and compute operations
- CUFFT Real-to-complex (R2C) and complex-to-real (C2R) optimizations for 2^n data sizes
- Improved performance for GEMV and SYMV subroutines in CUBLAS
- Optimized double-precision implementations of divide and reciprocal routines for the Fermi architecture
New and updated SDK code samples demonstrating how to use:
- Function pointers in CUDA C/C++ kernels
- OpenCL / Direct3D buffer sharing
- Hidden Markov Model in OpenCL
- Microsoft Excel GPGPU example showing how to run an Excel function on the GPU

For additional tools and solutions for Windows, Linux and MAC OS , such as CUDA Fortran, CULA, CUDA-dgb , please visit our Tools and Ecosystem Page

Windows XP, Windows VISTA, Windows 7

C/C++ compiler
CUDA Visual Profiler
OpenCL Visual Profiler
GPU-accelerated BLAS library
GPU-accelerated FFT library
Additional tools and documentation

*New* Updated versions of the CUDA C Programming Guide (Version 3.1.1) and the Fermi Tuning Guide (Version 1.2) are available via the links to the right.

Description of Download	Link to Binaries	Documents
C2050 Support Drivers	download
Developer Drivers for WinXP (257.21)	32-bit 64-bit
Developer Drivers for WinVista and Win7 (257.21)	32-bit 64-bit
Notebook Developer Drivers for WinXP (257.21)	32-bit 64-bit
Notebook Developer Drivers for WinVista and Win7 (257.21)	32-bit 64-bit
32-bit 64-bit	Getting Started Guide Windows Release Notes Updated CUDA C Programming Guide CUDA C Best Practices Guide OpenCL Programming Guide OpenCL BestPractices Guide OpenCL Implementation Notes CUDA Reference Manual API Reference PTX ISA 2.1 Visual Profiler User Guide Visual Profiler Release Notes Fermi Compatibility Guide * Updated * Fermi Tuning Guide CUBLAS User Guide CUFFT User Guide CUDA Developer Guide for Optimus Platforms License
NVIDIA Performance Primitives (NPP) library	32-bit 64-bit	NPP Release Notes NPP License
GPU Computing SDK code samples	32-bit 64-bit	OpenCL Release Notes CUDA C/C++ Release Notes DirectCompute Release Notes CUDA Occupancy Calculator License
NVIDIA OpenCL Extensions	Compiler_Options D3D9 Sharing D3D10 Sharing D3D11 Sharing Device Attribute Query Pragma Unroll

Linux

C/C++ compiler
cuda-gdb debugger
CUDA Visual Profiler
OpenCL Visual Profiler
GPU-accelerated BLAS library
GPU-accelerated FFT library
Additional tools and documentation

*New* Updated versions of the CUDA C Programming Guide (Version 3.1.1) and the Fermi Tuning Guide (Version 1.2) are available via the links to the right.

Источник

Description of Download	Link to Binaries	Documents
Developer Drivers for Linux (256.40)	32-bit 64-bit	README_Linux.txt