Nvidia cuda linux download

CUDA Toolkit 3.2 Downloads

Individual code samples from the SDK are also available.

New and Improved CUDA Libraries

  • CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
  • CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs, now 2x to 10x faster than MKL
  • New CUSPARSE library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations delivers 5x to 30x faster performance than MKL
  • New CURAND library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines at 10x to 20x faster than similar routines in MKL
  • H.264 encode/decode libraries now included in the CUDA Toolkit

CUDA Driver & CUDA C Runtime

  • Support for new 6GB Quadro and Tesla products
  • New support for enabling high performance Tesla Compute Cluster (TCC) mode on Tesla GPUs in Windows desktop workstations

Development Tools

  • Multi-GPU debugging support for both cuda-gdb and Parallel Nsight
  • Expanded cuda-memcheck support for all Fermi architecture GPUs
  • NVCC support for Intel C Compiler (ICC) v11.1 on 64-bit Linux distros
  • Support for debugging GPUs with more than 4GB device memory

Miscellaneous

  • Support for memory management using malloc() and free() in CUDA C compute kernels
  • New NVIDIA System Management Interface (nvidia-smi) support for reporting % GPU busy, and several GPU performance counters

New GPU Computing SDK Code Samples

  • Several code samples demonstrating how to use the new CURAND library, including MonteCarloCURAND, EstimatePiInlineP, EstimatePiInlineQ, EstimatePiP, EstimatePiQ, SingleAsianOptionP, and randomFog
  • Conjugate Gradient Solver, demonstrating the use of CUBLAS and CUSPARSE in the same application
  • Function Pointers, a sample that shows how to use function pointers to implement the Sobel Edge Detection filter for 8-bit monochrome images
  • Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion
  • Simple Printf, demonstrating best practices for using both printf and cuprintf in compute kernels
  • Bilateral Filter, an edge-preserving non-linear smoothing filter for image recovery and denoising implemented in CUDA C with OpenGL rendering
  • SLI with Direct3D Texture, a simple example demonstrating the use of SLI and Direct3D interoperability with CUDA C
  • cudaEncode, showing how to use the NVIDIA H.264 Encoding Library using YUV frames as input
  • Vflocking Direct3D/CUDA, which simulates and visualizes the flocking behavior of birds in flight
  • simpleSurfaceWrite, demonstrating how CUDA kernels can write to 2D surfaces on Fermi GPUs

Windows developers should be sure to check out the new debugging and profiling features in Parallel Nsight v1.5 for Visual Studio at www.nvidia.com/ParallelNsight.

Please refer to the Release Notes and Getting Started Guides for more information.

In CUDA Toolkit 3.2 and the accompanying release of the CUDA driver, some important changes have been made to the CUDA Driver API to support large memory access for device code and to enable further system calls such as malloc and free. Please refer to the CUDA Toolkit 3.2 Readiness Tech Brief for a summary of these changes.

Note: The developer driver packages below provide baseline support for the widest number of NVIDIA products in the smallest number of installers. More recent production driver packages for developers and end users may be available at www.nvidia.com/drivers.

For additional tools and solutions for Windows, Linux and MAC OS , such as CUDA Fortran, CULA, CUDA-GDB, please visit our Tools and Ecosystem Page

Источник

CUDA 7.0 Downloads

Please Note: There is a recommended patch for CUDA 7.0 which resolves an issue in the cuFFT library that can lead to incorrect results for certain inputs sizes less than or equal to 1920 in any dimension when cufftSetStream() is passed a non-blocking stream (e.g., one created using the cudaStreamNonBlocking flag of the CUDA Runtime API or the CU_STREAM_NON_BLOCKING flag of the CUDA Driver API).

Читайте также:  Системные требования моего компьютера windows 10
Version Network Installer Local Installer
Windows 8.1
Windows 7
Win Server 2012 R2
Win Server 2008 R2
EXE (8.0MB) EXE (939MB)
cuFFT Patch ZIP (52MB) , README
Windows Getting Started Guide

Q: Where is the notebook installer?
A: Previous releases of the CUDA Toolkit had separate installation packages for notebook and desktop systems. Beginning with CUDA 7.0, these packages have been merged into a single package that is capable of installing on all supported platforms.

Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required.

Q: Where do I get the GPU Deployment Kit (GDK) for Windows?
A: The installers give you an option to install the GDK. If you only want to install the GDK, then you should use the network installer, for efficiency.

Q: Where can I find old versions of the CUDA Toolkit?
A: Older versions of the toolkit can be found on the Legacy CUDA Toolkits page.

Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.

Version Network Installer Local Package Installer Runfile Installer
Fedora 21 RPM (3KB) RPM (1GB) RUN (1.1GB)
OpenSUSE 13.2 RPM (3KB) RPM (1GB) RUN (1.1GB)
OpenSUSE 13.1 RPM (3KB) RPM (1GB) RUN (1.1GB)
RHEL 7
CentOS 7
RPM (10KB) RPM (1GB) RUN (1.1GB)
RHEL 6
CentOS 6
RPM (18KB) RPM (1GB) RUN (1.1GB)
SLES 12 RPM (3KB) RPM (1.1GB) RUN (1.1GB)
SLES 11 (SP3) RPM (3KB) RPM (1.1GB) RUN (1.1GB)
SteamOS 1.0-beta RUN (1.1GB)
Ubuntu 14.10 DEB (3KB) DEB (1.5GB) RUN (1.1GB)
Ubuntu 14.04 * DEB (10KB) DEB (902MB) RUN (1.1GB)
Ubuntu 12.04 DEB (3KB) DEB (1.3GB) RUN (1.1GB)
GPU Deployment Kit Included in Installer Included in Installer RUN (4MB)
cuFFT Patch TAR (122MB) , README
Linux Getting Started Guide

* Includes POWER8 cross-compilation tools.

Q: Where can I find the CUDA 7 Toolkit for my Jetson TK1?
A: Jetson TK1 is not supported by the CUDA 7 Toolkit. Please download the CUDA 6.5 Toolkit for Jetson TK1 instead.

Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.

Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.

Version Network Installer Local Package Installer Runfile Installer
Ubuntu 14.10 DEB (3KB) DEB (588MB)
Ubuntu 14.04 DEB (3KB) DEB (588MB)
GPU Deployment Kit n/a n/a RUN (1.7MB)
cuFFT Patch TAR (105MB) , README
Linux Getting Started Guide

Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.

Q: Is cuSOLVER available for the POWER8 architecture?
A: The initial release of the CUDA 7.0 toolkit omitted the cuSOLVER library from the installer. On May 29, 2015, new CUDA 7.0 installers were posted for the POWER8 architecture that included the cuSOLVER library. If you downloaded the CUDA 7.0 toolkit for POWER8 on or earlier than this date, and you need to use cuSOLVER, you will need to download the latest installer and re-install.

Version Network Installer Local Installer
10.9
10.10
DMG (0.4MB) PKG (977MB)
cuFFT Patch TAR (104MB) , README
Mac Getting Started Guide

Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.

Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.

Q: What do I do if the Network Installer fails to run with the error message «The package is damaged and can’t be opened. You should eject the disk image»?
A: Check that your security preferences are set to allow apps downloaded from anywhere to run. This setting can be found under: System Preferences > Security & Privacy > General

Источник

CUDA Toolkit 2.3 Downloads

CUDA Toolkit 2.3 (June 2009)

  • The CUFFT Library now supports double-precision transforms and includes significant performance improvements for single-precision transforms as well. See the CUDA Toolkit release notes for details.
  • The cuda-gdb hardware debugger and CUDA Visual Profiler are now included in the CUDA Toolkit installer, and the CUDA-GDB debugger is now available for all supported Linux distros.
  • Each GPU in an SLI group is now enumerated individually, so compute applications can now take advantage of multi-GPU performance even when SLI is enabled for graphics.
  • The 64-bit versions of the CUDA Toolkit now support compiling 32-bit applications. Please note that the installation location of the libraries has changed, so developers on 64-bit Linux must update their LD_LIBRARY_PATH to contain either /usr/local/cuda/lib or /usr/local/cuda/lib64.
  • New support for fp16/fp32 conversion intrinsics allows storage of data in fp16 format with computation in fp32. Use of fp16 format is ideal for applications that require higher numerical range than 16-bit integer but less precision than fp32 and reduces memory space and bandwidth consumption.
  • The Visual Profiler includes several enhancements:
    • All memory transfer API calls are now reported
    • Support for profiling multiple contexts per GPU
    • Synchronized clocks for requested start time on the CPU and start/end times on the GPU for all kernel launches and memory transfers
    • Global memory load and store efficiency metrics for GPUs with compute capability 1.2 and higher
  • The CUDA Driver for MacOS now has it’s own installer, and is available separate from the CUDA Toolkit.
  • Support for major Linux distros, MacOS X, and Windows:
    • MacOS X 10.5.6 and later (32-bit)
    • Windows XP/Vista/7 with Visual Studio 8 (VC2005 SP1) and 9 (VC2008)
    • Fedora 10, RHEL 4.7 & 5.3, SLED 10.2 & 11.0, OpenSUSE 11.1, and Ubuntu 8.10 & 9.04

New CUDA SDK code samples:

  • A new pitchLinearTexure code sample that shows how to efficiently texture from pitch linear memory.
  • A new PTXJIT code sample illustrating how to use cuModuleLoadDataEx() to load PTX source from memory instead of loading a file.
  • Two new code samples for Windows, showing how to use the NVCUVID library to decode MPEG-2, VC-1, and H.264 content and pass frames to OpenGL or Direct3D for display.
  • Updated code samples showing how to properly align CUDA kernel function parameters so the same code works on both x32 and x64 systems.

All Toolkit and Library Documentation included with the Toolkit and SDK Installers

Источник

CUDA Toolkit 3.1 Downloads

CUDA Toolkit 3.1

For the latest releases see the CUDA Toolkit and GPU Computing SDK home page

  • GPUDirect(tm) gives 3rd party devices direct access to CUDA Memory
  • Support for 16-way concurrency allows up to 16 different kernels to run at the same time on Fermi architecture GPUs
  • Runtime / Driver interoperability enables applications to mix-n-match use of the CUDA Driver API with CUDA C Runtim and math libraries via buffer sharing and context migration
  • New language features added to CUDA C / C++ include:
    • Support for printf() in device code
    • Support for function pointers and recursion make it easier to port many existing algorithms to Fermi GPUs
  • Unified Visual Profiler now supports both CUDA C/C++ and OpenCL, and now includes support for CUDA Driver API tracing
  • Math Libraries Performance Improvements, including:
    • Improved performance of selected transcendental functions from the log, pow, erf, and gamma families
    • Significant improvements in double-precision FFT performance on Fermi-architecture GPUs for 2^n transform sizes
    • Streaming API now supported in CUBLAS for overlapping copy and compute operations
    • CUFFT Real-to-complex (R2C) and complex-to-real (C2R) optimizations for 2^n data sizes
    • Improved performance for GEMV and SYMV subroutines in CUBLAS
    • Optimized double-precision implementations of divide and reciprocal routines for the Fermi architecture
  • New and updated SDK code samples demonstrating how to use:
    • Function pointers in CUDA C/C++ kernels
    • OpenCL / Direct3D buffer sharing
    • Hidden Markov Model in OpenCL
    • Microsoft Excel GPGPU example showing how to run an Excel function on the GPU

Note: The developer driver packages below provide baseline support for the widest number of NVIDIA products in the smallest number of installers. More recent production driver packages for developers and end users may be available at www.nvidia.com/drivers.

For additional tools and solutions for Windows, Linux and MAC OS , such as CUDA Fortran, CULA, CUDA-dgb , please visit our Tools and Ecosystem Page

Windows XP, Windows VISTA, Windows 7

  • C/C++ compiler
  • CUDA Visual Profiler
  • OpenCL Visual Profiler
  • GPU-accelerated BLAS library
  • GPU-accelerated FFT library
  • Additional tools and documentation

*New* Updated versions of the CUDA C Programming Guide (Version 3.1.1) and the Fermi Tuning Guide (Version 1.2) are available via the links to the right.

Description of Download Link to Binaries Documents
C2050 Support Drivers download
Developer Drivers for WinXP (257.21) 32-bit
64-bit
Developer Drivers for WinVista and Win7 (257.21) 32-bit
64-bit
Notebook Developer Drivers for WinXP (257.21) 32-bit
64-bit
Notebook Developer Drivers for WinVista and Win7 (257.21) 32-bit
64-bit
32-bit
64-bit
Getting Started Guide Windows
Release Notes
*Updated* CUDA C Programming Guide
CUDA C Best Practices Guide
OpenCL Programming Guide
OpenCL BestPractices Guide
OpenCL Implementation Notes
CUDA Reference Manual
API Reference
PTX ISA 2.1
Visual Profiler User Guide
Visual Profiler Release Notes
Fermi Compatibility Guide
* Updated * Fermi Tuning Guide
CUBLAS User Guide
CUFFT User Guide
CUDA Developer Guide for Optimus Platforms
License
NVIDIA Performance Primitives (NPP) library 32-bit
64-bit
NPP Release Notes
NPP License
GPU Computing SDK code samples 32-bit
64-bit
OpenCL Release Notes
CUDA C/C++ Release Notes
DirectCompute Release Notes
CUDA Occupancy Calculator
License
NVIDIA OpenCL Extensions Compiler_Options
D3D9 Sharing
D3D10 Sharing
D3D11 Sharing
Device Attribute Query
Pragma Unroll

Linux

  • C/C++ compiler
  • cuda-gdb debugger
  • CUDA Visual Profiler
  • OpenCL Visual Profiler
  • GPU-accelerated BLAS library
  • GPU-accelerated FFT library
  • Additional tools and documentation

*New* Updated versions of the CUDA C Programming Guide (Version 3.1.1) and the Fermi Tuning Guide (Version 1.2) are available via the links to the right.

Источник

Читайте также:  Последнее обновление windows 10 64 бита
Оцените статью
Description of Download Link to Binaries Documents
Developer Drivers for Linux (256.40) 32-bit
64-bit
README_Linux.txt