Windows hpc pack 2016

Overview of Microsoft HPC Pack 2016

Learn how to evaluate, set up, deploy, maintain, and submit jobs to a high-performance computing (HPC) cluster that is created by using Microsoft HPC Pack 2016. HPC Pack allows you to create and manage HPC clusters consisting of dedicated on-premises Windows or Linux compute nodes, part-time servers, workstation computers, and dedicated or on-demand compute resources that are deployed in Microsoft Azure.

Based on where the compute resources are located, Microsoft HPC Pack can be categorized into three cluster Modes:

Cluster Mode Highlights Topology
HPC Pack On-premises
Get started with HPC Pack On-premises
— Supports Windows and Linux compute nodes
— Advanced job scheduling and resource management
— Proved and scale-tested capabilities
— Free of charge
— Easy to extend to hybrid
HPC Pack Hybrid
Get started with HPC Pack Hybrid
— Burst to cloud to handle peaks in demand or special projects
— Automate the deployment of Windows and Linux Azure VMs
— Use your current HPC scheduler or HPC Pack
— Pay only for what you use
HPC Pack IaaS
Get started with HPC Pack IaaS
— Deploy a cluster all in the cloud, on demand
— Use your current scheduler or HPC Pack
— Readily shift existing applications to the cloud
— Use templates, scripts, and gallery images to deploy on demand

Microsoft also has cloud-born HPC Scheduler Service called Azure Batch. You can either use Azure Batch directly or you can use HPC Pack as your scheduler and have your job to burst to azure batch.

Follow below links to start with your HPC Pack:

Migration to HPC Pack 2016 Update 1

TThis article describes the steps to migrate your HPC Pack cluster from HPC Pack 2016 RTM version to HPC Pack 2016 Update 1.

Only migration from HPC Pack 2016 RTM to HPC Pack 2016 Update 1 is supported. It is not supported to upgrade or migrate your existing HPC Pack 2012 R2 or earlier cluster to HPC Pack 2016 Update 1.

Before migration

Before the migration, you need to do the following:

  1. Stop all running jobs.
  2. Stop all Azure nodes (PaaS), if you have deployed them.
  3. Stop and delete all Azure Batch pools, if you have deployed them. Remove the Azure Batch node templates because there is a breaking change in the Burst to Azure Batch feature in HPC Pack 2016 Update 1.
  4. Back up the HPC databases manually.

Step 1: Download HPC Pack 2016 Update 1 installation package and create a network share

1.1: Decide the head node to create the network share

For a single head node cluster, create the HPC Pack 2016 Update 1 installation network share on the head node.

For a high availability cluster, run the following PowerShell command as administrator to decide in which head node to create the network share.

Output is similar to the following:

1.2: Download the HPC Pack 2016 Update 1 installation package

Download the HPC Pack 2016 Update 1 installation zip package and the migration zip package from the Microsoft Download Center to the head node decided in Step 1.1.

1.3: Unblock the installation package zip file and migration zip file

Right-click the downloaded zip file and select Properties. If there is a security alert on the General page that the file is blocked, click Unblock.

1.4: Create a network share for HPC Pack 2016 installation package

Extract the HPC Pack 2016 Update 1 installation package to a local directory, for example d:\HPCPack2016Update1, and extract the migration package to the same directory, and then create a network share as follows.

Right-click the folder d:\HPCPack2016Update1, choose Properties > Security, and click Edit.

Click Add to grant Read & execute permissions to “Everyone”. Click OK twice.

On the Sharing tab click Share. Create a network share for the folder and add Read permission for “Everyone”.

If your cluster is not in an Active Directory domain, you need to modify the local group policy. Do the following:

a. Run gpedit.msc.

b. Click Computer Configuration > Windows Settings > Security Settings > Local Policies> Security Options.

c. Enable Network access: Let everyone permissions apply to anonymous users, and add “HPCPack2016Update1” to Network access: Shares that can be accessed anonymously.

Step 2: Upgrade Windows compute, broker, and workstation nodes

Open HPC Cluster Manager on the head node, and click Resource Management > Nodes. Select all the compute, broker, and workstation nodes, and click Run Command. In the Command line field, enter the following command line, and click Run.

All the nodes will be shown in “Error” state after the command because of a breaking change in HPC Pack 2016 Update 1. The compute, broker, and workstation nodes upgraded with Update 1 cannot connect to the head node with RTM version. After you upgrade the head node(s), the connection will be recovered.

Step 3: Upgrade Linux compute nodes

Scenario 1: Upgrade on-premises Linux compute nodes

If your Linux compute nodes were installed manually as per Add Linux nodes to the cluster, use the following steps to migrate.

Open HPC Cluster Manager on the head node, and click Resource Management > Nodes. Select all the Linux nodes, click Run Command, and run the following commands in sequence.

First, create a temp directory on all Linux nodes.

Second, mount the HPC Pack 2016 Update 1 installation share.

Third, schedule a job on all Linux nodes to migrate one minute later

Scenario 2: Upgrade Azure Linux compute nodes

If you had deployed the HPC Pack 2016 cluster with Linux workloads with our Azure Resource Manager template, you can open a PowerShell console as administrator on one head node, and run the following commands. You need to specify the resource group and location in which your nodes were deployed.

All the Linux nodes will be shown in “Error” state after the command because there is a breaking change in HPC Pack 2016 Update 1. The Linux nodes upgraded to Update 1 cannot connect to the head node with RTM version. After you upgrade the head node(s), the connection will be recovered.

Step 4: Upgrade the head node(s)

Scenario 1: Upgrade the head node for a single head node cluster

1: Back up cluster configuration settings

Open a PowerShell Console as administrator on the head node, and run the following commands.

2: Run script to upgrade head node

Open a new PowerShell console as administrator, and run the following command:

Scenario 2: Upgrade the high availability head nodes

For an HPC Pack 2016 cluster with head node high availability, there are three head nodes. In one head node, you have downloaded the HPC Pack 2016 Update 1 installation package and created a network share (here called head node A). The other two head nodes are head node B and head node C.

1: Back up cluster configuration settings

Open a PowerShell console as administrator on head node A, and run the following PowerShell commands.

2: Upgrade the Service Fabric cluster

Run the following PowerShell commands on head node A to upgrade the Service Fabric cluster to the latest version.

Connect to the cluster and get the list of available versions that you can upgrade to.

Start a cluster upgrade to the latest version from the list (for example 6.0.232.9494).

During the upgrading the original PowerShell console will close. Open a new one as administrator, connect to the Service Fabric cluster again with the Connect-ServiceFabricCluster command, and run the following command to monitor the upgrading progress.

The upgrading completes when the UpgradeState becomes RollingForwardCompleted.

3: Remove the HPC application from the Service Fabric cluster

Run the following PowerShell commands on head node A to remove the old version of HPC application from Service Fabric cluster.

Remove the HPC application.

Remove the HPC application type and application package.

4: Upgrade the first two head nodes.

Run the following PowerShell command as administrator separately on head node B and head node C.

5: Upgrade the last head node.

Run the following PowerShell command as administrator on head node A.

What’s New in HPC Pack 2016

This document lists the new features and changes that are available in Microsoft HPC Pack 2016.

Operating system and software requirements

HPC Pack 2016 has an updated set of requirements for operating system and other prerequisite software. Among other updates, HPC Pack 2016 provides support for Windows Server 2016 on the head node and several other node roles.

For the head node role, HPC Pack 2016 is not supported to run on Windows Server 2012.

High availablity

In HPC Pack 2016, we have migrated our head node services from the Failover Clustering Service to the Service Fabric Service. You can now deploy a highly available HPC Pack cluster much more easily in Azure or on-premises. See the Get started guide for Microsoft HPC Pack 2016 to create a highly available HPC Pack cluster on-premises. If you want to deploy a highly available HPC Pack cluster in Azure, see Deploy an HPC Pack 2016 cluster in Azure.

Azure Active Directory integration

With previous versions of HPC Pack set up in Azure virtual machines, you needed to set up a domain controller for your HPC cluster. This is because HPC Pack requires Active Directory authentication for cluster administrators and cluster users. In HPC Pack 2016, the administrator can alternatively configure Azure Active Directory for cluster authentication. For more details, see Manage an HPC Pack cluster in Azure using Azure Active Directory.

Enhanced GPU support

Since HPC Pack 2012 R2 Update 3 we have supported GPUs for Windows compute nodes. HPC Pack extends the support to include Linux compute nodes. With the Azure N-Series VM size, you’re able to deploy an HPC Pack Cluster with GPU capabilities in Azure. For more details, see Get started with HPC Pack and Azure N-Series VMs.

GUI improvements

Hold job — Now in the job management UI (HPC Job Manager), you can hold an active job with a hold-until date and time. The queued tasks within the active job are held from dispatching. And if there are any running tasks in the job, the job state is marked as Draining instead of Running.

Custom properties page — In the Job dialog, you can now view and edit a job’s custom properties. And if the value of the property is a link, the link is displayed on the page and can be clicked by the user. If you would like a file location to be clickable as well, use the format file:/// , for example, file:///c:/users .

Substitution of mount point — When a task is executed on a Linux node, the user usually can’t open the working directory. Now within the job management UI you can substitute the mount point by specifying the job custom properties linuxMountPoint and windowsMountPoint so that the user can access the folder as well. For example, you can create a job with the following settings:

  • Custom Property: linuxMountPoint = /gpfs/Production
  • Custom Property: windowsMountPoint = Z:\Production
  • Task Working Directory: /gpfs/Production/myjob

Then when you view the job from GUI, the working directory value in the Job dialog > View Tasks page > Details tab will be z:\production\myjob . And if you previously mounted the /gpfs to your local Z: drive, you will be able to view the job output file.

Activity log — Job modification logs are now also logged in the job’s activity log.

Set subscribed information for node — The Administrator can set node subscribed cores or sockets from the GUI. Select offline nodes and perform the Edit Properties action.

No copy job – If you specify the job custom property noGUICopy as true , the Copy action on the GUI will be disabled.

Scheduler improvements

Task execution filter — HPC Pack 2016 introduces a task execution filter for Linux compute nodes to enable calling administrator-customized scripts that each time a task is executed on Linux nodes. This helps to enable scenarios such as executing tasks with an Active Directory account on Linux nodes and mounting a user’s home folder for task execution. For more information, see Get started with HPC Pack task execution filter.

Release task issue fix – HPC Pack 2016 fixes the issue that a job release task may not be executed for exclusive jobs.

Job stuck issue – HPC Pack 2016 fixes an issue that a job may be stuck in the Queued state.

SOA improvements

4 MB message limit removed — Now in SOA requests you can send requests that are larger than 4 MB in size. A large request will be split into smaller messages to persist into MSMQ, which has the 4MB message size restriction.

HoldUntil for SOA sessions — For a SOA session, users can now pause a running session by modifying a session job’s HoldUntil property to a future time.

SOA session survival during head node failover

SOA sessions can run on non-domain-joined compute nodes — For non-domain-joined compute nodes, the broker back-end binding configuration in the service registration file can be updated with None or Certificate security.

New nethttp transport scheme — The nethttp is based on WebSocket, which can greatly improve message throughput compared with basic HTTP connections.

Configurable broker dispatcher capacity — Users can specify the broker dispatcher capacity instead of the calculated cores. This achieves more accurate grow and shrink behavior if the resource type is node or socket.

Multiple SOA sessions in a shared session pool — To specify the pool size for a SOA service, add the optional configuration in the service registration file. When creating a shared SOA session with the session pool, specify both sessionStartInfo.ShareSession and sessionStartInfo.SessionPool as true . And after using this session, close it without purging to leave it in the pool.

Updated EchoClient.exe — Updates for random message size and time, flush per number of requests support, message operation (send/flush/EOM/get) timeout parameter, and new nethttp scheme support.

Extra optional parameters in ExcelClient.OpenSession method for Excel VBA — Extra parameters include jobname , projectName , anbd jobPriority .

Added GPU type support for SOA session API

Miscellaneious stability and performance fixes in SOA services

Management

Autogrow/shrink service supports Linux nodes — When HPC Pack cluster is deployed in Azure virtual machines.

New property for autogrow/shrink service — The ExcludeNodeGroup property enables you to specify the node group or node groups to exclude from automatic node starts and stops.

Built-in REST API Service – Now the REST API Service is installed on every head node instance by default.

Non-domain-joined Windows compute nodes – The cluster administrator can set up a Windows compute node which is not domain-joined. A local account will be created and used when a job is executed on this type of node.

Читайте также:  Создание сетевого подключения windows 10 через lan
Оцените статью