Windows server issues troubleshooting

Содержание
  1. Troubleshoot issues with WSUS client agents
  2. Verify that the client is configured correctly
  3. Check for issues relating to BITS
  4. BITS fails to start
  5. BITS jobs are failing
  6. Repair a corrupted BITS configuration
  7. Issues with the WSUS agent service
  8. Make sure the WSUS server is reachable from the client
  9. Rebuild the Automatic Update Agent Store
  10. Check for clients with the same SUSclient ID
  11. Troubleshoot the Windows Server Software Defined Networking Stack
  12. Error types
  13. Diagnostic tools
  14. Network controller diagnostics
  15. Hyper-V host diagnostics
  16. GitHub
  17. Troubleshooting Workflows and Guides
  18. [Hoster] Validate System Health
  19. Check network connectivity between the network controller and Hyper-V Host (NC Host Agent service)
  20. Check Host Agent services
  21. Check health of network controller
  22. Check for corresponding HostIDs and certificates between network controller and each Hyper-V Host
  23. Check the SLB Configuration State
  24. Gateway Validation
  25. [Hoster] Validate Data-Plane
  26. Check HNV Provider Logical Network Connectivity
  27. Check MTU and Jumbo Frame support on HNV Provider Logical Network
  28. Check Tenant VM NIC connectivity
  29. Specific Troubleshooting Scenarios
  30. No network connectivity between two tenant virtual machines
  31. Example
  32. Look at IP Configuration and Virtual Subnets which are referencing this ACL
  33. Logging, Tracing and advanced diagnostics
  34. Network controller centralized logging
  35. Enable logging
  36. Change logging settings
  37. Collecting Logs and Traces
  38. SLB Diagnostics
  39. SLBM Fabric errors (Hosting service provider actions)
  40. SLB Mux Tracing
  41. VFP and vSwitch Tracing

Troubleshoot issues with WSUS client agents

This article helps you diagnose and resolve issues with the Windows Server Update Services (WSUS) client agents.

Original product version: В Windows Server Update Services
Original KB number: В 10132

When you experience issues with the WSUS client agents, they can manifest themselves in many ways. Some common problems are listed here:

  • It could be an issue with the client settings for Group Policy.
  • It could be an issue with BITS.
  • It could be an issue with the WSUS agent service.
  • It could be related to a network issue that prevents the client from reaching the server.
  • It could be an issue with the Automatic Update Agent Store.
  • It could be an issue in which clients have duplicate WSUS client IDs caused by disk cloning.

Verify that the client is configured correctly

When you troubleshoot issues with a WSUS client agent, first make sure the client is properly configured. Make sure the proper Active Directory Group Policy is being received by the client, and the details of the WSUS server are present. You can do so by running the following command:

Open the text file in Notepad and find the name of your WSUS policy. For example, if your WSUS policy is named WSUS, you can find it in the GPRESULT.TXT file within the Computer Settings section under the Applied Group Policy Objects heading. Below is an example:

If the WSUS settings aren’t present, possible causes include:

  • The system doesn’t have the Group Policy from the domain.
  • The Group Policy isn’t targeted to the client system.

To fix this issue, ensure that the Group Policy is successfully updated on each client, and that the WSUS setting is properly configured.

To update the Group Policy on the client, run GPUpdate /force from a Command Prompt.

For more information about configuring Group Policy for WSUS clients, see Configure Automatic Updates by Using Group Policy.

Check for issues relating to BITS

Background Intelligent Transfer Service (BITS) is the service used by WSUS to download updates from Microsoft Update to the main WSUS server, and from WSUS servers to their clients. Some download issues may be caused by problems with BITS on the server or client computers. When you troubleshoot download problems, you should ensure that BITS is running properly on all affected computers.

The BITS service must run under the LocalSystem account by default. To configure the service to run under the correct account, follow these steps:

Open a Command Prompt and run the following command:

A space must occur between obj= and LocalSystem. If successful, you should receive the following output:

Stop and restart BITS.

To view the BITS service status, open a Command Prompt and run the following command:

If BITS is running, you should see the following output:

If BITS isn’t running, you’ll see the following output:

Usually it’s possible to resolve BITS issues by stopping the service and restarting it. To stop and restart the BITS service, run the following commands from a Command Prompt:

You must be logged on as a local administrator to stop and restart BITS.

BITS fails to start

If the BITS service fails to start, look in the event log for any BITS-related error. You can use the following table to diagnose the cause of these errors.

Error name Error code Description
ERROR_SERVICE_DOES_NOT_EXIST 0x80070424 See the section on repairing the BITS configuration below.
ERROR_SERVICE_NOT_IN_EXE 0x8007043B BITS isn’t listed as one of the services in the netsvcs svchost group
ERROR_SERVICE_DISABLED 0x80070422 BITS has been disabled. Enable the BITS service.
ERROR_SERVICE_DEPENDENCY_DELETED ERROR_SERVICE_DEPENDENCY_FAIL 0x80070433, 0x8007042c A service appearing in the BITS service dependency list cannot be started. Make sure the dependency list for the BITS service is correct:
Windows Vista: RpcSs, EventSystem (also http.sys and LanManWorkstation when peer caching is enabled)
Windows Server 2003: Rpcss, EventSystem
Windows XP: Rpcss
Windows 2000: Rpcss, SENS, Wmi
ERROR_PATH_NOT_FOUND 0x80070003 Pre-Windows Vista: %ALLUSERSPROFILE%\Microsoft\Network doesn’t exist
ERROR_FILE_NOT_FOUND 0x80070002 The Parameters key is missing. Ensure that the following keys and values exist:
HKLM\SYSTEM\CurrentControlSet\Services\BITS\Parameters\ServiceDll = %SystemRoot%\System32\qmgr.dll
REGDB_E_CLASSNOTREG, EVENT_E_INTERNALERROR 0x80040154, 0x80040206 BITS for Windows 2000 is dependent on SENS and EventSystem services. If the COM+ catalog is corrupted, BITS may fail with this error code.

BITS jobs are failing

If the client is properly configured to receive updates, BITS is configured correctly, and BITS appears to start and run properly, you may be experiencing an issue where BITS jobs themselves are failing. To verify it, look in the event log for any BITS-related errors. You can use the following table to diagnose the cause of these errors.

Error name Error code Description
E_INVALIDARG 0x80070057 An incorrect proxy server name was specified in the user’s Internet Explorer proxy settings. This error is also seen when credentials are supplied for authentication schemes that aren’t NTLM/Negotiate, but the user name or password is null. Change the user’s Internet Explorer proxy settings to be a valid proxy server. Or change the credentials not to be NULL user name/password for schemes other than NTLM/Negotiate.
ERROR_WINHTTP_NAME_NOT_RESOLVED 0x80072ee7 The server/proxy could not be resolved by BITS. Internet Explorer on the same machine in the context of the job owner would see the same problem. Try downloading the same file via the web browser using the context of the job owner.
ERROR_HTTP_INVALID_SERVER_RESPONSE 0x80072f78 It’s a transient error and the job will continue downloading.
BG_E_INSUFFICIENT_RANGE_SUPPORT 0x80200013 BITS uses range headers in HTTP requests to request parts of a file. If the server or proxy server doesn’t understand range requests and returns the full file instead of the requested range, BITS puts the job into the ERROR state with this error. Capture the network traffic during the error and examine if HTTP GET requests with Range header are getting valid responses. Check proxy servers to ensure that they are configured correctly to support Range requests.
BG_E_MISSING_FILE_SIZE 0x80200011 When BITS sends a HEAD request and the server/proxy doesn’t return Content-Length header in the response, BITS puts the job in ERROR state with this error. Check the proxy server and WSUS server to ensure that they are configured correctly. Some versions of the Apache 2.0 proxy server are known to exhibit this behavior.
BG_E_HTTP_ERROR_403 0x80190193 When the server returns HTTP 403 response in any of the requests, BITS puts the job in ERROR state with this error code. HTTP 403 corresponds to Forbidden: Access is denied. Check access permissions for the account running the job.
ERROR_NOT_LOGGED_ON 0x800704dd The SENS service isn’t receiving user logon notifications. BITS (version 2.0 and later) depends on logon notifications from Service Control Manager, which in turn depends on the SENS service. Ensure that the SENS service is started and running correctly.

Repair a corrupted BITS configuration

To repair corrupted BITS service configuration, you can enter the BITS service configuration manually.

This action should only be taken in circumstances where all other troubleshooting attempts have failed. You must be an administrator to modify the BITS configuration.

To repair a corrupted BITS configuration, follow these steps:

Open a Command Prompt.

Enter the following commands, press ENTER after you type each command:

Stop and restart BITS.

Issues with the WSUS agent service

Make sure that the Windows Update service can start successfully.

To view the current status of the Windows Update service, open a Command Prompt and run the following command:

If WUAUSERV is running, you should see the following output:

If WUAUSERV isn’t running, you see the following output:

Verify that you can start the WUAUSERV service successfully. You must be logged on as a local administrator to stop and restart WUAUSERV.

To start the WUAUSERV service, run the following commands from a Command Prompt:

If the client agent fails to start and run properly, check the Windows Update Agent version. If the agent isn’t up to date, update the Windows Update Agent to the latest version.

After you run the fix or update the agent, run wuauclt /detectnow . Check windowsupdate.log to make sure there’s no issues.

Make sure the WSUS server is reachable from the client

Make sure that you can access the URL http:// /iuident.cab and download the file without errors.

If the WSUS server is unreachable from the client, the most likely causes include:

  • There’s a name resolution issue on the client.
  • There’s a network-related issue, such as a proxy configuration issue.

Use standard troubleshooting procedures to verify name resolution is working on the network. If name resolution is working, the next step is to check for proxy issues. Check windowsupdate.log (C:\windows) to see if there are any proxy related errors. You can run the proxycfg command to check the WinHTTP proxy settings.

If there are proxy errors, go to Internet Explorer > Tools > Connections > LAN Settings, configure the correct proxy, and then make sure you can access the WSUS URL specified.

Once done, you can copy these user proxy settings to the WinHTTP proxy settings by using the proxycfg -u command. After the proxy settings are specified, run wuauclt /detectnow from a Command Prompt and check windowsupdate.log for errors.

Rebuild the Automatic Update Agent Store

When there are issues downloading updates and there are errors relating to the software distribution store, complete the following steps on the client:

  • Stop the Automatic Updates service by running sc stop wuauserv from a Command Prompt.
  • Rename the software distribution folder (for example, C:\Windows\SoftwareDistribution).
  • Restart the Automatic Update service by running sc start wuauserv from a Command Prompt.
  • From a Command Prompt, run wuauclt /resetauthorization /detectnow .
  • From a Command Prompt, run wuauclt /reportnow .

Check for clients with the same SUSclient ID

You may experience an issue where only one WSUS client appears in the console. Or you may notice that out of a group of clients, only one appears in the console at a time but the exact one that does appear may change over time. This issue can happen when systems are imaged and the clients end up having the same SUSclientID .

For those clients that aren’t working properly because of the same SUSclientID , complete the following steps:

Stop the Automatic Updates service by running sc stop wuauserv from a Command Prompt.

Delete the SUSclientID registry key from the following location:

Restart the Automatic Update service by running sc start wuauserv from a Command Prompt.

From a Command Prompt, run wuauclt /resetauthorization /detectnow .

From a Command Prompt, run wuauclt /reportnow .

Troubleshoot the Windows Server Software Defined Networking Stack

Applies to: Windows Server 2019, Windows Server 2016

This guide examines the common Software Defined Networking (SDN) errors and failure scenarios and outlines a troubleshooting workflow that leverages the available diagnostic tools.

For more information about Microsoft’s Software Defined Networking, see Software Defined Networking.

Error types

The following list represents the class of problems most often seen with Hyper-V Network Virtualization (HNVv1) in Windows Server 2012 R2 from in-market production deployments and coincides in many ways with the same types of problems seen in Windows Server 2016 HNVv2 with the new Software Defined Network (SDN) Stack.

Most errors can be classified into a small set of classes:

Invalid or unsupported configuration A user invokes the NorthBound API incorrectly or with invalid policy.

Error in policy application Policy from Network Controller was not delivered to a Hyper-V Host, significantly delayed and / or not up to date on all Hyper-V hosts (for example, after a Live Migration).

Configuration drift or software bug Data-path issues resulting in dropped packets.

External error related to NIC hardware / drivers or the underlay network fabric Misbehaving task offloads (such as VMQ) or underlay network fabric misconfigured (such as MTU)

This troubleshooting guide examines each of these error categories and recommends best practices and diagnostic tools available to identify and fix the error.

Diagnostic tools

Before discussing the troubleshooting workflows for each of these type of errors, let’s examine the diagnostic tools available.

To use the Network Controller (control-path) diagnostic tools, you must first install the RSAT-NetworkController feature and import the NetworkControllerDiagnostics module:

To use the HNV Diagnostics (data-path) diagnostic tools, you must import the HNVDiagnostics module:

Network controller diagnostics

These cmdlets are documented on TechNet in the Network Controller Diagnostics Cmdlet Topic. They help identify problems with network policy consistency in the control-path between Network Controller nodes and between the Network Controller and the NC Host Agents running on the Hyper-V hosts.

The Debug-ServiceFabricNodeStatus and Get-NetworkControllerReplica cmdlets must be run from one of the Network Controller node virtual machines. All other NC Diagnostic cmdlets can be run from any host which has connectivity to the Network Controller and is in either in the Network Controller Management security group (Kerberos) or has access to the X.509 certificate for managing the Network Controller.

Hyper-V host diagnostics

These cmdlets are documented on TechNet in the Hyper-V Network Virtualization (HNV) Diagnostics Cmdlet Topic. They help identify problems in the data-path between tenant virtual machines (East/West) and ingress traffic through an SLB VIP (North/South).

The Debug-VirtualMachineQueueOperation, Get-CustomerRoute, Get-PACAMapping, Get-ProviderAddress, Get-VMNetworkAdapterPortId, Get-VMSwitchExternalPortId, and Test-EncapOverheadSettings are all local tests which can be run from any Hyper-V host. The other cmdlets invoke data-path tests through the Network Controller and therefore need access to the Network Controller as descried above.

GitHub

The Microsoft/SDN GitHub Repo has a number of sample scripts and workflows which build on top of these in-box cmdlets. In particular, diagnostic scripts can be found in the Diagnostics folder. Please help us contribute to these scripts by submitting Pull Requests.

Troubleshooting Workflows and Guides

[Hoster] Validate System Health

There is an embedded resource named Configuration State in several of the Network Controller resources. Configuration state provides information about system health including the consistency between the network controller’s configuration and the actual (running) state on the Hyper-V hosts.

To check configuration state, run the following from any Hyper-V host with connectivity to the Network Controller.

The value for the NetworkController parameter should either be the FQDN or IP address based on the subject name of the X.509 >certificate created for Network Controller.

The Credential parameter only needs to be specified if the network controller is using Kerberos authentication (typical in VMM deployments). The credential must be for a user who is in the Network Controller Management Security Group.

A sample Configuration State message is shown below:

There is a bug in the system where the Network Interface resources for the SLB Mux Transit VM NIC are in a Failure state with error «Virtual Switch — Host Not Connected To Controller». This error can be safely ignored if the IP configuration in the VM NIC resource is set to an IP Address from the Transit Logical Network’s IP Pool. There is a second bug in the system where the Network Interface resources for the Gateway HNV Provider VM NICs are in a Failure state with error «Virtual Switch — PortBlocked». This error can also be safely ignored if the IP configuration in the VM NIC resource is set to null (by design).

The table below shows the list of error codes, messages, and follow-up actions to take based on the configuration state observed.

Code Message Action
Unknown Unknown error
HostUnreachable The host machine is not reachable Check the Management network connectivity between Network Controller and Host
PAIpAddressExhausted The PA Ip addresses exhausted Increase the HNV Provider logical subnet’s IP Pool Size
PAMacAddressExhausted The PA Mac addresses exhausted Increase the Mac Pool Range
PAAddressConfigurationFailure Failed to plumb PA addresses to the host Check the Management network connectivity between Network Controller and Host.
CertificateNotTrusted Certificate is not trusted Fix the certificates used for communication with the host.
CertificateNotAuthorized Certificate not authorized Fix the certificates used for communication with the host.
PolicyConfigurationFailureOnVfp Failure in configuring VFP policies This is a runtime failure. No definite work arounds. Collect logs.
PolicyConfigurationFailure Failure in pushing policies to the hosts, due to communication failures or others error in the NetworkController. No definite actions. This is due to failure in goal state processing in the Network Controller modules. Collect logs.
HostNotConnectedToController The Host is not yet connected to the Network Controller Port Profile not applied on the host or the host is not reachable from the Network Controller. Validate that HostID registry key matches the Instance ID of the server resource
MultipleVfpEnabledSwitches There are multiple VFp enabled Switches on the host Delete one of the switches, since Network Controller Host Agent only supports one vSwitch with the VFP extension enabled
PolicyConfigurationFailure Failed to push VNet policies for a VmNic due to certificate errors or connectivity errors Check if proper certificates have been deployed (Certificate subject name must match FQDN of host). Also verify the host connectivity with the Network Controller
PolicyConfigurationFailure Failed to push vSwitch policies for a VmNic due to certificate errors or connectivity errors Check if proper certificates have been deployed (Certificate subject name must match FQDN of host). Also verify the host connectivity with the Network Controller
PolicyConfigurationFailure Failed to push Firewall policies for a VmNic due to certificate errors or connectivity errors Check if proper certificates have been deployed (Certificate subject name must match FQDN of host). Also verify the host connectivity with the Network Controller
DistributedRouterConfigurationFailure Failed to configure the Distributed router settings on the host vNic TCPIP stack error. May require cleaning up the PA and DR Host vNICs on the server on which this error was reported
DhcpAddressAllocationFailure DHCP address allocation failed for a VMNic Check if the static IP address attribute is configured on the NIC resource
CertificateNotTrusted
CertificateNotAuthorized
Failed to connect to Mux due to network or cert errors Check the numeric code provided in the error message code: this corresponds to the winsock error code. Certificate errors are granular (for example, cert cannot be verified, cert not authorized, etc.)
HostUnreachable MUX is Unhealthy (Common case is BGPRouter disconnected) BGP peer on the RRAS (BGP virtual machine) or Top-of-Rack (ToR) switch is unreachable or not peering successfully. Check BGP settings on both Software Load Balancer Multiplexer resource and BGP peer (ToR or RRAS virtual machine)
HostNotConnectedToController SLB host agent is not connected Check that SLB Host Agent service is running; Refer to SLB host agent logs (auto running) for reasons why, in case SLBM (NC) rejected the cert presented by the host agent running state will show nuanced information
PortBlocked The VFP port is blocked, due to lack of VNET / ACL policies Check if there are any other errors, which might cause the policies to be not configured.
Overloaded Loadbalancer MUX is overloaded Performance issue with MUX
RoutePublicationFailure Loadbalancer MUX is not connected to a BGP router Check if the MUX has connectivity with the BGP routers and that BGP peering is setup correctly
VirtualServerUnreachable Loadbalancer MUX is not connected to SLB manager Check connectivity between SLBM and MUX
QosConfigurationFailure Failed to configure QOS policies See if sufficient bandwidth is available for all VM’s if QOS reservation is used

Check network connectivity between the network controller and Hyper-V Host (NC Host Agent service)

Run the netstat command below to validate that there are three ESTABLISHED connections between the NC Host Agent and the Network Controller node(s) and one LISTENING socket on the Hyper-V Host

  • LISTENING on port TCP:6640 on Hyper-V Host (NC Host Agent Service)
  • Two established connections from Hyper-V host IP on port 6640 to NC node IP on ephemeral ports (> 32000)
  • One established connection from Hyper-V host IP on ephemeral port to Network Controller REST IP on port 6640

There may only be two established connections on a Hyper-V host if there are no tenant virtual machines deployed on that particular host.

Check Host Agent services

The network controller communicates with two host agent services on the Hyper-V hosts: SLB Host Agent and NC Host Agent. It is possible that one or both of these services is not running. Check their state and restart if they’re not running.

Check health of network controller

If there are not three ESTABLISHED connections or if the Network Controller appears unresponsive, check to see that all nodes and service modules are up and running by using the following cmdlets.

The network controller service modules are:

  • ControllerService
  • ApiService
  • SlbManagerService
  • ServiceInsertion
  • FirewallService
  • VSwitchService
  • GatewayManager
  • FnmService
  • HelperService
  • UpdateService

Check that the Replica Status is Ready for each service.

Check for corresponding HostIDs and certificates between network controller and each Hyper-V Host

On a Hyper-V Host, run the following commands to check that the HostID corresponds to the Instance Id of a server resource on the Network Controller

Remediation If using SDNExpress scripts or manual deployment, update the HostId key in the registry to match the Instance Id of the server resource. Restart the Network Controller Host Agent on the Hyper-V host (physical server) If using VMM, delete the Hyper-V Server from VMM and remove the HostId registry key. Then, re-add the server through VMM.

Check that the thumbprints of the X.509 certificates used by the Hyper-V host (the hostname will be the cert’s Subject Name) for (SouthBound) communication between the Hyper-V Host (NC Host Agent service) and Network Controller nodes are the same. Also check that the Network Controller’s REST certificate has subject name of CN=.

You can also check the following parameters of each cert to make sure the subject name is what is expected (hostname or NC REST FQDN or IP), the certificate has not yet expired, and that all certificate authorities in the certificate chain are included in the trusted root authority.

  • Subject Name
  • Expiration Date
  • Trusted by Root Authority

Remediation If multiple certificates have the same subject name on the Hyper-V host, the Network Controller Host Agent will randomly choose one to present to the Network Controller. This may not match the thumbprint of the server resource known to the Network Controller. In this case, delete one of the certificates with the same subject name on the Hyper-V host and then re-start the Network Controller Host Agent service. If a connection can still not be made, delete the other certificate with the same subject name on the Hyper-V Host and delete the corresponding server resource in VMM. Then, re-create the server resource in VMM which will generate a new X.509 certificate and install it on the Hyper-V host.

Check the SLB Configuration State

The SLB Configuration State can be determined as part of the output to the Debug-NetworkController cmdlet. This cmdlet will also output the current set of Network Controller resources in JSON files, all IP configurations from each Hyper-V host (server) and local network policy from Host Agent database tables.

Additional traces will be collected by default. To not collect traces, add the -IncludeTraces:$false parameter.

The default output location will be the \NCDiagnostics\ directory. The default output directory can be changed by using the -OutputDirectory parameter.

The SLB Configuration State information can be found in the diagnostics-slbstateResults.Json file in this directory.

This JSON file can be broken down into the following sections:

  • Fabric
    • SlbmVips — This section lists the IP address of the SLB Manager VIP address which is used by the Network Controller to coodinate configuration and health between the SLB Muxes and SLB Host Agents.
    • MuxState — This section will list one value for each SLB Mux deployed giving the state of the mux
    • Router Configuration — This section will list the Upstream Router’s (BGP Peer) Autonomous System Number (ASN), Transit IP Address, and ID. It will also list the SLB Muxes ASN and Transit IP.
    • Connected Host Info — This section will list the Management IP address all of the Hyper-V hosts available to run load-balanced workloads.
    • Vip Ranges — This section will list the public and private VIP IP pool ranges. The SLBM VIP will be included as an allocated IP from one of these ranges.
    • Mux Routes — This section will list one value for each SLB Mux deployed containing all of the Route Advertisements for that particular mux.
  • Tenant
    • VipConsolidatedState — This section will list the connectivity state for each Tenant VIP including advertised route prefix, Hyper-V Host and DIP endpoints.

SLB State can be ascertained directly by using the DumpSlbRestState script available on the Microsoft SDN GitHub repository.

Gateway Validation

From Network Controller:

From Gateway VM:

From Top of Rack (ToR) Switch:

sh ip bgp summary (for 3rd party BGP Routers)

Windows BGP Router

In addition to these, from the issues we have seen so far (especially on SDNExpress based deployments), the most common reason for Tenant Compartment not getting configured on GW VMs seem to be the fact that the GW Capacity in FabricConfig.psd1 is less compared to what folks try to assign to the Network Connections (S2S Tunnels) in TenantConfig.psd1. This can be checked easily by comparing outputs of the following commands:

[Hoster] Validate Data-Plane

After the Network Controller has been deployed, tenant virtual networks and subnets have been created, and VMs have been attached to the virtual subnets, additional fabric level tests can be performed by the hoster to check tenant connectivity.

Check HNV Provider Logical Network Connectivity

After the first guest VM running on a Hyper-V host has been connected to a tenant virtual network, the Network Controller will assign two HNV Provider IP Addresses (PA IP Addresses) to the Hyper-V Host. These IPs will come from the HNV Provider logical network’s IP Pool and be managed by the Network Controller. To find out what these two HNV IP Addresses are ‘s

These HNV Provider IP Addresses (PA IPs) are assigned to Ethernet Adapters created in a separate TCPIP network compartment and have an adapter name of VLANX where X is the VLAN assigned to the HNV Provider (transport) logical network.

Connectivity between two Hyper-V hosts using the HNV Provider logical network can be done by a ping with an additional compartment (-c Y) parameter where Y is the TCPIP network compartment in which the PAhostVNICs are created. This compartment can be determined by executing:

The PA Host vNIC Adapters are not used in the data-path and so do not have an IP assigned to the «vEthernet (PAhostVNic) adapter».

For instance, assume that Hyper-V hosts 1 and 2 have HNV Provider (PA) IP Addresses of:

Hyper-V Host PA IP Address 1 PA IP Address 2
Host 1 10.10.182.64 10.10.182.65
Host 2 10.10.182.66 10.10.182.67

we can ping between the two using the following command to check HNV Provider logical network connectivity.

Remediation If HNV Provider ping does not work, check your physical network connectivity including VLAN configuration. The physical NICs on each Hyper-V host should be in trunk mode with no specific VLAN assigned. The Management Host vNIC should be isolated to the Management Logical Network’s VLAN.

Check MTU and Jumbo Frame support on HNV Provider Logical Network

Another common problem in the HNV Provider logical network is that the physical network ports and/or Ethernet card do not have a large enough MTU configured to handle the overhead from VXLAN (or NVGRE) encapsulation.

Some Ethernet cards and drivers support the new *EncapOverhead keyword which will automatically be set by the Network Controller Host Agent to a value of 160. This value will then be added to the value of the *JumboPacket keyword whose summation is used as the advertised MTU. e.g. *EncapOverhead = 160 and *JumboPacket = 1514 => MTU = 1674B

To test whether or not the HNV Provider logical network supports the larger MTU size end-to-end, use the Test-LogicalNetworkSupportsJumboPacket cmdlet:

  • Adjust the MTU size on the physical switch ports to be at least 1674B (including 14B Ethernet header and trailer)
  • If your NIC card does not support the EncapOverhead keyword, adjust the JumboPacket keyword to be at least 1674B

Check Tenant VM NIC connectivity

Each VM NIC assigned to a guest VM has a CA-PA mapping between the private Customer Address (CA) and the HNV Provider Address (PA) space. These mappings are kept in the OVSDB server tables on each Hyper-V host and can be found by executing the following cmdlet.

If the CA-PA mappings you expect are not output for a given tenant VM, please check the VM NIC and IP Configuration resources on the Network Controller using the Get-NetworkControllerNetworkInterface cmdlet. Also, check the established connections between the NC Host Agent and Network Controller nodes.

With this information, a tenant VM ping can now be initiated by the Hoster from the Network Controller using the Test-VirtualNetworkConnection cmdlet.

Specific Troubleshooting Scenarios

The following sections provide guidance for troubleshooting specific scenarios.

No network connectivity between two tenant virtual machines

  1. [Tenant] Ensure Windows Firewall in tenant virtual machines is not blocking traffic.
  2. [Tenant] Check that IP addresses have been assigned to the tenant virtual machine by running ipconfig.
  3. [Hoster] Run Test-VirtualNetworkConnection from the Hyper-V host to validate connectivity between the two tenant virtual machines in question.

The VSID refers to the Virtual Subnet ID. In the case of VXLAN, this is the VXLAN Network Identifier (VNI). You can find this value by running the Get-PACAMapping cmdlet.

Example

Create CA-ping between «Green Web VM 1» with SenderCA IP of 192.168.1.4 on Host «sa18n30-2.sa18.nttest.microsoft.com» with Mgmt IP of 10.127.132.153 to ListenerCA IP of 192.168.1.5 both attached to Virtual Subnet (VSID) 4114.

  1. [Tenant] Check that there is no distributed firewall policies specified on the virtual subnet or VM network interfaces which would block traffic.

Query the Network Controller REST API found in demo environment at sa18n30nc in the sa18.nttest.microsoft.com domain.

Look at IP Configuration and Virtual Subnets which are referencing this ACL

  1. [Hoster] Run Get-ProviderAddress on both Hyper-V hosts hosting the two tenant virtual machines in question and then run Test-LogicalNetworkConnection or ping -c from the Hyper-V host to validate connectivity on the HNV Provider logical network
  2. [Hoster] Ensure that the MTU settings are correct on the Hyper-V hosts and any Layer-2 switching devices in between the Hyper-V Hosts. Run Test-EncapOverheadValue on all Hyper-V hosts in question. Also check that all Layer-2 switches in between have MTU set to least 1674 bytes to account for maximum overhead of 160 bytes.
  3. [Hoster] If PA IP Addresses are not present and/or CA Connectivity is broken, check to ensure network policy has been received. Run Get-PACAMapping to see if the encapsulation rules and CA-PA mappings required for creating overlay virtual networks are correctly established.
  4. [Hoster] Check that the Network Controller Host Agent is connected to the Network Controller. Run netstat -anp tcp |findstr 6640 to see if the
  5. [Hoster] Check that the Host ID in HKLM/ matches the Instance ID of the server resources hosting the tenant virtual machines.
  6. [Hoster] Check that the Port Profile ID matches the Instance ID of the VM Network Interfaces of the tenant virtual machines.

Logging, Tracing and advanced diagnostics

The following sections provide information on advanced diagnostics, logging, and tracing.

Network controller centralized logging

The Network Controller can automatically collect debugger logs and store them in a centralized location. Log collection can be enabled when you deploy the Network Controller for the first time or any time later. The logs are collected from the Network Controller, and network elements managed by Network Controller: host machines, software load balancers (SLB) and gateway machines.

These logs include debug logs for the Network Controller cluster, the Network Controller application, gateway logs, SLB, virtual networking and the distributed firewall. Whenever a new host/SLB/gateway is added to the Network Controller, logging is started on those machines. Similarly, when a host/SLB/gateway is removed from the Network Controller, logging is stopped on those machines.

Enable logging

Logging is automatically enabled when you install the Network Controller cluster using the Install-NetworkControllerCluster cmdlet. By default, the logs are collected locally on the Network Controller nodes at %systemdrive%\SDNDiagnostics. It is STRONGLY RECOMMENDED that you change this location to be a remote file share (not local).

The Network Controller cluster logs are stored at %programData%\Windows Fabric\log\Traces. You can specify a centralized location for log collection with the DiagnosticLogLocation parameter with the recommendation that this is also be a remote file share.

If you want to restrict access to this location, you can provide the access credentials with the LogLocationCredential parameter. If you provide the credentials to access the log location, you should also provide the CredentialEncryptionCertificate parameter, which is used to encrypt the credentials stored locally on the Network Controller nodes.

With the default settings, it is recommended that you have at least 75 GB of free space in the central location, and 25 GB on the local nodes (if not using a central location) for a 3-node Network Controller cluster.

Change logging settings

You can change logging settings at any time using the Set-NetworkControllerDiagnostic cmdlet. The following settings can be changed:

  • Centralized log location. You can change the location to store all the logs, with the DiagnosticLogLocation parameter.
  • Credentials to access log location. You can change the credentials to access the log location, with the LogLocationCredential parameter.
  • Move to local logging. If you have provided centralized location to store logs, you can move back to logging locally on the Network Controller nodes with the UseLocalLogLocation parameter (not recommended due to large disk space requirements).
  • Logging scope. By default, all logs are collected. You can change the scope to collect only Network Controller cluster logs.
  • Logging level. The default logging level is Informational. You can change it to Error, Warning, or Verbose.
  • Log Aging time. The logs are stored in a circular fashion. You will have 3 days of logging data by default, whether you use local logging or centralized logging. You can change this time limit with LogTimeLimitInDays parameter.
  • Log Aging size. By default, you will have a maximum 75 GB of logging data if using centralized logging and 25 GB if using local logging. You can change this limit with the LogSizeLimitInMBs parameter.

Collecting Logs and Traces

VMM deployments use centralized logging for the Network Controller by default. The file share location for these logs is specified when deploying the Network Controller service template.

If a file location has not been specified, local logging will be used on each Network Controller node with logs saved under C:\Windows\tracing\SDNDiagnostics. These logs are saved using the following hierarchy:

  • CrashDumps
  • NCApplicationCrashDumps
  • NCApplicationLogs
  • PerfCounters
  • SDNDiagnostics
  • Traces

The Network Controller uses (Azure) Service Fabric. Service Fabric logs may be required when troubleshooting certain issues. These logs can be found on each Network Controller node at C:\ProgramData\Microsoft\Service Fabric.

If a user has run the Debug-NetworkController cmdlet, additional logs will be available on each Hyper-V host which has been specified with a server resource in the Network Controller. These logs (and traces if enabled) are kept under C:\NCDiagnostics

SLB Diagnostics

SLBM Fabric errors (Hosting service provider actions)

  1. Check that Software Load Balancer Manager (SLBM) is functioning and that the orchestration layers can talk to each other: SLBM -> SLB Mux and SLBM -> SLB Host Agents. Run DumpSlbRestState from any node with access to Network Controller REST Endpoint.
  2. Validate the SDNSLBMPerfCounters in PerfMon on one of the Network Controller node VMs (preferably the primary Network Controller node — Get-NetworkControllerReplica):
    1. Is Load Balancer (LB) engine connected to SLBM? (SLBM LBEngine Configurations Total > 0)
    2. Does SLBM at least know about its own endpoints? (VIP Endpoints Total >= 2 )
    3. Are Hyper-V (DIP) hosts connected to SLBM? (HP clients connected == num servers)
    4. Is SLBM connected to Muxes? (Muxes Connected == Muxes Healthy on SLBM == Muxes reporting healthy = # SLB Muxes VMs).
  3. Ensure the BGP router configured is successfully peering with the SLB MUX
    1. If using RRAS with Remote Access (i.e. BGP virtual machine):
      1. Get-BgpPeer should show connected
      2. Get-BgpRouteInformation should show at least a route for the SLBM self VIP
    2. If using physical Top-of-Rack (ToR) switch as BGP Peer, consult your documentation
      1. For example: # show bgp instance
  4. Validate the SlbMuxPerfCounters and SLBMUX counters in PerfMon on the SLB Mux VM
  5. Check configuration state and VIP ranges in Software Load Balancer Manager Resource
    1. Get-NetworkControllerLoadBalancerConfiguration -ConnectionUri netstat -anp tcp |findstr 6640)
  6. Check HostId in nchostagent service regkey (reference HostNotConnected error code in the Appendix) matches the corresponding server resource’s instance Id ( Get-NCServer |convertto-json -depth 8 )
  7. Check port profile id for virtual machine port matches corresponding virtual machine NIC resource’s Instance Id
  • [Hosting provider] Collect logs
  • SLB Mux Tracing

    Information from the Software Load Balancer Muxes can also be determined through Event Viewer.

    1. Click on «Show Analytic and Debug Logs» under the Event Viewer View menu
    2. Navigate to «Applications and Services Logs» > Microsoft > Windows > SlbMuxDriver > Trace in Event Viewer
    3. Right click on it and select «Enable Log»

    It is recommended that you only have this logging enabled for a short time while you are trying to reproduce a problem

    VFP and vSwitch Tracing

    From any Hyper-V host which is hosting a guest VM attached to a tenant virtual network, you can collected a VFP trace to determine where problems might lie.

    Читайте также:  Что за код ошибки windows 0x0000001a
    Оцените статью