About Me

My photo
Chennai, TamilNadu, India
I am Nageswari Passionate on Exchange server Messaging technology..Love my parents like anything & everything.

Cluster Verification tolls & wizards

Posted by Nageswari Vijayakumar | Posted in | Posted on 1:51 AM

What is cluster validation?
With the cluster validation wizard, you can run a set of focused tests on a collection of servers that are planned for use as cluster nodes. The cluster validation process tests the underlying hardware and software directly, and individually, to obtain an accurate assessment of how well failover clustering can be supported on a given configuration.
Important
Before you create a failover cluster, we strongly recommend that you run all tests in the cluster validation wizard.

Cluster validation is intended to catch hardware or configuration problems before the cluster goes into production. Cluster validation helps to ensure that the solution you are about to deploy is truly dependable. Cluster validation can also be performed on configured failover clusters as a diagnostic tool.
Considerations for performing cluster validation on an existing cluster
When you perform cluster validation on an already configured cluster, you might not always run all tests. If you include storage tests in the set of tests you run, there are different considerations to keep in mind than if you do not include storage tests. This section outlines the main considerations:
• Considerations when including storage tests: When cluster validation is performed on an already configured cluster, if the default tests (which include storage tests) are selected, only disk resources that are in an Offline state or are not assigned to a clustered service or application will be used for testing the storage. This builds in a safety mechanism, and the cluster validation wizard warns you when storage tests have been selected but will not run on storage in an Online state, that is, storage used by clustered services or applications. This is by design to avoid disruption to highly available services or applications that depend upon these disk resources being online.

One scenario where Microsoft CSS may request you to run the validation on production clusters is when there is a cluster storage failure that could be caused by some underlying storage configuration change or failure. By default, the wizard warns you if storage tests have been selected but will not be run on storage that is online, that is, storage used by clustered services or applications. In this situation, you can perform a valid test by creating or choosing a new logical unit number (LUN) from the same shared storage device and presenting it to all nodes. By testing this LUN, you can avoid disruption to clustered services and applications already online within the cluster and still test the underlying storage subsystem.

If a failover cluster passed the full set of validation tests and has no future hardware or software changes, then it will continue to be a supported configuration. However, when you perform routine updates to software components such as drivers and firmware, it may be necessary to re-run the validation wizard to ensure that the current configuration of the failover cluster is supported. The following guidelines can help in this process:
• All components of the storage stack must be identical across all nodes in the cluster. These components consist of the HBA and HBA drivers and firmware, multi-path I/O software, and Device Specific Module (DSM) components.
• To minimize impact to highly available applications and services, a best practice is to keep a small LUN available to allow the validation wizard to run tests on available storage without negatively impacting clustered services and applications. This way, if Microsoft CSS requests you to run a full set of cluster validation tests, the wizard will follow the default behavior and run tests on the available storage (the new LUN only).
• Considerations when not including storage tests: System configuration tests, inventory tests, and network tests have very low overhead, and can be performed without significant effect on servers in a cluster.

Microsoft CSS may request you to run the cluster validation on a production cluster as part of normal troubleshooting procedures (not focused on storage). In this scenario, you will use the wizard to inventory hardware and software, perform network testing, and validate system configuration. There may be certain scenarios in which only a subset of the full tests are needed. For example, if troubleshooting a problem with networking on a production cluster, Microsoft CSS may request that you run only the hardware and software inventory and the network tests.
How to provide a validation report when obtaining support from Microsoft
Microsoft will help you collect the validation report through the Microsoft Support Diagnostic Tool (MSDT), which is the replacement for the MPSReports data collection utility. Microsoft CSS will send the MSDT via e-mail with instructions on how to capture the data. In some situations, Microsoft CSS may request that the contents of the C:\Windows\Cluster\Reports folder be zipped and sent in for analysis. Either method will collect the required cluster validation report.
How to run the cluster validation wizard for a failover cluster
To validate a new or existing failover cluster
1. Identify the server or servers that you want to test and confirm that the failover cluster feature is installed:
• If the cluster does not yet exist, choose the servers that you want to include in the cluster, and make sure you have installed the failover cluster feature on those servers. To install the feature, on a server running Windows Server 2008 or Windows Server 2008 R2, click Start, click Administrative Tools, click Server Manager, and under Features Summary, click Add Features. Use the Add Features wizard to add the Failover Clustering feature.
• If the cluster already exists, make sure that you know the name of the cluster or a node in the cluster.
2. Review network or storage hardware that you want to validate, to confirm that it is connected to the servers. For more information, see http://go.microsoft.com/fwlink/?LinkId=111555.
3. Decide whether you want to run all or only some of the available validation tests. For detailed information about the tests, see the topics listed in http://go.microsoft.com/fwlink/?LinkId=111554.
The following guidelines can help you decide whether to run all tests:
• For a planned cluster with all hardware connected: Run all tests.
• For a planned cluster with parts of the hardware connected: Run System Configuration tests, Inventory tests, and tests that apply to the hardware that is connected (that is, Network tests if the network is connected or Storage tests if the storage is connected).
• For a cluster to which you plan to add a server: Run all tests. Before you run them, be sure to connect the networks and storage for all servers that you plan to have in the cluster.
• For troubleshooting an existing cluster: If you are troubleshooting an existing cluster, you might run all tests, although you could run only the tests that relate to the apparent issue.
Important
If a clustered service or application is using a disk when you start the wizard, the wizard will prompt you about whether to take that clustered service or application offline for the purposes of testing. If you choose to take a clustered service or application offline, it will remain offline until the tests finish.
4. In the failover cluster snap-in, in the console tree, make sure Failover Cluster Management is selected and then, under Management, click Validate a Configuration.

5. Follow the instructions in the wizard to specify the servers and the tests, and run the tests.
Note that when you run the cluster validation wizard on unclustered servers, you must enter the names of all the servers you want to test, not just one.
The Summary page appears after the tests run.
6. While still on the Summary page, click View Report to view the test results.
To view the results of the tests after you close the wizard, see SystemRoot\Cluster\Reports\Validation Report date and time.html where SystemRoot is the folder in which the operating system is installed (for example, C:\Windows).
7. To view Help topics that will help you interpret the results, click More about cluster validation tests.
To view Help topics about cluster validation after you close the wizard, in the failover cluster snap-in, click Help, click Help Topics, click the Contents tab, expand the contents for the failover cluster Help, and click Validating a Failover Cluster Configuration.
Understanding validation results
After the validation wizard has completed, the Summary Report will display the results. All tests must pass with either a green check mark or in some cases a yellow triangle (warning). The following table shows the symbols in the summary and tells what they mean:

Symbol Meaning
The corresponding validation test passed, indicating that this aspect of the cluster can be supported.
The corresponding validation test produced a warning, indicating that this aspect of the cluster can be supported, but it might not meet the recommended best practices and should be reviewed. Microsoft CSS might ask you to investigate or address the problem if it appears to be directly linked to the issue that you are troubleshooting.
The corresponding validation test failed, and this aspect of the cluster is not supported. You must correct the problem before you can create a failover cluster that is supported.
The corresponding validation test was canceled. This can occur when the test depended on another test that did not complete successfully.
When looking for problem areas (red Xs or yellow question marks), in the part of the report that summarizes the test results, click an individual test to review the details. Also review the summary statement for information about whether or not the cluster is a supported configuration.
After you take action to correct the problem, you can rerun the wizard as needed to confirm that the configuration passes the tests.
What to do if validation tests fail
In most cases, if any tests in the cluster validation wizard fail, then Microsoft does not consider the solution to be supported. There are exceptions to this rule, such as the case with multi-site (geographically dispersed) clusters where there is no shared storage. In this scenario the expected result of the validation wizard is that the storage tests will fail. This is still a supported solution if the remainder of the tests complete successfully.
The type of test that fails is a guideline to the corrective action to take. For example, if the storage test "List all disks" fails, and subsequent storage tests do not run (because these would also fail), contact the storage vendor to troubleshoot. Similarly, if a network test related to IP addresses fails, consult with your network infrastructure team. Not all warnings or errors indicate a need to call Microsoft CSS. Most of the warnings or errors should result in working with internal teams or with a specific hardware vendor.
For information about correcting failures listed in a validation report, see the previous section, Understanding validation results.
After the issues have been addressed and resolved, it is necessary to re-run the cluster validation wizard. It is required in order to be a supported configuration that all tests are run and completed successfully without failures.
Multi-site or geographically dispersed clusters
Failover cluster solutions that do not have a common shared disk and instead leverage data replication between nodes might not pass the cluster validation "storage" tests. This is a common configuration in cluster solutions where nodes are stretched across geographic regions. If a cluster solution does not require external storage to fail over from one node to another, it does not need to pass the "storage" tests to be a fully supported solution.
For more information on multi-site or geographically dispersed clusters, see the following whitepaper (http://go.microsoft.com/fwlink/?LinkId=112125).
Logos for Windows Server 2008 and Windows Server 2008 R2
Designed for line-of-business and mission-critical applications, the "Certified for Windows Server 2008" and "Certified for Windows Server 2008 R2" logos indicate that the application or hardware has been independently tested to meet the highest bar for stability, security, reliability, availability, Windows operating system fundamentals, and platform compatibility.
Hardware components that can run Windows Server 2008, Windows Server 2008 R2, or both, are eligible to receive the corresponding logo or logos. A logo covers each of the individual server hardware components such as the host bus adapter (HBA) or network adapter, and each associated driver or firmware revision is eligible for the appropriate logo. Components such as routers, hubs, or switches are not eligible to receive a logo.
Specific validation scenarios
The following lists describe scenarios in which validation is needed or useful.
• Validation before the cluster is configured
• A set of servers ready to become a failover cluster: This is the most straightforward validation scenario. The hardware components (systems, networks, and storage) are connected, but the systems are not functioning as a cluster. Running tests in this situation has no impact on availability.
• Cloned or imaged systems: With systems that you have cloned or imaged to different hardware, you must run the cluster validation wizard as you would with any other new cluster. We recommend that you run the wizard just after you connect the hardware components and install the failover cluster feature, before the cluster begins being used by clients.
• Virtualized servers: With virtualized servers in a cluster, run the cluster validation wizard as you would with any other new cluster. The requirement for running the wizard is the same regardless of whether you have a "host cluster" (where failover will occur between two physical computers), a "guest cluster" (where failover will occur between guest operating systems all on the same physical computer), or some other configuration that includes one or more virtualized servers.
• Validation when the cluster has only one node: You might want to run a limited number of validation tests on a single server that you intend to use in a cluster. Some tests cannot be run in this situation: tests that confirm that the software and software updates match between servers, and storage tests that simulate failover between nodes. When you bring one or more servers into the configuration, you must run the cluster validation wizard again so that all tests can complete. In other words, you must have at least two nodes in a cluster before you can complete the cluster validation process.
• Validation after the cluster is configured and in use
• For confirmation that the cluster is supported, or to rule out configuration problems: If you need support and it is necessary to rule out configuration problems with hardware, drivers, and basic system configuration, Microsoft CSS might require you to provide the report from the cluster validation wizard. If you have not already run the wizard and saved the report, you might need to take the cluster offline to run the wizard. The report shows whether your configuration is supported and can help with troubleshooting the issues on the cluster.
• Before adding a node: When you add a server to a cluster, we strongly recommend that you start by connecting the server to the cluster networks and storage and then run the cluster validation wizard, specifying both the existing cluster nodes and the new node. With some advance planning, running validation before adding a server can have relatively little impact on cluster availability. The network tests and system inventory tests have little or no impact on availability. For the storage tests, if you make a small, unused LUN available (as described earlier in this document in What is cluster validation?), the impact on availability is also small.
• When attaching new storage: When you attach new storage to the cluster (different from exposing a new LUN in existing storage), you must run the cluster validation wizard to confirm that the new storage will function correctly. To minimize the impacts to availability, we recommend that you run the wizard after attaching the storage but before beginning to use any of the new LUNs in clustered services or applications.
• When making changes that affect firmware or drivers: If you want to upgrade or make other changes to the cluster that would require changing the firmware or drivers, you must run the cluster validation wizard to confirm that the new combination of hardware, firmware, drivers, and software supports failover cluster functionality. If the change affects firmware or drivers for the storage, we recommend that you keep a small LUN available (unused by clustered services and applications) so that you can run the storage validation tests without taking your services and applications offline.
• After restoring a system from backup: After you restore a system from backup, run the cluster validation wizard to confirm that the system can function correctly as part of a cluster. The system is not considered a supported system until the validation tests are run.
Understanding the validation tests required for your scenario
You do not always need to run all tests in the cluster validation wizard when making a change to your cluster. This section lists the kinds of changes you might make to a cluster and the corresponding tests to run.
Important
To begin the process of adding hardware (such as an additional server) to a failover cluster, connect the hardware to the failover cluster. Then run the cluster validation wizard, specifying all servers that you want to include in the cluster. The wizard tests cluster connectivity and failover, not just isolated components (such as individual servers).
Categories of validation tests
• Full: The complete set of tests. This requires some cluster downtime.
• Single LUN: The complete set of tests, where you run the storage tests on only one LUN. The LUN might be a small LUN that you set aside for testing purposes, or the witness disk (if your cluster uses a witness disk). This validates the storage subsystem, but not specifically each individual LUN or disk. You can run these validation tests without causing downtime to your clustered services or applications.
• Omit storage tests: The system configuration, inventory, and network tests, but not the storage tests. You can run these validation tests without causing downtime to your clustered services or applications.
• None: No validation tests are needed.

Comments (0)

Post a Comment