Wednesday, March 31, 2010

Multiple Path I/O (MPIO) for iSCSI storage

To ensure server to storage path continuity, we usually deploy redundant physical path components, including NICs (for iSCSI) and HBAs (for FC, SCSI etc). In the event that one or more of these components fails, causing the path to fail, multipathing logic uses an alternate path for I/O so that the servers and applications can still access their data.

Each storage vendor may introduce their own Device Specific Module(DSM) solution, such as PowerPath for EMC. If you did not cater for budget to purchase specific vendor MPIO module, Microsoft did introduce a generic DSM free. I tried it on my W2K8 R2 server on Hyper-V VM and Dell-EMC AX4-5i storage using Round Robin to load balance among the four iSCSI paths. If you are also using W2K8, install MPIO as a feature using Server Manager or "Add-WindowsFeature" cmdlet in Server Core.


Click on Administrative Tools -> MPIO and check on "Add Support for iSCSI devices" on the "Discover Multi-Paths" tab. Reboot the server. On Server Core, you may run 
>"mpclaim -r -i -d "MSFT2005iSCSIBusType_0x9""


Invoke iSCSI initiator and connect to the iSCSI targets - make sure that the option "Enable multi-path" is checked. On server core, you may invoke the same initiator using "iscsicpl.exe" command.



Click "Advanced", connect using different path each time. Repeat and rinse for the number of different paths that you have. Click on "Devices" -> "MPIO". And you should the load balancing policy and the multiple paths linking to the Disk devices.



 For a more complete Server Core configuration, see "MPIO with Windows 2008 R2 Server Core and iSCSI".

Monday, March 29, 2010

Storage options for Hyper-V

Found this excellent online resource that discuss various storage options for Hyper-V. This table is particularly useful:

Sunday, March 28, 2010

Missing Deployed Printer Node in Windows 2008 GPMC

Windows Server 2003 R2 onwards supports "Printer Deployment with GPO". There is an excellent guide on WindowsNetworking.com that illustrates the step-by-step deployment.

However, if you attempt the same step in W2K8, you will realize a missing "Deployed Printer Node" in the GPO editor. To get it back, add the "RSAT - Print & Document Services Tool" using "Add Features" in the Server Manager as below:



Go back to GPO editor, expand "Computer or User Configuration" (depending whether per-computer or per-user deployment basis), Policies, and Windows Settings as below:

Live Migration on Multi-Site Clustering POC

Earlier in my post, I mentioned about the "Cheapskate Multi-site Cluster for Hyper-V R2 HA". I did a simple POC using low cost host-based replication called "SteelEye Data Keeper" (as compared to SAN replication) to provide asynchronous data replication over the network.

This is my 2-node cluster POC setup using MS Failover Clustering with quorum type "Node and File Share Majority". In this setup, we can afford any one (not two) site failure to provide continuity.


In this Hyper-V cluster, I have a few VMs running on both clusters. Let's do a live migration of one of the VMs called "PrintSrv" from Cluster01 to Cluster 02.



During migration, I continue my RDP session on PrintSrv to ensure that the VM is still up & running while in migration.



After a while, the current node owner is Cluster02. Live migration is complete without any down time.

Thursday, March 25, 2010

Failover Clustering Error 80070005

When I run Validation of my test failover clustering on Hyper-V, I get the following error report:


Validate Cluster Network Configuration Validate the cluster networks that would be created for these servers. An error occurred while executing the test. There was an error initializing the network tests. There was an error creating the server side agent (CPrepSrv). Creating an instance of the COM component with CLSID {E1568352-586D-43E4-933F-8E6DC4DE317A} from the IClassFactory failed due to the following error: 80070005.

This was despite the fact that I checked through all network connection & properties. I even made sure that the network binding order were correct on both nodes - the Public network card is at the top, above the Private network card.

This error usually comes when you have cloned VM. To check whether you have duplicate SID on the cloned VM, check out this blog post. To resolve it, sysprep the cloned VM (thanks to Farseeker) and the problem should go away.

According to this post, another smaller possible reason is that your environment is completely Windows 2008 Domain Controller (note: this error still persist even if the forest/domain level is at W2K3)). To resolve this error

1) Login to your domain controller using domain admin rights

2) Click on start -> run> dcomcnfg

3) Expand Component services -> Computers and right click on My Computer and click on properties

4) Go to Default Properties tab

5) Under default impersonation level select impersonate and apply it.

6) reboot your Domain controller and then try validation again.

Note: if your validation still fails. Dis-join machines from domain and rejoin and try again.

Monday, March 22, 2010

Cheapskate Multi-site Cluster for Hyper-V R2 HA

MS W2K8 R2 announces 2 important new HA features for virtualization: (1) new cluster shared volume (CSV) for Hyper-V HA cluster & live migration and (2) multi-site clustering.

Hyper-V R2 cluster support VM HA & live migration using the Cluster Shared Volume (CSV). CSV is still a shared volume between 2 nodes, except that 2 nodes can own the volume concurrently (instead of single node ownership previously). You may have a single large CSV that stores multiple VHDs and load balance the VMs within the cluster. However, the common shared CSV is still the major single point of site failure (imagine a total site outage e.g. earthquake, power, flood etc).

On the other hand, multi-site clustering can solve the above site disaster issue - each server node owns its storage on one site & replication occurs between the differently located storage box. One site is defined as source storage and the other as target storage. It presents a single virtual volume to a pair of cluster. However, CSV does not support replicated volumes. Hence, only node may own this virtual LUN at any one time. Two types of replication exist - host-based and storage-based. Multi-site clustering supports Hyper-V R2. The recommended quorum type for this setup is "Node and File-share (instead of disk) majority" whereby the file share server also carry a vote. For better resiliency, the file share server is recommended to be hosted at the third site.

I found this online demo that used host-based replication solution - a software called "SteelEye Data Keeper Cluster". As compared to expensive SAN replication solution, an advantage of this software is the retrofitting of any existing storage, as it allows mirroring across different storage types & volume e.g. iSCSI to NAS, DAS to iSCSI etc.

Friday, March 19, 2010

Presentation Virtualization is back in Las Vegas

I thought the term "Presentation Virtualization" was dropped since launch of Windows Server 2008 R2, since it was hardly mentioned in any new Microsoft Windows 2008 R2 literature. It was almost used in synonymous with Remote Desktop Services (f.k.s Terminal Services) RemoteApp.

Right now, I'm attending the Virtualization Pro summit 2010 at Las Vegas. Presentation Virtualization is still mentioned by a few MVP speakers, including Sean Deuby. Sean defined Presentation Virtualization as the display being abstracted from the originating processes.

Friday, March 12, 2010

Delete Volume Group in Openfiler

Openfiler is a free open-source iSCSI solution, which I mentioned earlier. I've been trying to delete a Volume Group (VG) via the Brower GUI on Openfiler. The VG still remains.

I search the Internet and found this blog on how to remove the VG using CLI instead.

Step 1: Disable VG
vgchange –a n

Step 2: Remove VG
vgremove

Thursday, March 11, 2010

How I assign storage to a VM

Someone asked how I typically assign storage to a VM. This is what I usually did - separate the data from the binaries to minimize the chance of corruption in the event of outage.

C: System OS drive. Typically assign ~60GB fixed sized VHD.

D: is application drive where I would install the application binary in VHD. Size varies on application requirements.

E: is the data or log drive where I would store the system & application data & logs. If syslog or ftp is the application, expect a big storage space. Typically, I would assign direct SAN LUN with RAID (e.g. iSCSI) to this volume. I would also redirect the host Firewall/IIS logs here for audit purposes.

In summary, C & D drives are typically assigned with VHDs that are stored on the host's direct or SAN storage while I would assign direct LUN with redundant RAID to E drive. The rationale is that the system & application binaries in C & D can be easily restored by installation but not the logs/data on E drive. Always remember to keep data and binaries separate.

Wednesday, March 10, 2010

Don't VM your PDC emulator

I learnt a mistake by virtualizing my Primary Domain Controller (PDC) emulator, which is the default master NTP clock on the Windows domain. PDC emulator is one for the five essential FSMO roles in maintaining the Microsoft Active Directory. Despite its misleading name PDC emulator for NT4.0, it is still used to support several AD operations, including being the default master NTP clock, password replication & DFS namespace meta data within the domain.

To find out which DC is the PDC emulator, run this on any DC: netdom query fsmo

The virtualized PDC seems to always "trust" Hyper-V time synchronization (part of Hyper-V integration service) more than the external NTP server (a Linux box), which I manually configured using w32tm (see this). Although the time was in-sync within the domain, it was out-of-sync with the real world.

Frustrated, I have to set aside a R200 1-U DELL server, run "dcpromo" and take over the PDC role. Finally, the clock is in sync. To sync the rest of domain controllers on VM, you've got to shutdown the VMs, turn off the time synchronization service on the Hyper-V integration setting and boot them up one-by-one.

KMS requirements

Microsoft Volume Licensing Activation comes in 2 forms: Multiple Activation Key (MAK) and Key Management Service (KMS). It is also well published in Microsoft website if you have at least 5 servers or 25 clients, you should go for KMS.

Now, we have about a dozen of servers in a particular network activated by KMS. Recently, I joined the first Win7 client to the domain and was unable to activate this client. Error: "The count reported by your KMS server is insufficient". Ops! I thought I already have more than 5 servers in this domain!?

A further check with Microsoft now confirms this:
KMS volume activation requires a minimum number of physical Windows clients: five (5) for Windows Server 2008, or twenty five(25) for Windows Vista. However, KMS does not differentiate between the two systems when counting the total number of clients. For example, a KMS host with a count of three (3) Windows Vista clients and two (2) Windows Server 2008 clients would activate the two (2) Windows Server 2008 clients because the cumulative count is five (5) clients. But KMS would not activate the three (3) Windows Vista computers until the total client count reached twenty-five (25). Each time a new machine contacts a KMSHOST, it is added to the count for thirty calendar (30) days, after which its record is deleted, similar to Time-To-Live (TTL) for Domain Name System (DNS) records.

Monday, March 8, 2010

Cisco Flexible Netflow

Cisco NetFlow is a IP traffic monitoring protocol used in Cisco IOS devices - mainly used for bandwidth monitoring and other reporting purposes, such as billings. A simple netflow configuration may look like this

1) To create flow export to a server:
ip flow-export destination {hostname|ip_address} {port no.}

2) Apply on interface:
interface {interface} {interface_number}
ip route-cache flow

As you can see, almost every traffic will be exported out. What if you want to monitor only a specific flow? Cisco now introduces Flexible Netflow, which export v9 and v5 (from Cisco 12.4(22)T). A simple configuration may now look like this:

(define the specific flow that you are interested in)
flow record app-traffic-analysis
description This flow record tracks TCP application usage
match transport tcp destination-port
match transport tcp source-port
match ipv4 destination address
match ipv4 source address
collect counter bytes
collect counter packets

(export to a netflow analyzer)
flow exporter export-to-server
destination 172.16.1.1
flow monitor my-flow-monitor
record app-traffic-analysis
exporter export-to-server

(apply on an interface)
interface Ethernet 1/0
ip flow monitor my-flow-monitor input

Of course, you would also need netflow analyzer software to process these collected data. There are several on the Internet that you can try out, including this free version ManageEngine Netflow Analyzer that supports up to 2 interfaces.

References:
  1. Getting Started with Configuring Cisco IOS Flexible NetFlow
  2. Cisco IOS Flexible NetFlow Technology Q&A

Saturday, March 6, 2010

Hello Remote Desktop Services, Goodbye Terminal Services

With the major launch of Microsoft Windows 2008 R2, Terminal Services is now renamed as Remote Desktop Services (RDS) to indicate additional functionality. The major addition is the support of Virtual Desktop Infrastructure (VDI).

Terminal Server is now renamed as Session Host. Session Broker (in-built load-balancer) is now renamed Connection Broker. Presentation Virtualization (Present-V) has apparently been taken out of Microsoft dictionary - RDS RemoteApp is used in place. The term (Present-V) which you saw in my earlier posts can now be replaced with RDS RemoteApp instead.

Securing enterprise applications using RDS RemoteApp

Windows 2008 has a new feature in Remote Desktop Services (RDS a.k.a Terminal Services) that allows individual applications to be presented to users via RDP. Although the applications are installed and run on Terminal Server (now known as Session Host), Users interact with the virtualised applications as if they were installed locally. This feature is known as RemoteApp.

There's a growing security demand for Internet traffic to be segregated from the corporate applications due to the recent high profile APT incidents. We conducted a trial that leveraged primarily on this RDS RemoteApp. Internet applications (i.e. Internet Explorer etc) are virtualised and executed via RDP, which effectively permit only screenshots, key stroke and mouse clicks to be transmitted between client and server. Even if the Internet applications were subverted by Trojans, it would have no impacts on existing corporate applications. Corporate applications are protected and there's no drop in user experiences. The setup is simple and fits well on existing infrastructure. And the trial is a huge success.

Tuesday, March 2, 2010

Zero Downtime Firmware Upgrade for Cisco ASA Active/Standby

We have a pair of Cisco ASA 5520 configured in Active/Standby mode. Both management interfaces share the same IP address. But, how do you upgrade both firmwares with zero down-time remotely? (Note: Both nodes may sync their configuration and state but not the ASA image).

SSH to the active node. Upgrade its image by doing "copy tftp: flash:" and configure the system to boot from new image "boot system image". Force the standby unit to take over by executing "failover exec standby failover active". The first part "failover exec standby" is to send command to the standby unit. "failover active" is to force the unit to takeover the active role. The connection will drop. Once you reconnect, you will be connecting to the other node. Repeat the same process on this newly active node mentioned in the first sentence of this paragraph. You may reload the standby unit for the new firmware to take effect by executing "failover reload-standby" from the active node when the upgrade is complete.

Monday, March 1, 2010

CPU Type of VM in SCVMM R2

The performance of one of the Hyper-V VMs deteriorated severely. I started the task manager and noticed that the CPU utilization hits 100%! I logged on SCVMM R2, checked on its hardware properties and realized that the CPU was just the ancient Pentium III?! I didn't even able to choose the CPU type when the VM was managed by Hyper-V manager.

Fortunately, someone posted this interesting article that explains it does not specify actual hardware but is used to calculate host ratings. In addition, SCVMM also uses it to set CPU resource allocation accordingly. As I set my "busy" VM to higher CPU, the host should assign more CPU cycles for it.