Thursday, March 31, 2011

Key Performance Monitor Counters

How do you interpret key PerfMon counters on Windows server when there are over a thousand counters to choose from? This article advices some common key counters to look for, especially in system availability uptime, processor, memory and disk I/O performance. It also explains how to interpret the data recieved.

Saturday, March 26, 2011

Route filtering using route tags

In enterprise routing, route-filtering is often used to prevent routing loops and sometimes for security reasons. Instead of solely relying on ip access-list and addresses, route filtering can also be performed by route tagging. In fact, this method is more scable for a larger network when managing access-lists can be a challenge over a large number of routers. Consider this corporate network (see below pic). The corporate has 3 remote sites with IP subnets of, and respectively. You have a corporate policy that states Network A should link to all 3 remote sites via ISP X. Network B should link to the first 2 remote sites via leased lines and the last remote site via ISP X only. Network A is peered with ISP X on eBGP. IGP between internal networks is OSPF and remote sites via leased line is RIP. To implement such routing policy using route-tag:

  1. Router A

  2. access-list 1 permit

  3. access-list 1 permit

  4. access-list 2 permit

  5. !

  6. route-map route-tag permit 10 ‌

  7. match ip address 1 ‌

  8. set tag 111 --tag the 1st two remote sites with 111

  9. !

  10. route-map route-tag permit 20 ‌

  11. match ip address 2 ‌

  12. set tag 222 -- tag the 3rd remote site with 222

  13. !

  14. route-map route-tag permit 30 -- without this, all other routes will be dropped

  15. !

  16. router ospf 1 ‌

  17. redistribute bgp 65001 subnets route-map route-tag -- redistribute ISP routes into IGP

  18. ...

  19. ...

  20. Router B

  21. route-map tag-filter deny 10 ‌

  22. match tag 111 -- filter off sites with tag 111

  23. !

  24. route-map tag-filter permit 20 ‌

  25. match tag 222 --permit only sites with tag 222

  26. !

  27. router ospf 2 ‌

  28. distribute-list route-map tag-filter in

To verify, perform the necessary "show ip route" commands on both router A and B to ensure the route entries are in order. Do note that tagging does not work with BGP. The alternative in BGP is to use community string in AA:NN format (e.g. 100:300). For the adverting routers (typically on customer edge), use "set community" in place of "set tag" in the route-map statement. For the recieving routers (typically on provider edge), use "ip community-list" to describe the community string and "match community". For further example on using BGP community, see this Cisco example.

Friday, March 25, 2011

Verifying Cisco IOS Image Checksum

It's always advisable to check the MD5 checksum after you downloaded the new IOS image and again after you uploaded the new image to the Cisco device. Otherwise, your routers may not even able to boot up with corrupted images. There are free MD5 checksum program available on the Internet. One such program is the MD5 Checker. And it's also probably good idea to store the MD5 value alongside with the image.

Once you have uploaded the new image and before you reload the router, run this command:

#verify /md5 ‹ ios image location ›
example: verify /md5 flash:c1841-adventerprisek9-mz.124-25e.bin

Compare the output value with the MD5 sum that you noted earlier.

Wednesday, March 23, 2011

SolarWinds Orion Part 3 - Netflow

Basic interface monitoring on SNMP probably only get you some bandwidth utilization rate. But it won't give further insights and break-down on the network applications, like which is the most talkative applications? Which node generate the most traffic? To gain deeper insights, you have to enable netflow monitoring on the routers and use a netflow management console (e.g. Solarwinds Netflow module known as "Netflow Traffic Analyzer") to view the reports.

To enable netflow on the managed node:
Choose a netflow version (either 5 or 9) depending on your netflow console support. Solarwinds supports both versions
(config)# ip flow-export version 9
Send the netflow traffic to your netflow server's IP address and designated port no. VRF is optional but we use it for Out-of-band monitoring(config)# ip flow-export destination 1055 vrf vrf-name
Choose an interface that should report to the netflow server(config)# ip flow-export source interface-name
On the interfaces that you want to monitor. Add these commands at the interface level.(config-if)# ip flow ingress
(config-if)# ip flow egress

The above example is to monitor all traffic entering and leaving the interface. If you wish to monitor a specific flow, you can replace the above with Cisco Flexible Netflow (click on example). For Juniper J-flow configuration, refer to this example.

Login to your netflow server console and you should see netflow messages saying that new netflow information are being recieved and added automatically. If it doesn't, you have to ensure that the node and interfaces have been added to the Orion core. Leave it running for some time and you should start seeing detailed graphs.

Monday, March 14, 2011

Rebuilding Perfmon WMI for SCVMM

Just earlier, SCVMM (great management tool for Hyper-V) reported that one of our Hyper-V cluster nodes stopped responding. We raised a ticket with Microsoft Technet. After some days of troubleshooting with the great Chinese Microsoft engineers (with my limited Chinese vocabulary), it was discovered that the Performance Monitor (Perfmon) WMI of the affected node was corrupted and hence unable to report to the SCVMM host.

To rebuild the perfmon WMI, enter the following command using elevated command prompt at the system32 prompt:

C:\Windows\system32 > lodctr /R

Re-sync the perfmon counter with WMI by running winmgmt /resyncperf.

C:\Windows\system32 > winmgmt /resyncperf

And then restart the WMI service. The parameter 'R' for lodctr must be in capital for the rebuild. This parameter is not even documented on Microsoft Technet.

Repairing System Center Data Protection Manager

Microsoft System Center Data Protection Manager (DPM) is used to protect and backup other windows systems, which is especially useful for backing up Hyper-V virtual machines. But what happen if the DPM is corrupted or broken?

In the event of corruption of the Microsoft Windows registry, system files, or the System Center Data Protection Manager (DPM) 2010 binaries, you can repair DPM by reinstalling it. Repairing DPM involves backing up existing DPM database (using DPMBackup.exe -db cmd), uninstalling DPM, reinstalling DPM and then restoring the database. See this technet article for step-by-step.

Friday, March 4, 2011

SID duplication doesn't matter?!

I have just come across a technet blog declaring that SID duplication doesn't matter, especially in a domain environment where Domain SID instead of machine SID is used. Domain SID is re-generated whenever a computer leave and re-join a domain, which is typical for disk imaging purposes. For years, we were taught to use sysprep or newsid to regenerate new SID for every cloned image.

"I realize that the news that it’s okay to have duplicate machine SIDs comes as a surprise to many, especially since changing SIDs on imaged systems has been a fundamental principle of image deployment since Windows NT’s inception. This blog post debunks the myth with facts by first describing the machine SID, explaining how Windows uses SIDs, and then showing that - with one exception - Windows never exposes a machine SID outside its computer, proving that it’s okay to have systems with the same machine SID."

Nevertheless, the blog concluded that sysprep is still necessary for Microsoft's support:

"Note that Sysprep resets other machine-specific state that, if duplicated, can cause problems for certain applications like Windows Server Update Services (WSUS), so Microsoft’s support policy will still require cloned systems to be made unique with Sysprep"

I would take this with a pinch of salt, as I did experience strange problems in the past for having duplicated SIDs. Or rather, I would interpret the statement this way - even though SID duplication per-se may not cause problems, unpredictable outcomes may still occur, as other machine-specific states are not reset. SID duplication is an indicator of such happening.