Love and Hate

 

dirrestoremode

If you have been working with Windows servers long enough there are a few things that you both love and hate. Two of these things are likely ‘Directory Services Restore Mode’ and ‘System State Restore’.  When things are looking grim and that Active Directory server will not boot properly or the database is corrupted you are hoping that both of these tools are your best friends and cooperate with your efforts to restore the server. If you can successfully boot into ‘Directory Services Restore Mode’ that is step one and you are feeling excited and nervous at the same time. However make sure you keep your fingers crossed, you pray to whatever deity that you believe in, rub a rabbits foot,  or whatever method of generating luck you have and hope that  you have a valid system state backup that is not too old and that it is not corrupt. Once you locate your backup and successfully restore it and then boot into Windows normally you can than check AD and other functions of the server to make sure they are working properly.  If so you know that you can get a good nights sleep that night. If not you will keep looking for a valid system state backup, image backup or other.  Moral of this story is have good backups and be sure to have system state backups as part of your plan. Even better you will also have image based backups both locally and offsite as a last resort.

Yes I lucked out tonight and a system state restore quickly fixed a down server. Time for bed…….

Unable to log into web interface 3Com 4500 Switch

I had a 3Com 4500 switch I was unable to log into the web interface.  Not being familiar with the cli for this device I was at a bit of a disadvantage. Having done a full factory reset I was still unable to access the web interface.  I had a second unit the same steps were performed on which worked just fine. I noticed the firmware was at a 2006 version so I decided the first step would be to do a firmware update.  I was able to get the required files from the HP site as new as 2012.  I downloaded the files, setup my tftp server.  I use http://tftpd32.jounin.net/ which works really well. And wend forward.  I noticed when I did a dir listing of the device it was missing one of the bootrom files. I updated the files, set the boot options and restarted the device. Still unable to log into web interface, after some google fu I found the procedure to reset web access to the device.

Upgrade process:
first get a list of files:
<4500>dir
Directory of unit1>flash:/
0 -rw- 5195 Feb 04 2007 13:21:21 3comoscfg.def
1 -rw- 642479 Feb 04 2007 13:21:52 s3p02_01.web
2 -rw- 4088266 Feb 04 2007 13:24:15 s3n03_02_00s56.app
3 -rw- 1364 Apr 02 2000 00:47:45 3comoscfg.cfg

Next delete the files an clear recycle bin(there is not enough flash space to store both sets, make sure you have backups)
<4500>delete s3p02_01.web
<4500>delete s3n03_02_00s56.app
<4500>reset recycle-bin
Clear flash:/3comoscfg.cfg ?[Y/N]:y
Clearing files from flash may take a long time. Please wait…
……..
%Cleared file unit1>flash:/~/s3p02_01.web.
Clear flash:/s3n03_02_00s56.app ?[Y/N]:y
Clearing files from flash may take a long time. Please wait…
………………………………………….

Upload new files
<4500>tftp 192.168.18.54 get s3p05_01.web
File will be transferred in binary mode.
Downloading file from remote tftp server, please wait…………………..
TFTP: 1083788 bytes received in 16 second(s).
File downloaded successfully.
<4500>tftp 192.168.18.54 get s3o04_06.btm
File will be transferred in binary mode.
Downloading file from remote tftp server, please wait……….
TFTP: 195022 bytes received in 3 second(s).
File downloaded successfully.
<4500>tftp 192.168.18.54 get s3n03_03_02s168p21.app
File will be transferred in binary mode.
Downloading file from remote tftp server, please wait…………………………………………………………..
TFTP: 4243013 bytes received in 63 second(s).
File downloaded successfully.

Set boot files
<4500>boot boot-loader flash:/s3n03_03_02s168p21.app
The specified file will be booted next time on unit 1!
<4500>boot bootrom flash:/s3o04_06.btm
This will update BootRom file on unit 1. Continue? [Y/N] y
Upgrading BOOTROM, please wait…
Upgrade BOOTROM succeeded!
<4500>boot web-package s3p05_01.web main
<4500>save
The configuration will be written to the device.
Are you sure?[Y/N]y

Reset web interface
[4500]local-user admin
New local user added.
[4500-luser-admin]attribute access-limit 1
[4500-luser-admin]level 3
[4500-luser-admin]service-type telnet level 3
[4500-luser-admin]service-type lan-access
[4500-luser-admin]
<4500>save
The configuration will be written to the device.
Are you sure?[Y/N]y

Microsoft 70-413 and 70-414 exam updates for R2

Microsoft updated the 70-413 and 70-414 exams on Friday April 4th, ahead of the April 7th time they had told the public. Many people hoping to pass the original exam before the update cutoff failed. I was one of these people.  I got an email address of a MS employee who was very helpful and apologized  for the early test release and gave me the following update guides that show the changes in the test material. I asked to verify I could share these and she said yes.

413_OD_R2-April-2014

414_OD_R2-April-2014

Unexplained High CPU usage on Hyper-V host and Guests

I have a client with 2 identical Hyper-v servers running almost identical VMs. One of the servers out of the blue started having high CPU utilization. The host was bouncing from 35-50% and the guests were at 99% CPU utilization. Turned off the guests and reboot server, no change. Still 35-50% utilization. Made sure any unnecessary hardware was disabled or disconnected, again no change. Experimenting with one of the guest machines I noticed that the CPU utilization would sometimes show system interrupts at 99% then go away for a bit and then come back with any process that was active taking over the 99% utilization. After seeing that I wanted to check into system interrupts on each host machine and compare them.

In the past I had used KernView on 32bit machines, however this does not work for modern 64bit machines.  After some digging around on the internet it turns out KernRate works on 64bit machines and can be found in the Windows Driver Development Kit 7 found here http://www.microsoft.com/en-us/download/details.aspx?id=11800. If you choose the default install path the files can be found here C:\WinDDK\7600.16385.1\Tools\Other\amd64.

I wanted to log the output and have it run for a fixed time for comparison.  After looking through the help files I found my command to be ‘kernrate -s 30 -yo filename.txt’ which would give me a 30 second sample and write it to a file in the same path with the chosen file name. I ran the command on both my host that was not having issues and the one that was having issues.  I will cut to the interesting parts of the resulting log files in order to save space on this post.

Server specs (both servers are the same):
Dell 320
32GB ram
Intel E5-2420 CPU (6 hyper-threaded cores)
Server 2012 with Hyper-V role installed

Server with issues:

Results for Kernel Mode:
—————————–

OutputResults: KernelModuleCount = 147

Percentage in the following table is based on the Total Hits for the Kernel

ProfileTime              276703 hits,           10002 events per hit ——–

Module                    Hits                  msec             %Total              Events/Sec
NTOSKRNL                138197            30074              49 %             45961508
HAL                          126880             30074              45 %             42197704
WIN32K                     7230             30026                  2 %                2408394
NTFS                           1030              30055                 0 %                  342773

Server without issues:

Results for Kernel Mode:
—————————–

OutputResults: KernelModuleCount = 145
Percentage in the following table is based on the Total Hits for the Kernel

ProfileTime            289145 hits,          10009 events per hit ——–

Module                    Hits                 msec              %Total             Events/Sec
NTOSKRNL            244130           29999                 84 %              81452620
HAL                          41760            29999                 14 %              13932992
WIN32K                   1650             29999                   0 %                   550513
IPMIDRV                   625            30000                   0 %                  208520

So I noticed right away that the server with issues has 45% of interrupts going to the HAL. The HAL is short for Hardware Abstraction Layer which  is a piece of the operating system that allow other parts of the operating system interact with the physical hardware of the computer. Modern versions of Windows automatically select the HAL used based on the processor type, but I still verified both servers were using the same one. Again I disabled any unnecessary hardware, turned the guest machines off, updated drivers and ran KernRate between each step, all with very similar results.

After testing many configurations, drivers, and multiple reboots I was frustrated at the hours lost and the lack of understating why this was occurring. I had one last resort before declaring a bad CPU or motherboard and calling Dell for warranty.  I upgraded the bios and rebooted. I had left all disabled devices disabled and the guest machines off in order to limit the changes. A few minutes after rebooting I logged back in and opened task manager to a pleasant 10% CPU utilization. I re-enabled all devices and turned the guests back on. Everything seemed nice and fast, including the guest performance. I again ran KernRate to see if there was any difference in the results.

After BIOS update on bad machine:

OutputResults: KernelModuleCount = 144
Percentage in the following table is based on the Total Hits for the Kernel

ProfileTime                 341514 hits,            10009 events per hit ——–

Module                           Hits                msec          %Total             Events/Sec
NTOSKRNL                   332831           29999             97%               111047217
HAL                                   6673           29999               1 %                   2226409
IPMIDRV                          835           29999               0 %                      278593
NTFS                                   395           29999               0 %                       131789

Wow, that is quite a difference from the previous result and even better than the machine that was working seemly well. I am going to schedule a window of time to do the BIOS update on the second machine sometime in the future and see if the BIOS update will achieve a similar result. As with any updates or changes please backup your data and double check your BIOS update is for the correct machine as a BIOS update can go south and your machine will no longer boot.

Server 2012 RDS licensing problem

On server 2012 when setting up RDS the licensing diagnoser will still show not licensed even though the licensing manager show licensees. This will affect users when the 180 day trial runs out. In order to fix this the server mode and license server need to be set in the local server policy or group policy.

Local Computer Policy -> Computer Configuration -> Administrative Templates -> Windows Components -> Remote Desktop Services -> Remote Desktop Session Host -> Licensing Use the specified RD license servers =
Set the Remote Desktop licensing mode = Per User or Per Device depending on licenses bought

SonicWALL intermittent connection issues

Some older SonicWall routers or newer ones with a configuration that was imported have default NAT rules listed for “WAN Primary Subnet”, this causes the SonicWall to respond to all ARP queries on the entire subnet of the WAN interface even if the client is not assigned those IPs by the ISP. If you see these rules please disable them and flush the ARP cache to help prevent issues with the connected internet connection. This article helped to resolve this once the ISP pointed out it was an issue http://serverfault.com/questions/294817/how-can-i-stop-my-sonicwall-tz-210-sonicos-enhanced-5-5-1-0-5o-from-responding

Oddly even though the SonicWall responds to the ISP router with the ARP it does not put these entries in its own ARP table and the only way to see it is to have the ISP check the ARP table on their connected router.

In this case the client only had .198-.200 assigned to them but the SonicWall was responding to ARP on the entire usable block of .194-.201

PEMTK82#sh ip arp | inc 98.XXX.XXX

Internet 98.XXX.XXX.177 – 0014.f1eb.3bd9 ARPA Bundle1

Internet 98.XXX.XXX.185 0 0006.b13a.a2ca ARPA Bundle1

Internet 98.XXX.XXX.193 – 0014.f1eb.3bd9 ARPA Bundle1

Internet 98.XXX.XXX.194 51 c0ea.e458.XXXX ARPA Bundle1

Internet 98.XXX.XXX.195 53 c0ea.e458.XXXX ARPA Bundle1

Internet 98.XXX.XXX.196 45 c0ea.e458.XXXX ARPA Bundle1

Internet 98.XXX.XXX.197 222 c0ea.e458.XXXX ARPA Bundle1

Internet 98.XXX.XXX.198 0 c0ea.e458.XXXX ARPA Bundle1

Internet 98.XXX.XXX.199 0 c0ea.e458.XXXX ARPA Bundle1

Internet 98.XXX.XXX.200 10 c0ea.e458.XXXX ARPA Bundle1

Internet 98.XXX.XXX.201 67 c0ea.e458.XXXX ARPA Bundle1

Internet 98.XXX.XXX.202 0 0012.1ebd.99a8 ARPA Bundle1

Internet 98.XXX.XXX.203 6 000b.8660.4b74 ARPA Bundle1

Internet 98.XXX.XXX.204 134 0025.614a.af00 ARPA Bundle1

Internet 98.XXX.XXX.205 2 a0f3.c1c3.f5a3 ARPA Bundle1

Internet 98.XXX.XXX.206 255 0017.c54e.b575 ARPA Bundle1

To disable the rules uncheck the highlighted boxes and then go to the ARP page and clear the cache.

Server 2012 Multi-Path I/O with SMB3

For those of you who have not tested out this feature yet, I have to say it is pretty wonderful. However I have also run into some issues with it as well. For the quick and somewhat obvious gotcha it only works if the server and client both support SMB3.

In case you don’t know what it does I can give you a brief and dumbed down version of Multi-Path I/O. It allows multiple network cards to handle the SMB3 connections. By doing this it can spread the load across multiple network cards in effect increasing bandwidth and reliability. This is not the same is NIC teaming as all ports have their own IP on the network and it only works with SMB3 and any other service that supports it.

The issue I had was that even though there were 5 Gb NICs on the server the aggregate bandwidth for transfers was topping out at about 1Gb. 1 NIC was a built in Atheros and the other 4 were a single Intel PCI-x 1000 Pro. I could disable all but one NIC and get the same bandwidth as with all 5 running. It was distributing the load, about 190Mbs per NIC when all 5 were online or about 950Mbs for just one. Unfortunately this Intel NIC did not have newer drivers and only would work on the Windows Built in drivers, but replacing this NIC with a PCI-Express card did allow me to break the 1Gb barrier at 1.5Gbs with only 2 NICs. I will update this when I get more NICs installed. I suspect the issue was either the driver or some limitation of the PCI-X NIC card.

This is a good result:

multipathio smb3

SBS 2003 server lockups

My remote management software was alerting me that a client’s SBS 2003 server was experiencing 100% CPU utilization and would randomly stop responding to input and even showing offline. I was eventually able to RDP in and started to look for a cause of the issue. Event viewer would not open and most input was not providing any results, I was able to open task manger though and found that 4 processes were consuming all CPU utilization between them. Canceling Svchost.exe did not free up any cycles and the other processes which are protected and just went up higher in utilization.

After waiting 20 minutes for the server to respond to a force reboot command and looking through the event log I found the likely culprit. Shortly after the backup started I noticed errors like “lsass (432) Shadow copy 371 time-out (70000 ms).” For the lsass.exe, ntfrs.exe and tcpsvcs.exe processes. This is due to some VSS issues in sever 2003, to be safe I reset the VSS writers using a batch file with the following commands:

cd /d %windir%\system32

Net stop vss

Net stop swprv

regsvr32 ole32.dll

regsvr32 oleaut32.dll

regsvr32 vss_ps.dll

vssvc /register

regsvr32 /i swprv.dll

regsvr32 /i eventcls.dll

regsvr32 es.dll

regsvr32 stdprov.dll

regsvr32 vssui.dll

regsvr32 msxml.dll

regsvr32 msxml3.dll

regsvr32 msxml4.dll

Net start vss

Net start swprv

After looking further into the issue there is a KB article http://support.microsoft.com/kb/826936 which provides a hotfix for time-out issues of essential services during VSS copies. Remember to make sure you have a good backup before installing any hotfixes as they are not always well tested. If you can try the hotfix in a test environment first to be safe.

Server 2012 Learning Resources

Free Server 2012 E-Book

http://blogs.technet.com/b/keithmayer/archive/2012/06/04/microsoft-releases-free-ebook-for-it-professionals-to-learn-windowsserver2012-hyperv-virtualization-privatecloud-itcamp.aspx#.Uc0nYfmubao

Server 2012 MCSA resources

http://borntolearn.mslearn.net/mcsa90/#fbid=0FVGVztw56Z

http://blogs.technet.com/b/keithmayer/archive/2013/05/20/upgrade-your-mcitp-or-mcsa-to-windows-server-2012-with-this-free-certification-exam-study-guide-for-exam-70-417.aspx#.Uc0nNvmubao

http://blogs.technet.com/b/keithmayer/archive/2012/09/20/windows-server-2012-quot-early-experts-quot-challenge-exam-70-410-nic-teaming.aspx#.Uc0novmubao

Exchange 2007 update HELL

In what should have been a simple 30-45 minute task to update 2 Exchange Server 2007 SP3 severs to Cumulative Update 10, ended up being many hours of hell. One server upgraded just fine, first shot no issue. The second server on the other hand would get most of the way though, then roll back. Ultimately it was having problems finding files during the install process which seemed very odd. The fix was in the Windows Environmental Variables settings. While %SystemRoot%\Temp should equal C:\Windows\Temp but in this case it did not work. After changing that setting for both TEMP and TMP the update installed just fine.

Just looking at it from this screen is deceiving:

image001

If you edit one of them, you find they are using %SystemRoot%\Temp like:

image002

Changing it to C:\Windows\Temp fixed the issue in this case

image003