TeamDME! FAQ:  Hardware case Study
Up

When problems can not be easily resolved when talking to TeamDME! Technical support, it is often the result of a hardware or network issue. This guide will assist in the discovery of possible hardware as well as software issues.

A. Problems Log: The first step toward isolating what the problems are, is to create a log of the problems events over a period of 3 to 4 days. Whenever a problem occurs, you will need to record the following information: User, Workstation, Time of event, and a Description of what the user was doing in the system when the problem occurred.


B. Problem Log Analysis: Once you have created the problem log, you can begin to determine what the most common problem (s) are. For example, a 1010 Read Error and 1011 Write error point to hardware/network problems. A "Variable does not exist" error will point to a software bug. Errors like 1132 or an individual workstation locking up could be either hardware or software. Once you have determined what most of the errors are and possibly the root cause of errors, you are in a position to begin looking for the cause.

As you analyze the Problem Log, you are trying to determine two things. The first is the general direction of your search. Are we looking for hardware or software problems? TeamDME!'s technical support can help you here as we have many years of experience with the program. The second thing you are trying to determine is a demonstrable error.


C. Identify a Demonstrable Error: No this is not a demon possessed workstation although it may seem like that a times. What we are trying to find is a simple error that we can demonstrate and repeat on a consistent basis. This is an important first step toward resolving the problem, as we will use this as our baseline. As we try various solutions (replacing a workstation, or NIC card), we will immediately try to recreate the problem to see if our solution works. The demonstrable error may be unique to your problem, but we have included a few examples.

1. Tax the network using XCOPY: This is a simple test the users can perform. All we are trying to test is the network connection to the file server and its ability to handle a large amount of network traffic. In this test we will create a folder on the file server called TEST and copy into it about 500 files. These files can come from the \SPECTRUM\DATA folder or other places, but we need several large files (100-200MB) as well as small files. Then we copy these files from the file server to the local workstation. This should take at least 5-10 minutes or you should add more files to the TEST folder.

Click Start, Program, MSDOS Prompt (this opens a DOS sessions)
MD F:\TEST (this creates the test folder)
XCOPY F:\SPECTRUM\DATA\*.* F:\TEST (this fills the test folder with files)

XCOPY F:\TEST\*.* C:\TEMP (this begins the network test)

Note: Your DATA folder may be somewhere other than the default location, so you may need to modify these sample commands to your situation.

This is the simplest test possible for the network as a whole. You might run this same command (XCOPY) 3-10 times on each workstation. It would be very common for this test to work 4 times in a row and fail on the 5th attempt. You should also attempt to run this test on multiple workstations simultaneously. You could run this on as many as 100 workstations and should get the same results.
Note, each workstation may take longer to complete a series as you add more stations. This is normal as the file server and network cable have a limit to the amount of traffic they can handle (bandwidth). If this fails, you have eliminated software and can narrow your search to the workstation, NIC card, cable or file server.

2. Tax the network using INDEX: This is another simple test the user can perform. Again we are testing the network connection to the file server by sending a large amount of data to/from the file server and workstation(s). Simply have everyone logout of TeamDME! and choose System, Reindex, All Files and enter an X to Reindex all files. You might run this same command 3-10 times on each workstation. It would again be very common for this test to work 4 times in a row and fail on the 5th attempt. You could also attempt to run this test on multiple workstations simultaneously, but if two workstations attempted to reindex the same database, one would stop with an error and this would be normal.

This is another very simple test for the network and software. If this fails, you have identified a demonstrable error, but have not eliminated anything yet as we have all elements involved in this test.

3. Problems inside TeamDME!: Sometimes you can only seem to demonstrate a problem inside of TeamDME!. This is fine, but again, the problem needs to be repeatable.


D. Check the Cables: Cabling is estimated to account for up to 70% of all network problems and is so simple that it is often overlooked. Installing and testing cabling is a job for a professional although I have done it in the past with success. The problem is that a novice will never know if he/she has done a good job or not until the network starts crashing. The following are some simple checks that every user can do to make sure you don't have an obvious problem including:

1. Check that you have a connection light (if available) on the workstation and the hub. This tells you that the system is getting a connection from the hub to the workstation. This does not tell you if the cable is crimped properly or if the cable will carry full bandwidth, but only that a connection is complete.

2. Check that your cabling seems intact. Check that cables have no obvious severe bends or obscenely exposed wires. Check that cables do not run across the floor where they can be stepped on and kicked loose. Check that cables run along the wall and across the ceiling and do not come in contact with or close proximity to electrical wires, light fixtures, especially fluorescent lights or any other source of electrical interference.

3. Check that your twisted-pair cable twists. Many times when cable installers are crimping twisted-pair cables, they will strip off more than 1 inch of insulation and untwist the cable. This results in more than 3/4" of untwisted cable which can introduce noise in the cable. The twist is designed to cancel out electrical interference, so removing the twist, removes this ability.

4. Check that your cable meets the requirements of the speed desired. If you are using 10Mbs NIC cards and hubs, you must have Cat 3 or better cable. If you are using 100Mbs devices, you must have Cat 5 cable. While it may be true that you can transmit 100Mbs over Cat 3 cable in the noise-free environment of a lab, most of us don't work in noise-free environments, electrically speaking.

5. If you have identified a demonstrable error, you can eliminate the cable by creating a new cable and running it directly (along the floor if necessary, but temporarily) from the workstation to the network hub. You should attempt to bypass everything possible (patch cables, wall cables, patch panels, etc) with as direct a connection as possible. If repeated tests are successful, return the cabling to its original configuration and retest. If failures reappear, you have narrowed the problem down to what was bypassed (patch cables, wall cables, patch panels, etc).

6. If these simple tests don't seem to identify your problem, you will need professional help. There are a few (read expensive) tools (digital cable analyzers) available to cabling professionals to test the integrity of the cables and their ability to transport the full bandwidth 10/100Mbs of signal from the workstation to the file server.

Note: We had one client running a Windows NT server with 40 Win98 workstations who was having frequent problems within TeamDME! which resulted in 1010 Read errors, 1011 Write errors and 1132 Bound errors. He had had 2 Windows NT certified technicians look at his network and declare that it was setup correctly. But for 6 months he continued to have problems. The network technician swore that the problem must be software (they always say that). So we worked onsite for 2 days running multiple tests and found using the NetBench program that 14 of his 40 workstations could not even pass the simple connection test (VERIFY). He called the cabling technician who tested each of his cables and found cables running over fluorescent lights, poorly crimped, etc. Numerous cabling problems which could only have been identified with a dedicated tool.

7. Running the Netbench program described following can test cabling as well as other components of the network.


E. Check the Workstations: Once a demonstrable problem is identified, you may want to test each workstation. The first step will be to insure you don't have a simple problem with the workstation.

1. Delete all temporary files: Usually, these are stored in the C:\WINDOWS\TEMP folder, but you can check this by typing SET at a DOS prompt and note where the TMP and TEMP variables point to:

Click Start, Program, MSDOS Prompt (this opens a DOS sessions)
Type SET

Then start the Windows Explorer and delete all files in each of these folders.

Note: You may have some files which can not be deleted. This is normal as the system will not allow you to delete files in use. Sometimes a workstation may have too many temporary files to work with and this can cause problems.


2. SCANDISK the workstation: This checks the workstation's hard drive for read/write and FAT (File Allocation Table) errors and will automatically fix any errors it finds.

Click Start, Program, Accessories, System Tools, Scandisk (this starts Scandisk)
Check Fix Automatically and Thorough

Note: Some MSWindows versions may have these tools in different locations or may not have them available at all.

3. DEFRAG the workstation: This consolidates files on the workstation's hard drive to insure each file is located physically in a contiguous block on the hard drive. This is usually done to enhance performance, but is a good system check.

Click Start, Program, Accessories, System Tools, Defrag (this starts the Defragmenter)
Check Drive C:\ (defragment each drive in turn)

Note: Some MSWindows versions may have these tools in different locations or may not have them available at all.

4. Isolate the Workstation: One way to insolate the general direction of the problem is to attempt to narrow the problem to one/few workstations. You would do this by breaking the workstations up into quads and run only four workstations for a period of time. All other workstations should be turned off. If the first set runs fine for the selected period of time, add another set of four workstations into the workgroup. Continue this until a) an error occurs or b) all the workstations are in the system.

This is a simple test that the average users can perform. What you are looking for is to isolate a single workstation so that when we have everyone in the system, but this workstation, we run fine. But, when we add this workstation to the workgroup, we begin to have problems. If you can isolate one or a few workstations, you can focus your attention on those. If you are unable to isolate a single workstation or group of workstations, your problem is probably not a single workstation, cable or NIC card.

5. Replace the NIC card: You might also consider replacing the NIC card in the workstation with either another NIC card of the same brand/model or a name-brand card. NIC cards are often cheap, but very important components of the network and cheap cards may not be as robust as a card from a reputable vendor.

Note: We had one client running a Novell file server with 10 Win95 workstations who was having sporadic problems within TeamDME! which resulted in 1010 Read errors and 1011 Write errors. None of their other programs seemed to exhibit the problem including Microsoft Word, Excel and various other programs. They had tried replacing all the workstations (very expensive test, but they were willing to do it) and had two separate Novell certified technicians who examined their system and determined it was not at fault. We came onsite and after two days of testing, found that the network cards in each workstation would not transmit reliably to the file server using the INDEX test described elsewhere. The error rate was extremely small, about 1 error after every 500MB of data transmitted (1 of 500,000,000,000 bits) but any error would result in an unrecoverable error within TeamDME! (of course). The solution was to replace all their network cards with 3Com and they have been very happy since.

6. Running the Netbench program described following can test the Workstations as well as other components of the network.



F. Check the File Server: There a number of tests to run on the file server as well to insure it is working properly. This is beyond the scope of our little white paper, but we can suggest several things to try.

1. Check and make sure the file server is not running any screen saver. While these are pretty, they eat up processing time and memory at an alarming rate.

Note: We had one client with a Windows NT server and 25 users on the system. They had moved from a Novell server to Windows NT and almost immediately saw their workstation performance fall to an unacceptable level (30 seconds from choosing a patient until they saw the Order window). They had had 3 separate Windows NT certified technicians in to evaluate their system and each had said that the file server and network was setup properly. Of course they screamed that the software (TeamDME!) would not work well on a Windows NT system and demanded that we do something about this problem. However we noted that at various times during the day the system would speed up and performance was great, the users would work for 10-15 minutes and the performance would fall again to abysmal. What we found, was that this user had 3-D Pipes setup as a screen saver on the file server. And while the screen saver was active, it was spending the majority of processing time (CPU) calculating 3-D pipes (which was very pretty) and the users suffered. What was happening was whenever someone walked by and bumped the table where the file server was sitting, the screen saver would disappear and performance was great, but in 15 minutes the screen saver would start again (very difficult to identify). Once we set the screen saver to NONE, their performance problems went away.

2. If running on a Windows 95/98 server, run Scandisk in the same way you would on a workstation.

3. If running on a Windows 95/98/NT server, run Defrag in the same way you would on a workstation.

Note: Windows NT does not have a Defrag program included in the operating system, but one is available as a part of the Norton Utilities for NT program. This can help your system run faster in addition to finding and fixing disk problems.

4. After the Scandisk and/or Defrag procedures are done, shutdown the system normally, leave the power off for 5 minutes and turn it back on. Retest the system using your demonstrable problem.

Note: It is important to shutdown the file server after any test where changes have been made to the file server configuration. This makes sure that our previous tests do not interfere with the new test/configuration.

7. Running the Netbench program described following can test the file server as well as other components of the network.


G. Check the Software: After the hardware is thoroughly checked out, the next step is to check the software. We are often criticized for making all hardware checks before looking at the software, but like building a house you have to make sure you have a firm foundation. If the foundation is crumbling, you will never be able to fix sagging walls, cracks in the ceiling, etc. because you are not fixing the real problem. In the same way, if you have hardware problems, they will, given enough time create software problems.

1. Check that all workstations access the program in the same way. This normally means that everyone has the latest version of the software, and the software is installed on the workstation in the same folder. Check the properties of the icon which launches the program and make sure they are consistent from workstation to workstation.

2. Check that temporary files stored inside the \SPECTRUM\DATA folder are deleted. TeamDME! will normally create temporary files in the DATA folder as you use the system. These files either end with a *.TMP extension or start with a number and end with a *.DBF extension although we delete anything that begins with a number.

Click Start, Program, MSDOS Prompt (this opens a DOS sessions)
F: (sets the current drive to the network)
CD \SPECTRUM\DATA (changes the current folder to DATA)
DEL *.TMP (deletes the undesired files)
DEL 0*.*
DEL 1*.*
DEL 2*.*
DEL 3*.*
DEL 4*.*
DEL 5*.*
DEL 6*.*
DEL 7*.*
DEL 8*.*
DEL 9*.*

Note: Your DATA folder may be somewhere other than the default location, so you may need to modify these sample commands to your situation. You are erasing data here, so contact your network technician if you have questions about these commands.

Note: If you have created forms or reports or other files beginning with numbers, you will want to make a backup copy of these before deleting them.

3. Reindex all data files by starting TeamDME! and choosing System, Reindex, All Files and enter an X beside each section.

Note: After each test in which you are running TeamDME! (entering an order, posting payments, etc.) you will need to reindex all files before starting the next test. This establishes an uncorrupted index as a starting point. If you corrupt an index during Test1 and you begin Test2 and find an error, you will not be able to determine if the second error was due to what you were testing or a corrupt index resulting from Test1.

4. Check Single User processing. Another thing to try would be to copy the TeamDME program and all the data files to a workstation and run the program on the local workstation. Now if you have a 40 users network, you may not be able to reduce your workforce down to 1 user and 1 workstation. But what you could do is to give this person a days worth of work that will be duplicated by another user in the "live" system. If you continue to have problems with the system setup in this fashion, you have eliminated the NIC card, cabling and file server and the problem must be in the software, data or workstation.

5. Check Background Processing. TeamDME! normally assumes that your main job is using it's program. As a result it takes a fairly selfish view of system resources. In other words, if it can take a little more memory or more processor time to make itself run faster, it will. You can disable this behavior by starting the program with the /NOMOUSE parameter. This disables the mouse and reduces the performance of TeamDME!, but reduces the processor time TeamDME! takes, increasing the background processing time available to the workstation.

Note: The /NOMOUSE parameter has no effect on the file server at all in either memory or processor time in a typical LAN configuration.

Note: If you are working in an environment where TeamDME! is being executed on the file server (Win 95/98 as a server or WinNT Terminal Server or Citrix environments) you will need to make sure you are using the /NOMOUSE parameter for all users. If you don't you will find that performance for all users drops to an unacceptable low level.

Note: Once you have tried the /NOMOUSE parameter and found it causes no difference, remove it from the command line to allow TeamDME! to run at full speed again.


H. Running the Ziff-Davis NetBench 6.0. The NetBench 6.0 program is a freeware product by Ziff-Davis Publishing (the same people who bring you PCMagazine and PCWorld) to load a network environment and test its performance simulating a number of users working with the network. The setup, running and viewing the results of this tool is beyond easy reach of the average user, so we recommend that you have your network technician do it. And we have not attempted to give you step-by-step instructions on its use. It has been left to you to read the instruction included with NetBench and follow them.

1. Download NetBench from the Ziff-Davis web site. The site address is: http://www.zdnet.com/zdbop/ The program is about 3.8MB and the documentation is an additional 0.8MB.

2. Install NetBench to your network using the enclosed instructions.

3. Verify the network connections. The first test is a simple network test to make sure each workstation can read/write to/from the file server. Start NetBench and start the client on each workstation you intend to participate in this test. Then choose the VERIFY.NDB when asked which test you want to run. This test takes approximately 15 minutes to run if no errors are found. If the VERIFY test finds problems with one or more workstations, stop here and determine why. This is a simple test and should only fail if you have obvious hardware problems.

Note: This test is designed to be repeatable. If you are having sporadic problems, you may want to run this test 3 to 5 times dropping failing workstations from the test, until you are getting consistent results from the test.

Note: We had one client running a Windows NT server with 40 Win98 workstations who was having frequent problems within TeamDME! which resulted in 1010 Read errors, 1011 Write errors and 1132 Bound errors. He had had 2 Windows NT certified technicians look at his network and declare that it was setup correctly. But for 6 months he continued to have problems. The network technician swore that the problem must be software (they always say that). So we worked onsite for 2 days running multiple tests and found using the NetBench program that 14 of his 40 workstations could not even pass the simple connection test (VERIFY).

4. Stress Test the Network. The second test is more comprehensive and is designed to make sure each workstation can read/write/seek large blocks of data to/from the file server. Start NetBench and start the client on each workstation you intend to participate in this test. Then choose the NB_DM60.NDB test when asked which test you want to run. This test takes approximately 3 to 4 hours to run. It uses a series of mixes to stress the workstations, network and all connections. If there are errors, a workstation will turn Red on the controller screen. NetBench will even give you a basic explanation of the test being run when the error appeared and in some cases, even suggest what may be the problem.

Note: This test, like the VERIFY test is designed to be repeatable. If you are having sporadic problems, you may want to run this test 3 to 5 times dropping failing workstations from the test, until you are getting consistent results from the test.

5. View and Analyze the Results. NetBench saves the results of both the VERIFY and NB_DM60 tests for later viewing. Do this by starting the NetBench program and clicking the View Results button. Then, select the test you wish to view and you should see a list of Result files, one for each test. Click on the test results, then click the View button. NetBench will then create an Excel spreadsheet with the results of the selected test. It's important to use a workstation with Excel installed on it to view the results. The results will show client/server responses, throughputs, errors, etc. A network technician should be able to use these results to spot possible problems within the hardware/network.

Note: This is where you would concentrate if you are not having problems with obvious network problems, but rather performance issues. The results of your network test will show the performance of your network with 1, 4, 8, 12, 16, and more users in the system. If you see that your performance is sharply falling off with the number of users added to the system, you need to analyze your system for the performance bottleneck. And depending on what you find, you can add change configurations, add equipment, etc. to remove the bottleneck.

For example, you may find that you are maximizing the server's cache memory, so you increase the amount of RAM available to the server and its cache. You may find that the bandwidth of your cable is being overloaded and you would begin replacing your 10Mbs with a 10/100Mbs hub and begin replacing NIC cards with 100Mbs cards. There are a number of bottlenecks in the typical LAN network that your network technician should be able to identify and recommend solutions (of course, most solutions = $$$).

Some things to consider on the file server would be available memory, free hard disk space, interface to the hard disk (SCSI, IDE, EIDE, etc), NIC card (10 or 100Mbs), processor speed, type of processor chip, number of processors (1, 2, 3 or 4, although current versions of both Novell and Windows NT profit little from multiple processors).

Some things to consider on the workstations would be similar, available memory, free hard disk space processor speed and type of processor chip. You can tell fairly easily whether upgrading your workstations would be profitable or not. First if you have a faster computer on the network, have someone work on the slower workstation for an hour or two and then use the faster workstation for the same amount of time. If the user sees a noticeable difference, local workstation performance is a factor. If there is little/no difference, network performance is the problem you want to address.

Note: This represents our desire to enhance the performance of one workstation or the network as a whole. Obviously the performance of the network is dependent on the proper functioning of the workstation and network. Trying to enhance performance would be worthless before you have insured that the system is working properly.


I. Intervention, the Last Straw. Okay what do you do if you have run each of the test and come up empty-handed. Well, we have designed this white paper to exhaust our knowledge of hardware problems and solutions (which admittedly is only a little knowledge). We will be the first to acknowledge that your best defense is a good network technician. And while we have poked a little bit of fun at their knee-jerk reaction to place the blame for anything they don't understand on the software, most are knowledgeable and well-intentioned. Their's is a difficult job and we really don't envy them. Similar to ours, people only call you when they are having problems and often mad, and when we leave their systems are generally working, but they have an unexpected bill, so they are still not happy.

Traditionally the customer has been caught in the cross-hairs of the hardware person says the problem is software and the software person says the problem is hardware. How is the customer, who just wants it to work suppose to arbitrate between two "professionals"? Well the customer should never be placed in that difficult position and if he/she is, it represents a failure of the hardware/software profession. We will always try to work with your hardware/network person to assist them is solving your problems, but sometime this is not enough.

If you feel that you are not getting the results you want from either of us, we will be glad to step up to help you. We will work with you over the phone to do as much as we can including creating a problems log, analyzing it, identifying a demonstrable error and moving toward a solution. When we have done as much off-site as possible, we will come on-site and continue the process using the steps we have listed. Of course, this intervention costs more in time and travel than having your hardware person come on-site, but this way we eliminate the conflict between the hardware and software people because we will work on both.

If you think you would like us to intervene in your problem, call us for more information about this option.
Questions or comments regarding the website should be directed to the TeamDME! webmaster.

Last updated: October 21, 2002
Copyright © 2002 Spectrum Software, Inc. All rights reserved