I heard the phone clang down and my colleague Steve distraughtly mumble "She's going to kill the fish." His wife called to tell him about a phosphorus problem in their fish tank at home. She's a medical researcher, a biologist by training. Steve's first reaction when she told him there was a phosphorus problem was to ask if she had in fact done a phosphorus test. No, she said, but she'd run through all of the other chemical and algae tests, so of course it had to be the phosphorus and thus she'd started adding more phosphorus to the tank -- they'd know in a few days if that was the problem. Steve, imagining coming home to a tank of dead fish, was not impressed that his scientist wife had failed to use the scientific method at home.
It's so often like that in technology as well. Despite years of rigorous training to use the scientific method to guide our actions (it is called "computer science" for a reason), it's easy to throw all that away when faced with a challenge. A customer came to me the other day asking about monitoring tools to help with a production triage situation for a failing web service. A developer assigned to the task interrupted us saying that a fix had been deployed ten minutes prior and it looked like it was working. Let's reflect upon that:
a) No load or performance testing scripts existed for this web service.
b) No monitoring or profiling tools had been deployed with this service in either a pre-production or production setting.
c) A hopeful fix had been hot-deployed to production and left to run for a mere ten minutes before victory was declared.
d) No permanent monitoring was put in place to prevent the next occurrence of the problem.
e) Apart from a few manual executions of the service and a face-value assessment by one individual, no further validation to correlate the fix with the perceived problem occurred.
Chances are good that Steve's fish will be fine, but can the same be said for those cases where we play roulette with mission critical IT systems? Just as in the case of Steve's fish, there is no legitimate reason for a lack of objective, quantitative analysis except basic human apathy. Anyone who has ever taken a statistics course or been face-to-face with a serious production issue knows that just because many other tests have ruled out many options does not mean its safe to jump ahead and make assumptions just because of gut feeling -- why abandon a working method for one that brings doubt, risk, and exposure to criticism? Run the phosphorus test and let the results be your guide.
Monday, July 6, 2009
Friday, July 3, 2009
A video speaks a thousand words
It is nothing new for us to be constantly developing new educational tools. Demos and lab materials for trainings on site, or content for our evolving KnowledgeBase that augments the HP software support we provide to our customers. But the videos are the biggest hits so far. They pack a three minute punch of information without leaning on those lazy powerpoint icons. Check 'em out.
Business Transaction Management in palatable terms (no yawning required):
http://www.youtube.com/watch?v=49tQ9BpnrT0
In case you missed the first one, here it is:
Why J9? Well, since you asked...
http://www.youtube.com/watch?v=FjPlvO01SmA
Please rate them! We'd love to get some feedback on how well these videos connect with you and for god sakes, if they are still boring, please let us know.
Thursday, July 2, 2009
How would you test a 4000 user community?
That question was the lead in to a discussion I had with a colleague this week. He had been interviewing someone for a performance testing role and that was the key question that could make or break a candidate. The typical response goes something like "I'd start with one user, then move on to five, then ten, then 50 then 100, then... all the way up to 4000." While the most common answer, this is entirely wrong. This kind of common yet broken testing process explains why the group of us that joined the conversation could each retell case studies of customers who had spent multiple years (and millions of dollars) on failed testing efforts.
The right answer goes like this:
a) Ask the hard questions
How many of the 4000 users are concurrent users and what is their use pattern? For example, many batch billing systems do nothing for 29 days per month, but then run through a massive number of transactions on the last day. Other systems have limited daily use until 5pm when their user community arrives home from work and then signs in. Are the users spread across multiple timezones?
If the data to discern the number of real concurrent users isn't available, that actually means two things to our project:
1) A separate project is needed to put in place tools to capture user behavior. The lack of such information can cause poor decisions in the areas of testing, capacity planning, security, and product usability design and functionality.
2) If no such data exists and the 4000 number simply means we have 4000 users in our database, we can now back into a more realistic upward bound through some basic calculations.
b) Functional performance test
Start with one user as a means of functional performance test. This enables you to validate your test cases and test scripts and flush out any immediate functional problems with the application(s).
c) Longevity testing, peak testing, failover testing
There are a variety of other tests with greater pertinence and validity in understanding the application's serviceability than simply running through the same script with a randomly increasing number of virtual users.
d) Load and Performance testing
If we've determined that simply starting with one user and continuing to double isn't the right process for load testing our application, then what is the right heuristic for getting to the Nth user? The answer is that it doesn't really matter, as we've determined, in effect, all of the above through the answers to our questions about the user community. If we have 4000 users in our database but don't know how and when they use the application, a test of 200 users as a top number is just as valid as a test of 2000 users. Using these numbers though, one can arrive at some guidelines by looking at the length of a user day. For example, if our application is used by an internal business customer that only works standard business hours in the eastern time zone, then we can surmise a roughly 8 hour work day, 5 days per week. Take 4000 users, divided by 8 hours, we can take an educated guess that there are 500 users per hour. Take an 8 hour day, multiply by 60 to get 480 minutes, divide the 4000 users by 480 and we can surmise that at any one minute interval there are likely to be 8 users on the system. In the absence of further information about our user community, we now have real, actionable numbers to test against. Rather than the dozens and dozens of incremental tests we were potentially facing, we can now break our cases into one user, 10 users, 500 users, and anything above that is essentially to discover the upward bound of our capacity.
These steps are a productive tool to improve the quality of your testing, as well as a great way to gain new insight into the candidates you interview.
The right answer goes like this:
a) Ask the hard questions
How many of the 4000 users are concurrent users and what is their use pattern? For example, many batch billing systems do nothing for 29 days per month, but then run through a massive number of transactions on the last day. Other systems have limited daily use until 5pm when their user community arrives home from work and then signs in. Are the users spread across multiple timezones?
If the data to discern the number of real concurrent users isn't available, that actually means two things to our project:
1) A separate project is needed to put in place tools to capture user behavior. The lack of such information can cause poor decisions in the areas of testing, capacity planning, security, and product usability design and functionality.
2) If no such data exists and the 4000 number simply means we have 4000 users in our database, we can now back into a more realistic upward bound through some basic calculations.
b) Functional performance test
Start with one user as a means of functional performance test. This enables you to validate your test cases and test scripts and flush out any immediate functional problems with the application(s).
c) Longevity testing, peak testing, failover testing
There are a variety of other tests with greater pertinence and validity in understanding the application's serviceability than simply running through the same script with a randomly increasing number of virtual users.
d) Load and Performance testing
If we've determined that simply starting with one user and continuing to double isn't the right process for load testing our application, then what is the right heuristic for getting to the Nth user? The answer is that it doesn't really matter, as we've determined, in effect, all of the above through the answers to our questions about the user community. If we have 4000 users in our database but don't know how and when they use the application, a test of 200 users as a top number is just as valid as a test of 2000 users. Using these numbers though, one can arrive at some guidelines by looking at the length of a user day. For example, if our application is used by an internal business customer that only works standard business hours in the eastern time zone, then we can surmise a roughly 8 hour work day, 5 days per week. Take 4000 users, divided by 8 hours, we can take an educated guess that there are 500 users per hour. Take an 8 hour day, multiply by 60 to get 480 minutes, divide the 4000 users by 480 and we can surmise that at any one minute interval there are likely to be 8 users on the system. In the absence of further information about our user community, we now have real, actionable numbers to test against. Rather than the dozens and dozens of incremental tests we were potentially facing, we can now break our cases into one user, 10 users, 500 users, and anything above that is essentially to discover the upward bound of our capacity.
These steps are a productive tool to improve the quality of your testing, as well as a great way to gain new insight into the candidates you interview.
Subscribe to:
Posts (Atom)