Thursday, July 2, 2009

How would you test a 4000 user community?

That question was the lead in to a discussion I had with a colleague this week. He had been interviewing someone for a performance testing role and that was the key question that could make or break a candidate. The typical response goes something like "I'd start with one user, then move on to five, then ten, then 50 then 100, then... all the way up to 4000." While the most common answer, this is entirely wrong. This kind of common yet broken testing process explains why the group of us that joined the conversation could each retell case studies of customers who had spent multiple years (and millions of dollars) on failed testing efforts.

The right answer goes like this:

a) Ask the hard questions
How many of the 4000 users are concurrent users and what is their use pattern? For example, many batch billing systems do nothing for 29 days per month, but then run through a massive number of transactions on the last day. Other systems have limited daily use until 5pm when their user community arrives home from work and then signs in. Are the users spread across multiple timezones?
If the data to discern the number of real concurrent users isn't available, that actually means two things to our project:
1) A separate project is needed to put in place tools to capture user behavior. The lack of such information can cause poor decisions in the areas of testing, capacity planning, security, and product usability design and functionality.
2) If no such data exists and the 4000 number simply means we have 4000 users in our database, we can now back into a more realistic upward bound through some basic calculations.

b) Functional performance test
Start with one user as a means of functional performance test. This enables you to validate your test cases and test scripts and flush out any immediate functional problems with the application(s).

c) Longevity testing, peak testing, failover testing
There are a variety of other tests with greater pertinence and validity in understanding the application's serviceability than simply running through the same script with a randomly increasing number of virtual users.

d) Load and Performance testing
If we've determined that simply starting with one user and continuing to double isn't the right process for load testing our application, then what is the right heuristic for getting to the Nth user? The answer is that it doesn't really matter, as we've determined, in effect, all of the above through the answers to our questions about the user community. If we have 4000 users in our database but don't know how and when they use the application, a test of 200 users as a top number is just as valid as a test of 2000 users. Using these numbers though, one can arrive at some guidelines by looking at the length of a user day. For example, if our application is used by an internal business customer that only works standard business hours in the eastern time zone, then we can surmise a roughly 8 hour work day, 5 days per week. Take 4000 users, divided by 8 hours, we can take an educated guess that there are 500 users per hour. Take an 8 hour day, multiply by 60 to get 480 minutes, divide the 4000 users by 480 and we can surmise that at any one minute interval there are likely to be 8 users on the system. In the absence of further information about our user community, we now have real, actionable numbers to test against. Rather than the dozens and dozens of incremental tests we were potentially facing, we can now break our cases into one user, 10 users, 500 users, and anything above that is essentially to discover the upward bound of our capacity.


These steps are a productive tool to improve the quality of your testing, as well as a great way to gain new insight into the candidates you interview.

5 comments:

Clay Roach said...

Great article Jeff. I always found it a bit strange that there's still a big misconception around what constitutes a valid test scenario. Would be nice if we just developed some better terminology and formulas around typical usage scenarios and then use that for the basis of defining the load.

"User" - a user could be a physical end user in the case of a GUI-based application, but could be more broadly extended to be a "consumer" of a web service or any of a number of entry points into an application
"# of Users" - not specific enough and should be completely disregarded for a performance test
"Concurrent user" - users that are simultaneously logged into the system at the same time. Typically, this can be calculated by counting the # of open 'sessions', however, this # can be skewed when the session duration is set arbitrarily long (i.e. one day).
"Transaction" - Single unit of functionality that corresponds directly to a use case or can be explained by a non-technical business user. For instance: Login, Enter Address, Submit Order, etc. Timing of transactions, when they occur and their distribution over a test scenario is the key to achieving a valid load test.
"Usage Pattern" - Pattern of clickstreams or login/logout behavior that is typical of a typical time period.
"Scenario Time Period" - Period of time that will be considered for the test. If this is a batch process, then it may just take 1 hour to complete, but in an endurance test may last days. For many apps, this also corresponds to the standard working hours for the users of the system (9x5, 24x7, 24x7 - multiple regions).

Process:
* Define concurrent user (use session metric or combination of # of hits/hour for certain key pages)
* Develop set of expected transactions (may use production data, or if a new app, then it may just be an estimate)
* Create a script that models a specific set of usage patterns (made up of the key transactions)
* Define the scenario using a calculation based on the # of concurrent users over the scenario time period
* Run the test and validate the time period, transaction distribution and expected latencies
* Re-run and tune think times to tune the scenario to match the expected transaction mix

I probably left a few key steps out, but I think this summarizes the typical flow.

Ryan said...

This question I tend to find can make or break an interview without the candidate feeling ackward that they didn't answer the question properly. They give their answer and simply wait for the next one. Just as this blog hit many different questions if this question is answered with any level of enthusiasm the candidate can essentially take over the interview and the interviewer can just sit back and listen to a person that can head up a Performance Testing COE. The truth of the matter is most candidates you speak to can't answer this question. They can use the tool, script, run a test, report results(maybe) but in most cases they will have a hard time explaining more transactional and usage patterns and bail out to "well we had 100 users" in this test. Its the difference between those that paint in black and white and those that can paint in color.

Freelance - Auction Script said...

Hi i am new bee here. I seen the information above it ill be more useful for me, its to good information to see in you blogs. ;-) Job Site Search Engine Script

web development company said...

Great post and very well written, that will really help you to learn Web Design, web development and SEO Strategies to help businesses web design company . You can find out many useful information about web design, seo and his work by visiting his blog and I Just wanna say thanks you for the information you have shared. Web Design Company India

Social Networking Development said...

This is good to read and I appreciate it that you shared indeed handy post.