Sunday, December 6, 2009

Performance Testing Web Applications

This section explains how the terms performance, load and stress testing seamlessly integrate towards end to end performance testing

Performance testing:

What to measure:
  • Response time of an action (Performance testing tool)
  • Actual throughput vs target throughput (Performance testing tool)
  • Responsiveness of the action/app (Example successful HTTP requests -200 response code) (Performance testing tool)

Baselining:
Exercise each action in the application with no load/concurrency
In the absence of a specific requirement of supported throughput, measure the actual throughput without increasing/manipulating target throughput under no load

If a requirement is clear as to minimum throughput to be supported, then execute the action specifically for target throughput set to the min required value and after execution, verify the actual generated throughput

Throughput is transaction rate (business transactions in a given interval of time)
For instance: bytes transferred/second, requests/second etc


Load testing:

What is load:
Number of concurrent users/connections
Number of concurrent threads accessing each action in the application
Throughput for each action

What to measure:
1.Breaking/stress point
Steadily increase the load (as defined above) for each action and execute test
The point where the test returns 500 response code (Server error due to high load) is the breaking/stress point for your application (Will be continued as part of the stress testing section in future)

2.Stability under acceptable load
Acceptable load is the load that you can expect your application to be subject to in production
Three ways to arrive at this acceptable load:
  • Management specifies in a requirement doc
  • Monitor production usage of the application by tailing logs or such and determine the load or frequency of use of each action
  • 60% of the breaking point/stress point, if accepted by stakeholders

Measuring stability for increased load means checking responsiveness and that latency introduced by load is within acceptable level (latency in this context is is response time with load - response time without load)

What's acceptable level defines verification criteria
In the absence of requirement of response time, observed latency is provided to stakeholders. If accepted, this is the baseline/yardstick
If the latency is not accepted by stakeholders, back to the drawing board for developers to tune the performance of the system (that's a topic for another day)

What to monitor:

1.System resources on the application server (System monitoring tool - Nagios, VMStat, PerfMon)
Monitor the CPU and memory utilization, Disk and Network I/O, Swap size during different load levels and between load tests to track/determine memory leaks
Nagios UI interface displays all the utilization info from the server as graphs
PerfMon results can also be analyzed as graphs
VMstat results must be collected and exported to Excel and graphs must be generated from this info
The reason I stress on graphs, is it's easier to find spikes in utilization when observed as graphs between time and utilization

One quick test for memory leaks is to run a high load test, stop the test and re-run it. Expect to see a dip in utilization before it rises again. If the spike continues with no dip, there you have your memory leak bug ..woo hoo !

2.File handles - Small scripts can be written to monitor count of file handles

3.Garbage collection- Profiling tools (JProfiler) -- note: Profiling tools slow down the system, so do not measure speed during profiling

4.Sessions in app server - Monitor the application server manager to ensure sessions are cleaned up appropriately after user logs out/test ends

Verification criteria:
If tests result in acceptable latency and 200 result codes, no memory leaks, test passes -- that's your baseline !
If tests result in out of proportion latency and 404/500 response codes, memory leaks, file a bug

Performance regression testing
All the measurements of baselines/yardsticks should be noted and compared against all future builds.

If performance is better in future build, new measurement is the new baseline
If performance is worse, file a bug and don't change the baseline. If the bug cannot be resolved, hold on to the old baseline and new results to track this long term

Yet to come in future posts:

Persistence testing
Stress testing
Scalability testing
Best practices in performance testing
Tools for performance testing will be covered in detail (JMeter, HTTPerf, OpenSTA etc)
Performance testing Linux/windows systems (IOMeter etc)

Stay tuned .. !