Wednesday, October 29, 2008

Sometimes it's not about the technology

In a recent conversation, I was asked about tool support for enforcing application architecture. The original team for this project had been increased from under 5 to more than 20, and many of the newcomers failed to grasp the overall architecture, choosing instead to forge their own way ahead for each new functional component. A reflection of this was the copy-paste coding, duplicating blocks of functionality with minor variations throughout the code base. What had started as a an application rewrite aimed at creating a model for future application projects throughout the department was quickly turning into a very sloppy project.

The question I was asked is a common, but unfortunate one. Things have been lean in many IT departments for a long time, even before the current economic downturn. I can't remember the last time I met a development team that was using anything other than cost-free tools and it's becoming very rare to meet any development team that has more than a skeleton crew as so much work has been sent offshore in the name of cost cutting. This is not to complain about either of those factors, but rather to recognize the state of affairs; this is reality, and it isn't going to change. The real frustration is the lack of skilled managers who recognize the difference between management and technology problems, which is directly correlated with the emphasis on cost over quality.

In the particular case I mentioned, it is impractical to try to enforce architecture through technology and even if it were, a poor excuse for good management. Don't misunderstand - the right architecture is addicting and leads developers to want to follow the pattern, so can act as a natural enforcer. My argument is that expecting technology to separate the wheat from the chaff when it comes to talent or productivity is asking for trouble and letting managers act irresponsibly. If a team has "renegade" developers who are creating sloppy, unmaintainable code outside the constraints of a selected architecture, then that's a reflection on ineffective technical management as much as a reflection on those developers. A tool might tell you who broke the build or how many times the same code block has been copied and pasted, but it is the responsibility of technical managers to notice problems, quantify, respond, guide, and otherwise act as stewards. The root cause is that due an endemic lack of training, rampant age discrimination, short sighted yet ineffective cost cutting, and the outright gutting of departments, the IT managers of today are grossly ill-equipped to both understand and handle these problems.

Here's another story. I once worked for a company with a publicly stated policy of never hiring consultants or contractors of any kind in its IT division. The history behind this was that many years before, a large well-known technology consultancy had come in and spent several years and several million dollars evaluating the technology operations. The final deliverable was two binders worth of "best practices." Needless to say, the CIO of the time was less than pleased with the way his money had been spent. The irony of course is that he should have been pointing the finger at himself, not the consultants.

The problem itself in this case wasn't with the vendor, or the technology and methodology being recommended, but with the way the project was mismanaged by that CIO. Anyone who has been in this industry for even a few years has dozens of these stories to tell and a pattern becomes clear. It really doesn't matter what tools you are using or how modern your methodologies. The differentiating factor between successfully delivering working technology solutions and utter disasters boils down to one thing: people. If your managers are simply filling out a space on the organizational chart and not actually being effective, or even worse expecting software itself to act as panacea for personnel matters, then you've got problems regardless of how SOA, Agile, and Cloud you may wish things to be.

Friday, October 17, 2008

Tips and tricks for discovering performance issues before they become production issues.

Nobody wants to get the phone call. The one that comes at 6 am, informing you that it is going to be a very long day due to a crisis in the software you built. It could be an outright crash, data corruption, or users fleeing your application due to its lagging performance. It doesn't have to be like this; there are reliable ways to put your application under the microscope within the bounds of reasonable effort.

First and foremost, you need tools. There are a range of tools available for all budgets and as in most things, you get what you pay for. Start simple -- you need a way to generate load against your application and you need a code profiler. Freely available open source tools like JMeter are a good place to start -- you can become productive in under an hour. There are countless inexpensive code profilers available, just a web search away. What these lack in features, they make up for in simplicity and price. If you've never examined the performance of your application, then anything is a step up at this point.

Once we start generating consistent load against our applications and run a code profiler, what should we be looking for? The approach we use at J9 is one of pragmatism: focus on the areas with the largest potential return. For example, many books on Java application performance start with a discussion of String versus StringBuffer or StringBuilder. Perhaps they include information on choosing the appropriate Collection types to reduce synchronization overhead. These are fantastic suggestions, but there are plenty of other, more pragmatic improvements to be made before we get to this level of granularity. Take initial jvm heap sizes as an example. This is now one of the first questions we ask customers when evaluating their application performance -- have you explicitly set an appropriate size? We've seen countless customers pulling their hair out due to server crashes have their concerns dissipate by making this trivial change.

We've seen a common list of problems over the years that yield significant improvements with a few simple changes. Besides the initial jvm heap setting:

-- Object pooling to external dependencies. Are you using it? Are your pools sized correctly to expected demand?
-- XML serialization: This one normally shows up as high-CPU use and causes your application to spend all its effort in processing wrappers rather than the business problem at hand.
-- Poor database interaction: There are a number of basic issues here, like cumbersome sql statements and failure to properly index tables.
-- Lack of caching: Dynamically fetching otherwise static data (like jndi entries) or a weak caching strategy leading to poor hit ratio. This assumes that any caching at all has been implemented.
-- Slow report or page rendering: This is common, especially on pdf generation with large data sets. Typically this is an architectural problem stemming from a monolithic approach.
-- Slow network, inadequate hardware: Need to have the basics in place before we can expect performance.

Under even a consistent, moderate amount of load a rudimentary code profiler should offer hints at the above common problems.

Once we've got our tools selected and set up and we start looking for trouble, what should our approach be? Optimally, we're striving to employ the scientific method, beginning with simple divide-and-conquer. First, I take a look across the entire application, noticing what transactions stand out in terms of latency. My search begins there because that tends to be the low-hanging fruit: many problems, regardless of their root cause, manifest as latency issues. It also allows me to potentially focus on getting an early win -- if I can reduce an annoying 10 second response time to 5 seconds, it's a noticeable difference to an end user versus saving an extra 256 megabytes of ram which only a purist would notice. With the information about the slowest performing transactions, I can then take a tier or layer perspective. This step involves examining the performance of my application at each step -- where are the slow downs in the web tier? Are there bottlenecks in the database? What about web services or message oriented middleware? Maybe there are problems within the business logic itself. The key to this examination is keeping concerns separated: just look at the performance within one layer at a time, ignoring the performance within other layers.

Once you've identified the layer or tier where a problem is occurring, begin looking for data that enables you to build a testable hypothesis. Be careful to not assume the first problem you find is the root cause -- many problems are side-effects of the real issue. For example, is the sql statement slow running because it has not been optimized, because the invoked tables have not been indexed, or because of poor data validation and legitimacy? A typical work flow might be as follows:

-- Slowness is identified at the database.
-- Slowest running sql statements are identified.
-- Statement execution is divided into connection and execution latency. Which one is worse?
-- Assuming connection latency is significant, look for obvious issues:
-- Are we using connection pooling?
-- Are we actually pulling our connections from the pools we have configured?
-- Are the pools sized adequate with the expected transaction throughput?
-- Are there network or connectivity issues that would cause connections to expire or be slow to create?
-- Create a test or collect supporting data to eliminate each of these potential concerns.
-- For each issue identified, devise a solution and test the effectiveness of that solution.
-- Repeat process until application achieves acceptable performance levels.
-- Implement monitoring, thresholds, and alerts to proactively catch future issues.

As you work through problems, be aware of what you can and cannot control. There are physical limits to computing that are outside your control, just like code from a vendor will seldom quickly be repaired.


We can summarize the approach recommended by J9 as follows:

-- Look for solutions with a high probability of success
-- Be aware of the basic limits and issues with your environment.
-- Use a scientific, quantitative approach.
-- Put in place tools that make finding and testing for issues easier

Tuesday, October 14, 2008

SOA does not mean agnostic

As a follow up to our last post about the nature of SOA, I wanted to speak to a common misconception. Here are two statements I hear with all-too-common frequency:

1) SOA is about masking the route to a service from its implementation.

2) The underlying transport protocols used in SOA should always be hidden.


It seems in technology there is an ever-pervasive interest in adding more and more layers of indirection. Its a situation where if we don't understand the reason for such layers, we can quickly add unnecessary overhead and complexity to otherwise simple problems. In the case of the above statements, the underlying goal -- to create flexible implementations while masking details from a service user -- is a good one, but without understanding the nuances involve, the commonly heard refrains noted above are in gross error.

Let's address point one. It's true that things like UDDI and other kinds of service discovery help us to change the location or even implementation of our services, but let's be clear that implementing UDDI is not a primary goal of SOA, rather a by-product of an implementation approach. You can do SOA without having an intermediary routing requests between service providers and service consumers; you will simply have a more tightly coupled implementation. To be clear, there have been ways of creating this kind of indirection since long before the word SOA entered our lexicon.

Next we should address the idea of protocol agnosticism. If I had a nickel for every time I heard the statement "SOA masks the underlying protocol" I'd be retired. Let's reflect upon reality for a minute -- even if we ask a service directory for the means of reaching a service, we've made some kind of conscious effort to ask that that service directory, which innately means we've chosen a protocol. On a related example, if we offered access to a service over both message-oriented middleware and via http, we'd be forced to supply an sdk/api to service consumers that would mask the underlying protocol used. Whoops -- sdks? apis? Isn't that part of the very draw to Web Services in the first place, that is not having to support binary apis in multiple languages for all of our consumers? In the end, it's a fallacy and in some cases even undesirable to declare protocol agnosticism realistic or even a desired goal of SOA.

Monday, October 13, 2008

Systems are not Services

In our ongoing discussion about what SOA is and is not, it's time to address another popular misconception -- that everything is a service. Nothing could be further from the truth, which is why we also have to follow up with this reality check: SOA is hard.

Too often when working with customers, I hear the common misconception that they are already doing SOA. After all, they have a claims service, a billing service, and an order management service that have all been "web-enabled" therefore they are completely buzzword compliant! This is where we need to be able to address some terminology:

-- Service: A concrete, decomposed, independent functional offering.
-- Application: An aggregation of related services
-- System: An aggregation of related applications.

Now time for an example:

-- Service: User login, Change customer address.
-- Application: User account management
-- System: Revenue Management (encompassing applications like user account management, bill pay, and product ordering)


Thus, having two or more monolithic systems or applications now speaking http does not constitute SOA.

Now, on that note about SOA being hard. The reality is, while SOA makes a lot of sense on greenfield development or in times of major IT overhauls, it's a challenge to decompose a system or application into independent, reusable services and to carry through on their implementation. It will take aggressive sheparding by technical architects to both lay out such a vision and to keep services in their appropriate scope. For many organizations, SOA is enticing but will ultimately prove too overwhelming in the face of declining budgets and more pressing priorities. And, as long as vendors benefit from system lock-in, the likelihood of SOA adoption through vendor selection is unclear.

Friday, October 10, 2008

Why J9 is building protocols for HP LoadRunner?

Our experience last week at StarWest was filled with eureka moments. That's really the greatest benefit we get from attending conference like StarWest -- the opportunity to get hands on with IT practitioners and get a dose of honest feedback.

One of the questions we found ourselves answering over and over related to our jms and jdbc protocols for HP LoadRunner offerings, phrased typically as "Why do I need that?" In fact, the most surprising yet common question we answered was "Doesn't HP already offer this?"

Here's how this situation plays out. When you license HP LoadRunner, whether you realize it or not, you are paying for specific protocols that plug into it. The most common scenario involves purchasing the web protocol, which enables you to record and replay a series of actions against a browser-delivered application. This is a fantastic start down the road of performance and scalability testing, but it isn't the complete story. What if in the course of your performance testing, you want to understand just the scalability of your database separate from the overall scalability of your application? With just the web protocol, you can't do it. And, HP doesn't offer anything that will target your databases and message-oriented middleware in their current protocol lineup. This is where HP Partners like J9 are stepping in to fill a gap, with HP's blessing. With J9's JDBC protocol, you record and replay all of the sql statements -- the interactions between your application and the database -- which allows you to measure just the database performance. The same concept applies to your messaging providers. With J9's JMS protocol, JMS messages are captured and both the consumer and provider perspective can be simulated under load conditions.

Using feedback like we received at StarWest, J9 is continuing our road map of building protocols to enable customers to maximize their investment in HP LoadRunner. We encourage you to download our free, full-featured trial version and to send us feedback on your performance testing experiences.