Thursday, August 31, 2006

Java Tip #11 - Beware the Fickle Session ID

In March of 2005 I posted the article Performance 101 - Avoiding Work. In that tip I described a process where we store the users last jdbc connection in a hashmap using the HTTP Session ID as the key. In the last few months, several of our customers had been experiencing random, unexplained connection leaks when under a heavy load.

We were unable to duplicate this issue in house, even though we executed the same business process against a copy of the customer database. This of course made it very hard to find... we had no choice but to debug a production system. A couple of weeks ago I flew to Minneapolis (nice place, btw) for a week and the hunt was on.

I had been looking at all our fancy connection handling code trying to find the leak, and missed the problem for several days. Finally after staring at logs for several hours on the day we were scheduled to leave, we noticed something odd. The HTTP Session ID changed for a LIVE session. I had never even considered that a Session ID could change after a user logged in. Here is the scenario that caused the leak.
1) Connection X goes into map with key 123!456!NONE
2) Session ID changes to 123!456!7890
3) The next request comes in, there is NO connection in the map using key 123!456!7890, so connection Y is retrieved from the connection pool.
4) Connection X just leaked.

The servlet api doesn't say anything about the ID changing, only that it must be unique. The underlying issue is Weblogic appending proprietary information to the session "cookie" and returning that as the ID when HttpSession.getId() is called. We use the default name of JSESSIONID for the cookie and a sample would look like:
JSESSIONID=1E9Xwn7nLYfOsc1oSD7iaWHMXzpHga5cQj!-1587343083!-1587348922

According to the HTTP Session Replication Failures support pattern, the JSESSIONID format is:
SessionId!PrimaryServer JVM Hash!SecondaryServer JVMHash

If you experience a session replication failure, the SecondaryServer JVMHash will change to NONE.
JSESSIONID=1E9Xwn7nLYfOsc1oSD7iaWHMXzpHga5cQj!-1587343083!NONE

We actually saw it start with NONE, and then get a JVMHash. The theory is that the replication failed, then succeeded at a later time. We also wonder if during a heavy load, did a delay in replication cause the ID to begin at NONE then eventually get assigned. We didn't have the opportunity to research, but plan to during our next load test.

Our fix was to use only the 1st 52 bytes of the value returned by HttpSession.getId() as the key into the hashmap. The length is configurable in weblogic.xml, so you need to use that configured value as the length.

I'm sure BEA just doesn't know what to do here. There is probably tons of code out there relying on HttpSession.getId() returning the full JSESSIONID cookie, so changing its default behavior would not be a good idea.

Watch out for the fickle session id! Please let me know if you've experienced this before on application servers other than Weblogic. I tested this on 8.1 - 9.2 and saw the same behavior.

2 Comments:

Blogger Unknown said...

Why on the earth would yo do this instead of just using any Connection Pool implementation?

This does not sound like a good idea:

1.- JDBC connections are not thread safe, and a single session could have more than one concurrent request (more than one tab open in firefox, AJAX requests, for example)
2.- This does not scale well. The reason of being for Tangosol and such is that your server farm can grow indefinitely, but your database cannot. You are proposing a solution that opens a connection to the database for each user.
3.- You will end doing funky things to prevent JDBC connection timeout, if it is less than the session timeout.

More monsters ahead: HA, failover, etc. You should use any open source connection pool implementation instead.

7:38 AM  
Blogger Billy Bob Bain said...

Thanks for the response.
We do use BEAs connection pool. This map is used to hold the connection obtained from the pool while the transaction is open. This is not a webapp, so #1 does not apply. It is a COM application talking to a servlet. On #2, it does scale. We've proven it. It only holds that connection for the life of the transaction. You are right on #3. We did have to put a session listener in to handle the rolling back the transaction on session timeout.

Wondering if JTA might with this. Thanks again.

1:36 PM  

Post a Comment

<< Home