Distributable J2EE Web Applications
A Container Provider's View of the current Servlet Specification.
The
'Java(tm) Servlet Specification, Version 2.4'
makes a number of references to 'distributable' web applications
and httpsession 'migration'. It states that compliant deployments
"...can ensure scalability and quality of service features like
load-balancing and failover..." (SRV.7.7.2). In today's demanding
enterprise environments, such features are increasingly
required. This paper sets out to distil and understand the
relevant contents of the specification, construct a model of the
functionality that this seems to support, assess this
functionality with regard to feasibility and popular requirements
and finally make suggestions as to how a compliant implementation
might be architected.
Prerequisites.
TODO - A good understanding of what an HttpSession is, what it is
used for and how it behaves will be necessary for a full
understanding of this content. A comprehensive grasp of the
requirements driving architectures towards clustering and of
common cluster components (such as load-balancers) will also be
highly beneficial.
The Servlet Specification - distilled:
When a webapp declares itself <distributable/> it enters into a
contract with it's container. The Servlet Specification includes a dry
bones description of this contract which we will distil from it and
flesh out in this paper.
For a successful outcome the implementors of both Container and
Containee need to be agreed on exactly what behaviour is expected of
each other. For a really deep understanding of the contract they will
need to know why it is as it is (TODO - This paper will provide such a
view, from both sides).
The Specification mandates the following behaviour for distributable
Servlets:
Non-Distributable Servlets
Only Servlets deployed within a webapp may be distributable. (TODO -
Ed.: is there any other standard way to deploy a Servlet? Perhaps
through the InvokerServlet?) (SRV.3.2) TODO - WHY?
Single Threaded Servlets
SingleThreadedModel Servlets, whilst discouraged (since it is
generally more efficient for the Servlet writer, who understands the
problem domain, to deal with application synchronisation issues) are
limited to a single instance pool per JVM.(SRV.2.3.3.1)
Multi-Threaded Servlets
Multithreaded HttpServlets are restricted to one Servlet
instance per JVM, thus delegating all application
synchronisation issues to a single point where the Servlet's
writer may resolve them with application-level
knowledge (SRV.2.2).
Distributable State
The only state to be distributed will be the HttpSession. Thus all
application state that requires distribution must be housed in an
HttpSession or alternative distributed resource (e.g. EJB, DB,
etc.). The contents of the ServletContext are NOT distributed.
(SRV.3.2, SRV.3.4.1, SRV.14.2.8)
HttpSession Migration
Moving HttpSessions between process boundaries (i.e. from JVM to
JVM, or JVM to store) is termed 'migration'.In order that the
container should know how to migrate application-space Objects,
stored in an HttpSession, they must be of mutually agreed type.
In a J2EE (Version 1.4) environment (e.g. in a web container
embedded in an application server), the set of supported types
for HttpSession attributes is as follows, although web
containers are free to extend this set (J2EE.6.4): (Note that
using an extended type would impact your webapp's portability).
java.io.Serializable
javax.ejb.EJBObject,
javax.ejb.EJBHome
javax.ejb.EJBLocalObject
javax.ejb.EJBLocalHome
javax.transaction.UserTransaction (TODO ??)
- "a
javax.naming.Context object for the java:comp/env context" (TODO)
Breaking this contract through use of an unagreed type will
result in the container throwing an
IllegalArgumentException upon its introduction to
the HttpSession, since the container must maintain the
migratability of this resource (SRV.7.7.2).
Migration Implementation
How migration is actually implemented is undefined and left up to
the container provider (SRV.7.7.2). The application is not even
guaranteed that the container will use readObject()
and writeObject() (TODO explain) methods if they are
present on an attribute. The only guarantee given by the
specification is that their "serializable closure" will be
"preserved" (SRV.7.7.2). This is to allow the container provider
maximum flexibility in this area.
HttpSessionActivationListener
The specification describes an
HttpSessionActivationListener interface. Attributes
requiring notification before or after migration can implement
this. The container will call their willPassivate()
method just before passivation, thus giving them the chance to
e.g. release non-serialisable resources. Immediately after
activation the container will call their
didActivate() method, giving them the chance to
e.g. reacquire such resources. (SRV.7.7.2, SRV.10.2.1, SRV.15.1.7,
SRV.15.1.8). Support for a number of other such listeners are
required in a compliant implementation, but these are not directly
related to session migration.
HttpSession Affinity
Given that:
-
Multiple instances of a distributable webapp will be running
in multiple different JVMs within our proposed cluster
-
A client browser may throw multiple concurrent requests
for the same session at this cluster
-
The spirit of the specification and performance
requirements call for such a grouping of requests to be
processed concurrently, rather than serially,
we can see that any implementation must resolve these
apparently contradictory issues satisfactorily.
The Servlet Specification states:
"All requests that are part of a session must be handled by
one Java Virtual Machine (JVM) at a time." (SRV.7.7.2).
The intention of this statement is to resolve such
concurrency issues. It prunes the tree of possible
implementations substantially, insisting that all concurrent
requests for a particular session are delivered to the same
node.
Delivering requests for the same session to the same node is
known variously as 'session affinity', 'sticky sessions',
persistent sessions' etc., depending on your container's
vendor. The specification is trading complexity in the
web-container tier for complexity in the load-balancer
tier. This added requirement will impact the latency of this
tier, in that the load-balancer will generally need to parse the
uri or headers of each http request travelling through it (in a
non-encrypted form) in order to extract the target session
id. However, the reduction of potentially awkward concurrency
issues/race conditions in the web-container tier is a gain
considered worth this sacrifice.
It is worth noting that, since we have now introduced a
requirement for the load-balancer tier to have knowledge of
the location of httpsessions within the web-container tier,
the ability to 'migrate' these objects may, therefore,
require a certain amount of coordination between the two
tiers.
Background Threads
The previous requirement reduces our problem from race
conditions between distributed objects in different JVMs, to a
situation where we simply have to manage coordination between
multiple threads in the same JVM. The purpose of this
coordination is to ensure that access to container managed
resources that are available to multiple concurrent application
space threads is properly synchronised.
Whilst the container has implicit knowledge about any thread,
executing application code, for the lifecycle of which it is
responsible (i.e. request threads), it has no control over any
thread that is entirely managed by application code - Background
thread. Such threads might execute across request boundaries,
accessing otherwise predictably dormant resources that might
otherwise be passivated or migrated elsewhere.
Fortunately, the specification also recommends that references
to container-managed objects should not be given to threads that
have been created by an application (SRV.2.3.3.3, SRV.S.17) and
whose lifecycle is not entirely bounded by that of a request
thread. The container is encouraged to generate warnings if this
should occur. Application developers should understand that
recommendations such as this become all the more important when
working in a distributed environment.
This concept of "container-managed objects" needs more careful
discussion and we shall look at it more closely later.
HttpSession Events
Finally, given that HttpSessions are the only type to be
distributed and that they should only ever be in one JVM at one
time, it should come as no surprise that ServletContext and
HttpSession events are not propagated outside the JVM in which
they were raised (SRV.10.7) as this would result in container
owned objects becoming active in a JVM through which no relevant
request thread was passing.
Is this adequate ?
Armed now with a deeper understanding of exactly what the
specification says about distributable webapps, we can begin to
speculate on what a compliant implementation might look like.
The specification has done a reasonably good job of outlining our area
of interest. Before implementing a container, however, there are a
number of issues that we still need to address.
Catastrophic failure
TODO -
Looking at what this specification actually says about
distributable webapps, it can be seen immediately that it seems
to reliably outline a mechanism for the controlled shutdown of a
node and the attendant migration of it's sessions to [an]other
node[s], or persistant storage.
The ability to migrate sessions on controlled shutdown is useful
functionality (maintenance will be one of the main reasons
behind the occurrence of session migration), but it does not go
far enough for many enterprise-level users, who require a
solution capable of transparent recovery, without data loss,
even in the case of a node's catastrophic failure. If a node is
simply switched off, thus having no chance to perform a shutdown
sequence, then volatile state will simply be lost. It is too
late to call HttpSessionActivationListener.willPassivate() where
necessary and serialise all user state to a safe place!
Container implementors must ask themselves the question - 'What,
within the bounds of the current specification, can we do to
mitigate this event?'.
Before moving into more detailed discussion about session
migration we need to discuss the synchronisation of session
attributes and to introduce the concepts of 'Reference vs. Value
Based Semantics' and 'Object Identity'.
Session Attribute Synchronisation
We have shown that there are many times at which a container may
wish to take a backup copy, via serialisation, of a session or
session attribute. In a multi-threaded environment the container
needs to be able to ensure a consistent view of the object that
it is backing up. i.e. the object must remain unchanged
throughout the process of serialisation, otherwise the backup
copy can not be guaranteed valid.
If we classify session attributes as "container-managed
objects", then we can see that the specification 'recommends'
their references not being given to any application thread
running beyond the scope of a request. This means that, provided
that no request threads for this session are running in the
container, we can be assured of thread-safe access to it's
attributes and thus a consistent snapshot of the session's
state.
If we classify sessions but not session attributes as
"container-managed objects", then this assumption breaks down.
Even given this asumption, backing up of sessions when a
relevant request or background thread is running (e.g. 'When'
policies 'Immediate' and 'Request') become problematic. This is
unfortunate, because inability to implement these policies
impacts on the guarantees that the container can make and thus
the quality of service that it can offer.
These issues are not isolated to the management of HttpSessions,
they are present throughout distributed software
architectures. Aside from an explicit synchronisation protocol a
common and practical solution is to alter the semantics of object equality.
Because the design of HttpSessions did not originally encompass their distributability
- explicit session attribute synchronisation protocol between
application and container code.
- shift from reference to value based semantics
Object Identity is also an issue.
Reference vs Value Based Semantics - TODO - needs refactoring.
Given the following Servlet code snippet:
Foo foo1=new Foo();
session.setAttribute("foo", foo1);
Foo foo2=session.getAttribute("foo");
Which of these assertions (assuming that Foo.equals()
is well implemented) would you expect to be true?
-
foo1==foo2;
-
foo1.equals(foo2);
If you expect foo1==foo2 then you are expecting
reference-based semantics.
If you are expecting reference-based semantics you might well
write code such as this in order to avoid unnecessary
de/rehashes:
Point p=new Point(0,0);
session.setAttribute("point", p);
p.setX(100);
p.setY(100);
and then might expect that :
((Point)session.getAttribute("point")).getX()==100;
Using value based-semantics, out of these three (TODO)
assertions, only the second of the equality tests would succeed.
Every parameter passed to and from a value based API must be
assumed to be copied from an original, since it may have come
across the wire from another address space.
For this reason, when you start dealing with (possibly) remote
objects in a distributed scenario, you generally shift your
semantics from reference to value. (c.f. Remote EJB APIs)
Unfortunately, the Servlet Specification, whilst clearly
mandating that every session attribute must be of a type that
the container knows how to move from VM to VM omits to mention
that a possible impact of doing this is an important shift in
semantics. This is exacerbated by the fact that, unlike EJBs,
which have been designed specifically for distributed use, the
httpsession API does not change (c.f. Local/Remote) according to
the semantic that is required, which is simply a single
deployment option. This encourages developers to believe that
they can make a webapp that has been written for Local use, into
a fully functional distributed component, simply by adding the
relevant tag to the web.xml. All attendant problems are
delegated, by spec and developer, to the unfortunate container
provider.
Thus the container provider must make a choice here
-
continue to support reference-based semantics in which case
migration may only occur when there are no active threads for
a session and there is an implicit contract between container
and containee that objects deriving from a session will not
have their references compared to objects deriving from
elsewhere, whose lifecycles may span across such periods.
-
make an explicit new contract that states that all interaction
with session attributes is by value and comparisons should
only be made in this way. The full ramifications of this
choice should become apparent as we progress further in this
paper.
Object Identity, Object Streams and Synchronisation
TODO - I guess Object Identity can only be preserved within a
single Object tree ? so attribute-based distribution will not
recognise the same object shared between different attributes
How can we guarantee, unless we know that no other threads are
running, the synchronisation of values as we stream them out of
the container ?
Session Backup - When
The answer to the concern of lost data is to frequently ship
backup copies off-node, so that in the case of its catastrophic
failure, we have a fallback position. The freshness of our
backup data depends directly on the frequency of this
process. This frequency is bounded by resource concerns and the
contract between container and containee, as discussed above.
Let us examine some of the possibilities:
-
Immediate - As soon as a change is made to a session, it is
backed up.
-
Most Accurate - This policy constrains our window of
data-loss as much as is reasonably possible.
-
Most Expensive - This accuracy has a cost. Every write to
a session object will result in expensive back up code
being triggered.
-
TODO - Without some agreement on value-based semantics or
attribute synchronisation, the container cannot guarantee
thread-safety as it serialises its backup.
-
Request - All changes to a session are backed up at the end of
each relevant request.
-
Less Accurate - This is less accurate than the 'Immediate'
policy described above, since a failure halfway through a
request thread would result in the loss of all changes
that it had made to it's session.
-
Less Expensive - Since backups are only done at the end of
each request, they will be fewer than 'Immediate' mode,
resulting in much less expense.
-
Inconsistency - Assuming that the session is somehow in a
'consistent' state at the end of each request is
misguided, since multiple requests may overlap. Although,
it may be, with the benefit of knowledge of the
application at one's disposal, that this problem can be
safely discounted.
-
Since there may be more application threads than just this
request running in the container, this policy suffers the
same synchronisation/semantic issues as 'Immediate',
although to a lesser extent.
-
Request Group - All changes to a session are backed up as soon
as all associated active requests have been processed.
-
Less Accurate - This policy will be less accurate still,
if requests for the same session overlap within the
container.
-
Less Expensive - For the above reason this will lead to it
being still less expensive.
-
Consistency - This policy guarantees that what it backs up
is a consistent view of the session's contents. No request
is only half processed.
-
Thread-Safe - The real win for this policy is that, with
no contract between containee and container regarding
attribute synchronisation/semantics, other than that
described in the Servlet Specification, the container may
access session attributes for backup, in the knowledge
that no other application code is concurrently modifying
them.
-
WebApplication - Backup occurs in line with web application
lifecycle. i.e. Not until the web application is
stop()-ed by it's container.
-
This policy gives no protection against catastrophic
failure, but is fine for maintenance-only scenarios.
-
There is no associated runtime overhead.
-
As with the 'Request Group' policy, there are no
consistency or synchronisation issues. All request and
background threads will have terminated before the session
is backed up.
-
Timed - 'dirty' sessions are backed up at regular intervals,
orthogonal to the lifecycles of requests, web applications
etc...
-
This might conceivably be useful to overlay on top of
e.g. the 'Request' policy if your request threads took a
long time to run, or the 'RequestGroup' policy if you
expected long periods without backing up because of many
overlapping requests for he same session.
-
A backup policy that worked in the manner would be of
conveniently tunable accuracy and impact.
-
However such a policy would suffer from all the
synchronisation and semantic issues common to 'Immediate'
and 'Request' approaches.
TODO - NEEDS CONCLUDING
Session Backup - What ?
Once we have decided when to backup, we must think about what to
backup. Candidates include the following:
-
Session - We backup the whole session every time it changes.
-
Simple - This is the simple, brute-force solution.
-
Expensive - This will be an expensive approach if there
are e.g. many attributes in your session and you regularly
touch one of them.
-
Robust - Even if a backup message went astray, perhaps due
to a failure in your transport, the next change to the
session would bring the off-node backup fully up to date.
-
Object identity may be maintained across the scope of the
whole session. If the same object appears multiple times
in your session, then this may be an important feature and
worth the price.
-
Delta - Every time some change occurs to the session,
encapsulate this change in an object and back that up. Deltas
may be batched until the 'When' policy flushes this onto the
distribution layer.
-
This is a more complex solution, requiring infrastructure
capable of detecting changes and classes to encapsulate
these.
-
This will be a more lightweight solution in the case of
sessions with many attributes and few changes but the
additional complexity might outweigh this benefit in the
case of a very small session.
-
This solution depends more upon the reliability of the
transport used to distribute the backups, since if one is
lost, the backup may become invalid. To be guaranteed
valid, it must contain the complete set of changes that
have occurred to the session, correctly ordered (TODO:
excepting some optimisation which we will discuss later).
-
Object identity may only be scoped within the delta. If
the same object appears multiple time in the same session,
both inside and outside the delta, object identity will
not be preserved and reference-based semantics will break
down.
-
All Sessions - This, of course can only be done if you select
'WebApp' as your 'When' policy.
-
A major win in this strategy is that Object Identity may
be scoped across all sessions. i.e. if you have common
objects referenced by many different sessions,
reference-based semantics will hold for them, since all
sessions will be serialised together in a single
synchronised block.
HttpSessionActivationListener:
The Servlet Specification has one final curve ball to throw at
the Container Provider here. We have already seen how
HttpSessionActivationListeners are notified around
passivation/activation. Assuming that they require this in order
to prepare themselves for serialisation, or recover from
deserialisation it is likely that when the container calls their
willPassivate() method, that they will move to a
new state that, whilst valid for serialisation, is invalid for
normal runtime operation. They might e.g. release a resource
that would be too expensive or awkward to passivate, knowing
that they can reacquire a replacement upon re-activation.
Imagine now that rather than simply migrate a session from one
node to another, we are simply taking a backup of it at the end
of a request group, a guard point against the node's
catastrophic failure. If we simply call
willPassivate() and then serialise a copy into our
backup store, we will have the backup that we required, but will
have left the attribute in a state which may mean it is invalid
for normal operations
The solution is to call didActivate() immediately
after taking the copy, thus restoring the attribute to its
previous valid state. In effect the backup procedure may be
thought of as a mini-migration off a node and then straight back
onto it again, leaving a spare copy off-node.
This has interesting ramifications for the whole 'Session'
backup policy which may end up doing this to many attributes
which have not actually been added or altered since the last
backup was taken. If this involves an expensive release and
reacquisiton of resources, the impact may be substantial. The
'Delta' policy will not suffer from this inefficiency, since it
will only concern itself with attributes that have changed.
Optimisations
TODO - These need to be discussed here so that we can draw upon
them when discussing different impls.
-
lastAccessedTime - don't distribute, ignore until last minute
? If a node dies.all sessions upon it must be considered
just touched, since a request thread for them may have caused
the crash. Time of node death should be noted and associated
with these sessions. upon rehydration this is the lastAcessed
value that they should adopt.
TODO - perhaps move this into previous section - a naive
impl may..., but: maybe not - think about it...
Unfortunately, the specification requires that every session
object carries a 'LastAccessedTime' value. Which is updated
every time the session is retrieved by an application thread
for reading or writing. Thus any request requiring stateful
interaction within the webapp will have the side effect of
writing a change to the session. Taken literally these
changes can be very expensive in a distributable scenario as
a naive implementation will require each such change to be
exported to another vm in case of catastrophic node failure.
-
Batching of deltas - compression of e.g. rem(X);set(X) or
se(X);set(X) etc..
-
TODO - lowering contention on session table...
DefaultServlet is stateless (session will never be fetched) so
can be excluded from the equation. This means that requests for
images etc can be excluded from concurrent request groups,
reducing them substantially. A smart lb would know about this
and could relax affinity as well (although a caching tier might
be serving this content before you hit the lb - this tier also
needs coordination with web-container teir so it can be
selectively flushed upon webapp redeployment).
Conclusions
In conclusion, we have been able to show that, whilst the spec
does not explicitly cater for recovery from catastrophic
failure, it does provide the Container Provider with enough
structure to be able to implement various solutions to this
problem. Unfortunately, because of its implicit reference-based
semantics and failure to impose a mandatory protocol for the
synchronisation of distributable session attributes it does not
go quite far enough to allow such implementations suffient room
to manoeuvre that they can deliver optimium results.
Given the current state of the specification, therefore, the
solution space is an area of trade-off and compromise between
not only accuracy and economy as might well be expected but also
the semantics of reference, value and identity, which are really
areas that should not be open to interpretation. Any application
developer involving themselves in this would therefore be well
advised to aqcuaint themselves fully with these concepts so as
to be prepared for the unexpected side-effects that they are
likely to cause.
No single set of the above policies is likely to implement the
desired "silver bullet", however, with a solid understanding of
the issues involved and the route that various implementations
take through this maze, the application architect has a much
improved chance of a successful outcome.
With this in mind, we may now survey existing implementations in
the open source arena, with particular respect to the solutions
that they have chosen to overcome the problems that we have
identified.
Current Open Source Implementations
The Jetty distribution contains a pluggable distributable session
manager, written by the author, which relies on value-based
semantics to implement an immediate, by default, delta-based
replication strategy over JGroups (TODO - link), although other
distribution policies, notably by CMP EJB and JBoss(tm) clustering
(see below) are alos available.
With mod_jk and session affinity, via a pluggable session id
generator, backups may be done asynchronously, or if deployment
happens under a dumb load-balancer, backups may be taken
synchronously to ensure the consistency of the session no matter
where in the cluster a request lands.
Tomcat 4.x - Filip Hanik
Tomcat 5.x - ???
JBoss(tm) contains a ClusteredHttpSession service (TODO - check
name) which backs onto the JBoss clustering layer which is
implemented through replication using JGroups (TODO -
link). Replication is done on according to whole 'Session'
policy. The regularity of the backing up depends on the Web
Container making use of the service.
The Jetty integration provides a pluggable 'Store' component which
allows it to make use of this medium. Backups are taken
immediately any change is made to a session. Jetty's other
distribution policies may also be used.
The Tomcat integration relies on this service for it's
transport. (TODO - finish). CONFIRM.
Apache Geronimo (TODO - URL)
The Geronimo implementation is currently being undertaken by the
author of this paper, and therefore takes into account all points
raised herein.
Further Reading:
-
SRV.7 Sessions
-
SRV.7.6 Last Accessed Times
-
SRV.15.1.7 HttpSession
-
SRV.15.1.8 HttpSessionActivationListener
-
SRV.15.1.9 HttpSessionAttributeListener
-
SRV.15.1.10 HttpSessionBindingEvent
-
SRV.15.1.11 HttpSessionBindingListener
-
SRV.15.1.13 HttpSessionEvent
-
SRV.15.1.14 HttpSessionListener
TODO - more readings needed.
Further Isues
-
ClassLoading - ?? where does this impact ?
Further Notes
TODO - Look into Geronimo impl... SRV.10.6 Listener Exceptions
TODO - we can use a SecurityManager to prevent background threads
being created. We can prevent access from such a thread to a
container managed object, but we can't prevent such a reference
being held by such a thread...
does anything else other than session need to be distributed ?
security info
application level data (as opposed to user level)
etc
TODO - replication is faster than shared-store because
'getAttribute' is not a remote call. Effectively, with
replication, each replicant IS a shared store which processes
requests locally.
have we mentioned that migration is bad because cache hits go
down ?
Q for Niall - if Object Identity table is scoped within a single
ObjectOutputStream, then howcome contention on this table is
meant to be a VM-wide problem ? Since the table only appears to
scope the lifecycle of a single instance of this stream, how
could it make sense for it to be a longlived global construct?