SOFTWARE—PRACTICE AND EXPERIENCE
Softw. Pract. Exper. 2005; 00:1
Prepared using speauth.cls [Version: 2002/09/23 v2.2]
Profiling with AspectJ
David J. Pearce1, Matthew Webster2, Robert Berry2 and
Paul H.J. Kelly3
1
School of and Mathematics, Statistics and Computer Science, Victoria University of Wellington, NZ.
Email:
[email protected]. Tel: +64 (0)44635833.
2
IBM Corporation, Hursley Park, Winchester, UK
3
Department of Computing, Imperial College, London, UK
SUMMARY
This paper investigates whether AspectJ can be used for efficient profiling of Java programs. Profiling
differs from other applications of AOP (e.g. tracing), since it necessitates efficient and often complex
interactions with the target program. As such, it was uncertain whether AspectJ could achieve this goal.
Therefore, we investigate four common profiling problems (heap usage, object lifetime, wasted time and
time-spent) and report on how well AspectJ handles them. For each, we provide an efficient implementation,
discuss any trade-offs or limitations and present the results of an experimental evaluation into the costs
of using it. Our conclusions are mixed. On the one hand, we find that AspectJ is sufficiently expressive
to describe the four profiling problems and reasonably efficient in most cases. On the other hand, we
find several limitations with the current AspectJ implementation that severely hamper its suitability for
profiling.
KEY WORDS :
AspectJ, AOP, Java, Profiling, Performance
1. INTRODUCTION
Profiling program behaviour is a common technique for identifying performance problems caused
by, for example, inefficient algorithms, excessive heap usage or synchronisation. Profiling can be
formalised as the collection and interpretation of program events and is a well-understood problem
with a significant body of previous work. However, one area of this field has been largely unexplored
Received August 2005
Copyright c 2005 John Wiley & Sons, Ltd.
Revised April 2006
2
D. J. PEARCE ET AL.
in the past: effective deployment. That is, given a program, how can it be easily profiled in the desired
manner? In some situations, this is relatively straightforward because the underlying hardware provides
support. For example, time profiling can be implemented using a timer interrupt to give periodic access
to the program state. Alternatively, hardware performance counters can be used to profile events such as
cache misses, cycles executed and more [1, 2]. The difficulty arises when there is no hardware support
for the events of interest. In this case, instrumentation code must be added and various strategies are
used to do this. For example, gprof — perhaps the most widely used profiler — relies upon specific
support from gcc to insert instrumentation at the start of each method [3]. Unfortunately, it is very
difficult to capitalise on this infrastructure for general purpose profiling simply because gcc has no
mechanism for directing where the instrumentation should be placed.
In a similar vein, binary rewriters (e.g. [4, 5]) or program transformation systems (e.g. [6, 7]) can
help automate the process of adding instrumentation. While these tools do enable profiling, they are
cumbersome to use since they operate at a low level. For example, binary rewriters provide only simple
interfaces for program manipulation and, hence, code must still be written to apply the instrumentation.
Likewise, program transformation tools operate on the abstract syntax tree and require the user provide
complex rewrite rules to enable instrumentation. In a sense, these tools are too general to provide an
easy solution to the profiling problem. What is needed is a simple and flexible mechanism for succinctly
specifying how and where instrumentation should be deployed.
One solution is to provide support for profiling through a general purpose virtual machine interface.
For example, the Java Virtual Machine Profiler Interface (JVMPI) enables several different types of
profiling [8]. However, there are some drawbacks: firstly, it is a fixed interface and, as such, can
only enable predefined types of profiling; secondly, enabling the JVMPI often dramatically reduces
performance. The Java Virtual Machine Tool Interface (JVMTI) replaces the JVMPI in Java 1.5 and
attempts to address both of these points [9]. However, as we will see in Section 8.3, this comes at a
cost — the JVMTI no longer supports profiling directly. Instead, it simply enables the manipulation of
Java bytecodes at runtime, placing the burden of performing the manipulation itself on the user.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
3
An alternative to this has recently become possible with the advent of Aspect Oriented Programming
(AOP) — a paradigm introduced by Kiczales et al. [10]. In this case, the programmer specifies in
the AOP language how and where to place the instrumentation, while the language compiler/runtime
takes care of its deployment. This does not require special support (e.g. from the JVM), as the
target program is modified directly. However, very few works have considered AOP in the context of
profiling. Therefore, we address this by taking the most successful AOP language, namely AspectJ, and
evaluating whether it is an effective tool for profiling. We do this by selecting four common profiling
problems and investigating whether they can be implemented in AspectJ or not. We also examine what
performance can be expected in practice from the current AspectJ implementation, as this is critical to
the adoption of AspectJ by the profiling community. Our reasoning is that the outcome of this provides
some evidence as to whether AspectJ is suitable for general purpose profiling or not. For example, if we
could not implement these straight-forward cases, we would have little hope that other, more complex
types of profiling were possible. Likewise, if we were able to implement them, but the performance
was poor, this would indicate AspectJ was not yet ready for the profiling community.
The outcome of our investigation is somewhat mixed. We find that, while the language itself
can express the profiling examples we consider, several limitations with the current AspectJ
implementation prevent us from generating results comparable with other profilers (such as hprof).
As such, we believe these must be addressed before AspectJ can be considered a serious platform for
profiling. Specifically, the main contributions of this paper are as follows:
1. We investigate AspectJ as a profiling tool — both in terms of performance and descriptive
ability. This is done by evaluating four case studies across 10 benchmarks, including 6 from
SPECjvm98.
2. We present novel techniques, along with source code, for profiling heap usage, object lifetime,
wasted time and time-spent with AspectJ.
3. We identify several issues with the current AspectJ implementation which prohibit complete
implementations of our profiling case-studies.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
4
D. J. PEARCE ET AL.
Throughout this paper we use the term “AspectJ” to refer to the language itself, whilst “AspectJ
implementation” refers to the AspectJ implementation available from http://www.aspectj.org
(since this is very much the standard implementation at this time).
The remainder is organised as follows. Section 2 provides a brief introduction to AspectJ. Sections
3, 4, 5 and 6 develop AspectJ solutions for profiling heap usage, object lifetime, wasted time and timespent respectively. After this, Section 7 presents the results of an experimental evaluation into the cost
of using them. Section 8 discusses related work and, finally, Section 9 concludes.
2. INTRODUCTION TO ASPECTJ
In this section, we briefly review those AspectJ constructs relevant to this work. For a more complete
examination of the language, the reader should consult one of the well-known texts (e.g. [11, 12]).
AspectJ is a language extension to Java allowing new functionality to be systematically added to an
existing program. To this end, AspectJ provides several language constructs for describing where the
program should be modified and in what way. The conceptual idea is that, as a program executes, it
triggers certain events and AspectJ allows us to introduce new code immediately before or after these
points. Under AOP terminology, an event is referred to as a join point, whilst the introduced code is
called advice. The different join points supported by AspectJ include method execution, method call
and field access (read or write). We can attach advice to a single join point or to a set of join points
by designating them with a pointcut. The following example, which profiles the number of calls to
MyClass.toString() versus those to any toString() method, illustrates the syntax:
1. aspect ToStringCountingAspect {
2. private int totalCount = 0;
3. private int myCount = 0;
4.
5. pointcut myCall() : call(String MyClass.toString());
6. pointcut allCalls() : call(String *.toString());
7.
8. before(): myCall() { myCount++; }
9. after() : allCalls() { totalCount++; }
10.}
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
5
This creates two pointcuts, myCall() and allCalls(), which respectively describe the act of
calling MyClass.toString() and toString() on any class. They are each associated with
advice which is executed whenever a matching join point is triggered. Here, before() signals the
advice should be executed just before the join point triggers, while after() signals it should be
executed immediately afterwards (although it makes no difference which is used in this example). The
advice is wrapped inside an aspect which performs a similar role to the class construct. Aspects
permit inheritance, polymorphism and implementation hiding. When the aspect is composed with
a Java program — a process known as weaving — the program behaviour is changed such that
myCount is incremented whenever MyClass.toString() is called. Likewise, totalCount
is incremented whenever any toString() method is called (including MyClass.toString()).
Note, the current AspectJ implementation does not alter the program’s source code, rather the change is
seen in its generated bytecode. A problem can arise with a pointcut that matches something inside the
aspect itself as this can cause an infinite loop, where the aspect continually triggers itself. This would
happen, for example, if our aspect had a toString() method that was called from the after advice.
To overcome this, we can specify that a class C should not be advised by including !within(C) in
the pointcut definition.
Another interesting issue is determining which particular join point triggered the execution of some
advice. For this purpose, AspectJ provides a variable called thisJoinPoint which is similar in
spirit to the this variable found in OOP. It refers to an instance of JoinPoint which contains both
static and dynamic information unique to the join point in question. Here, static information includes
method name, class name and type information, while dynamic information includes parameter values,
virtual call targets and field values. To provide the dynamic information, the AspectJ implementation
creates a fresh instance of JoinPoint every time a join point is triggered, passing it to the
advice as a hidden parameter (much like the this variable). For efficiency reasons, this is only
done if the advice actually references it. For the static information, the AspectJ implementation
constructs an instance of JoinPoint.StaticPart which is retained for the duration of the
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
6
D. J. PEARCE ET AL.
program’s execution. This can be accessed through either the thisJoinPoint.StaticPart
or the thisJoinPointStaticPart variables. The latter is preferred as it allows greater
optimisation. We can alter our example in the following way to record the total number of calls to
toString() on a per-class basis, rather than lumping them all together:
2.
private Map totalCounts = new HashMap();
...
6. after() : allCalls() {
7.
Class c = thisJPSP.getSignature().getDeclaringType();
8.
Integer i = totalCounts.get(c);
9.
if(i != null) totalCounts.put(c,new Integer(i.intValue()+1));
10. else totalCounts.put(c,new Integer(1));
11. }
Note, thisJPSP is an abbreviation for thisJoinPointStaticPart and is used throughout
the remainder of this paper to improve the presentation of our code examples. Also, totalCounts
replaces totalCount from before. It can also be useful to access information about the enclosing
join point. That is, the join point whose scope encloses that triggering the advice. For example,
the enclosing join point of a method call is the method execution containing it and AspectJ
provides thisEnclosingJoinPoint to access its JoinPoint object. The corresponding static
component is accessed via thisEnclosingJoinPointStaticPart (henceforth thisEJPSP).
The final AspectJ feature of relevance is the inter-type declaration, which gives the ability to define
new fields or methods for existing classes and/or to alter the class hierarchy. For example, the following
alters MyClass to implement the Comparable interface:
1. aspect ComparableMyClass {
2. declare parents: MyClass implements Comparable;
3. int MyClass.compareTo(Object o) { return 0; }
4. }
This first declares that MyClass implements Comparable and, second, defines the required
compareTo() method (which effectively adds this method to MyClass). Note, if MyClass already
had a compareTo() method, then weaving this aspect would give a weave-time error.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
7
3. PROFILING HEAP USAGE
In this section, we investigate AspectJ as a tool for determining which methods allocate the most heap
storage. The general idea is to advise all calls to new with code recording the size of the allocated
object. This amount is then added to a running total for the enclosing method (of that call) to yield
exact results:
1. aspect HeapProfiler {
2.
Hashtable totals = new Hashtable();
3.
4.
before() : call(*.new(..)) && !within(HeapProfiler) {
5.
MutInteger tot = getTotal(thisEJPSP);
6.
Class c = thisJPSP.getSignature().getDeclaringType();
7.
if(c.isArray()) {
8.
Object[] ds = thisJoinPoint.getArgs(); // dims for array
9.
tot.value += sizeof(c,ds);
10.
} else {
12.
tot.value += sizeof(c);
13. }}
14.
15
MutInteger getTotal(Object k) {
16.
MutInteger s = (MutInteger) totals.get(k);
17.
if(s == null) {
18.
s = new MutInteger(0);
19.
totals.put(k,s);
20.
}
21.
return s;
22. }
23. int sizeof(Class c, Object arrayDims...) { ... }
24. }
Here, sizeof() computes the size of an object and, for now, assume it behaves as expected —
we discuss its implementation later. Also, getTotal() maps each method to its accumulated
total. Notice that, since the JoinPoint.StaticPart object given by thisEJPSP uniquely
identifies the enclosing method, it can be used as the key. Furthermore, getTotal() is implemented
with a Hashtable to provide synchronised access, although more advanced containers (e.g.
ConcurrentHashMap) could be used here. The use of !within(HeapProfiler) is crucial
as it prevents the advice from being applied to code within the aspect itself. Without this, an infinite
loop can arise with the advice being repeatedly triggered in getTotal(). Notice that we have
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
8
D. J. PEARCE ET AL.
used an anonymous pointcut to specify where the advice should be applied, instead of an explicit
designation via the pointcut keyword. The purpose of distinguishing between the creation of
arrays and other objects is subtle. The key point is that, to determine the size of an array, we need
its dimensions and AspectJ gives access to these via getArgs() since the arguments of an array
constructor are its dimensions (Section 3.1 discusses this in more detail). As an optimisation, we call
getArgs() only on array types, since this requires accessing the JoinPoint object (which is
created lazily in the current AspectJ implementation). Finally, the MutInteger class is similar to
java.lang.Integer, except that its value can be updated.
A subtle aspect of our approach is that bytes allocated inside an object’s constructor are not
attributed to the enclosing method creating it. Instead, they are attributed to the constructor itself and,
to see that this makes sense, consider the following:
1. class T { T() { for(...) new X(); }}
2. int foo() { T x = new T(); }
This example, while perhaps somewhat contrived, highlights an important point: if T’s constructor
allocates a lot of unnecessary storage, who is to blame? By including bytes allocated by T() in
foo()’s total, we are misdirecting optimisation efforts toward foo() rather than T(). Of course, we
could devise situations where the problem stems from foo() calling T() too often. In this case, the
inclusive approach seems to make more sense, since it focuses attention toward foo(). However, this
is misleading as it is really a fundamentally different problem regarding call frequency. For example,
foo() may call T() frequently because it is itself called frequently and neither will catch this.
Furthermore, while both approaches could be extended to catch problems relating to call frequency,
the inclusive approach could never catch the example highlighted above.
We now identify our first limitation with the current AspectJ implementation which affects
the precision of our scheme. The issue is that the pointcut call(*.new(..)) does not catch
allocations of array objects — meaning they are not included in the heap measurements. In
fact, a fix for this issue has been recently included in the AspectJ implementation (as a direct
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
9
result of this work), although it is not currently activated by default (the command-line switch
“-Xjoinpoints:arrayconstruction” is required).
3.1. IMPLEMENTING SIZEOF
At this juncture, we must briefly discuss our implementation of sizeof as Java does not provide
this primitive. To determine an object’s size, we iterate every field of its class (using reflection) and
total their sizes. Of course, this is an estimate since alignment/packing issues are ignored, the size of
references is unknown (we assume 4 bytes) and the object header size is also unknown (we assume
8 bytes). Also, we do not traverse references and accumulate the size of the objects they target, since
only the bytes allocated by the current call to new are relevant (and the objects targeted by such fields
must have been allocated previously). For arrays, the innermost dimension is calculated using the type
held by the array, whilst the outer dimensions are assumed to be arrays of references to arrays (the
dimensions themselves being obtained from the join point, as discussed previously). Again, we do not
traverse the references of objects held by the innermost dimension since an array cannot be populated
until after being created. To improve performance (as reflection is notoriously slow), we also employ a
Hashtable to cache results and make subsequent requests for the same type cheaper.
One issue with this implementation is that a Hashtable lookup is needed to access cached type
sizes. If we could eliminate this, the cost of using our heap profiling aspect might be reduced. AspectJ
version 1.5.0 introduced a new primitive, pertypewithin(..), which makes this possible. This
allows us to specify that a separate aspect instance should be instantiated for every type matching a
given type pattern. For example:
1. aspect TestAspect pertypewithin(mypkg..*) {
2. ...
3. }
This results in a separate instance of TestAspect (created lazily) for every class within the
package mypkg, rather than just a single instance of TestAspect being created (as for normal
aspects). Thus, pertypewithin allows us to associate state (in our case, sizeof information) with
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
10
D. J. PEARCE ET AL.
a type. The information associated with a type C is stored as a static class variable of C. This (in
theory at least) allows us to access cached sizeof information using a static field lookup, rather than
a Hashtable lookup. Unfortunately, we find in practice that using pertypewithin to implement
the cache actually gives worse performance that using a Hashtable. The reason for this appears to
be that, although the information is stored in a static field, the current AspectJ implementation accesses
it via a reflective call. We expect future optimisation of the AspectJ implementation will eliminate
this overhead, leading to better performance of pertypewithin. A complete implementation of
sizeof using pertypewithin is given in Appendix B for reference.
4. PROFILING OBJECT LIFETIME
In this section, we look at profiling object lifetime, where the aim is to identify which allocation sites
generate the longest-living objects. This can be used, for example, to help find memory leaks as longlived objects are candidates [13]. Another application is in generational garbage collectors, where it is
desirable to place long lived objects immediately into older generations, often known as pretenuring
(see e.g. [14, 15]).
As we have already demonstrated how allocation sites can be instrumented with AspectJ, the
remaining difficulty lies in developing a notification mechanism for object death. In Java there are
two obvious constructs to use: weak references and finalizers. An implementation based on the latter
would rely upon introducing special finalizers for all known objects to signal their death. This poses a
problem as introducing a method foo() into a class which already has a foo() is an error in AspectJ.
To get around this, we could manually specify which classes need finalizers introduced into them (i.e.
all those which don’t already have them) with a pointcut. At which point, we could advise all finalizers
to signal object death. Note that, while the process of determining which classes don’t have finalizers
could be automated, this cannot be done within AspectJ itself making this approach rather inelegant.
In light of this, we choose weak references and, indeed, they have been used for this purpose before
[14].
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
11
1. aspect lifetimeProfiler {
2. static int counter = 0;
3. static int period = 100;
4. static Set R; static Monitor M;
5. static ReferenceQueue Q;
6.
7. after() returning(Object o) : call(*.new(..)) &&
9.
if(++counter >= period) && !within(lifetimeProfiler) {
10.
MyRef mr = new MyRef(thisJPSP, System.currentTimeMillis(),o,Q);
11.
R.add(mr);
12.
counter = 0;
13. }
14.
15. lifetimeProfiler() {
16. HashSet tmp = new HashSet();
17. R = Collections.synchronizedSet(tmp);
18. Q = new ReferenceQueue();
19. M = new Monitor(); M.start();
20. }
21.
22. class MyRef extends PhantomReference {
23. public JoinPoint.StaticPart sjp;
24. public long creationTime;
25.
26. MyRef(JoinPoint.StaticPart s, long c, Object o, ReferenceQueue q) {
27.
super(o,q); sjp = s; creationTime = c;
28. }}
29.
30. class Monitor extends Thread {
31. public void run() { while(true) { try {
32.
MyRef mr = (MyRef) Q.remove();
33.
R.remove(mr);
34.
long age = System.currentTimeMillis() - mr.creationTime;
35.
getSample(mr.sjp).log(age);
36.
} catch(InterruptedException e) {
37. }}}}
38.
39. class AvgSample {
40. double avg = 0; int num = 0;
41. public void log(long v) { avg = ((avg * num) + v) / ++num; }
42. }
43. AvgSample getSample(Object k) { ... }}
Figure 1. The outline of our lifetime profiler aspect. The key feature is the after() returning(..) notation,
which gives access to the newly allocated object returned by new. The advice then attaches an extended phantom
reference containing the creation time and allocation site. When an object dies, its reference is removed from
R by the Monitor and its lifetime logged. Here, getSample() is similar to getTotal() from before.
AvgSample is used to maintain the average lifetime of all objects created at a given allocation site. Additional
code is needed to catch immortal objects: on program termination this would iterate through R to identify and log
the lifetime of any unclaimed objects. Finally, counter-based sampling is used to reduce the number of objects
being tracked. This lowers overhead and causes less perturbation on the target program.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
12
D. J. PEARCE ET AL.
The java.lang.ref package was introduced to give a degree of control over garbage collection
and provides three types of weak reference. These share the characteristic that they do not prevent the
referenced object, called the referent, from being collected. That is, if the only references to an object
are weak, it is open to collection. The different weak reference types provide some leverage over when
this happens:
1. Soft references. The garbage collector must free softly referenced objects before throwing an
OutOfMemoryError exception. Thus, they are useful for caches, whose contents should stay
for as long as possible. Soft references are “cleared” (i.e. set to null) before finalisation, so
their referents can no longer be accessed.
2. Weak references. Their referents are always reclaimed at the earliest convenience. Weak
references are also “cleared” before finalisation.
3. Phantom references. Again, phantomly referenced objects are always reclaimed at the earliest
convenience. However, they are not “cleared” until after finalisation.
To see which is best suited to our purpose, we must understand the relevance of clearing. When creating
a reference, we can (optionally) indicate a ReferenceQueue onto which it will be placed (by the
garbage collector) when cleared. Thus, this is a form of callback mechanism, allowing notification of
when the reference is cleared. Note, if it was not cleared before placed on the queue, our application
could resurrect it by adding new references. In fact, objects are not truly dead until after finalisation
because their finalizer can resurrect them [14]. From these facts, it follows that phantom references
give the most accurate indication of object lifetime.
The basic outline of our scheme is now becoming clear: at object creation, we attach a phantom
reference and record a timestamp and an identifier for the allocation site. The phantom reference is
associated with a global reference queue, monitored by a daemon thread. This is made efficient by
ReferenceQueue.remove(), which sleeps until a reference is enqueued. Thus, when an object
is garbage collected, the daemon thread is awoken (by the reference queue) to record the time of death
and, hence, compute the object’s lifetime. Figure 1 provides the core of our implementation.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
13
5. PROFILING WASTED TIME
In the last section, we developed a technique for profiling object lifetime. In fact, we can go beyond
this by breaking up the lifetime into its Lag, Drag and Use phases [16]. Under this terminology, lag is
the time between creation and first use, drag is that between last use and collection, while use covers
the rest. Thus, we regard lag and drag as wasted time and the aim is to identify which allocation sites
waste the most.
An important question is what it means for an object to be used. In this work, we consider an object
is used when either of the following occurs: a public, non-static method is called; or a public, nonstatic field is read or written. We ignore read/writes to private fields and methods, since these must
have arisen from a call to a public method, in which case the object use has been registered. Methods
which run for a long period of time updating the internal state of some object may cause imprecision
if there is sufficient difference between the time of method entry and the actual last use. This is really
a trade-off as, by ignoring changes to the internal state of an object, the profiling data associated with
it needs to be updated less frequently, leading to greater performance in practice. We wanted a more
complex definition of object use, which additionally ignored changes to public fields from within the
object’s own methods. As it turned out, we could not express this constraint efficiently in AspectJ.
The main difficulty in this endeavour actually lies in efficiently associating state with an object.
Here, the state consists of timestamps for the first and last use which, upon object death, can be used
to determine lag, drag and use. This state is updated by advice associated with the get/set join points
as the program proceeds. As such advice will be executed frequently, access to the state must be as
fast as possible. We considered, when embarking upon this project, that there should be three possible
approaches:
1. Using a Map. This is the simplest solution — state is associated with each object using a
HashMap (or similar). The downside, of course, is the need to perform a lookup on every field
access (which is expensive).
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
14
D. J. PEARCE ET AL.
2. Member Introduction. In this case, we physically add new fields to every object to hold the state
using member introduction. The advantage is constant time access, while the disadvantage is an
increase in the size of every object.
3. Using pertarget. The pertarget specifier is designed for this situation. It indicates that one
aspect should be created for every object instance, instead of using a singleton (which is the
norm). Again, a disadvantage is that every object is larger.
The issue of increasing object size is important as it reduces the advantages of sampling — where the
aim is to reduce overheads by monitoring only a few objects, rather than all. In particular, sampling
should dramatically reduce the amount of additional heap storage needed, but this is clearly impossible
if the size of every object must be increased. Now, approach 3 gives something like:
1. aspect ptWaste pertarget(call(*.new(..))){
2. State theState = new State();
3. before(Object o) : target(o) && (set(public !static * *.*) ||
4.
get(public !static * *.*) ||
5.
call(public !static * *.*(..))) {
6.
...
7. }}
The pertarget(X) specifier declares that a separate instance of the aspect should be created
for every object that is the target of the join points identified by X. Thus, a separate instance of
the ptWaste aspect will be created for every constructible object. Each would be created the first
time its corresponding object is the target of some invoked advice. This allows theState to be
shared between invocations of advice on the same object. We have already seen that the call join
point captures method invocation. In this case, we have annotated it to specify that only public,
non-static methods should be captured. Likewise, the get/set join points capture all public, nonstatic field read / writes. Thus, these join points taken together define what it means for an object
to be used. Unfortunately, this approach of using pertarget fails as there are no valid target
objects for a call(*.new(..)) — meaning the pertarget(call(*.new(..))) specifier
does not match anything. This arises because the target object is not considered to exist until after the
call(*.new(..)) pointcut. Using other pointcuts for the pertarget(...) specifier (such as
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
15
get(), set() and call()) does not help because these will not match objects which are created
but not used. This constitutes our second limitation with the current AspectJ implementation. As this
behaviour was intentional, it is perhaps more significant than the others identified so far. In particular,
it remains uncertain whether or not it can be resolved.
The second approach, which uses Inter-Type Declarations (ITD), provides a manual implementation
of the above:
1. aspect itdWaste {
2. private interface WI { }
3. declare parents: * && !java.lang.Object implements WI;
4. State WI.theState = new State();
5. before(Object o): target(o) &&
6.
(set(public !static * *.*) ||
7.
get(public !static * *.*) ||
8.
call(public !static * *.*(..))) {
9. if(o instanceof WI)
9.
WI w = (WI) o;
10.
... // access WI.thState directly
11. } else {
12.
... // use map
13.}}
Here, line 3 is an ITD which declares every class to implement interface WI (except
java.lang.Object, as this is prohibited by the current AspectJ implementation), while line 4
introduces theState into WI. The effect of all this is to introduce a new instance variable theState
into every user-defined class in the class hierarchy (see Section 7.1 for more on why only user-defined
classes are affected). This ensures that every corresponding object has exactly one copy∗ of theState
and, through this, we can associate each object with a unique instance of State. Only user-defined
classes are affected by the ITD because, in practice, classes in the standard library cannot be altered
using the current AspectJ implementation (Section 7.1 discusses the reasons for this in more detail).
The pointcut for the advice matches all uses (including method invocation) of any object. In the case of
a user-defined object (i.e. an object implementing WI), we obtain constant-time access to theState
∗ Note,
AspectJ does not introduce a field F into a class whose supertype is also a target for the introduction of F. Thus, an
instance of any class can have at most one copy of F, rather than potentially one for every supertype in its class hierarchy.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
16
D. J. PEARCE ET AL.
(since it is a field). For other objects, we use a map to associate the necessary state as a fall back (this
is outlined in more detail below) . To complete the design, we must also advise all new calls to record
creation time and include our technique from the previous section for catching object death.
We now consider our final design, which uses a map to associate state with each object. A key
difficulty is that the map must not prevent an object from being garbage collected. Thus, we use weak
references to prevent this, which adds additional overhead. The main body of our implementation is
detailed in Figure 2 and the reader should find it similar to those discussed so far. Note the use of
sampling to reduce the number of objects being tracked. This improves space consumption as fewer
state objects are instantiated, although it has little impact upon runtime overhead. In fact, our ITD
implementation also uses sampling for this reason, although it must still pay the cost of an extra word
per user-defined object (for theState).
The astute reader may notice something slightly odd about our implementation of Figure 2 — it
contains a bug! The problem is subtle and manifests itself only when the target program contains
objects with user-defined hashCode() implementations that read/write public fields. It arises
because WeakKey invokes an object’s HashCode() method, which is needed to ensure different
WeakKeys referring to the same object match in the Hashtable. This invocation will correspond to
a use of the object if hashCode() reads/writes public fields. The invocation itself is not a use, since
it occurs within WasteAspect and this is explicitly discounted using !within(..). The problem
causes an infinite loop where looking up the state associated with an object is a use of it, which triggers
the before() advice, which again tries to lookup the state and so on. To get around this is not trivial.
We cannot use an alternative map, such as TreeMap, since this uses the object’s compareTo()
method, leading to the same problem. We could, however, employ AspectJ’s cflow() construct to
include in our definition of an object use the constraint that a method within WasteAspect cannot
be on the call stack. Unfortunately, this would almost certainly impose a large performance penalty
[17]. Thus, we choose simply to acknowledge this problem, rather than resolving it, since it is unlikely
to occur in practice.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
17
1. public final aspect WasteAspect {
2. static int counter = 0;
3. static int period = 100;
4. static Map R = new Hashtable();
5. static ReferenceQueue Q = new ReferenceQueue();
6.
7. after() returning(Object newObject) : call(*.new(..))
8.
&& !within(WasteAspect) && if(++counter >= period) {
9.
WeakKey wk = new WeakKey(newObject,Q);
10. R.put(wk,new State(thisJPSP, System.currentTimeMillis()));
11. counter = 0;
12. }
13.
14. before(Object o) : target(o) && !within(WasteAspect) && (
15.
call(public !static * *.*(..)) ||
16.
set(public !static * *.*) || get(public !static * *.*)) {
17. Object t = R.get(new WeakKey(o));
18. if(t != null) {
19.
State s = (State) t;
20.
s.lastUse = System.currentTimeMillis();
21.
if(s.firstUse == -1) { s.firstUse = s.lastUse; }
22 }}
23.
24. class WeakKey extends WeakReference {
25. int hash;
26. WeakKey(Object o) { super(o); hash = o.hashCode(); }
27. WeakKey(Object o, ReferenceQueue q) { super(o,q); hash = o.hashCode();}
28. public int hashCode() { return hash; }
29. public boolean equals(Object o) {
30.
if (this == o) return true;
31.
Object t = this.get();
32.
Object u = ((WeakKey) o).get();
33.
if ((t == null) || (u == null)) { return false; }
34.
return t == u;
35. }
36.
37. private final class State {
38. long lastUse,firstUse = -1;
39. long creationTime;
40. JoinPoint.StaticPart sjp = null;
41. State(JoinPoint.StaticPart s, long c) { creationTime = c; sjp = s; }
42. }}
Figure 2. The core parts of an aspect for profiling wasted time. The key features are the Hashtable which
associates state with an object and the use of counter-based sampling to reduce overhead. Note that, while
sampling does help reduce storage, it does not prevent a table look up on each field access. To complete this
design, a daemon thread must monitor Q to catch object death and log usage information, as for Figure 1. Finally,
WeakKey.equals() deals with two awkward problems: firstly, its hashcode must be identical for identical
referents to allow correct look up from the after() advice; secondly, look up must still be possible after the
referent is cleared.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
18
D. J. PEARCE ET AL.
6. PROFILING TIME SPENT
In this section, we consider time sampling, where the idea is to periodically log the currently executing
method. Thus, on termination, those methods with the most samples (i.e. logs) accrued are considered
to be where most time was spent. Our approach is to track the currently executing method with AspectJ,
so it can be periodically sampled by a daemon thread. This is done by updating a global variable
whenever a method is entered (through normal entry/call return) or left (through normal exit/calling
another):
1. aspect CurrentMethodAspect {
2. static JoinPoint.StaticPart current;
3. before() : (execution(* *.*(..)) || execution(*.new(..)))
4.
&& !within(CurrentMethodAspect) {
5.
current = thisJPSP;
6. }
7.
8. after() returning : (execution(* *.*(..)) || execution(*.new(..)))
9.
&& !within(CurrentMethodAspect) {
10.
current = null;
11. }
12.
13. before() : (call(* *.*(..)) || call(*.new(..)))
14.
&& !within(CurrentMethodAspect){
15.
current = null;
16. }
17.
18. after() returning : (call(* *.*(..)) || call(*.new(..)))
19.
&& !within(CurrentMethodAspect) {
20.
current = thisEJPSP;
21. }}
Here, the unique JoinPoint.StaticPart object is used to identify the currently executing
method. Notice that current is assigned to null when a method is left. This may seem redundant,
since it will be overwritten as soon as the next method is (re-)entered. Indeed, if we could guarantee
that all methods were advised, this would be the case. Unfortunately, we cannot necessarily make
this guarantee for reasons discussed in Section 7.1. With regard to multithreading, our approach can
be inaccurate as, following a context switch, a sample may be taken before current is updated by
the newly executing method. This results in time being incorrectly charged to the method which was
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
19
running before the switch. Following [18], we argue that this is not a serious cause for concern as
sampling is inexact anyway and it is unlikely that such behaviour would consistently affect the same
method. Certainly, synchronizing on current would cause greater problems and making it a thread
local means a thread lookup before/after every method call/execution. Thus, we choose against either
of these on the grounds of efficiency.
Another interesting point is the use of after() returning, instead of just after() advice.
The former only catches normal return from a method, whilst the latter also catches thrown exceptions.
Our reason then, for choosing after() returning is that we have observed it offers better
performance (up to 20% in some cases), while the issues of missing return by exception seem
negligible. Note, if this were considered important, then after() could simply be used in place
of after() returning to ensure current was updated correctly after an exception.
7. EXPERIMENTAL RESULTS
In this section, we present and discuss the results of an experimental evaluation into the costs of using
the profiling aspects developed in the previous sections. We also introduce djprof, a commandline tool which packages these aspects together so they can be used without knowledge of AspectJ.
This was used to generate the results presented later on and we hope it will eventually find future
use as a non-trivial AspectJ benchmark. Indeed, previous work has commented on the lack of such
benchmarks [17]. The djprof tool itself is available for download under an open source license from
http://www.mcs.vuw.ac.nz/˜djp/djprof.
The benchmark suite used in our experiments consisted of 6 benchmarks from the SPECjvm98
suite [19] as well as 4 candidates which were eventually dropped from inclusion in it† . Table I details
these. Where possible, we also compared the performance and precision of djprof against hprof
† Note, the 222 mpegaudio benchmark is also part of the SPECjvm98 suite. This could not be used due to a bug in the current
implementation of the new array join point.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
20
D. J. PEARCE ET AL.
Benchmark
_227_mtrt
_202_jess
_228_jack
_213_javac
_224_richards
_210_si
_208_cst
_201_compress
_209_db
_229_tsgp
Size (KB)
56.0
387.2
127.8
548.3
138.5
18.2
23.2
17.4
9.9
7.7
Time (s)
4.7
6.5
5.5
12.3
7.4
9.0
17.8
21.1
35.1
36.5
Heap (MB)
25
17
17
31
18
16
30
24
28
26
SPECjvm98
Y
Y
Y
Y
N
N
N
Y
Y
N
Multi-threaded
Y
N
N
N
Y
N
N
N
N
N
Table I. The benchmark suite. Size indicates the amount of bytecode making up the benchmark, excluding harness
code and standard libraries. Time and Heap give the execution time and maximum heap usage for one run of the
benchmark.
(a well-known JVMTI profiler — see [8]) and pjprof, a pure Java time profiler described below. In
doing this, our aim was twofold: firstly, to ascertain whether the current AspectJ implementation is
competitive, compared with alternative approaches; secondly, to validate the results produced by our
profiling aspects. We now provide further discussion of djprof and pjprof, detail the experimental
procedure used and present the results themselves.
7.1. THE DJPROF TOOL
In this section, we consider issues related to the deployment of our aspects as part of a general purpose
profiling tool. We believe it desirable that such a tool can be used on existing Java programs without
knowledge of AspectJ. One solution is for the tool to statically weave aspects behind the scenes. In
this case, they are combined with the original binaries to produce modified binaries in some temporary
location. However, the current AspectJ implementation allows a more efficient mechanism, through a
feature known as load-time weaving. In this case, all weaving is performed by a special classloader
allowing it to be done lazily — thereby reducing costs. Therefore, we used this to implement djprof
— a command-line tool which encompasses the profiling aspects considered in the previous sections.
Unfortunately, there is one significant drawback with the current load-time weaving implementation:
code in the standard libraries cannot be woven against. This is very restrictive and constitutes our third
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
21
limitation with the current AspectJ implementation. In fact, while in theory it is possible to statically
weave against the standard libraries, we find this impractical due to the massive amounts of time and
storage required. Furthermore, static weaving requires every class in the standard libraries be woven
against, regardless of whether it is used or not. Thus, it seems clear that, if this limitation with the
load-time weaver were overcome, then it would offer the best approach as only classes actually used
by the program would be woven.
At this point, we must clarify how this limitation affects the results produced by our tool. The
inability to weave against the standard libraries means that djprof cannot report results for methods
within the libraries themselves. For heap and lifetime, accurate results are still obtained for all objects
allocated in the target application. However, for wasted-time profiling, uses of objects allocated in the
target application which occur in library methods are missed. This, in theory at least, could affect the
precision of the wasted-time results (if a significant number of uses occur in library methods), although
it remains unclear whether this really happens in practice or not. Finally, for time-spent profiling,
accurate results are obtained for all methods in the target application (subject to the issues of multithreading already discussed in Section 6).
Aside from issues of imprecision, the inability to weave against the standard libraries also gives
djprof an inherent advantage over hprof and pjprof, since they must pay the cost of profiling all
methods where djprof does not. While this does compromise our later performance comparison of
djprof against hprof and pjprof, it does not render it completely meaningless. This is because we
are still able to make general observations about the performance of djprof and, hence, the current
AspectJ implementation (namely, that it is not outrageously slow in most cases and, most likely, will
be competitive should this limitation be overcome).
The output produced by djprof consists of a list of methods, along with the amount of the profiled
quantity (e.g. bytes allocated) used by them. The output is ordered so that methods consuming the most
appear first. As such, djprof does not provide any additional context (i.e. stack-trace) information. In
contrast, hprof is capable of providing context-sensitive information, where information is reported
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
22
D. J. PEARCE ET AL.
for individual stack-traces (to a given depth). This information can be more useful in practice, as it
can help determine the circumstances (if there are any) under which a method performs badly. In fact,
djprof can easily be extended to record such information, although we have opted against doing
this to simplify our evaluation. Since recording this additional context information can be expensive,
we restrict the amount of context recorded by hprof to a depth of one (which is equivalent to that
recorded by djprof) to ensure a fair comparison. This is achieved using the depth=1 command-line
switch to hprof.
7.2. PJPROF - A PURE JAVA TIME PROFILER
The ability to write a time profiler without AspectJ is made possible in Java 1.5 with the new
Thread.getAllStackTraces() and Thread.getState() methods. The former allows a
daemon thread to iterate, at set intervals, the stack trace of all other threads to record their currently
executing method. In doing this, Thread.getState() is used to ignore those which are blocked,
as samples should not be recorded for them [20, 8]. This was developed by us in the course of this work
and is the first pure Java time profiler we are aware of. A complete implementation, which we refer to
as pjprof, can be found in Appendix A.
7.3. EXPERIMENTAL PROCEDURE
The SPECjvm98 benchmark harness provides an autorun feature allowing each benchmark to be run
repeatedly for N iterations in the same JVM process. Generally speaking, the first run has higher
overhead than the others as it takes time before JIT optimisations are applied and it also includes the
weaving time. Therefore, we report the average of five runs from a six iteration autorun sequence
(we discard the first run), using a problem size of 100. We believe this reflects the overheads that
can be expected in practice, since most real world programs are longer running than our benchmarks
(hence, these startup costs will be amortised) and, for short running programs, such overheads will be
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
23
insignificant anyway. In all cases, the variation coefficient (i.e. standard deviation over mean) for the
five measured runs was ≤ 0.15 — indicating low variance between runs.
To generate time-spent profiling data using hprof, we used the following command line:
java -Xrunhprof:cpu=samples,depth=1,interval=100 ...
The interval indicates hprof should record a sample every 100ms (the same value was used for
djprof and pjprof). The depth value indicates no context information should be recorded (as
discussed in Section 7.1), whilst the cutoff indicates the precision (as a percentage) of information
which hprof should report. To generate heap profiling data, we used the following:
java -Xrunhprof:heap=sites,depth=1,cutoff=0.0 ...
The hprof tool produces a breakdown per stack trace of the live bytes allocated (i.e. those actually
used), as well as the total number of bytes allocated. The results are ranked by live bytes allocated,
rather than total bytes allocated. However, hprof does not report stack traces where the number of live
bytes allocated, relative to the total number of live bytes allocated overall, is below a certain threshold
(this is the cutoff value). Since djprof reports total bytes allocated only, a discrepancy can occur
between the profilers when a method allocates a large number of bytes which are not considered live
by hprof (since these will be reported by djprof, but cut off by hprof). Setting cutoff=0.0
ensures a fair comparison with djprof, since it forces hprof to report all results. Note, this does not
in any way affect the performance of hprof.
The output of djprof and pjprof is similar, providing a breakdown of the total allocated by each
method. A script was used to convert hprof’s output into a form identical to that of djprof and
pjprof. A slight complication is that, in the case of heap profiling, using a depth of 1 with hprof
also does not provide comparable information with djprof. This is because hprof charges storage
allocated for a type X to its constructor (indeed, its supermost constructor), rather than the method
calling new X(..) (as djprof does). Therefore, to ensure the fairest comparison possible, we ran
hprof twice for each benchmark when generating the heap profiling data: the first had depth=1
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
24
D. J. PEARCE ET AL.
and was used to generate the performance data (since djprof does not record context information);
the second had depth=5 and was used to compare the outputs of hprof and djprof (since the
additional context allowed the true calling method to be determined).
Finally, to determine the maximum amount of heap storage used by the VM (used to measure a
profiler’s space overhead), we used a simple program to periodically parse /proc/PID/stat and
record the highest value for the Resident Set Size (RSS). The experiments were performed on a 900Mhz
Athlon based machine with 1GB of main memory, running Mandrake Linux 10.2, Sun’s Java 1.5.0
(J2SE 5.0) Runtime Environment and AspectJ version 1.5.2.
7.4. DISCUSSION OF RESULTS
Before looking at the results, we must detail our metrics. Time overhead was computed as
TP −TU
TU
for each benchmark, where TP and TU are the execution times of the profiled and unprofiled versions
respectively. Space overhead was computed in a similar way.
7.4.1. HEAP PROFILING
Figure 3 looks at the overheads (in time and space) of the heap profiling implementation developed in
Section 3, as well as those of hprof. Regarding djprof, perhaps the most important observations
are: firstly, time overhead is quite low — especially on the longer-running benchmarks; secondly, space
overhead is comparatively higher. We suspect the latter stems from our implementation of sizeof,
which indefinitely caches the size of previously seen types. For hprof, we see significantly higher
time overheads, while the space overheads are (roughly) of the same magnitude as djprof. The exact
reasons behind hprof’s poor runtime performance remain unclear. A very likely explanation is that
the additional costs of instrumenting standard libraries (which are not profiled by djprof) is to blame.
Figure 4 details our attempts to validate the output of the heap profiling aspect against hprof. To
do this, we compare the profilers against each other using a metric called overlap percentage [18]. This
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
25
Heap Profiling Overheads
(6734%)
(5845%)
(4187%)
mtrt
jess
jack
(2281%)
(4734%)
(1244%)
(442%)
Time Overhead %
200
150
100
50
0
javac
richards
si
cst
djprof
compress
db
tsgp
db
tsgp
hprof
160
Space Overhead %
140
120
100
80
60
40
20
0
mtrt
jess
jack
javac
richards
si
cst
compress
Figure 3. Experimental results comparing the overhead (in time and space) of our heap profiling implementation
against hprof using the heap=sites switch. Note, empty columns (e.g. for compress) do not indicate
missing data — only that the relevant value was very small.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
26
D. J. PEARCE ET AL.
Heap Profiling Accuracy
100
Overlap %
80
60
40
20
0
mtrt
jess
jack
javac
richards
si
cst
compress
db
tsgp
Figure 4. Experimental results comparing the precision of our heap profiling implementation against hprof using
the heap=sites switch. The overlap metric indicates the amount of similarity between the output of the two
profilers (see Section 7.4.1 for more discussion on this). A higher value indicates greater similarity, with the
maximum being 100% overlap.
works as follows: first, the output of each profiler is normalised to report the amount allocated by each
method as a percentage of the total allocated by any method in the target application, not including the
standard libraries; second, each method is considered in turn and the minimum score given for it by
either profiler is added to the overlap percentage. For example, if djprof reports that Foo.bar()
accounts for 25% of the total storage allocated, whilst hprof gives it a score of only 15%, then the
lower value (i.e. 15%) is counted toward the overlap percentage. Thus, two profilers with identical
results produce an overlap of 100%, whilst completely different results have no overlap. We can think
of the overlap percentage as the intersection of the scores given by the two profilers. Methods in the
standard libraries are not included in the calculation because djprof cannot profile them (see Section
7.1 for more on why). In general, we find this is a useful way to evaluate profiler precision.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
27
Object Lifetime Profiling Overheads (period = 100 objects)
(126%)
120
Time
Space
Overhead %
100
80
60
40
20
0
mtrt
jess
jack
javac
richards
si
cst
compress
db
tsgp
Figure 5. Experimental results looking at the overheads of our object lifetime implementations. The period
indicates that every 100th object was monitored. Again, empty columns (e.g. for jack) do not indicate missing
data — only that the relevant value was very small.
Looking at Figure 4 we see that on all benchmarks hprof and djprof have an overlap of over
90%, indicating an excellent correlation between them. We would not expect identical results since
djprof estimates object size (recall Section 3.1), where hprof does not.
Our overall conclusions from these results are mixed. Clearly, the inability to profile the standard
libraries makes it difficult to properly compare the performance of djprof and hprof. In spite of
this, the results are still interesting since they indicate that: firstly, the performance of djprof is not
outrageously bad, compared with hprof, and, hence, could well be competitive should this limitation
be overcome; secondly, that the precision obtained by djprof (when ignoring methods in the standard
libraries) is good.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
28
D. J. PEARCE ET AL.
7.4.2. OBJECT LIFETIME
Figure 5 looks at the overheads of the lifetime profiling technique developed in Section 4. The main
observations are, firstly, that similar but consistently lower time overheads are seen compared with
heap profiling. Secondly, that the space overheads are also similar, but consistently higher. The first
point can most likely be put down to the cost of using sizeof (as the lifetime aspect does not use
this) which is non-trivial, especially for previously unseen types. The second point almost certainly
arises because we are associating additional state with individual object instances.
7.4.3. WASTED TIME
Figure 6 details the overheads of using the two wasted-time implementations of Section 5. The main
observation is that the Member Introduction (MI) approach generally performs better than just using a
Map (java.util.HashTable in this case). Indeed, although its overhead is still large, we feel the
MI approach works surprisingly well considering it is advising every public field and method access.
As expected from its implementation (where an extra field is added to every used-defined object), the
storage needed for the MI approach is consistently greater than for the Map approach.
7.4.4. TIME-SPENT PROFILING
Figure 7 compares the overheads of our time-spent profiling implementation against hprof and
pjprof. The results show that the overheads of djprof are much higher than for either of the
other two profilers. However, there are several other issues to consider: firstly, pjprof only works in
Java 1.5; secondly, in other experiments not detailed here, we have found the performance of hprof
on Java 1.4 environments to be significantly worse than djprof. The reason for this latter point is
almost certainly due to the fact that, under Java 1.4, hprof uses the JVMPI whilst, under Java 1.5, it
uses the more efficient JVMTI (see Section 8.3 for more on this). While these points are only relevant
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
29
Wasted Time Profiling Overheads (period = 100 objects)
(3240%) (4034%)(3561%)
(4087%)(3420%)
(TO)(TO)
(7225%) (10491%)
1600
Time Overhead %
1400
1200
1000
800
600
400
200
0
mtrt
jess
jack
javac
richards
si
cst
Map
compress
db
tsgp
db
tsgp
MI
800
Space Overhead %
700
600
500
400
300
200
100
0
mtrt
jess
jack
javac
richards
si
cst
compress
Figure 6. Experimental results looking at the overheads of our wasted-time implementations. Here, Map
corresponds to approach 1 from Section 5, whilst MI (short for Member Introduction) corresponds to approach 2.
TO indicates the benchmark had not completed after 1 hour (i.e. timeout) and the period indicates that every 100th
object was monitored.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
30
D. J. PEARCE ET AL.
Time Sampling Overheads (period = 100ms)
100
(278%)
(212%)
hprof
djprof
pjprof
Time Overhead %
80
60
40
20
0
mtrt
jess
jack
javac
richards
si
cst
db
tsgp
hprof
djprof
pjprof
180
160
Space Overhead %
compress
140
120
100
80
60
40
20
0
mtrt
jess
jack
javac
richards
si
cst
compress
db
tsgp
Figure 7. Experimental results looking at the overheads in time (top) and space (bottom) of our time profiling
implementation, compared with hprof and pjprof. Again, empty columns (e.g. for cst) do not indicate
missing data — only that the relevant value was very small.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
31
Time Sampling Accuracy (period = 100ms)
100
Overlap %
80
60
40
20
0
mtrt
jess
jack
hprof-djprof
javac
richards
si
hprof-pjprof
cst
compress
db
tsgp
djprof-pjprof
Figure 8. Experimental results looking at the precision of our time profiling implementation, compared with
hprof and pjprof. The overlap metric indicates the amount of correlation between the output of the two
profilers (see Section 7.4.1 for more discussion on this). A higher value indicates a better correlation, with the
maximum being 100% overlap.
to those using the older Java 1.4 VM’s, we expect this user-base to remain significant for some time to
come.
Figure 8 details our attempts to validate the output of the time profiling aspect against hprof and
pjprof. Again, overlap percentage is used to make the comparison, with each profiler normalised to
report the time spent by each method as a percentage of the total spent by any in the target application,
not including the standard libraries. As there are three time profilers, we compared each against the
others separately in an effort to identify their relative accuracy. Looking at Figure 8, we see that
hprof and pjprof have consistently higher overlaps when compared with each other. This suggests
djprof is the least precise of the three. Since time spent in the standard libraries is not included in
the overlap scores, djprof’s inability to profile them does not explain this observation. While the
other inaccuracies mentioned in Section 6 may be a factor, we believe the main problem is simply that
djprof causes the most perturbation on the target program. To see why, recall that our time profiling
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
32
D. J. PEARCE ET AL.
implementation adds advice before and after every method execution and method call. Even if this
advice is inlined, its effect on short, frequently executed methods will still be quite high and this would
skew the results significantly.
Our overall conclusion from these results is that djprof and, hence, the current AspectJ
implementation is not well-suited to this kind of profiling. In fact, should the restriction on profiling
the standard libraries be overcome, we would only expect the performance of djprof to deteriorate
further.
8. RELATED WORK
We now consider two categories of related work: AspectJ/AOP and profiling. We also examine the
JVMPI/JVMTI in more detail.
8.1. ASPECTJ AND AOP
Aspect-Oriented Programming was first introduced by Kiczales et al. [10] and, since then, it has
received considerable attention. Many language implementations have arisen, although their support
for AOP varies widely. Some, such as AspectC [21], AspectC++ [22] and AspectC# [23], are similar
to AspectJ but target different languages. Others, like Hyper/J [24] and Jiazzi [25], are quite different
as they do not integrate AOP functionality into the source language. Instead, they provide a separate
configuration language for dictating how pieces of code in the source language compose together.
AspectWerkz [26] and PROSE [27] focus on run-time weaving, where aspects can be deployed (or
removed) whilst the target program is executing. The advantage is that, when the aspect is not applied,
no overheads are imposed. While static weaving techniques can enable/disable their effect at runtime,
there is almost always still some overhead involved. In fact, the ideas of run-time weaving share much
in common with Dynamic Instrumentation (e.g. [5, 28, 29]) and Metaobject Protocols [30].
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
33
Several works have focused on AspectJ itself. In particular, Dufour et al. investigated the
performance of AspectJ over a range of benchmarks [17]. Each benchmark consisted of an aspect
applied to some program and was compared against a hand-woven equivalent. They concluded that
some uses of AspectJ, specifically pertarget and cflow, suffered high overheads. In addition,
after() returning() was found to outperform around() advice when implementing
equivalent things. A detailed discussion on the implementation of these features can be found in [31],
while [32] focuses on efficient implementations of around() advice.
Hanenberg et al. [33] consider parametric introductions, which give member introductions access
to the target type. Without this, they argue, several common examples of crosscutting code, namely
the singleton, visitor and decorator patterns, cannot be properly modularised into aspects. In fact,
introductions share much in common with mixins [34] and open classes [35], as these also allow
new functionality to be added at will. Another extension to AspectJ is investigated by Sakurai et al.,
who propose a variant on pertarget which allows aspect instances to be associated with groups of
objects, instead of all objects [36].
8.2. PROFILING
Profiling is a well known topic which has been studied extensively in the past. Generally speaking we
can divide the literature up into those which use sampling (e.g. [14, 1, 37, 18, 3]) and those which
use exact measurements (e.g. [28, 38]). However, exact measurements are well known to impose
significant performance penalties. For this reason, hybrid approaches have been explored where exact
measurements are performed on a few methods at a time, rather than all at once [28, 38]. However, it
remains unclear what advantages (in terms of accuracy) are obtained. Most previous work has focused
on accounting for time spent in a program (e.g. [3, 37, 1, 39, 20, 28, 38]). As mentioned already, gprof
is perhaps the best known example [3]. It uses a combination of CPU sampling and instrumentation to
approximate a call-path profile. That is, it reports time spent by each method along with a distribution
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
34
D. J. PEARCE ET AL.
of that incurred by its callees. To generate this, gprof assumes the time taken by a method is constant
regardless of calling context and this leads to imprecision [39].
DCPI uses hardware performance counters to profile without modifying the target program [1].
These record events such as cache misses, cycles executed and more and generate hardware interrupts
on overflow. Thus, they provide a simple mechanism for sampling events other than time and are
accurate to the instruction level. More recent work has focused on guiding Just-In-Time optimisation
of frequently executed and time consuming methods [37, 40].
Techniques for profiling heap usage, such as those developed in Sections 3, 4 and 5, have received
relatively little attention in the past. Röjemo and Runciman first introduced the notions of lag, drag
and use [16]. They focused on improving memory consumption in Haskell programs and relied upon
compiler support to enable profiling. Building on this, Shaham et al. looked at reducing object drag
in Java programs [41]. Other works use lifetime information for pretenuring (e.g. [14, 15]). Of these,
perhaps the most relevant is that of Agesen and Garthwaite who use phantom references (as we do) to
measure object lifetime. The main difference from our approach is the use of a modified JVM to enable
profiling.
The heap profiler mprof requires the target application be linked against a modified system library
[42]. Unfortunately, this cannot be applied to Java since the JVM controls memory allocation. Another
heap profiler, mtrace++, uses source-to-source translation of C++ to enable profiling [43]. However,
translating complex languages like C++ is not easy and we feel that building on tools such as AspectJ
offers considerable benefit.
We are aware of only two other works which cross the boundary between AOP and profiling. The
first of these does not use AOP to enable profiling, but instead is capable of profiling AOP programs
[44]. The second uses AspectJ to enable profiling, but focuses on the visualisation of profiling data
rather than the intricacies of using AspectJ in this context [45]. Finally, there are many other types
of profiling which could be explored in conjunction with AspectJ in the future. These include lock
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
35
contention profiling (e.g. [8, 44]), object equality profiling (e.g. [46]), object connectivity profiling
(e.g. [47]), value profiling (e.g. [48]) and leak detection (e.g. [13]).
Conversely, others such path, vertex or edge profiling (e.g. [49, 50]) are perhaps not well suited to
AspectJ, since they require instrumentation at the basic block level.
8.3. COMPARISON WITH JVMPI/JVMTI
An interesting question is what is gained with AspectJ over what can already be achieved through
the standard profiling interface found in JVM’s. Prior to Java 1.5, this was the Java Virtual Machine
Profiler Interface (JVMPI). With Java 1.5 this has been replaced by the Java Virtual Machine Tool
Interface (JVMTI) [9]. The latter is a refined version of the JVMPI, designed to be more flexible and
more efficient than its predecessor.
The JVMTI comprises two main features: an event call-back mechanism and a Byte Code Insertion
(BCI) interface. The former provides a number of well-defined events which can be set to automatically
call the JVMTI client when triggered. The latter allows classes to be modified at the bytecode
level during execution. However, the BCI does not itself provide support for manipulating bytecodes
and, instead, the user must do this manually (perhaps via some third-party library such as BCEL
[51]). Example events supported by JVMTI include: MonitorWait, triggered when a thread begins
waiting on a lock; MethodEntry, triggered on entry to a Java/JNI method; and FieldAccessed,
triggered when a predetermined field is accessed. In general, events supported by the JVMTI tend
to be those which cannot otherwise be implemented via the BCI interface. In particular, there is
no event for catching objects allocated by Java programs. Furthermore, while the MethodEntry
and MethodExit events exist, they are not recommended because they can severely impair JVM
performance. Instead, using the BCI to catch method entry/exit should be preferred since this gives
full-speed events. That is, since the triggers inserted to catch method entry/exit are simply bytecodes
themselves, they can be fully optimised via the JVM. Note, this applies to other events such as
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
36
D. J. PEARCE ET AL.
FieldAccessed and FieldModified. Thus, it becomes apparent that AspectJ and the JVMTI
actually complement each other, rather than providing alternate solutions to the same problem.
The older JVMPI did not provide a Byte Code Insertion interface, making it less flexible. However,
it did provide some events not found in the JVMTI. In particular, there was direct support for heap
profiling via the OBJECT_ALLOC event type, which caught all object allocations made by the JVM.
Finally, both the JVMPI/JVMTI can be used to profile synchronisation issues. This is not currently
possible in AspectJ, as there is no join point for synchronized blocks. However, this feature has
been requested as an enhancement and we hope this work will help further motivate its inclusion in the
language.
9. CONCLUSION
Profiling tools typically restrict the user to a set of predefined metrics, enforce a particular profiling
strategy (e.g. sampling or exact counting) and require the whole program be profiled regardless of
whether this is desired or not. Aspect-oriented programming languages offer an alternative to this
dogma by allowing the user to specify exactly what is to be profiled, how it is to be profiled and which
parts of the program should be profiled. In this work, we have investigated how well an aspect-oriented
programming language (namely AspectJ) lives up to this claim. We have developed and evaluated
solutions to four well-known profiling problems in an effort to answer this question. The results of our
investigation are mixed. On the one hand, we found AspectJ was sufficiently flexible to support the four
profiling examples and that it was reasonably efficient in most cases; on the other hand, we uncovered
several limitations, some of which are quite severe, which would need to be addressed before AspectJ
could be considered a serious profiling platform. To summarise, these issues are:
1. Load-time weaving standard libraries - the inability to perform load-time weaving against the
standard libraries severely handicaps any profiler (see Section 7.1). One solution to this problem
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
37
may be possible through the Java 1.5 instrument package and we hope this is explored in the
future.
2. State association - We found that associating state through pertarget failed as
pertarget(call(*.new(..))) does not match any join points (see Section 5). This
prevented us from using pertarget to provide a more natural implementation of the wasted
time aspect.
3. Synchronisation - We would like to have explored the possibility of profiling lock contention
(among other things), but this is not possible as sychronized blocks have no join point
associated with them. As this feature has already been requested by others, we feel this adds
further support for its inclusion.
4. Array allocation join point - The current version of AspectJ (1.5.2) does not support the array
join point by default — meaning array objects are not profiled (see section 3). A fix for this has
been recently included in the AspectJ implementation (as a direct result of this work) and we
hope this will be activated by default in future releases.
The limitations identified here are limitations with the current AspectJ implementation, rather than
Aspect-Oriented Programming (AOP). Nevertheless, we believe that AspectJ — and AOP in general
— has much to offer the profiling community. For example, two of the problems studied (namely,
object lifetime and wasted time) are not generally supported by profilers in the main (such as hprof),
and yet were easily expressed in AspectJ. Furthermore, building upon our techniques to develop more
powerful profilers should be straightforward and opens up many possibilities that would otherwise be
hard to achieve.
We do not expect AspectJ will excel at all types of profiling, since it operates on a fairly abstracted
program model which, most notably, ignores many details of a method’s implementation. Profiling for
branch prediction is, therefore, impossible (since there is no branch join point). Likewise, profiling the
flow of values (e.g. reference values) through a program is hampered by the inability to monitor value
flow through local variables (this would require, at the very least, an assignment join point). Of course,
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
38
D. J. PEARCE ET AL.
both of these would be possible with a language supporting a richer variety of join points. Thus, these
issues are not with AOP in general, rather they are artifacts of a particular language (i.e. AspectJ) and
we could easily imagine an AOP language tailored more specifically to profiling.
In the future, we would also like to investigate how well AspectJ applies to other profiling problems,
such as those discussed in Section 8.2. We would also like to investigate the amount of perturbation
caused by djprof, although this is well-known to be a difficult undertaking [52].
ACKNOWLEDGEMENTS
We thank everyone on the AspectJ team at IBM Hursley for their help, as well as all members of the
aspectj-users mailing list. We also thank Prof. James Noble and the anonymous SP&E (and other) referees
for helpful comments on earlier drafts of this paper. Finally, Dr. Paul H. J. Kelly is supported by an Eclipse
Innovation Grant.
REFERENCES
1. J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S. A. Leung, R. L. Sites, M. T. Vandervoorde, C. A.
Waldspurger, and W. E. Weihl. Continuous profiling: Where have all the cycles gone? In Proceedings of the Symposium
on Operating Systems Principles, pages 1–14. ACM Press, 1997.
2. G. Ammons, T. Ball, and J. R. Larus. Exploiting hardware performance counters with flow and context sensitive profiling.
In ACM Conference on Programming Language Design and Implementation, pages 85–96. ACM Press, 1997.
3. S. L. Graham, P.B. Kessler, and M.K. McKusick. gprof: a call graph execution profiler. In ACM Symposium on Compiler
Construction, pages 120–126. ACM Press, 1982.
4. A. Srivastava and A. Eustace. ATOM: a system for building customized program analysis tools. In Proceedings of the
ACM Conference on Programming Language Design and Implementation, pages 196–205. ACM Press, 1994.
5. D. J. Pearce, P. H. J. Kelly, T. Field, and U. Harder. Gilk: A Dynamic Instrumentation Tool for the Linux Kernel. In
Proceedings of the International TOOLS Conference, pages 220–226. Springer-Verlag, 2002.
6. I. D. Baxter, C. Pidgeon, and M. Mehlich. DMS: program transformations for practical scalable software evolution. In
Proceedings of the IEEE International Conference on Software Engineering, pages 625–634. IEEE Computer Society
Press, 2004.
7. E. Visser. Stratego: A language for program transformation based on rewriting strategies. In Proceedings of the
International Conference on Rewriting Techniques and Applications, pages 357–362. Springer-Verlag, 2001.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
39
8. S. Liang and D. Viswanathan. Comprehensive profiling support in the Java Virtual Machine. In Proceedings of the
USENIX Conference On Object Oriented Technologies and Systems, pages 229–240. USENIX Association, 1999.
9. K. O’Hair. The JVMPI transition to JVMTI, http://java.sun.com/developer/technicalArticles/
Programming/jvmpitransition/, 2004.
10. G. Kiczales, J. Lamping, A. Menhdhekar, C. Maeda, C. Lopes, J. Loingtier, and J. Irwin. Aspect-oriented programming.
In Proceedings of the European Conference on Object-Oriented Programming, pages 220–242. Springer-Verlag, 1997.
11. J. D. Gradecki and N. Lesiecki. Mastering AspectJ : Aspect-Oriented Programming in Java. Wiley, 2003.
12. R. Laddad. AspectJ in Action. Manning Publications Co., Grennwich, Conn., 2003.
13. N. Mitchell and G. Sevitsky. LeakBot: An automated and lightweight tool for diagnosing memory leaks in large Java
applications. In Proceedings of the European Conference on Object-Oriented Programming, pages 351–377. SpringerVerlag, 2003.
14. O. Agesen and A. Garthwaite. Efficient object sampling via weak references. In Proceedings of the international
Symposium on Memory Management, pages 121–126. ACM Press, 2000.
15. P. Cheng, R. Harper, and P. Lee. Generational stack collection and profile-driven pretenuring. In Proceedings of the ACM
Conference on Programming Language Design and Implementation, pages 162–173. ACM Press, 1998.
16. N. Röjemo and C. Runciman. Lag, drag, void and use — heap profiling and space-efficient compilation revisited. In
Proceedings of the ACM International Conference on Functional Programming, pages 34–41. ACM Press, 1996.
17. B. Dufour, C. Goard, L. Hendren, O. de Moor, G. Sittampalam, and C. Verbrugge. Measuring the dynamic behaviour of
AspectJ programs. In Proceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages and
Applications, pages 150–169. ACM Press, 2004.
18. M. Arnold and B.G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of the ACM
Conference on Programming Language Design and Implementation, pages 168–179. ACM Press, 2001.
19. The Standard Performance Corporation. SPEC JVM98 benchmarks, http://www.spec.org/osg/jvm98, 1998.
20. T. E. Anderson and E. D. Lazowska. Quartz: a tool for tuning parallel program performance. In Proceedings of the ACM
Conference on Measurement and modeling of computer systems, pages 115–125. ACM Press, 1990.
21. Y. Coady, G. Kiczales, M. Feeley, and G. Smolyn. Using AspectC to improve the modularity of Path-Specific customization
in operating system code. In Proceedings of the Joint European Software Engeneering Conference and ACM Symposium
on the Foundation of Software Engeneering, pages 88–98. ACM Press, 2001.
22. O. Spinczyk, A. Gal, and W. Schröder-Preikschat. AspectC++: an aspect-oriented extension to the C++ programming
language. In Proceedings of the Conference on Technology of Object Oriented Languages and Systems, pages 53–60.
Australian Computer Society, Inc., 2002.
23. H. Kim. AspectC#: An AOSD implementation for C#. Master’s thesis, Department of Computer Science, Trinity College,
Dublin, 2002.
24. H. Ossher and P. Tarr. Hyper/J: multi-dimensional separation of concerns for Java. In Proceedings of the International
Conference on Software Engineering, pages 734–737. ACM Press, 2000.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
40
D. J. PEARCE ET AL.
25. S. Mcdirmid and W. C. Hsieh. Aspect-oriented programming with Jiazzi. In Proceedings of the ACM Conference on
Aspect Oriented Software Development. ACM Press, 2003.
26. J. Bonér. What are the key issues for commercial AOP use: how does AspectWerkz address them? In Proceedings of the
Conference on Aspect-oriented software development, pages 5–6. ACM Press, 2004.
27. A. Popovici, T. Gross, and G. Alonso. Dynamic weaving for aspect-oriented programming. In Proceedings of the
Conference on Aspect-Oriented Software Development, pages 141–147. ACM Press, 2002.
28. H. W. Cain, B. P. Miller, and B. J.N. Wylie. A callgraph-based search strategy for automated performance diagnosis. In
Proceedings of the European Conference on Parallel Processing (Euro-Par), pages 108–122. Springer-Verlag, 2001.
29. K. Yeung, P. H. J. Kelly, and S. Bennett. Dynamic instrumentation for Java using A virtual JVM. In Performance Analysis
and Grid Computing, pages 175–187. Kluwer, 2004.
30. J. des Rivires G. Kiczales and D. G. Bobrow. The Art of the Metaobject Protocol. MIT Press, 1991.
31. E. Hilsdale and J. Hugunin. Advice weaving in AspectJ. In Proceedings of the ACM Conference on Aspect-Oriented
Software Development, pages 26–35. ACM Press, 2004.
32. S. Kuzins. Efficient implementation of around-advice for the aspectbench compiler. Master’s thesis, Oxford University,
2004.
33. S. Hanenberg and R. Unland. Parametric introductions. In Proceedings of the Conference on Aspect-Oriented Software
Development, pages 80–89. ACM Press, 2003.
34. G. Bracha and W. Cook.
Mixin-based inheritance.
In Proceedings of the ACM Conference on Object-Oriented
Programming, Systems, Languages, and Applications, pages 303–311. ACM Press, 1990.
35. C. Clifton, G. T. Leavens, C. Chambers, and T. Millstein. MultiJava: Modular open classes and symmetric multiple dispatch
for Java. In Proceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications,
pages 130–145. ACM Press, 2000.
36. K. Sakurai, H. Masuhara, N. Ubayashi, S. Matsuura, and S. Komiya. Association aspects. In Proceedings of the ACM
Conference on Aspect-Oriented Software Development, pages 16–25. ACM Press, 2004.
37. J. Whaley. A portable sampling-based profiler for Java Virtual Machines. In Proceedings of the ACM Java Grande
Conference, pages 78–87. ACM Press, 2000.
38. D. J. Brear, T. Weise, T. Wiffen, K. C. Yeung, S. A.M. Bennett, and P. H. J. Kelly. Search strategies for Java bottleneck
location by dynamic instrumentation. IEE Proceedings — Software, 150(4):235–241, 2003.
39. M. Spivey. Fast, accurate call graph profiling. Software — Practice and Experience, 34:249–264, 2004.
40. M. Arnold, M. Hind, and B. G. Ryder. Online feedback-directed optimization of Java. In Proceedings of the ACM
Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 111–129. ACM Press, 2002.
41. R. Shaham, E. K. Kolodner, and M. Sagiv. Heap profiling for space-efficient Java. In Proceedings of the ACM Conference
on Programming Language Design and Implementation, pages 104–113. ACM Press, 2001.
42. B. Zorn and P. Hilfinger. A memory allocation profiler for C and Lisp programs. In Proceedings of the USENIX
Conference, pages 223–237. USENIX Association, 1988.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
41
43. W. H. Lee and J. M. Chang. An integrated dynamic memory tracing tool for C++. Information Sciences, 151:27–49, 2003.
44. R. Hall. CPPROFJ: aspect-capable call path profiling of multi-threaded Java applications. In Proceedings of the IEEE
Conference on Automated Software Engineering, pages 107–116. IEEE Computer Society Press, 2002.
45. M. Hull, O. Beckmann, and P. H. J. Kelly. MEProf: Modular extensible profiling for eclipse. In Proceedings of the Eclipse
Technology eXchange (eTX) Workshop. ACM Digital Library, 2004.
46. D. Marinov and R. O’Callahan.
Object equality profiling. In Proceedings of the Conference on Object-Oriented
Programing, Systems, Languages, and Applications, pages 313–325. ACM Press, 2003.
47. M. Hirzel, J. Henkel, A. Diwan, and M. Hind. Understanding the connectivity of heap objects. In Proceedings of the ACM
symposium on Memory management, pages 36–49. ACM Press, 2002.
48. S. A. Watterson and S. K. Debray. Goal-directed value profiling. In Proceedings of the Conference on Compiler
Construction, pages 319–333. Springer-Verlag, 2001.
49. T. Ball and J. R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Language Systems,
16(4):1319–1360, 1994.
50. D. Melski and T. W. Reps. Interprocedural path profiling. In Computational Complexity, pages 47–62, 1999.
51. Markus Dahm. Byte code engineering with the BCEL API. Technical Report B-17-98, Freie Universität Berlin, 2001.
52. A. D. Malony and S. Shende. Overhead compensation in performance profiling. In Proceedings of the European
Conference on Parallel Processing (Euro-Par), pages 119–132. Springer-Verlag, 2004.
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
42
D. J. PEARCE ET AL.
APPENDIX A — PJPROF IMPLEMENTATION
1. class PureJavaTimeProfiler {
2. Hashtable totals = new Hashtable();
3. TimerThread timer = null;
4. long startTime; int period = 100; // 100ms
5.
6. PureJavaTimeProfiler() {
7.
startTime = System.currentTimeMillis(); timer = new TimerThread();
8.
timer.setDaemon(true); timer.start();
9. }
10.
11. void sample() { // do the sampling
12. Map<Thread,StackTraceElement[]> m = Thread.getAllStackTraces();
13. Iterator<Thread> i = m.keySet().iterator();
14. while(i.hasNext()) {
15.
Thread t = i.next();
16.
if(t != timer && t.isAlive() && t.getThreadGroup().getName() != "system"
17.
&& t.getState() == Thread.State.RUNNABLE) {
18.
StackTraceElement ste[] = m.get(t);
19.
if(ste.length > 0) {
20.
// discard line number
21.
StackTraceElement s = new StackTraceElement(ste[0].getClassName(),
22
ste[0].getMethodName(), ste[0].getFileName(),-1);
23.
getTotal(s).value++;
24. }}}}
25.
26. MutInteger getTotal(Object k)
27. MutInteger s;
28. s = (MutInteger) totals.get(k);
29. if(s == null) { s = new MutInteger(0); totals.put(k,s); }
30. return s;
31. }
32.
33. class TimerThread extends Thread {
34. public void run() {
35.
while(true) { try { Thread.sleep(period); sample(); }
36.
catch(InterruptedException e) {}
37. }}}
38.
39. public static void main(String argv[]) {
40. new PureJavaTimeProfiler();
41. try {
42.
Class clazz = Class.forName(argv[0]);
43.
Method mainMethod = clazz.getDeclaredMethod("main",argv.getClass());
44.
// construct args array for target
45.
String nArgv[] = new String[argv.length-1];
46.
Object args[] = new Object[1];
47.
for(int i=1;i<argv.length;++i) { nArgv[i-1]=argv[i]; }
48.
args[0] = nArgv;
49.
mainMethod.invoke(null,args);
50. } catch(Exception e) {}
51. }}
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1
PROFILING WITH ASPECTJ
43
APPENDIX B — SIZEOF ASPECT
1. aspect SizeOf pertypewithin(*) {
2. static final private Hashtable cache = new Hashtable();
3. private int size = -1;
4.
5. after() returning(): staticinitialization(*) && !within(SizeOf) {
6.
size = sizeof(thisJPSP.getSignature().getDeclaringType()) + 8;
7. }
8. public static int get(Object o) {
9.
Class c = o.getClass();
10. if(SizeOf.hasAspect(c)) {
11.
SizeOf a = SizeOf.aspectOf(c);
12.
return a.size;
13. } else { // for classes which AspectJ cannot weave
14.
Integer r = (Integer) cache.get(c);
15.
if(r != null) { return r.intValue(); }
16.
else {
17.
int x = sizeof(c,o) + 8;
18.
cache.put(c,new Integer(x));
19.
return x;
20. }}}
21. static public int sizeof(Class c, Object dims...) {
22. int tot = 0, m = 1;
23. if(c.isArray()) {
24.
for(int i=0;i!=dims.length;++i) {
25.
c = c.getComponentType(); // move toward type held by array
26.
int d = ((Integer) dims[i]).intValue();
27.
if(i != (dims.length-1)) { tot += m * ((d*4) + 8); }
28.
else { tot += m * ((d*primitiveSize(c)) + 8); }
29.
m = m * d;
30. }} else {
31.
Field fs[] = c.getDeclaredFields();
32.
for(int i=0;i!=fs.length;++i) {
33.
Field f = fs[i];
34.
if(isInstance(f)) {
35.
Class ft = f.getType ();
36.
tot += primitiveSize(ft);
37.
}}
38.
Class s = c.getSuperclass();
39.
if(s != null) { tot += sizeof(s); }
40. }
41. return tot;
42. }
42. static private boolean isInstance(Field f) {
43. return !Modifier.isStatic(f.getModifiers());
44. }
45. static private int primitiveSize(Class pt) {
46. if (pt == boolean.class || pt == byte.class) return 1;
47. else if (pt == short.class || pt == char.class) return 2;
48. else if (pt == long.class || pt == double.class) return 8;
49. else { return 4; } // object references, floats + ints
50. }}
Copyright c 2005 John Wiley & Sons, Ltd.
Prepared using speauth.cls
Softw. Pract. Exper. 2005; 00:1–1