Java Performance Tuning-D69518GC10-sg PDF
Java Performance Tuning-D69518GC10-sg PDF
Java Performance Tuning-D69518GC10-sg PDF
D73450
June 2011
Edition 1.0
D69518GC10
Student Guide
and Optimization
Java Performance Tuning
Disclaimer
This document contains proprietary information and is protected by copyright and other intellectual property laws. You may copy and
print this document solely for your own use in an Oracle training course. The document may not be modified or altered in any way.
Except where your use constitutes "fair use" under copyright law, you may not use, share, download, upload, copy, print, display,
perform, reproduce, publish, license, post, transmit, or distribute this document in whole or in part without the express authorization
of Oracle.
The information contained in this document is subject to change without notice. If you find any problems in the document, please
report them in writing to: Oracle University, 500 Oracle Parkway, Redwood Shores, California 94065 USA. This document is not
warranted to be error-free.
Trademark Notice
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective
owners.
Authors
Clarence Tauro, Michael Williams
Table of Contents
Introduction .............................................................................................................................................. 1-1
Introduction............................................................................................................................................. 1-2
Course Goal ........................................................................................................................................... 1-3
Course Objectives................................................................................................................................... 1-4
Class Introductions ................................................................................................................................. 1-5
Audience ................................................................................................................................................ 1-6
Prerequisites........................................................................................................................................... 1-7
Course Map ............................................................................................................................................ 1-8
Course Topics ........................................................................................................................................ 1-9
Preface
Profile
Related Publications
Additional Publications
• System release bulletins
• Installation and user’s guides
• Read-me files
• International Oracle User’s Group (IOUG) articles
• Oracle Magazine
• Typographic Conventions
Typographic Conventions
The following two lists explain Oracle University typographical conventions for words that
appear within regular text or within code samples.
Notations:
(N) = Navigator
(M) = Menu
(T) = Tab
(B) = Button
(I) = Icon
(H) = Hyperlink
(ST) = Sub Tab
1. In the navigation frame of the help system window, expand the General Ledger entry.
2. Under the General Ledger entry, expand Journals.
3. Under Journals, select Enter Journals.
4. Review the Enter Journals topic that appears in the document frame of the help system
window.
xii
Java Performance Tuning and Optimization Table of Contents
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle University and DeloitteAsparona Limited use only
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Introduction
Chapter 1 - Page 1
Chapter 1
Introduction
Introduction
Introduction
Chapter 1 - Page 2
Introduction
Course Goal
Course Goal
Introduction
Chapter 1 - Page 3
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Course Objectives
Course Objectives
Introduction
Chapter 1 - Page 4
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Class Introductions
Class Introductions
Class Introductions
Take a moment to share the above information with the class and instructor.
Introduction
Chapter 1 - Page 5
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Audience
Audience
Audience
The target audience for this course is listed in the slide.
Introduction
Chapter 1 - Page 6
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Prerequisites
Prerequisites
Prerequisites
The slide lists the key main prerequisites for this course.
Introduction
Chapter 1 - Page 7
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Course Map
Course Map
Language and
GC Concerns
Language Language
Tuning Performance Tuning
Course Map
The course map shows all the lessons of this course, and how they are grouped into logical
sections.
Introduction
Chapter 1 - Page 8
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Course Topics
Course Topics
Introduction
Chapter 1 - Page 9
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Course Schedule
Course Schedule
Session Module
1: Introduction
A.M.
Course Schedule
The class schedule might vary according to the pace of the class. The instructor will provide
updates.
At the end of the course, the instructor facilitates a feedback session that includes a written
questionnaire. Oracle University uses your feedback to improve our training programs. We
appreciate your honest evaluation.
Introduction
Chapter 1 - Page 10
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Course Environment
Course Environment
Classroom PC
Course Environment
In this course, the following products are preinstalled for the lesson practices:
• JDK 6
• Firefox 3
• Visual VM
• Oracle Solaris Studio
• NetBeans 6.9.1
• Solaris Performance Tools CD 3.0
Introduction
Chapter 1 - Page 11
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Additional Resources
Additional Resources
Topic Website
Education and Training http://education.oracle.com
Introduction
Chapter 1 - Page 12
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Chapter 2 - Page 2
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
JVM and Performance Overview
Objectives
Objectives
JVM Overview
JVM Overview
Java was originally developed by James Gosling at Sun Microsystems (a subsidiary of Oracle
Corporation) in 1995. Features include object-oriented syntax, automatic garbage collection,
and platform-independent program execution.
Java programs are compiled into bytecode and stored in .class files. Java applications are
run by loading the .class files into a Java Virtual Machine (JVM) and executing the byte
code. The default JVM included as part of Java 6 is called the Java HotSpot Virtual Machine.
Native Native
Execution Method Method
JIT Garbage Interface Libraries
Engine Compiler Collector
The HotSpot JVM possesses an architecture that supports a strong foundation of features and
capabilities and supports the ability to realize high performance and massive scalability. For
example, the HotSpot JVM JIT compilers generate dynamic optimizations. In other words,
they make optimization decisions while the Java application is running and generate high-
performing native machine instructions targeted for the underlying system architecture.
In addition, through the maturing evolution and continuous engineering of its runtime
environment and multithreaded garbage collector, the HotSpot JVM yields high scalability on
even the largest available computer systems.
Native Native
Execution Method Method
JIT Garbage Interface Libraries
Engine Compiler Collector
What Is Performance?
What Is Performance?
There are a number of ways in which you can define performance. Some of these aspects of
performance impact the JVM.
Each of these aspects should be a well-defined requirement for an application. In addition,
the importance should be prioritized for the application. This clarification is important input to
the folks who are tackling the application's performance issues.
Memory Footprint
Memory Footprint
Memory Footprint
It is very important to know the environment and ecosystem in which your application runs.
Shared resources can have a significant impact on performance.
Virtual memory swapping should be minimized for any Java application. Consider a scenario
where a garbage collection occurs with a large portion of the Java heap in virtual memory.
Scanning for referenced objects would take much longer than when the heap is in physical
memory. It is very important to configure a system and the Java heap to avoid virtual memory
swapping.
Startup Time
Startup Time
Time ‘til Performance: The time it takes for the JVM to JIT compile the most frequently
executed methods. This is an area of interest to many financial services companies if shortly
after they launch a JVM or application, the busiest trading hours of the day occur shortly after
the launch the application. So, the JVM is trying to JIT compile code and at the same time the
application is trying to handle peak load. In other words, the application does not run at peak
efficiency until the code is "fully" JIT compiled and the JIT compiler is trying to use CPU cycles
while the application is receiving peak load.
In general, the larger the number of classes loaded and the longer and more complex the
classpath, the longer it will take to start an application.
Note: Initial start time may not be indicative of an application’s performance after Hotspot JVM
has had a chance to make optimizations.
Tip: Consider disabling bytecode verification, but only if your company's security policy and
the application are not subject to bytecode tampering. Bytecode verification does add
overhead to the loading of classes.
Scalability
Scalability
Scalability can apply to not only an application but also to many other aspects of a system of
which the application is a part.
Tip: It is very important to have a development or qualification environment that replicates
production environment situations. This reduces the chance to be caught off-guard with an
application that does not scale well.
Application Scalability
Application Scalability
Response Time in Milliseconds
Application Scalability
In this example, response times of the application with poor scalability rise exponentially as
the number of users increases.
The response times of the application with good scalability rises in a more linear fashion.
Responsiveness
Responsiveness
Responsiveness can be measured in a number of ways. But the bottom line is that a
responsive application returns requested data quickly. Client applications are typically more
focused on responsiveness.
Throughput
Throughput
Throughput
Server applications typically focus more on throughput and less on responsiveness.
This course focuses on optimizing Java application performance. Understanding your code
and its interaction with the JVM is key to developing high-performing Java applications.
Performance Methodology
Performance Methodology
• Performance methodology
– Monitor
Not only will we define these terms, but this is the basic approach for the course and for tuning
a Java application. First we monitor, then profile, and finally tune. This process is followed
throughout this course.
Performance Monitoring
Performance Monitoring
Non-intrusive: The act of monitoring the application does not materially impact the
performance of the application.
In most cases, monitoring is a preventive or proactive action. However, it can be an initial step
in a reactive action when troubleshooting a problem. Monitoring helps identify or isolate
potential issues without having a severe impact on runtime responsiveness or throughput of
an application.
This course focuses on using monitoring techniques and tools to improve application
throughput or responsiveness issues. There’s less emphasis on using monitoring for
troubleshooting or debugging.
Performance Profiling
Performance Profiling
Performance Tuning
Performance Tuning
Start
Analysis
Code
Test
No
Quality
OK
Yes
The diagram shows a typical development process. In the analysis phase, the problem to be
solved is defined and application requirements are created and evaluated. Next, in the design
phase, the higher level requirements are turned into more detailed requirements and
strategies for the programs and systems that will make up the application. With this
information, the coding and testing phases begin. Code is created to fulfill the requirements,
then tested, and refactored until the application is complete.
Typically, many applications developed using this traditional model give little attention to
performance or scalability requirements until the application is released.
Start
Analysis
Code
Benchmark
Performance No Profile
OK
Yes
Monitor Deploy
A better approach would be to include performance criteria beginning with the analysis phase.
Then, performance and scalability requirements could be tested throughout the development
of the application. This would result in a better performing application at deployment.
Summary
Summary
The original author of this course has written the book described in the slide.
Additional Resources
Additional Resources
Additional Resources
The slide lists materials that you might find interesting.
Objectives
Objectives
We want to observe the system to get a sense of what the performance problem is. In other
words, what "poor performance symptoms" are being exhibited by the application? Based on
the symptoms observed, we can determine the next step in the action of diagnosing the
performance issue. It's somewhat analogous to going to the doctor when you don't feel
well. The doctor has various different tools and instruments to check your performance.
Depending on the observed symptoms, the doctor makes recommendations as to the next
step to get you back on “performance par.”
• User time: The amount of CPU time spent in running a process outside the kernel.
• System time: The amount of CPU time spent using resources in the operating system
kernel.
• Idle time: The amount of time the CPU is not being utilized.
• Voluntary context switch (VCX): A thread is given a certain amount of CPU time to
run. The thread voluntarily yields the CPU after running for its scheduled time.
• Involuntary context switch (ICX): A thread is given a certain amount of CPU time to
run. The thread is interrupted and yields the CPU before completing its scheduled run
time.
High sys/kernel CPU time indicates a lot of CPU cycles are spent in the kernel. Also, high sys
CPU time could indicate shared resource contention, (in other words, locking). Reducing the
amount of time spent executing code in the kernel gives the application more CPU time to
execute.
block
The vertical bars represent a time quantum. A time quantum is the amount of time the
operating system scheduler makes available for an application thread to run. In this example,
an application has multiple threads and the scheduler allocates a time quantum for each
thread. Each thread executes until the end of its time quantum and then releases the CPU for
the next thread. This demonstrates a voluntary context switching situation.
block
Thread Priority
Interrupt
In this example, the first thread executes to completion. The second thread is interrupted and
is not allowed to finish its time quantum. The thread is involuntarily interrupted. This causes a
CPU and a thread context switch. From a CPU perspective, context switches are expensive. It
takes thousands of CPU cycles for context switches to occur.
CPU Tools
• prstat: Similar to top on Linux. On Solaris prstat is less intrusive than top.
• Gnome System Monitor: A graphical representation of CPU utilization on Linux. Also
works on Solaris if Gnome desktop is installed.
• cpubar: A graphical representation of CPU utilization on Solaris
• iobar: A graphical representation of I/O and CPU utilization.
Column Legend
usr: user time; sys: system time; idl: idle time; csw: context switches; icsw: involuntary
context switches
Column Legend
USR: user time
SYS: system time
VCX: voluntary context switches
ICX: involuntary context switches
Column Legend
USR: user time
SYS: system time
VCX: voluntary context switches
ICX: involuntary context switches
Graphical tools often provide more information faster than command-line tools.
On the far left, the 0 and 1 identify the two CPUs on this system. The average bar is the
average of the two CPUs. Dashed lines are average CPU utilization.
Bar Colors
Green: User
Red : System
Blue : Idle
Use Solaris prstat –Lm to locate the LWP ID consuming the most CPU (usr or sys).
prstat –Lm
This will list microstate information including all the threads in an application. Find the thread
using the most CPU (usr or sys).
1. Note the process ID (leftmost column) and the thread or LWPID (rightmost column).
2. Use HotSpot’s jstack to dump thread information about the running application. For
example:
jstack 2874 > temp.txt
3. Convert the LWPID from step 1 into hex. Search temp.txt for nid=0xHexValue. You
should find the name of the thread in the jstack output.
• Data of interest
– Network utilization in terms of Transaction Control Protocol
If an application is using the network and/or disk, it realizes the best performance by
minimizing the number of network and disk interactions. So, if an application accesses the
network and disks, ask yourself: Is the application suffering from too much network and/or
disk interaction? Monitoring the application gives you a sense of network and disk activity.
tcptop can identify who is doing what and how much traffic are they generating. If this were
a production system, you would want to know who is running rcp on this system and why.
• Data of interest
– Number of disk accesses
Number of disk accesses: Each disk access you make is a costly event. You want to
minimize the number of times you go to disk and read. The round trip time is very expensive.
Latency and average latencies: This is the time it takes to find something on the disk.
In this example, the find command is keeping disk cmdk0 busy almost 60% of time during
the 5-second interval. If a Java application were doing something similar, we would want to
investigate the reason why.
Column Legend
pi: pages in
po: pages out
sr: page scan rate
Scan Rate: The rate at which the operating system is scanning for free pages
On Solaris, as a system begins to swap to virtual memory, the page in, page out, and scan
rate increase. Page in and page out with no scan rate activity, or a small amount of scan rate
that terminates very quickly, is ok.
• Data of interest
– Footprint size
Process questions
Why are footprint size, number of threads, thread state, lock contention, and context switching
important to monitor?
What do lock contention and context switching look like on Solaris?
How can you find the lock or locks causing problems?
How can you address the thread context switching problem?
Processes: prstat-Lm
Processes: prstat-Lm
Processes: mpstat
Processes: mpstat
• Data of interest
– Kernel CPU utilization, locks, system calls, interrupts,
Why are high sys/kernel cpu, run queue depth, lock contention, migrations and context
switching important to monitor?
CPU Usage: Moving usage out of kernel provides more cycles for running applications and
better performance.
Lock contention: Inhibits scalability
Migrations: Process migration from one CPU to another should be minimized. When data is
moved from one CPU to another, it is not cached until it is moved over. It can take thousands
of cycles to move the data over, which can be quite an expensive operation.
Run Queue Depth: As the CPU schedules threads to execute if there are not enough CPUs
available for the number threads that want to execute, those threads are queued up in the run
queue. If the queue gets really deep, we are saturating the system with work.
Kernel: vmstat
Kernel: vmstat
Kernel: mpstat
Kernel: mpstat
Kernel: prstat-Lm
Kernel: prstat-Lm
Summary
Summary
Objectives
Objectives
What to Monitor
What to Monitor
Garbage Collector: The portion of the JVM responsible for freeing memory no longer utilized
by application logic; the “magic” that lets programmers not have to worry about “managing
memory”
Garbage collection involves traversing Java heap spaces where application objects are
allocated and managed by the JVM's garbage collector.
JIT compilation: The portion of the JVM responsible for turning byte code into executable
instructions for the target hardware platform
Questions to Ask
• How frequently are garbage collections occurring?
• What is the amount of time that is taken for a GC to complete?
Note: The amount of time that garbage collection takes is not necessarily related to the size of a
given Java heap space. Instead, it is related to the size or number of live objects in that heap
space.
HotSpot GC Basics
HotSpot GC Basics
Young Generation
The young generation holds newly created object and consists of an eden space plus two
survivor spaces.
Eden space – The memory space where objects are initially allocated.
Survivor spaces – The memory space used to age newer objects.
Permanent Generation
The permanent generation holds data needed by the virtual machine to describe objects that
do not have an equivalence at the Java language level. For example, objects describing
classes and methods are stored in the permanent generation.
Eden
All new objects are allocated to the eden space. When the eden space is full, a minor garbage
collection is triggered. The garbage collection is considered minor as it only affects the young
generation, which is generally smaller and has fewer objects than the old generation. Minor
garbage collections are faster with a minimal impact on performance.
Stop the World Event – A minor garbage collection is a Stop the World operation. This
means that all application threads are stopped until the operation completes. Minor garbage
collections are always Stop the World events.
Eden
Unreferenced
S0 survivor space S1 survivor space
1 1 1 1 Referenced
• Next minor GC
– Referenced objects from last GC become “from” survivor
Unreferenced
Once the eden space fills up, it again triggers a minor garbage collection.
1. The survivor space with referenced objects from the last GC is designated as the “from”
survivor space.
2. Referenced objects from the eden and the “from” space are now copied to the previously
empty survivor space and incremented. The survivor space is designated as the “to”
survivor space.
3. The eden space and the “from” survivor spaces are cleared of objects.
• Next minor GC
– Referenced objects from last GC become “from” survivor
Unreferenced
Unreferenced
Tenured
9 9
Eventually, when objects reach a certain age threshold, they are promoted to tenured space.
• The “to” survivor space from the last GC becomes the “from” survivor space for this GC.
• Referenced objects from the eden and “from” spaces are copied to the “to” survivor
spaces.
• In this example, objects surviving 8 GCs are promoted to tenured space.
• The eden space and “from” survivor space are cleared.
Eden
3 1
Tenured
9 9
The process as outlined in the last four steps repeats itself until an object age threshold is
met. At that point, aged objects are copied to the old generation space. The more often minor
garbage collections occur, the more quickly objects are promoted to old generation space.
• Young generation
– Eden
Serial Garbage Collector: The single-threaded young generation garbage collector. It can be
specified on the command line with: -XX:+UseSerialGC. This enables a serial garbage
collector for both young and old generation. The command-line option is discussed in more
detail later in the course.
Multithreaded (Parallel) Garbage Collector: For multithreaded young generation GC, you
have two options, both of which are considered parallel collectors. By default, each option is
used with a specific old generation collector.
-XX:+UseParallelGC (used with the old generation throughput collector)
-XX:+UseParNewGC (used with the old generation concurrent collector)
Both the throughput and concurrent old generation collectors are talked about in more detail
later in the course.
Server applications rarely use the serial collector. Desktop client applications typically use it.
However, as client applications grow in size, they are moving toward using multithreaded
collectors as well.
Permanent Generation
Permanent Generation
Survivor Space
Permanent Generation
The permanent generation contains metadata required by the JVM to describe the classes
and methods used in the application. The permanent generation is populated by the JVM at
runtime based on classes in use by the application. In addition, Java SE library classes and
methods may be stored here.
Classes may get collected (unloaded) if the JVM finds they are no longer needed and space
may be needed for other classes. The permanent generation is included in a full garbage
collection.
Interned Strings: A literal defined in double quotes in Java or by calling the intern method on
a string variable, (e.g., stringVar.intern())
• -verbose:gc
• -XX:+PrintGCTimeStamps
Using -verbose:gc
Using -verbose:gc
• Adding -XX:+PrintGCTimeStamps
host:~ $ java -client –verbose:gc –XX:+PrintGCTimeStamps
-Xmx12m -Xms3m -Xmn1m -XX:PermSize=20m -XX:MaxPermSize=20m
-jar /usr/java/demo/jfc/Java2D/Java2Demo.jar
...
3.791: [GC 1884K->1299K(5056K), 0.0031820 secs]
...
-verbose:gc displays detailed garbage collection information to the console for each GC while the
Java application runs.
What to Look For: Over a period of GC events, the overall amount of live data is constantly increasing
and you see a full GC. A large amount of space is reclaimed. Then this pattern repeats itself. The
pattern indicates that objects are being promoted too quickly or that the Heap size is too small.
-verbose:gc Output Defined
• GC: Means a minor GC. “Full GC” is listed for a full GC.
• 1884K: Heap size before the GC
• 1299K: Heap size after the GC
• 5056K: Overall size of the Java heap
• 0.0031820 secs: Time to complete the GC
-verbose:gc -XX:+PrintGCTimeStamps
Displays the seconds since the application started to each outline from –verbose:gc.
3.791 – Time since the launch of the JVM.
Format: YYYY-MM-DDTHH.MM.SS.mmm-tttt
YYYY = year
MM = month
DD = day of month
HH = hour
MM = minute
SS = seconds
mmm = milliseconds
tttt = time zone offset
Using -XX:+PrintGCDetails
Using -XX:+PrintGCDetails
-XX:+PrintGCDetails
...
Command line used most frequently by engineering and is recommended for customers to
use frequently as well. Looking for the same sort of pattern as mentioned into –verbose:gc.
GC: Minor GC
DefNew: The serial collector is being used (Default New)
Young Space
490K: Amount of live data in the young generation heap space
64K: Amount of live data in the young generation after the minor GC
(960K): Young generation space is 960K
0.0032800 secs: Time for the minor GC
5470K: Overall Java Heap space before GC
5151K: Overall Java Heap space after GC
7884K: Total size of the Java Heap
Times: user=0.00 sys=0.00, real=0.00 secs – CPU time in user, sys, and overall.
Full GC (System) – A full GC triggered by a system.gc() call in code.
Tenured Space
5087K: Tenured space before GC
5151K: Tenured space after full GC
6924K: Total Tenured space.
0.0971070 secs: Time for the full GC
• -XX:+PrintGCApplicationStoppedTime
• -XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
The amount of time an application has been paused for a safe point operation. The most
common safe point operation is a GC.
Safe Point Operations – JVM operations that require a Stop the World event. Safe point
operations require a known state before they can take place.
For example, JDK 5 introduced the idea of biased locks. Generally, locking operations are
requested by the last thread that acquired the lock. So, locking operations by default are
biased to this heuristic. However, if this turns out not to be the case in an application, the JVM
requires a safe point operation to turn off the locking bias.
-XX:+PrintGCApplicationConcurrentTime
The amount of time the application runs between safe point operations. The time between the
last safe point operation and the current safe point operation.
Note: These two command-line options can be very useful in identifying JVM-induced
latencies.
Using jps
Using jps
jps
• Command-line utility to find running Java processes
jps is very similar to ps in Solaris. Its purpose is to identify the process ID of Java processes.
Note: See jps man page on oracle.com for details about -q, -mlvV options:
http://download.oracle.com/javase/6/docs/technotes/tools/share/jps.html
Using jstat
Using jstat
jstat
• Command-line utility that runs on a local or remote JVM
jstat is a command-line tool that displays detailed performance statistics for a local or
remote HotSpot VM. The –gctuil command-line option is used most frequently.
See the jstat man page on download.oracle.com for details on garbage collection options:
http://download.oracle.com/javase/1.5.0/docs/tooldocs/share/jstat.html
Caution: When using the Concurrent Mark Sweep (CMS) collector (also known as concurrent
collector), jstat reports two full GC events per CMS cycle, which is obviously misleading.
However, young generation stats are accurate with the CMS collector.
Using jconsole
Using jconsole
jconsole
• Graphical Monitoring and
— threads
— CPU usage
— class loading
jconsole is a graphical monitoring and management console that comes with the HotSpot
JDK. jconsole supports both Java Management Extensions (JMX) and MBean technology.
This allows jconsole to monitor multiple JVMs at the same time. In addition, more than one
jconsole session can monitor a single JVM session at the same time. jconsole can
monitor the following JVM features:
• Memory usage by memory pool/spaces
• Class loading
• JIT compilation
• Garbage collection
• Threading and logging
• Thread monitor contention
Note: MBeans are managed beans, Java objects that represent resources to be managed.
They can be used with JMX applications.
Using VisualVM
Using VisualVM
VisualVM
• Graphical JVM monitoring tool
VisualVM is a lightweight graphical monitoring tool that is included with JDK 6 update 7 or
later. Based on an open source project (https://visualvm.dev.java.net), it has a number of cool
features including:
• Integration with JDK tools including jconsole and a subset of NetBeans Profiler
• Performance analysis and troubleshooting abilities including thread deadlock detection
and thread monitor contention
• Easily extended through its plug-in API
• Plug-ins can be installed directly into VisualVM using its plug-in center. Available plug-
ins include:
- jconsole
- VisualGC: covered in the next slide
- Glassfish plug-in
- btrace plug- in: A byte code trace plugin for the JVM. Similar to DTrace, but for a
JVM.
Visual VM can look visually for the same sort of GC patterns discussed earlier in the course.
Using VisualGC
Using VisualGC
VisualGC
• Stand-alone GUI or
VisualGC is a stand-alone graphical JVM monitor or a VisualVM plugin. In this course and
typically, VisualGC is used as a VisualVM plug-in. You can install it directly using the
VisualVM plug-in center. With VisualGC, a picture is worth a thousand words, because you
can see visually exactly what is going on with the garbage collector. In addition to garbage
collection, VisualGC also provides information about class loading and JIT compilation.
Using GCHisto
Using GCHisto
• Stand-alone GUI
• Open Source project: http://gchisto.dev.java.net
• Not included with JDK
GCHisto is a stand-alone GUI for analyzing GC log data. (There is also a VisualVM plugin
under development.) You can analyze multiple log files at the same time. GCHisto also allows
the comparison of heap sizes or collector types for JVM tuning by comparing GC logs. A
GCHisto lab is included in Lesson 7.
Open Source Project: http://gchisto.dev.java.net
JIT compilation is monitored infrequently. However, it is good to know what tools you can use
when JIT compilation analysis is required.
• Opt/de-opt Cycles: The JIT compiler optimizes a piece of code. Then at some point
later, it decides to de-optimize the same piece of code. When the compiler repeats this
cycle on the same piece of code, this can negatively impact performance.
• Failed Compilation: JIT bug that causes the compiler to fail to do an optimization. This
was a more frequent occurrence in Java 1.2 and 1.3. It is much more rare in Java 5 and
6.
The JIT compiler will de-optimize because it has learned some assumption in a previous
optimization turned out to be wrong. So, it must de-optimize and re-optimize the affect piece
of code.
Note: When you start a JVM with the –server option, there are a lot more JIT compiler
optimizations done than when using –client. This is why server applications take much
more time to load.
Using –XX:+PrintCompilation
Using –XX:+PrintCompilation
• -XX:+PrintCompilation Sample
– Blue text added to output
– Shows an opt/de-opt cycle
If you see this endless repeating de-optimize/re-optimize of the same method, then you can
tell HotSpot not to compile it. But, this tends not to be an issue with Java 5 and Java 6.
Focusing on Throughput
Focusing on Throughput
Throughput
• Have as highest priority the raw throughput of the
Throughput-Sensitive Applications
A Java application that focuses on throughput emphasizes the raw throughput of the
information or data being processed. This is the most important quality for the application.
Pause times resulting from JVM garbage collection events are not an issue, or of very little
interest. As long as the overall throughput of the application over a period of time is not
sacrificed, long pause times are ok. Examples of applications that focus on throughput
include:
• A large phone company printing bills or statements
• A large credit card company printing statements
• A bank calculating interest for accounts
Focusing on Responsiveness
Focusing on Responsiveness
Responsiveness
• Have as highest priority the servicing of all requests within
Responsiveness-Sensitive Applications
A Java application that focuses on responsiveness emphasizes how quickly an application
responds in a given scenario rather than focusing on the raw throughput of the application.
Most applications emphasizing responsiveness have a maximum pause time the application
can tolerate. Examples of applications that focus on responsiveness include:
• Applications connected to a user interface such as a web browser or an IDE
• Financial trading applications
• Telecommunication applications
Java applications focusing on responsiveness are sensitive to the time it takes for garbage
collection events to complete.
Summary
Summary
Performance Profiling
Chapter 5 - Page 1
Chapter 5
Performance Profiling
Performance Profiling
Chapter 5 - Page 2
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Objectives
Objectives
Performance Profiling
Chapter 5 - Page 3
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Performance Profiling
Chapter 5 - Page 4
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
• Characteristics:
– CPU performance profiling using bytecode instrumentation
The NetBeans Profiler is included in the standard NetBeans distribution. It works by injecting
bytecode into your application’s bytecode. The NetBeans Profiler is powerful and easy to use.
The profiler can minimize profiler overhead with root method selection also making it easy to
target specific parts of an application.
Performance Profiling
Chapter 5 - Page 5
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
NetBeans
NetBeans
• Supported platforms:
– Solaris (SPARC and x86)
Performance Profiling
Chapter 5 - Page 6
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Oracle Solaris Studio has a very minimal intrusiveness right out of the box. Typically, intrusion
is only 10% of application performance or less. This is much less than typical Java profilers,
which have a 30% to 50% performance impact.
With Oracle Solaris Studio you collect data from the application, then store it and analyze it
later. This differs from traditional Java profilers, which take snapshots and analyze
applications while they run. It also enables you to analyze both user and system CPU time.
Inclusive method time is the time it takes to execute a method and anything that method calls.
Exclusive method time is the time it takes to execute the method by itself. Inclusive times are
good for measuring the performance of algorithms in code.
CPU counters enable you to analyze which pieces of code experience the most CPU cache
misses or translation lookaside buffer misses. A translation lookaside buffer is a cache used to
improve virtual address translation in a CPU.
Performance Profiling
Chapter 5 - Page 7
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Oracle Solaris Studio is really simple to use. Just prepend collect –j to your normal Java
command line. For example:
collect -j on java BatchProcessor
Note: Although the product is called Oracle Solaris Studio, it also works on Linux.
Performance Profiling
Chapter 5 - Page 8
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
After running the collect –j option, an experiment file is created. The file in the slide shows
the experiment file in the Analyzer. If you want a command-line option, you can use
er_print. The er_print utility can be controlled by scripts. This makes the automated
analysis possible.
Performance Profiling
Chapter 5 - Page 9
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
• Used in combination:
– jmap: Produces heap profile
Both tools are distributed with the JDK. Both are command-line tools that are not as flashy as
Studio.
Performance Profiling
Chapter 5 - Page 10
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Profiling Tips
Profiling Tips
• CPU profiling
• Heap profiling
Profiling Tips
• CPU Profiling: Used when there is a large amount of CPU time in system or kernel
utilization. Also used when there is not enough application throughput.
• Heap Profiling: Should be used for throughput or responsiveness issues. Also should
be used when full garbage collections are occurring frequently.
• Memory Leak: Used when a Java heap continues to grow and grow over time without
bound. This can lead to “out-of-memory” errors from the JVM.
• Lock Contention: Used when there is a large number of context switches, which
correlate to high-CPU utilization
• Tool Selection: Selecting the right tool for the kind of issue that needs to be addressed
• Inlining Effect: A JVM optimization that automatically consolidates methods for
execution. This reduces the overhead associated with moving methods on and off the
stack. This can lead to confusion when profiling, as two- or three-line methods may
behave in a confusing manner.
Performance Profiling
Chapter 5 - Page 11
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Determine where the application is spending the most amount of time. Instead of focusing on
individual methods, focus on the use case and call space of where your application is
spending the most time. How could the algorithm be improved?
Performance Profiling
Chapter 5 - Page 12
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Methods that have a high system CPU usage without performing much I/O are generally a
good place to start.
Inclusive method time: The time it takes to execute a method and anything that method
calls. Inclusive times are good for measuring the performance of algorithms in code. Looking
at inclusive times may help identify a change in implementation or design that could be a good
corrective approach.
Exclusive method time: The time it takes to execute the method by itself. Looking at
exclusive times focuses on specific implementation details within a method.
By isolating a portion of the application, you can narrow the focus of your investigation. The
NetBeans Profiler is typically only 10% intrusive compared to some commercial profiling tools,
which can be anywhere from 50% to 90% intrusive.
Performance Profiling
Chapter 5 - Page 13
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
The sampling rate could be increased up to five seconds. This reduces the size of the data set
produced when analyzing an application over a few hours.
DTrace
Creating your own DTrace scripts can be a bit daunting for most Java developers. However,
there are several DTrace scripts in the samples directory of the JDK.
Performance Profiling
Chapter 5 - Page 14
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Remote profiling can be a bit more difficult to set up compared to local profiling. To make this
easier, the NetBeans IDE includes wizards to help design a Java command line that you can
use to remote profile.
Accuracy
With NetBeans profiler, instrumentation is added to your application. This can affect the
performance of you application, especially for very small methods. This may result in
conflicting output.
Oracle Solaris Studio should give you better data because it does not add the instrumentation
to your application.
Performance Profiling
Chapter 5 - Page 15
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Heap profiling provides information about the memory allocation footprint of an application.
In general, if you can minimize the number of objects being allocated, you should be able to
reduce the frequency of GC events and improve application performance. However, you
should avoid taking this concept to an extreme and try to avoid object allocations completely.
Performance Profiling
Chapter 5 - Page 16
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Performance Profiling
Chapter 5 - Page 17
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Look for objects that may have lengthy initialization times and allocate large amounts of
memory. They are good candidates for caching.
Performance Profiling
Chapter 5 - Page 18
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Although not as sophisticated as the NetBeans Profiler, jmap and jhat are a quick-and-dirty
way to look at the memory profile of an application.
jmap captures the heap snapshot, and jhat displays the data.
Performance Profiling
Chapter 5 - Page 19
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Consider alternative classes, objects, and possibly caching approaches for large allocators.
Note: jmap can be intrusive in that it must stop all the application threads to take the
snapshot. Generally, the more live objects there are in the heap, the longer it will take for the
JVM to write the dump file to disk.
Performance Profiling
Chapter 5 - Page 20
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Map Example
The most common example of an unintended memory leak is the use of hash maps or maps.
Developers will continue to add data to the map without ever removing it. The map will grow
until the JVM runs out of memory. This leads to poor performance due to frequent garbage
collections.
Performance Profiling
Chapter 5 - Page 21
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Performance Profiling
Chapter 5 - Page 22
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Pay close attention to surviving generations or object age when looking for memory leaks.
Surviving generations is the number of different object ages for a given class. An increasing
number of surviving generations over a period of time can be a strong indicator of a source of
a memory leak.
Remember, the object age is the number of garbage collections that the object has survived.
Performance Profiling
Chapter 5 - Page 23
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Capturing multiple heap profiles for comparison could easily be applied to the map scenario
discussed previously.
-XX:+HeapDumpOnOutOfMemoryError
Use this JVM command-line switch when starting an application. It can be used with the
-XX:HeapDumpPath=<path>/<file>.
For example, the following jhat OQL query returns the number of live HTTP requests for the
application:
select s from com.sun.grizzly.ReadTask s s.byteBuffer.position > 0
Performance Profiling
Chapter 5 - Page 24
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
HotSpot and many modern JVMs do optimistic locking within the JVM. Therefore, detecting
these locking operations outside the JVM can be challenging. Voluntary context-switching can
be an indicator of this sort of activity.
Fortunately, Collector/Analyzer is very good at identifying locking issues.
Performance Profiling
Chapter 5 - Page 25
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Identify ways to partition the “guarded” data such that multiple locks can be integrated at a
finer-grained level as a result of partitioning.
If writes are much less frequent than reads, separate read locks from write locks by using a
Java SE 5 ReentrantReadWriteLock. This class allows multiple threads to read an object
simultaneously. However, only a single write thread has access to the object for an update. In
this case, reads are blocked only in the rare instance of a write.
Concurrent data structures might introduce additional CPU utilization overhead and might in
some cases not provide as good a performance as a synchronized collection. This is due to
JVM optimizations targeted at synchronized usage. Therefore, compare the approaches with
meaningful workloads before selecting which approach is best.
Concurrent data structures tend to optimistically update data. The concurrent structure
anticipates the data states before and after writing. This can result in the JVM spinning in a
tight loop while waiting for the expected state to occur.
Performance Profiling
Chapter 5 - Page 26
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Biased Locking
Biased Locking
Biased Locking: In most cases, the last thread that used a lock is the most likely to request
that lock again. So the JVM will bias the lock to the last thread that held the lock.
• Introduced in JDK 5.0_06
• Improved in JDK 5.0_08
Performance Profiling
Chapter 5 - Page 27
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Inlining Effect
Inlining Effect
Inlining: A JIT optimization where the code from a called method is included in the calling
method. So, in effect, the method that was called no longer exists as its code is now part of
the method that originally called it. For example, you may see a method that merely calls other
methods having a high CPU utilization when it should not.
To disable inlining, use the following JVM command-line switch: -XX:-Inline
Note: Disabling inlining may distort the “actual” performance profile of your application. In
effect, you are testing a different application with this command-line option.
Caution: This command-line option should be used only for testing purposes.
Performance Profiling
Chapter 5 - Page 28
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Identifying Anti-Patterns
Identifying Anti-Patterns
Performance Profiling
Chapter 5 - Page 29
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Performance Profiling
Chapter 5 - Page 30
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Performance Profiling
Chapter 5 - Page 31
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Performance Profiling
Chapter 5 - Page 32
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Performance Profiling
Chapter 5 - Page 33
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Performance Profiling
Chapter 5 - Page 34
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Monitor Contention: A situation where multiple threads hold global locks too frequently or too
long.
Performance Profiling
Chapter 5 - Page 35
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Summary
Summary
Performance Profiling
Chapter 5 - Page 36
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Chapter 6 - Page 2
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Garbage Collection Schemes
Objectives
Objectives
A typical garbage collector is responsible for a high-speed memory allocation with optimal
memory utilization and makes sure that there are no long-term fragmentation problems. To
understand how to use the garbage collector efficiently and the various performance problems
that you may face while running a Java program in a garbage collected environment, it is
important to understand the basics of garbage collection and how garbage collectors work.
Before Marking
After Marking
A live object
Unreferenced Objects
Memory space
The first step that the garbage collector performs is called marking. The garbage collector
iterates one-by-one through the application graph, checks if the object is being referenced,
and if so marks the object as being used. The marked objects will not be deleted in the
sweeping stage.
Normal Deletion
Normal Deletion
After normal
deletion
The deletion of objects happens in the sweeping stage. The traditional and easiest way is to
mark the space as free and let the allocator use complex data structures to search the
memory for the required free space.
After normal
Deletion with
compacting
It is obvious that the traditional way of freeing memory has many problems associated with it.
An improved way is by providing a defragmenting system that compacts memory by moving
objects closer to each other and removes fragments of free space, if any. In this way, the
allocation of future objects is much faster.
HotSpot uses a type of garbage collection that is termed generational; what that means is the
Java heap is partitioned into generational spaces. The default arrangement of generations (for
all collectors with the exception of the throughput collector) looks something like the image
above.
There are three types of generational spaces:
• Young Generation
• Tenured Generation
• Permanent Generation
Each of these spaces and the role they play are discussed in the subsequent slides.
The non-generational garbage collectors iterate over every object in the heap and check
whether or not the object has some active references to it. As the number of objects increases
in the heap, this process would take a longer time to complete, and therefore, would be
inefficient.
A careful observation of a typical object would tell us the following two characteristics:
• Most allocated objects will die young.
• Few references from older to younger objects exist.
To take advantage of this, the Java HotSpot VM splits the heap into two physical areas, which
are called generations.
Major Spaces
At initialization, a maximum address space is virtually reserved but not allocated to physical
memory unless it is needed. The complete address space reserved for object memory can be
divided into the young and tenured generations.
The young generation consists of eden plus two survivor spaces. Objects are initially allocated
in eden. One survivor space is empty at any time, and serves as a destination of the next,
copying collection of any live objects in eden and the other survivor space. Objects are copied
between survivor spaces in this way until they are old enough to be tenured, or copied to the
tenured generation.
Other virtual machines, including the production virtual machine for the J2SE Platform version
1.2 for the Solaris Operating System, use two equally sized spaces for copying rather than
one large eden plus two small spaces. This means the options for sizing the young generation
are not directly comparable.
A third generation closely related to the tenured generation is the permanent generation. The
permanent generation is special because it holds data needed by the virtual machine to
describe objects that do not have an equivalence at the Java language level. For example,
objects describing classes and methods are stored in the permanent generation.
All newly allocated objects are allocated in the young generation, which is relatively smaller
than the Java heap, and is collected more frequently. Because most objects in it are expected
to become unreachable quickly, the number of objects that survive a young generation
collection (also referred to as a minor garbage collection) is expected to be low. In general,
minor garbage collections are very efficient because they concentrate on a space that is
usually small and is likely to contain a lot of garbage objects.
Objects that are longer-lived are eventually promoted, or tenured, to the old generation. This
generation is typically larger than the young generation and its occupancy grows more slowly.
As a result, old generation collections (also referred to as major garbage collections or full
garbage collections) are infrequent, but when they do occur they can be quite lengthy.
Allocation
Promotion
Old Generation
All newly allocated objects are allocated in the young generation, which is typically small and
collected frequently. Because most objects in the young generation are expected to die
quickly, the number of objects that survive a young generation collection is expected to be
low. In general, minor collections are very efficient because they concentrate on a space that
is usually small and is likely to contain a lot of garbage objects. The young generation
collection is also called as a minor collection.
Objects that are longer-lived are eventually promoted, or tenured, to the old generation. This
generation is typically larger than the young generation and its occupancy grows more slowly.
As a result, old generation collections are infrequent, but when they do occur they are quite
lengthy. The tenured generation collection is also called major collection.
eden
Before Marking
1 1 1 1
eden
GC Performance Metric
GC Performance Metric
There are two primary measures of garbage collection performance. Throughput is the
percentage of total time not spent in garbage collection, considered over long periods of time.
Throughput includes time spent in allocation (but tuning for speed of allocation is generally not
needed.) Pauses are the times when an application appears unresponsive because garbage
collection is occurring.
Users have different requirements of garbage collection. For example, some consider the right
metric for a web server to be throughput, because pauses during garbage collection may be
tolerable, or simply obscured by network latencies. However, in an interactive graphics
program even short pauses may negatively affect the user experience.
Some users are sensitive to other considerations. Footprint is the working set of a process,
measured in pages and cache lines. On systems with limited physical memory or many
processes, footprint may dictate scalability. Promptness is the time between when an object
becomes dead and when the memory becomes available, an important consideration for
distributed systems, including remote method invocation (RMI).
There are numerous ways to size generations. The best choice is determined by the way the
application uses memory as well as user requirements. Therefore, the virtual machine's
choice of a garbage collector is not always optimal, and may be overridden by the user in the
form of command-line options.
HT 4
HT 4
HT – Hardware Thread
Types of GC Collectors
Types of GC Collectors
The Java Virtual Machine assumes no particular type of automatic storage management
system, and the storage management technique may be chosen according to the
implementer’s system requirements.
Serial Collector
Serial Collector
The Serial GC is the garbage collector of choice for most applications that do not have low
pause time requirements and run on client-style machines. It takes advantage of only a single
virtual processor for garbage collection work (therefore, its name). Still, on today's hardware,
the Serial GC can efficiently manage a lot of non-trivial applications with a few hundred MBs
of Java heap, with relatively short worst-case pauses (around a couple of seconds for full
garbage collections).
Another popular use for the Serial GC is in environments where a high number of JVMs are
run on the same machine (in some cases, more JVMs than available processors!). In such
environments when a JVM does a garbage collection it is better to use only one processor to
minimize the interference on the remaining JVMs, even if the garbage collection might last
longer. And the Serial GC fits this trade-off nicely.
Old Generation
The configuration of the Serial GC is a young generation that operates as described above.
Both minor and full garbage collections take place in a Stop the World fashion (that is, the
application is stopped while a collection is taking place). Only after garbage collection has
finished is the application restarted.
Young Generation
Empty eden
Empty
Old Generation
The configuration of the Serial GC over an old generation is managed by a sliding compacting
mark-sweep, also known as a mark-compact garbage collector. The mark-compact garbage
collector first identifies which objects are still live in the old generation. It then slides them
towards the beginning of the heap, leaving any free space in a single contiguous chunk at the
end of the heap. This allows any future allocations into the old generation, which will most
likely take place as objects are being promoted from the young generation, to use the fast
bump-the-pointer technique.
The parallel garbage collector is similar to the young generation collector in the default
garbage collector but uses multiple threads to do the collection. By default on a host with N
CPUs, the parallel garbage collector uses N garbage collector threads in the collection. The
number of garbage collector threads can be controlled with command-line options. On a host
with a single CPU the default garbage collector is used even if the parallel garbage collector
has been requested. On a host with two CPUs the parallel garbage collector generally
performs as well as the default garbage collector and a reduction in the young generation
garbage collector pause times can be expected on hosts with more than two CPUs.
This new parallel garbage collector can be enabled by using command-line product flag -
XX:+UseParallelGC. The number of garbage collector threads can be controlled with the
ParallelGCThreads command-line option (-XX:ParallelGCThreads=<desired number>). This
collector cannot be used with a concurrent low pause collector.
The parallel young generation collector is similar to the parallel garbage collector (-
XX:+UseParallelGC) in intent and differs in implementation. Unlike the parallel garbage
collector (-XX:+UseParallelGC) this parallel young generation collector can be used with the
concurrent low pause collector that collects the tenured generation.
Note: The old and permanent generations are collected through a serial mark-sweep-compact
collection algorithm.
In the purist sense, the difference between a parallel collector and a parallel compacting
collector is “compacting.” Compacting describes the act of moving objects in a way that there
are no holes between objects. After a garbage collection sweep, there may be holes left
between live objects. Compacting moves objects so that there are no remaining holes. It is
possible that a garbage collector be a non-compacting collector. Therefore, the difference
between a parallel collector and a parallel compacting collector could be the latter compacts
the space after a garbage collection sweep. The former would not.
A parallel collector also implies a multithreaded garbage collector where it uses multiple
threads to perform the garbage collection. A multi-thread parallel compacting garbage
collector implies a multithreaded garbage collection and possibly a multithreaded compaction
capability too.
In the context of HotSpot, the term parallel collector suggests that the garbage collector is
enabled via -XX:+UseParallelGC or -XX:+UseParallelOldGC. These are also
described as the throughput collectors and both can be considered parallel compacting
collectors. The former is a multi-thread young generation collector with a single-threaded old
generation collector that also does single-threaded compaction of old generation.
The Concurrent Mark Sweep (CMS) collector (also referred to as the concurrent low pause
collector) collects the tenured generation. It attempts to minimize the pauses due to garbage
collection by doing most of the garbage collection work concurrently with the application
threads.
Normally the concurrent low pause collector does not copy or compact the live objects. A
garbage collection is done without moving the live objects. If fragmentation becomes a
problem, allocate a larger heap.
Note: CMS collector on young generation uses the same algorithm as that of the parallel
collector.
• Remark
– Finds objects that were missed by the concurrent mark
Concurrency No No Yes
New in the J2SE Platform version 1.5 is a feature referred to here as ergonomics. The goal of
ergonomics is to provide good performance from the JVM with a minimum of command-line
tuning. Ergonomics attempts to match the best selection of the following for an application:
• Garbage collector
• Heap size
• Runtime compiler
This selection assumes that the class of the machine on which the application is run is a hint
as to the characteristics of the application (that is, large applications run on large machines).
In addition to these selections is a simplified way of tuning garbage collection.
In the Java platform version 5.0, a class of machine referred to as a server-class machine has
been defined as a machine with
• Two or more physical processors
• 2 GB or more of physical memory
On server-class machines by default the following are selected:
• Throughput garbage collector
• Heap sizes
• Initial heap size of 1/64 of physical memory up to 1 GB
• Maximum heap size of ¼ of physical memory up to 1 GB
• Server runtime compiler
Ergonomics
Ergonomics
Summary
Summary
Chapter 7 - Page 2
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Garbage Collection Tuning
Objectives
Objectives
A typical garbage collector is responsible for a high-speed memory allocation with optimal
memory utilization and makes sure that there are no long-term fragmentation problems. To
understand how to use the garbage collector efficiently and the various performance problems
that you may face while running a Java program in a garbage collected environment, you
must understand the basics of garbage collection and how garbage collectors work.
In general, a garbage collector is responsible for three tasks:
• Allocating memory for new objects
• Ensuring that any referenced objects (live objects) remain in memory
• Recovering memory used by objects that are no longer reachable (dead objects)
The first step that the garbage collector performs is called marking. The garbage collector
iterates one-by-one through the application graph, checks if the object is being referenced,
and if so, marks the object as being used. The marked objects will not be deleted in the
sweeping stage.
Object creation has a cost, a cost in terms of CPU usage as well as memory. The garbage
collector generally takes care of the deallocation and recycling of the memory used by the
objects but takes a significant amount of memory and CPU time. Responsible coding
practices like avoiding extra temporary objects and unnecessarily creating objects at random
can lead to a significant speed gain.
-Xmx -XX:MaxPermSize
-Xms -XX:PermSize
reserved
reserved
to from eden tenured Perm
-XX:MaxNewSize
Concurrent collector
In the Java Standard Edition Platform version 1.5, there are three collectors other than the
default serial collector:
• Serial collector
• Throughput collector
• Concurrent collector
The fourth one is called Increment Concurrent Collector, which is only recommended for small
Java heaps and small number of hardware threads type of systems. It's not meant for large
Java heaps and systems with a large number of hardware threads.
Each is a generational collector that has been implemented to emphasize the throughput of
the application or low garbage collection pause times.
Throughput Responsiveness
• How much work can • Maintaining an elapsed
be done in a given time limit in which the
interval of time application much
respond
Throughput Collector
Solaris Zones is an implementation of operating system–level virtualization technology for x86
and SPARC systems first made available in 2005 as part of Solaris 10. It is present in newer
OpenSolaris-based distributions, like OpenIndiana and Solaris 11 Express.
Initial mark
2
Remark
3
Concurrent sweep
4
Concurrent reset
2
Remark
3
Concurrent sweep
4
Concurrent reset
Initial mark
3
Finds objects that were Concurrent sweep
missed by the concurrent
mark 4
Concurrent reset
Prepares for next
concurrent collection 5
6. To date, classes will not by default be unloaded from permanent generation when using
the concurrent collector unless explicitly instructed to do so by using both of the
following:
- -XX:+CMSClassUnloadingEnabled
- -XX:+PermGenSweepingEnabled
Note: The second switch is not needed in post HotSpot 6.0u4 JVMs.
7. If relying on explicit GC and you want them to be concurrent, use:
- -XX:+ExplicitGCInvokesConcurrent (requires 1.6.0 and later)
The concurrent collector is likely to require larger tenured heap space sizing than other
collectors due to heap fragmentation and floating garbage. The concurrent collector by default
is a multithreaded young generation collector and -XX:+UseParNewGC is enabled by default
with the concurrent collector.
Note
A serial collector can work better on applications with small young generation heaps versus a
parallel throughput collector. A throughput collector's parallel GC threads may compete for
work in small young generation heaps, resulting in thrashing. For small Java heaps, or small
young generation Java heaps, try both a serial collector and parallel throughput collector.
1. Use JConsole, VisualVM or jstat to observe the sizing behavior of the permanent
generation.
2. Use -XX:MaxPermSize=<n> to increase the maximum size. Also consider setting
-XX:PermSize=<n> to the same value to avoid performance overhead of permanent
generation space expansion.
3. The concurrent collector can be specified to collect permanent generation by using
-XX:+CMSClassUnloadingEnabled and -XX:+PermGenSweepingEnabled (not required
for HotSpot JVMs on Java 6).
Understanding -XX:+PrintGCDetails
In the case of a minor collection:
• DefNew is the young generation space. DefNew indicates the type of young generation
collector and that it is also a minor GC. DefNew indicates that the Serial Collector is
used for young generation. GC indicates a minor garbage collection.
• The old generation space is not collected.
• The permanent generation space is not collected.
Understanding -XX:+PrintGCDetails
In the case of a full collection:
• Tenured is the old generation space.
• Perm is the permanent generation space.
• Young generation stats are reported.
Minor Collection
Understanding -XX:+PrintGCDetails
Here is an analysis of the output:
[GC [PSYoungGen: 13737K->1978K(14080K)]
17407K->6303K(43840K), 0.2144150 secs]
Collected 13737 KB minus 1978 KB of space collected in a 14080-KB sized young generation
space.
43840 KB overall heap size
Time: It took .2144150 seconds to perform the collection.
PSYoungGen immediately indicates that the garbage collector used is the -
XX:+UseParallelGC or -XX:+UseParallelOldGC collector, that is the throughput collector.
Also "GC" indicates that it is a minor GC.
Other Information:
[GC[1 CMS-initial-mark:41011K(81920K)]41011K(96704K),0.0001346 secs]
CMS-initial-mark indicates the start of a concurrent collection cycle.
[CMS-concurrent-mark: 0.007/0.007 secs]
CMS-concurrent-mark indicates the end of the concurrent marking phase.
[CMS-concurrent-preclean: 0.000/0.000 secs]
CMS-concurrent-preclean indicates work performed concurrently in preparation for the
remark phase.
[1 CMS-remark: 55994K(81920K)] 59315K(96704K), 0.0003651 secs]
“Losing the race” occurs when the rate at which objects are being promoted to the old
generation exceeds the rate at which objects are being collected by the CMS cycle. The
corrective action to “losing the race” is tuning the Java heap size for effective object aging, or
starting the CMS cycle earlier, or a combination of both.
PrintGCStats prints a summary of garbage collection statistics taken from a log file
generated by java -verbose:gc -XX:PrintGCDetails. These statistics can be very
helpful when tuning the GC parameters for a Java application. The tool is applicable only for
32-bit JVMs and can be downloaded from:
http://java.sun.com/developer/technicalArticles/Programming/turbo/Pri
ntGCStats.zip.
Allows comparison of
JVM tuning, such as
heap sizes or collector
Tool summarizes GC
activity obtained from
GC logs
Summary
Summary
Objectives
Objectives
Do not call java.lang.System.gc(). The garbage collector generally does a much better job
than System.gc() in deciding when to perform garbage collection. In fact, performance is likely
to decrease if your application repeatedly calls System.gc(). If you are having problems with
memory usage, pause times for garbage collection, or similar issues, you should configure the
memory management system appropriately.
A common use of explicit GC is RMI distributed garbage collection (dgc). -
XX:+DisableExlipicitGC will disable RMI dgc. Consider tuning RMI dgc if needed rather than
disabling explicit GC.
Default RMI distributed GC interval is once per minute (60000 ms). To change this, use
-Dsun.rmi.dgc.client.gcInterval and -Dsun.rmi.dgc.server.gcInterval. The maximum value that
it can take is Long.MAX_VALUE.
It is important to consider the following issues if large strings are added to the ArrayList:
• Several array-resizing operations will take place.
• They will allocate several large arrays.
• They will cause a large amount of array copying.
• They might cause fragmentation issues on noncompacting GCs.
package java.lang.ref;
Java has three kinds of references, called soft references, weak references, and phantom
references, in order of increasing weakness.
Java has three orders of strength in holding onto objects.
1. Soft references can be deleted from a container if the clients are no longer referencing
them and memory is tight.
2. Weak references are automatically deleted from a container as soon as clients stop
referencing them.
3. Phantom references point to objects that are already dead and have been finalized.
The JVM holds onto regular objects until they are no longer reachable by either clients or any
container. In other words, objects are garbage collected when there are no more live
references to them. Dead references do not count.
Referent
ref someObject
GC
GCfound
foundthe
the
referent
Referent
dead
dead
and
andcleared
cleared
reference
reference
Referent
ref someObject
Reference queue
References queued in
ref the reference queue
Once a reference object is discovered by the garbage collector, it is queued for reference
processing, which can extend the lifetime of a reference object until the reference processing
is completed for that reference object.
Note: If there are many reference objects, the number of reference processing threads can
have an impact on the latency of retiring the reference objects.
In addition, many reference objects also give the garbage collector more work to do because
unreachable reference objects need to be discovered and queued during garbage collection.
Reference object processing can extend the time it takes to perform garbage collections,
especially if there are consistently many unreachable reference objects to process.
Object o = sr.get();
if (o != null) {
System.out.println(o);
} else {
// collected or has been reclaimed
}
Soft Reference
Soft references are kept alive longer in HotSpot Server JVM.
Use -XX:SoftRefLRUPolicyMSPerMB=<n> to control the clearing rate; the default is 1000 ms.
This specifies the number of ms a soft reference will be kept alive for each megabyte of free
heap space after it is no longer strongly reachable. Keep in mind that soft references are
cleared only during garbage collection, which may not occur as frequently as the value set of
SoftRefLRUPolicyMSPerMB.
Soft references are commonly used for caching.
System.out.println(reference.get());
System.out.println(map.get(reference));
System.out.println(reference.isEnqueued());
Memory Leaks
Memory Leaks
A memory leak means the garbage collector is not able to reclaim a certain amount of
memory, as the portion of the memory is still being referenced.
You can prevent memory leaks by watching for some common problems. Collection classes,
such as hashtables and vectors, are common places to find the cause of a memory leak. This
is particularly true if the class has been declared static and exists for the life of the application.
Memory Leaks
Memory Leaks
void someMethod(){
int count = 0;
while(true){
someIntegers.add(new Integer(count));
count++;
}
}
There is a severe performance penalty for using finalizers because of the way the garbage
collector works. When you have finalizers, the GC needs to figure out all the objects in eden
that need to be finalized, and queue them on a thread that actually executes the finalizers.
The GC cannot finish cleaning up the objects efficiently. So, it either has to keep them alive
longer than they should be, or has to delay collecting other objects, or both.
Relying on garbage collection to manage resources other than memory is not a good idea.
Here are some tips when using finalizers:
• Try to limit the use of the finalizer as a safety net. Use other mechanisms for releasing
resources.
• Either include an explicit method for “final” cleanup, or use an alternative “reference-
handling” approach by using WeakReferences or SoftReferences.
• If using a finalizer cannot be avoided, try to keep the work being done as small as
possible. For instance, do not rely on a finalizer to close file descriptors.
Summary
Summary
Objectives
Objectives
Strings: An Introduction
Strings: An Introduction
Java Is Fun
s1
The Joy of Java
Strings have a special status in the Java programming language. They are the only objects
with their own operators, such as + and +=. Characters surrounded by double quotes are also
Strings.
A String cannot be altered once created. Although a number of methods, such as
.toLowerCase(), .substring(), .trim(), and so on, appear to manipulate Strings, they
are unable to do so. Instead, the method returns an altered copy of the String.
2 ...
String s1 = "Java is Fun";
String s2 = "Java" + " is " + S1 is a compile time string.
new StringBuffer("Fun"); S2 is a runtime string.
...
3 public String sayHello(String name){ An expression involving String
return "Hello " + name; concatenation cannot be
} resolved at compile time.
At compile time, Strings are resolved to eliminate the concatenation operator if possible as in
point one in the slide and therefore, the boolean expression evaluated to true.
If you create a String using a StringBuffer, the compiler cannot resolve the String during
compile time. Compile time Strings are always more efficient than Strings being created
during runtime. In short, when a String can be fully resolved at compile time, the
concatenation operator is more efficient than using a StringBuffer. But when the String cannot
be resolved at compile time, the concatenation operator is less efficient than using a
StringBuffer. Therefore, the boolean expression (2) evaluates to false.
The String generated by the method in (3) cannot be resolved at compile time because the
variable name can have any value. The compiler is free to generate code to optimize the
String creation, but it does not have to. Consequently, the String-creation line could be
compiled as:
return (new StringBuffer( ))
.append("Hello").append(name).toString( );
s1 ...
Java System.out.println(s1 == s2);
String s2 = "Java";
s2 Java String s3 = new String("Java");
String s4 = new String("Java");
s3 ...
Java
s4
Java Virtual Machine maintains an internal list of references for interned Strings to avoid
duplicate String objects in heap memory. Whenever the JVM loads String literal from a class
file and executes, it checks whether that String exists in the internal list. If it already exists in
the list, it does not create a new String and it uses the reference to the existing String Object.
JVM does this type of checking internally for String literal but not for String object, which it
creates through the new keyword. You can explicitly force JVM to do this type of checking for
String objects that are created through the new keyword by using the String.intern() method.
This forces JVM to check the internal list and use the existing String object if it is already
present.
The diagram in the slide shows the creation of String Objects without using the intern()
method.
s1
s1
s1
In situations where String objects are duplicated unnecessarily, the String.intern() method
avoids duplicating String objects. The figure shows how the String.intern() method works. The
String.intern() method checks the object existence and if the object exists already, it changes
the point of reference to the original object rather than create a new object.
System.out.println(endTime - startTime);
...
The String class is used to manipulate character strings that cannot be changed. Simply
stated, objects of the String type are read only and immutable. The StringBuffer class is
used to represent characters that can be modified.
According to javadoc, StringBuilder is designed as a replacement for StringBuffer in
single-threaded usage. Their key differences in simple terms are:
• StringBuffer is designed to be thread-safe and all public methods in StringBuffer
are synchronized. StringBuilder does not handle the thread-safety issue and none of
its methods are synchronized.
• StringBuilder has better performance than StringBuffer under most
circumstances.
• Use the new StringBuilder wherever possible.
The proper use of exceptions can make your programs easier to develop and maintain, freer
from bugs, and simpler to use. When exceptions are misused, the opposite situation prevails:
programs perform poorly, confuse users, and are harder to maintain. Preventing exception
scenarios is one strategy: Use good programming habits such as checking object references
for null before accessing a member/method.
A best practice is to use primitive data types wherever possible, because it would be faster
than the object wrapper in all circumstances. Some unavoidable situations where you have to
box a primitive data type as an object would be:
• While storing the data in the collections
• If it is a requirement to pass the data as a reference type to a method
The program was tested on JDK 1.5.1 for running the loops 1,000,000 times. The result very
clearly shows that primitive data type is faster than using an object type, and boxing/unboxing
has a cost.
Instead of accessing the method that returns a mutable object several times, it is always
better that you have the object captured and use the data related to the object. This will avoid
unnecessary creation of objects.
Thread Synchronization
Thread Synchronization
The main issue with thread synchronization is a liveness problem. A concurrent application's
ability to execute in a timely manner is known as its liveness. The most common types of
liveness problems are deadlock, starvation, and livelock.
• Deadlock describes a situation where two or more threads are blocked forever, waiting
for each other.
• Starvation describes a situation where a thread is unable to gain regular access to
shared resources and is unable to make progress. This happens when shared resources
are made unavailable for long periods by “greedy” threads.
• A thread often acts in response to the action of another thread. If the other thread's
action is also a response to the action of another thread, then livelock may result.
Thread Synchronization
Thread Synchronization
2 int getBalance(){
synchronized(this){
Same as above. The
Account account = // ... current object is
return account.balance; locked.
}
}
int getBalance(){
3 Account account = // ...
synchronized(account){
return account.balance;
}
}
Using synchronization has a high performance cost. Improper synchronization can also cause
a deadlock, which can result in complete loss of service because the system usually has to be
shut down and restarted. But performance overhead cost is not a sufficient reason to avoid
synchronization completely. Failing to make sure that your application is thread-safe in a
multithreaded environment can cause data corruption, which can be much worse than losing
performance.
The following are some practices that you can consider to minimize the overhead:
• Synchronize critical sections only
• Use private fields
• Use a thread-safe wrapper
• Use immutable objects
• Know which Java objects already have synchronization built-in
• Do not undersynchronize
Collections
Collections
LinkedList
HashSet
Hashtable
HashMap
Some Java collection classes like ArrayList, LinkedList, HashSet, and HashMap are not
synchronized by default. If you are working on a multithreaded environment, the best practice
is to use Vector as a replacement for ArrayList or LinkedList and use Hashtable as a
replacement for classes like HashSet or HashMap.
Collections
Collections
The SynchronizedList synchronizes all access methods to the list by providing a synchronized
wrapper on top of the List. You must, however, always use the wrapper to access the list
elements.
The Iterator is not synchronized, because the moment anything changes in the List you get a
ConcurrentModificationException. This means that simply synchronizing individual access
methods (next, remove) will not be useful, because if the List is modified externally, the
Iterator becomes useless.
When using an Iterator, you actually have to synchronize the entire block that iterates over the
List.
The correct usage when using Lists in multiple threads is to get a SynchronizedList and use it
everywhere. In the code that uses an Iterator over that List, you have to be sure that you
synchronize the block on the SynchronizedList object.
• Tested for:
– add()
– get()
– remove()
Benchmark Results
Benchmark Results
• ArrayList
– The get() method is very fast.
The benchmark results give a clear understanding about the usage of the collection. If the
application has more random lookups, the best practice is to use ArrayList instead of
LinkedList, whereas if an application has more delete operations, you are strongly urged to
use LinkedList. It would be faster in every case.
For better performance results, avoid using a linear logic to copy the contents of an array.
Instead use System.arrayCopy() when you have to copy the entire contents from one array to
another.
I/O Performance
Buffering Tokenization
Formatting Costs
Random Access
When discussing Java I/O, it is worth noting that the Java programming language assumes
two distinct types of disk file organization. One is based on streams of bytes, the other on
character sequences. In the Java language a character is represented by using two bytes, not
one byte as in other common languages such as C. Because of this, some translation is
required to read characters from a file.
Speeding Up I/O
Speeding Up I/O
The first approach simply uses the read method on a FileInputStream. However, this
approach triggers several calls to the underlying runtime system, that is, FileInputStream.read,
a native method that returns the next byte of the file.
int cnt = 0;
int b;
The second approach avoids the above problem by using a large buffer.
BufferedInputStream.read takes the next byte from the input buffer, and only rarely accesses
the underlying system.
• Direct Buffering
...
The third approach avoids BufferedInputStream and performs buffering directly, thereby
eliminating the read method calls.
The huge speedup does not necessarily prove that you should always emulate the third
approach, in which you perform your own buffering. Such an approach may be error-prone,
especially in handling end-of-file events, if it is not carefully implemented. It may also be less
readable than the alternatives. But it is useful to keep in mind where the time goes, and how it
can be reclaimed when necessary.
Buffering
Buffering
Buffering is a technique where large chunks of a file are read from disk, and then accessed a
byte or character at a time. Buffering is a basic and important technique for speeding I/O, and
several Java classes support buffering (BufferedInputStream for bytes,
BufferedReader for characters).
An obvious question is: Will making the buffer bigger make I/O go faster? Java buffers
typically are by default 1024 or 2048 bytes long. A buffer larger than this may help speed I/O,
but often by only a few percent, say 5 to 10%.
• DataInputStream.readLine is obsolete.
2 FileReader fr = new FileReader("filename");
BufferedReader br = new BufferedReader(fr);
ps.println("\uffff\u4321\u1143");
pw.println("\uffff\u3214\u1243");
3. The program in (3) writes an output file, but without preserving the Unicode characters
that are actually output. The Reader/Writer I/O classes are character-based, and
are designed to resolve this issue. OutputStreamWriter is where the encoding of
characters to bytes is applied.
4. This program uses the UTF8 encoding, which has the property of encoding ASCII text as
itself, and other characters as two or three bytes.
Formatting Costs
Formatting Costs
Writing data to a file is only part of the cost of output. Another significant cost is data
formatting.
Formatting Costs
Formatting Costs
• Approach 1: The first approach is simply to write out a fixed string to get an idea of the
intrinsic I/O cost.
• Approach 2: The second approach uses simple formatting with the “+” character.
• Approach 3: The third approach uses the MessageFormat class from the java.text
package.
The fact that approach 3 is quite a bit slower than approaches 1 and 2 does not mean that you
should not use it. But you need to be aware of the cost in time.
Message formats are quite important in internationalization contexts, and an application
concerned about this issue might typically read the format from a resource bundle, and then
use it.
Random Access
Random Access
RandomAccessFile is a Java class for doing random access I/O (at the byte level) on files.
The class provides a seek method, similar to that found in C/C++, to move the file pointer to
an arbitrary location, from which point bytes can then be read or written.
The seek method accesses the underlying runtime system, and as such, tends to be
expensive. One cheaper alternative is to set up your own buffering on top of a
RandomAccessFile and implement a read method for bytes directly. The parameter to read is
the byte offset >= 0 of the desired byte. This technique is helpful if you have locality of access,
where nearby bytes in the file are read at about the same time. For example, if you are
implementing a binary search scheme on a sorted file, this approach might be useful. It is of
less value if you are truly doing random access at arbitrary points in a large file.
Compression
Compression
zos.putNextEntry(ze);
Java provides classes for compressing and uncompressing byte streams. These are found in
the java.util.zip package, and also serve as the basis for .jar files.
Whether compression helps or hurts I/O performance depends to a large extent on your local
hardware setup; specifically the relative speeds of the processor and disk drives.
Compression using Zip technology implies typically a 50% reduction in data size, but at the
cost of some time to compress and decompress.
Tokenization
Tokenization
st.resetSyntax();
st.wordChars('a', 'z');
int tok;
Tokenization refers to the process of breaking byte or character sequences into logical
chunks, for example words. Java offers a StreamTokenizer class to do so.
StreamTokenizer is sort of a hybrid class, in that it will read from character-based streams
(like BufferedReader), but at the same time operates in terms of bytes, treating all characters
with two-byte values (greater than 0xff) as though they are alphabetic characters. Writing a
low-level code would make the code run faster than the one that uses the StreamTokenizer
class.
Serialization
Serialization
oos.writeObject(employeeObject);
• Deserialization example:
FileInputStream fis = new FileInputStream("filename");
BufferedInputStream bis = new BufferedInputStream(fis);
ObjectInputStream ois = new ObjectInputStream(bis);
There is probably no faster way than serialization to write out large volumes of data, and then
read it back, except in special cases. For example, suppose that you decide to write out a 64-
bit long integer as text instead of as a set of 8 bytes. The maximum length of a long integer as
text is around 20 characters, or 2.5 times as long as the binary representation. So it seems
likely that this format would not be any faster. In some cases, however, such as bitmaps, a
special format might be an improvement. However, using your own scheme does work against
the standard offered by serialization, so doing so involves some tradeoffs.
Beyond the actual I/O and formatting costs of serialization (using DataInputStream and
DataOutputStream), there are other costs, for example, the need to create new objects
when deserializing.
Summary
Summary
Objectives
Objectives
CPU Tools
• Prstat: Similar to top on Linux. On Solaris prstat is less intrusive than top.
• Gnome System Monitor: a graphical representation of CPU utilization on Linux
• Cpubar: A graphical representation of CPU utilization on Solaris
• Iobar: A graphical representation of I/O and CPU utilization
Data of interest
• Us: user time
• Sy: system time
• Id: idle time
To launch:
# pidstat –u –w 5
Because 1420 bytes is the maximum MTU for TCP, there was an application using a lot of
bandwidth during the middle of this monitoring session.
Summary
Summary
Appendix B
Chapter 11 - Page 1
Chapter 11
Appendix B
Appendix B
Appendix B
Chapter 11 - Page 2
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Objectives
Objectives
Appendix B
Chapter 11 - Page 3
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Appendix B
Chapter 11 - Page 4
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Appendix B
Chapter 11 - Page 5
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Default values are listed for Java SE 6 for Solaris Sparc with
the –server option. Some options may vary per
architecture/OS/JVM version.
Appendix B
Chapter 11 - Page 6
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Behavioral Options
-XX:-AllowUserSignalHandlers
Do not complain if the application installs signal handlers. (relevant to Solaris and Linux only)
Appendix B
Chapter 11 - Page 7
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
-XX:AltStackSize=16384
Alternate signal stack size (in Kbytes) (relevant to Solaris only, removed from 5.0.)
-XX:-DisableExplicitGC
Disable calls to System.gc(), JVM still performs garbage collection when necessary.
-XX:+FailOverToOldVerifier
Fail over to old verifier when the new type checker fails. (introduced in 6)
-XX:+HandlePromotionFailure
The youngest generation collection does not require a guarantee of full promotion of all live
objects. (introduced in 1.4.2 update 11) [5.0 and earlier: false]
-XX:+MaxFDLimit
Appendix B
Chapter 11 - Page 8
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
-XX:+UseSplitVerifier
Use the new type checker with StackMapTable attributes. (Introduced in 5.0.)[5.0: false]
-XX:+UseThreadPriorities
Use native thread priorities.
-XX:+UseVMInterruptibleIO
Thread interrupt before or with EINTR for I/O operations results in OS_INTRPT. (Introduced in
6. Relevant to Solaris only.)
Performance Options
-XX:+AggressiveOpts
Appendix B
Chapter 11 - Page 9
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
-XX:+UseFastAccessorMethods
Use optimized versions of Get<Primitive>Field.
-XX:-UseISM
Use Intimate Shared Memory. [Not accepted for non-Solaris platforms.] For details, see
Intimate Shared Memory.
-XX:+UseLargePages
Use large page memory. (Introduced in 5.0 update 5.) For details, see Java Support for Large
Memory Pages.
-XX:+UseMPSS
Use Multiple Page Size Support w/4mb pages for the heap. Do not use with ISM as this
Debugging Options
-XX:-CITime
Prints time spent in JIT Compiler. (Introduced in 1.4.0.)
-XX:ErrorFile=./hs_err_pid<pid>.log
If an error occurs, save the error data to this file. (Introduced in 6.)
-XX:-ExtendedDTraceProbes
Enable performance-impacting dtrace probes. (Introduced in 6. Relevant to Solaris only.)
-XX:HeapDumpPath=./java_pid<pid>.hprof
Path to directory or filename for heap dump. Manageable. (Introduced in 1.4.2 update 12, 5.0
update 7.)
-XX:-HeapDumpOnOutOfMemoryError
Dump heap to file when java.lang.OutOfMemoryError is thrown. Manageable. (Introduced in
1.4.2 update 12, 5.0 update 7.)
-XX:OnError="<cmd args>;<cmd args>"
Run user-defined commands on fatal error. (Introduced in 1.4.2 update 9.)
Appendix B
Chapter 11 - Page 10
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
Appendix B
Chapter 11 - Page 11
THESE eKIT MATERIALS ARE FOR YOUR USE IN THIS CLASSROOM ONLY. COPYING eKIT MATERIALS FROM THIS COMPUTER IS STRICTLY PROHIBITED
allocated object. Each Java thread has its own allocation point. The default value varies with
the platform on which the JVM is running.
-XX:InlineSmallCode=
Inline a previously compiled method only if its generated native code size is less than this. The
default value varies with the platform on which the JVM is running.
-XX:MaxInlineSize=35
Maximum bytecode size of a method to be inlined.
-XX:FreqInlineSize=
Maximum bytecode size of a frequently executed method to be inlined. The default value
varies with the platform on which the JVM is running.
Appendix B
Chapter 11 - Page 12