TL;DR: if you do nothing else, then make sure you set
-XX:TieredStopAtLevel=1
and -noverify
as default JVM args in your
IDE. They might not be recommended in a long-running, production
process, but they have a really big effect on startup time. Get a fast
disk (SSD). Also, don’t starve your app of CPU (you need 3 or 4 CPUs
to take advantage of some of the parallelization in Spring Boot, 8 is
better). And use an up-to-date JVM (although actually 8u131-zulu
seems to to be faster than 8u141-openjdk
).
Spring and Spring Boot are not slow (after many optimizations, some of which were motivated by data here). Neither component scanning, autoconfiguration, condition processing, nor anything else they do on their own is a significant source of startup time. There is some evidence that most of the cost is associated with JVM overheads, loading and parsing classes. So essentially, if you want more features (more classes) you get slower startup. See in particular the data here for more detail.
Note
|
if you are looking for more information on memory consumption, as opposed to startup time, please see https://github.com/dsyer/spring-boot-memory-blog. |
This project uses JMH to test and report on various the effect of various parameters on Spring Boot startup times. The apps are launched in a forked JVM and the measurements below are all "seconds to start up", which includes the time from the app being executed to the point where it reports "Started" on the console.
Various combinations of things are tested:
-
Spring Boot version, 1.5.1, 1.4.2 and 1.3.8
-
Some JVM flags (e.g.
-noverify
has an effect-Xmx
not so much) -
Fat jar vs thin jar
-
Exploding the archive and running with
JarLauncher
or the application main -
Shading vs Spring Boot fat jars
-
Different application features: a basic web app with actuator, the petclinic and a Spring Cloud config server
-
Hardware (running on bare metal vs AWS for instance)
-
The scaling of startup time with bean count is investigated in more detail in the static module
Note
|
There is a Spring Boot issue with some comments and more benchmark results. This might be useful to read if you have slow start up times and want some company. Very important: check your anti-virus software and make sure it isn’t slowing down access to jar files. |
Spring Boot (and other) issues for follow up:
-
PropertySourcesPropertyResolver
: spring-boot #7572 -
Condition processing: spring-boot #7573
-
Cache configuration: spring-boot #7576
-
Lazy Actuator: spring-boot #7578
-
Lazy Validation: spring-boot 7579
-
Jackson: spring-boot 8028
-
Cache annotation processing: spring-boot 8056
-
Startup JVM args: spring-ide #85
You need Java 8 and Docker to build all the code cleanly (you can
build selectively the bits that don’t depend on Docker if you don’t
have it). To run the JsaBenchmark
you need an Oracle JVM.
Then do this:
$ ./mvnw clean install
$ (cd benchmarks/; java -jar target/benchmarks.jar)
If you see odd (longer than expected) results in the devtoolsRestart
benchmarks, check that you don’t have a
~/.spring-boot-devtools.properties
overriding the polling properties.
Install vagrant and the AWS plugin
(https://github.com/mitchellh/vagrant-aws). Then make sure you have an
EC2 security group called "vagrant" that allows ssh (open it up for
all traffic to be on the safe side). The way the Vagrantfile
is
configured you need a default VPC in us-east-2 (an account that was
created recently will have one by default).
To provide your AWS credentials, define some environment variables:
$ export AWS_ACCESS_KEY=...
$ export AWS_SECRET_KEY=...
$ export AWS_KEYPAIR=...
where AWS_KEYPAIR
is the AWS name of the key pair that you have
already registered with Amazon. For the configuration to work without
changes you also need the private key in a file with read-only (600)
permissions called ~/.ssh/id_rsa.${AWS_KEYPAIR}
. Then you can
$ ./mvnw clean install
$ cd vagrant
$ vagrant up --provider aws
$ vagrant ssh
> cd /build/benchmarks
> java -jar target/benchmarks.jar
Edit the Vagrantfile
to change the instance type and repeat the
process. You have to vagrant destroy
in between changes of
instance type.
If you need to update the benchmarks just mvnw clean install
locally
and then vagrant reload
to sync the changes to the remote box.
Note
|
I had a lot of issues getting SSH to work. Had to create my own
security group, and there was a hiccup with the ssh client wanting to
prompt me to change my ~/.ssh/known_hosts . Good luck.
|
The basic webapp also has a "minimal" version with a stripped down
classpath and handpicked configuration classes (instead of
@EnableAutoConfiguration
). I noticed that logback showed up a lot in
the -verbose:class
logs so I took it out of the minimal
sample. Ditto hibernate-validator
(lots of classes initialized
compared to the actual feature set I use).
I also tried a super-minimal sample, where I included, instead of a
handpicked set of autoconfiguration, the whole set of configuration
from the vanilla demo. That’s a bit slower (maybe 150ms) to start up,
but not much in it. Also I copied the config files and manually
stripped out the @Conditional
annotations, so there is no condition
processing. This made no appreciable difference (with Spring Boot
1.5.4, although I suspect it might have done with 1.3.8).
Summary:
-
It is possible to start a non-trivial app from cold in well under 2 seconds (well under 1 second in a warm JVM).
-
Spring Boot 1.4 and 1.3 are barely distinguishable (1.4 slightly slower but within the error bars). 1.5 is fast too. 2.0 a bit slower, but not much.
-
Devtools restart is much faster than a cold start.
-
Running the app’s main method from an exploded jar is marginally faster.
-
Hardware matters a lot, but affects all apps roughly the same.
-
Spring Boot (and Spring Data) slow things down (but only a bit).
-
Common Data Sharing (CDS) can improve startup a bit (see
JsaBenchmark
).
N.B. the times reported by the Spring Boot stop watch on the console are always shorter than those measured by the benchmark. The difference is a feature of the hardware used (200ms on faster platforms, 1700ms on slowest) and also the test ("thin" jars have a wider margin because they do a little bit of work checking caches before Spring Boot actually starts).
Spring Boot is 2.0 is a bit slower and 2.1 is a bit faster. Early milestones were fine, but degradation continued through the 2.0 development cycle. A lot of progress was made between M7 and RC1, so I suppose it could be worse. Results (before RC2):
Benchmark Mode Cnt Score Error Units
SpringBoot13xBenchmark.explodedJarLauncher avgt 10 2.103 ± 0.260 s/op
SpringBoot13xBenchmark.explodedJarMain avgt 10 1.447 ± 0.038 s/op
SpringBoot13xBenchmark.fatJar avgt 10 1.959 ± 0.081 s/op
SpringBoot14xBenchmark.explodedJarLauncher avgt 10 2.155 ± 0.080 s/op
SpringBoot14xBenchmark.explodedJarMain avgt 10 1.659 ± 0.053 s/op
SpringBoot14xBenchmark.fatJar avgt 10 2.097 ± 0.096 s/op
SpringBoot15xBenchmark.explodedJarLauncher avgt 10 1.964 ± 0.060 s/op
SpringBoot15xBenchmark.explodedJarMain avgt 10 1.495 ± 0.045 s/op
SpringBoot15xBenchmark.fatJar avgt 10 1.852 ± 0.055 s/op
SpringBoot20xBenchmark.explodedJarLauncher avgt 10 2.371 ± 0.107 s/op
SpringBoot20xBenchmark.explodedJarMain avgt 10 1.815 ± 0.053 s/op
SpringBoot20xBenchmark.fatJar avgt 10 2.248 ± 0.071 s/op
Spring Boot 2.0.4.RELEASE:
Benchmark Mode Cnt Score Error Units
PetclinicLatestBenchmark.devtoolsRestart avgt 10 1.210 ± 0.034 s/op
PetclinicLatestBenchmark.explodedJarMain avgt 10 4.110 ± 0.092 s/op
PetclinicLatestBenchmark.fatJar avgt 10 5.599 ± 0.057 s/op
PetclinicLatestBenchmark.noverify avgt 10 4.892 ± 0.152 s/op
Spring Boot 2.1.0 (snaphots after M1):
Benchmark Mode Cnt Score Error Units
PetclinicLatestBenchmark.devtoolsRestart avgt 10 1.208 ± 0.070 s/op
PetclinicLatestBenchmark.explodedJarMain avgt 10 3.897 ± 0.067 s/op
PetclinicLatestBenchmark.fatJar avgt 10 4.996 ± 0.032 s/op
PetclinicLatestBenchmark.noverify avgt 10 4.399 ± 0.029 s/op
There are a few improvements in Spring 1.5 that affect startup time (mainly to do with configuration processing). Here are some highlighted results, where "latest" means Pet Clinic with Boot 1.5:
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.fatJar138 avgt 10 2.959 ± 0.082 s/op
ConfigServerBenchmark.fatJar142 avgt 10 2.786 ± 0.078 s/op
ConfigServerBenchmark.fatJar150 avgt 10 2.668 ± 0.084 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 2.202 ± 0.111 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 1.516 ± 0.087 s/op
SpringBoot138Benchmark.fatJar avgt 10 2.024 ± 0.068 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 2.210 ± 0.061 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 1.649 ± 0.089 s/op
SpringBoot142Benchmark.fatJar avgt 10 2.138 ± 0.081 s/op
SpringBoot150Benchmark.explodedJarLauncher avgt 10 2.210 ± 0.078 s/op
SpringBoot150Benchmark.explodedJarMain avgt 10 1.664 ± 0.085 s/op
SpringBoot150Benchmark.fatJar avgt 10 2.128 ± 0.112 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 1.116 ± 0.070 s/op
PetclinicBenchmark.explodedJarMain avgt 10 3.469 ± 0.069 s/op
PetclinicBenchmark.fatJar avgt 10 5.003 ± 0.350 s/op
PetclinicBenchmark.noverify avgt 10 4.358 ± 0.113 s/op
PetclinicLatestBenchmark.devtoolsRestart avgt 10 1.056 ± 0.067 s/op
PetclinicLatestBenchmark.explodedJarMain avgt 10 3.441 ± 0.106 s/op
PetclinicLatestBenchmark.fatJar avgt 10 4.787 ± 0.112 s/op
PetclinicLatestBenchmark.noverify avgt 10 4.289 ± 0.070 s/op
MinimalBenchmark.explodedJarMain avgt 10 1.250 ± 0.029 s/op
MinimalBenchmark.fatJar avgt 10 1.579 ± 0.072 s/op
The configserver benefits by a few hundred millseconds. The vanilla demo (web plus actuator) interestingly does not. The Pet Clinic also gets a couple of hundred milliseconds boost, but not so much in the exploded jar case (which mimic what people get in the IDE). The minimal sample is quite a bit zippier.
New Dell desktop
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 1.034 ± 0.201 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 3.271 ± 0.234 s/op
ConfigServerBenchmark.fatJar138 avgt 10 3.267 ± 0.173 s/op
ConfigServerBenchmark.fatJar142 avgt 10 3.423 ± 0.160 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 1.387 ± 0.399 s/op
PetclinicBenchmark.explodedJarMain avgt 10 5.491 ± 0.357 s/op
PetclinicBenchmark.explodedShadedMain avgt 10 5.266 ± 0.332 s/op
PetclinicBenchmark.fatJar avgt 10 6.130 ± 0.271 s/op
PetclinicBenchmark.noverify avgt 10 5.430 ± 0.217 s/op
PetclinicBenchmark.shaded avgt 10 7.975 ± 0.307 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 2.764 ± 0.149 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 2.004 ± 0.074 s/op
SpringBoot138Benchmark.fatJar avgt 10 2.589 ± 0.107 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 2.873 ± 0.105 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 2.229 ± 0.083 s/op
SpringBoot142Benchmark.fatJar avgt 10 2.677 ± 0.209 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 2.474 ± 0.134 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 2.764 ± 0.160 s/op
MinimalBenchmark.explodedJarMain avgt 10 1.602 ± 0.120 s/op
MinimalBenchmark.fatJar avgt 10 1.901 ± 0.089 s/op
JsaBenchmark.basicThin avgt 10 5.725 ± 0.460 s/op
JsaBenchmark.sharedClasses avgt 10 4.302 ± 0.203 s/op
JsaBenchmark.thinMain avgt 10 4.808 ± 0.161 s/op
Excluding groovy and thymeleaf layout (unused dependencies) from the Pet Clinic had a big effect on the startup time, but oddly only in the exploded main test:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
<exclusions>
<exclusion>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy</artifactId>
</exclusion>
<exclusion>
<groupId>nz.net.ultraq.thymeleaf</groupId>
<artifactId>thymeleaf-layout-dialect</artifactId>
</exclusion>
</exclusions>
</dependency>
Before:
PetclinicBenchmark.explodedJarMain avgt 10 5.491 ± 0.357 s/op
and after:
PetclinicBenchmark.explodedJarMain avgt 10 4.894 ± 0.186 s/op
Setting -XX:TieredStopAtLevel=1
was quite a big boost, so we set
that permanently in the benchmarks (the rest of the results pre-date
this change):
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 0.842 ± 0.047 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 2.451 ± 0.100 s/op
ConfigServerBenchmark.fatJar138 avgt 10 3.002 ± 0.091 s/op
ConfigServerBenchmark.fatJar142 avgt 10 2.936 ± 0.101 s/op
MinimalBenchmark.explodedJarMain avgt 10 1.268 ± 0.077 s/op
MinimalBenchmark.fatJar avgt 10 1.537 ± 0.078 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 1.107 ± 0.054 s/op
PetclinicBenchmark.explodedJarMain avgt 10 3.618 ± 0.149 s/op
PetclinicBenchmark.fatJar avgt 10 5.040 ± 0.229 s/op
PetclinicBenchmark.noverify avgt 10 4.402 ± 0.149 s/op
ShadedBenchmark.explodedShadedMain avgt 10 3.764 ± 0.085 s/op
ShadedBenchmark.shaded avgt 10 5.871 ± 0.123 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 2.224 ± 0.083 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 1.542 ± 0.044 s/op
SpringBoot138Benchmark.fatJar avgt 10 2.102 ± 0.126 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 2.232 ± 0.067 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 1.684 ± 0.087 s/op
SpringBoot142Benchmark.fatJar avgt 10 2.146 ± 0.080 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 1.908 ± 0.120 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 2.081 ± 0.088 s/op
SpringBootThinBenchmark.petclinicThin avgt 10 4.038 ± 0.267 s/op
JsaBenchmark.basicThin avgt 10 4.230 ± 0.216 s/op
JsaBenchmark.sharedClasses avgt 10 3.124 ± 0.125 s/op
JsaBenchmark.thinMain avgt 10 3.573 ± 0.229 s/op
This wouldn’t be a sensible setting for a long-running process in production, but it seems to have a positive effect on startup, so at dev time it’s a good idea.
Class Data Sharing is a JDK feature (since Java 5). Some people have reported success in improving startup time with simple apps (e.g. see this blog).
CDS is not a "standard" JVM option, but it works for core JDK classes
with OpenJDK. You have to use the Oracle JDK for the JsaBenchmark
which also uses
application
class data sharing. IBM and Open J9 have their own version of the
same thing. Zulu works with the vanilla CdsBenchmark
.
There are data for a JsaBenchmark
that uses commercial features in
the Oracle JDK, and also for a "vanilla" CdsBenchmark (not using
commercial features).
With 8u151-oracle
:
Benchmark Mode Cnt Score Error Units
CdsBenchmark.sharedClasses avgt 10 2.871 ± 0.088 s/op
CdsBenchmark.thinMain avgt 10 2.933 ± 0.154 s/op
JsaBenchmark.sharedClasses avgt 10 2.893 ± 0.068 s/op
JsaBenchmark.thinMain avgt 10 2.898 ± 0.085 s/op
With 8u152-zulu
:
Benchmark Mode Cnt Score Error Units
CdsBenchmark.sharedClasses avgt 10 2.854 ± 0.042 s/op
CdsBenchmark.thinMain avgt 10 2.837 ± 0.072 s/op
So there is really nothing in it (which is what I remembered from the JDK9 results I looked at as well). The difference in the old results is probably entirely explained by the effect of -noverify, which I forgot to include in the non-shared classes execution.
This is a PetClinic with Boot 1.4.6. There’s no reason to put any effort into exploring the effect with other versions and other apps based on this data.
Java 10 has app class sharing as a new (non-commercial) feature. It is slower in general than Java 8, but the class sharing does improve startup times:
Benchmark (sample) Mode Cnt Score Error Units
Java10CdsBenchmark.sharedClasses older avgt 10 2.780 ± 0.071 s/op
Java10CdsBenchmark.sharedClasses latest avgt 10 3.407 ± 0.223 s/op
Java10CdsBenchmark.thinMain older avgt 10 3.217 ± 0.783 s/op
Java10CdsBenchmark.thinMain latest avgt 10 3.747 ± 1.026 s/op
The "older" is Boot 1.5, and "latest" is 2.0, so the effect is more dramatic for Boot 1.5. Things are a bit better with Java 11 (apparently there was a bug in Java 10 that made it slower on startup overall). See the static module for more data.
By switching between Logback and other libraries in the minimal sample
we can test the effect of the logging library. We did this by removing
the logging starter completely, and adding in commons-logging
(which
drops down to plain JDK logging by default). The results show that
logback slows down the fat jar compared to plain JDK logging, but not
the exploded main. The effect is only 100ms. Here are the data (with
all the rest of the test set up as per the previous section):
MinimalBenchmark.explodedJarMain avgt 10 1.292 ± 0.105 s/op
MinimalBenchmark.fatJar avgt 10 1.620 ± 0.082 s/op
Log4j2 is actually slower:
MinimalBenchmark.explodedJarMain avgt 10 1.425 ± 0.098 s/op
MinimalBenchmark.fatJar avgt 10 1.820 ± 0.069 s/op
Remember there are only about 40 lines of logging in these tests. It might well be that logback or log4j2 performs much better under load, and this test is just for initialization time really.
Stripping everything else back reveals EhCache initialization as a hotspot in the Pet Clinic (it shows up in YourKit when you launch in "trace" mode). Removing it from the classpath boosts startup by a couple of hundred milliseconds:
PetclinicBenchmark.devtoolsRestart avgt 10 1.095 ± 0.086 s/op
PetclinicBenchmark.explodedJarMain avgt 10 3.465 ± 0.145 s/op
PetclinicBenchmark.fatJar avgt 10 4.827 ± 0.241 s/op
PetclinicBenchmark.noverify avgt 10 4.249 ± 0.116 s/op
If you add -noverify
as well, you will see the startup time in an
exploded jar (so in the IDE) at a little more than 3s (down from over
6s initially). Note that EhCache is not used by the app unless the
"production" profile is enabled, so you can switch it off just by
disabling the profile. The benchmarks allow you to do that by setting
-Dbench.args
, e.g:
$ cd benchmarks/
$ java -Dbench.args='-Dspring.profiles.active=default' -Ddebug \
-jar target/benchmarks.jar PetclinicBenchmark
Removing the actuator actually changes the features of the application, so in some ways it’s an unfair test, but on the other hand, if you only want to test the business features you might not care if the actuator was not there on startup. The results are promising (cumulative on top of disabling ehcache):
PetclinicBenchmark.devtoolsRestart avgt 10 0.781 ± 0.056 s/op
PetclinicBenchmark.explodedJarMain avgt 10 2.845 ± 0.089 s/op
PetclinicBenchmark.fatJar avgt 10 4.083 ± 0.192 s/op
PetclinicBenchmark.noverify avgt 10 3.566 ± 0.146 s/op
What if we ditch the validator (also shows up as a hotspot in YourKit)? Again this is cutting into application features - you even have to edit the Pet Clinic quite heavily to get it to work without validation. Results are good though, so maybe we could make it lazy somehow:
Benchmark Mode Cnt Score Error Units
PetclinicBenchmark.devtoolsRestart avgt 10 0.724 ± 0.047 s/op
PetclinicBenchmark.explodedJarMain avgt 10 2.670 ± 0.136 s/op
PetclinicBenchmark.fatJar avgt 10 3.682 ± 0.097 s/op
PetclinicBenchmark.noverify avgt 10 3.213 ± 0.146 s/op
Initializing a Jackson ObjectMapper
is the next hotspot after
EhCache is disabled. It turns out that this is a side effect of
initializing Thymeleaf. It happens in a class initializer in
SpringTemplateEngine
, so there’s no way to switch it off with lazy
beans in Spring, other than to remove Jackson from the classpath.
Even if we removed Thymeleaf, and replaced it with Mustache for instance, Jackson is needed by the actuator. So we would pay the startup cost anyway, despite the fact that it isn’t used. This raises the question of why we need to initialize these things at all. They aren’t going to be used until a request is processed.
Here’s the result of replacing Thymeleaf with Mustache (on top of all the other improvements above) and excluding Jackson:
Benchmark Mode Cnt Score Error Units
PetclinicBenchmark.devtoolsRestart avgt 10 0.727 ± 0.052 s/op
PetclinicBenchmark.explodedJarMain avgt 10 2.577 ± 0.091 s/op
PetclinicBenchmark.fatJar avgt 10 3.514 ± 0.107 s/op
PetclinicBenchmark.noverify avgt 10 3.046 ± 0.142 s/op
There is a small improvement, but it’s not clear that it has precisely
the origin we attribute using the reasoning above. The reasoning
behind that assertion is that if we do a background initialization of
Thymeleaf in an ApplicationInitializer
(same pattern as
BackgroundPreinitializer
from Spring Boot) then there is no
improvement in startup time, despite the fact that Jackson and
Thymeleaf are already initialized. It is interesting that removing
Thymeleaf completely has an impact, but it must be something other
than simply the static initializer. Also notable is that we saw an
improvement above when we removed Hibernate Validator from the
project, but this is also pre-initialized by Spring Boot, so the
inefficiency is not in the initialization there either.
A lot of components in a Spring Boot app are in the category of "needed at some point, but maybe not even for all requests, and definitely not at startup". Another one that shows up in YourKit on startup is the Webjars Locator. You only get <100ms improvement in startup time by leaving it out (and it changes the behaviour of the application, so you want it in there at runtime), but you don’t need it until after startup.
Removing Jackson means removing the actuators as well, as things stand. But it has quite a dramatic effect (probably Thymeleaf has no impact at all and we are just seeing Jackson in the results above). Results:
Benchmark Mode Cnt Score Error Units
PetclinicBenchmark.devtoolsRestart avgt 10 0.851 ± 0.018 s/op
PetclinicBenchmark.explodedJarMain avgt 10 2.823 ± 0.075 s/op
PetclinicBenchmark.fatJar avgt 10 3.895 ± 0.039 s/op
PetclinicBenchmark.noverify avgt 10 3.397 ± 0.060 s/op
These should be compared with the same thing but fully loaded (not the latest result in the previous paragraph). Here’s a reminder of what they said:
Benchmark Mode Cnt Score Error Units
PetclinicBenchmark.devtoolsRestart avgt 10 1.107 ± 0.054 s/op
PetclinicBenchmark.explodedJarMain avgt 10 3.618 ± 0.149 s/op
PetclinicBenchmark.fatJar avgt 10 5.040 ± 0.229 s/op
PetclinicBenchmark.noverify avgt 10 4.402 ± 0.149 s/op
That’s pretty huge!
Spring 5.0 has an annotation processor that creates an index of
components on startup. It is easy to set up (you just put
spring-context-indexer
on the classpath), but since it only indexes
a component scan, which is a pretty small part of a trivial Spring
Boot application like the ones we are testing, it has no discernable
effect at runtime.
>5 year old Dell desktop
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 2.041 ± 0.173 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 6.930 ± 0.228 s/op
ConfigServerBenchmark.fatJar138 avgt 10 7.395 ± 0.203 s/op
ConfigServerBenchmark.fatJar142 avgt 10 7.439 ± 0.247 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 2.868 ± 0.195 s/op
PetclinicBenchmark.fatJar avgt 10 13.579 ± 0.222 s/op
PetclinicBenchmark.noverify avgt 10 12.186 ± 0.373 s/op
JsaBenchmark.basicThin avgt 10 10.830 ± 0.443 s/op
JsaBenchmark.sharedClasses avgt 10 8.158 ± 0.217 s/op
JsaBenchmark.thinMain avgt 10 9.568 ± 0.488 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 5.821 ± 0.191 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 4.365 ± 0.117 s/op
SpringBoot138Benchmark.fatJar avgt 10 5.550 ± 0.132 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 6.238 ± 0.225 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 4.840 ± 0.137 s/op
SpringBoot142Benchmark.fatJar avgt 10 6.238 ± 0.560 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 5.248 ± 0.130 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 5.715 ± 0.115 s/op
One year old Lenovo X1 laptop.
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 1.458 ± 0.256 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 5.217 ± 0.280 s/op
ConfigServerBenchmark.fatJar138 avgt 10 5.651 ± 0.221 s/op
ConfigServerBenchmark.fatJar142 avgt 10 5.812 ± 0.163 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 1.952 ± 0.249 s/op
PetclinicBenchmark.fatJar avgt 10 9.973 ± 0.637 s/op
PetclinicBenchmark.noverify avgt 10 9.312 ± 0.350 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 4.696 ± 0.242 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 3.599 ± 0.192 s/op
SpringBoot138Benchmark.fatJar avgt 10 4.427 ± 0.249 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 4.926 ± 0.313 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 3.824 ± 0.150 s/op
SpringBoot142Benchmark.fatJar avgt 10 4.814 ± 0.199 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 4.370 ± 0.200 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 4.796 ± 0.165 s/op
Slightly knackered old Lenovo X220 laptop.
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 1.568 ± 0.253 s/op
ConfigServerBenchmark.fatJar138 avgt 10 6.267 ± 0.274 s/op
ConfigServerBenchmark.fatJar142 avgt 10 6.426 ± 0.398 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 5.880 ± 0.326 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 2.053 ± 0.226 s/op
PetclinicBenchmark.fatJar avgt 10 11.241 ± 0.365 s/op
PetclinicBenchmark.noverify avgt 10 10.726 ± 0.526 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 5.526 ± 0.341 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 4.190 ± 0.201 s/op
SpringBoot138Benchmark.fatJar avgt 10 5.325 ± 0.206 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 5.958 ± 0.190 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 4.589 ± 0.258 s/op
SpringBoot142Benchmark.fatJar avgt 10 5.638 ± 0.269 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 5.082 ± 0.307 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 5.731 ± 0.211 s/op
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 1.760 ± 0.236 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 8.597 ± 0.218 s/op
ConfigServerBenchmark.fatJar138 avgt 10 9.114 ± 0.387 s/op
ConfigServerBenchmark.fatJar142 avgt 10 9.092 ± 0.358 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 2.709 ± 0.180 s/op
PetclinicBenchmark.fatJar avgt 10 15.277 ± 0.414 s/op
PetclinicBenchmark.noverify avgt 10 14.011 ± 0.340 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 7.327 ± 0.223 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 5.684 ± 0.099 s/op
SpringBoot138Benchmark.fatJar avgt 10 7.017 ± 0.218 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 7.825 ± 0.193 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 6.150 ± 0.160 s/op
SpringBoot142Benchmark.fatJar avgt 10 7.372 ± 0.204 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 7.055 ± 0.175 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 7.553 ± 0.255 s/op
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 1.071 ± 0.170 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 4.504 ± 0.250 s/op
ConfigServerBenchmark.fatJar138 avgt 10 4.706 ± 0.110 s/op
ConfigServerBenchmark.fatJar142 avgt 10 4.730 ± 0.159 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 1.555 ± 0.198 s/op
PetclinicBenchmark.fatJar avgt 10 8.181 ± 0.396 s/op
PetclinicBenchmark.noverify avgt 10 7.393 ± 0.304 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 3.857 ± 0.141 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 3.113 ± 0.145 s/op
SpringBoot138Benchmark.fatJar avgt 10 3.729 ± 0.088 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 4.074 ± 0.170 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 3.468 ± 0.159 s/op
SpringBoot142Benchmark.fatJar avgt 10 3.916 ± 0.207 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 3.935 ± 0.176 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 4.250 ± 0.208 s/op
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 1.638 ± 0.255 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 7.222 ± 0.153 s/op
ConfigServerBenchmark.fatJar138 avgt 10 7.641 ± 0.212 s/op
ConfigServerBenchmark.fatJar142 avgt 10 7.616 ± 0.182 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 2.502 ± 0.314 s/op
PetclinicBenchmark.explodedJarMain avgt 10 11.447 ± 0.209 s/op
PetclinicBenchmark.fatJar avgt 10 12.835 ± 0.341 s/op
PetclinicBenchmark.noverify avgt 10 11.833 ± 0.438 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 6.195 ± 0.193 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 4.939 ± 0.217 s/op
SpringBoot138Benchmark.fatJar avgt 10 5.893 ± 0.190 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 6.471 ± 0.126 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 5.360 ± 0.209 s/op
SpringBoot142Benchmark.fatJar avgt 10 6.302 ± 0.199 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 6.013 ± 0.159 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 6.437 ± 0.301 s/op
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 1.874 ± 0.344 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 7.967 ± 0.270 s/op
ConfigServerBenchmark.fatJar138 avgt 10 8.375 ± 0.229 s/op
ConfigServerBenchmark.fatJar142 avgt 10 8.541 ± 0.317 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 2.707 ± 0.333 s/op
PetclinicBenchmark.explodedJarMain avgt 10 12.403 ± 0.385 s/op
PetclinicBenchmark.fatJar avgt 10 13.956 ± 0.439 s/op
PetclinicBenchmark.noverify avgt 10 13.023 ± 0.382 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 6.802 ± 0.272 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 5.370 ± 0.341 s/op
SpringBoot138Benchmark.fatJar avgt 10 6.609 ± 0.328 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 7.157 ± 0.233 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 5.946 ± 0.133 s/op
SpringBoot142Benchmark.fatJar avgt 10 6.855 ± 0.173 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 6.527 ± 0.262 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 7.175 ± 0.286 s/op
MacBook Pro (Retina, 15-inch, Late 2013) running OS X Yosemite (10.10.5) and Java 1.8.0_102
Benchmark Mode Cnt Score Error Units
ConfigServerBenchmark.devtoolsRestart avgt 10 0.999 ± 0.078 s/op
ConfigServerBenchmark.explodedJarMain avgt 10 3.747 ± 0.177 s/op
ConfigServerBenchmark.fatJar138 avgt 10 3.810 ± 0.158 s/op
ConfigServerBenchmark.fatJar142 avgt 10 3.846 ± 0.049 s/op
PetclinicBenchmark.devtoolsRestart avgt 10 1.431 ± 0.171 s/op
PetclinicBenchmark.fatJar avgt 10 6.995 ± 0.299 s/op
PetclinicBenchmark.noverify avgt 10 6.292 ± 0.308 s/op
SpringBoot138Benchmark.explodedJarLauncher avgt 10 3.256 ± 0.104 s/op
SpringBoot138Benchmark.explodedJarMain avgt 10 2.518 ± 0.074 s/op
SpringBoot138Benchmark.fatJar avgt 10 3.012 ± 0.046 s/op
SpringBoot142Benchmark.explodedJarLauncher avgt 10 3.374 ± 0.082 s/op
SpringBoot142Benchmark.explodedJarMain avgt 10 2.738 ± 0.040 s/op
SpringBoot142Benchmark.fatJar avgt 10 3.247 ± 0.109 s/op
SpringBootThinBenchmark.basic138Thin avgt 10 3.035 ± 0.083 s/op
SpringBootThinBenchmark.basic142Thin avgt 10 3.271 ± 0.093 s/op
There’s a tool in the JDK called jcmd
that prints out a load of useful stuff
about a running process. E.g:
$ jcmd <PID> PerfCounter.print
where <PID>
is the process id. The petclinic (on "tower") has these
times in ns (highlights):
...
sun.classloader.findClassTime=2051631382
sun.classloader.parentDelegationTime=220795236
sun.cls.appClassLoadTime=2315218900
sun.cls.classInitTime=1058084598
sun.cls.classLinkedTime=1142874583
sun.cls.classVerifyTime=1037315846
sun.cls.defineAppClassTime=946831715
sun.cls.lookupSysClassTime=28253463
sun.cls.parseClassTime=1055187254
sun.cls.sharedClassLoadTime=581133
sun.cls.sysClassLoadTime=223416957
sun.urlClassLoader.readClassBytesTime=299665148
...
So you can immediately save up to a second by using -noverify
(probably not advisable in production, but the jury is out on
that). The benchmark results confirm this the saving is actually only
700ms, but that’s a start.
Other things that stick out are
-
findClassTime
- time spent working out where the damn things are, loading the bytes, parsing the bytes and defining them to build Class instances (2052ms). -
readClassBytesTime
- time from when you have a reference to where the bytes are to the time you have read them in (300ms).
Also interesting (not shown above) is java.cls.loadedClasses
, at
10660. Perhaps this is over optimistic but maybe some chunk of them is
being touched by accident that aren’t necessary and they can be
skipped, or deferred.