In this module we study the effect of Spring and Spring Boot on startup time by first stripping out as much of it as we can, and then piling on more features, adding more dependencies and letting Spring Boot figure out the desired configuration. In the table below, the "demo" sample is the canonical, empty Spring Boot web app. All the smaller and faster apps are stripped down versions of that where we remove various things and finally get rid of even using reflection in the BeanFactory
(using the new "functional bean registration").
Results:
-
With a couple of outliers we discuss below, the startup time is directly proportional to the number of classes loaded. This usually correlates with the number of beans in the application context, but things like Hibernate and Zuul add more classes than beans.
-
Reflection is not a bottleneck - more work would have to be done with more complex scenarios, but the sample that uses functional bean registration for everything fits the curve of the more heavyweight applications, so it isn’t intrinsically much faster, it just has fewer beans (and therefore fewer features).
-
Condition processing in Spring Boot is not expensive - when we remove conditions partially ("slim", "thin") and completely ("lite", "func") the startup time is right on the curve. The same is true of component scanning, except possibly in the very largest of applications.
-
There is some evidence that most of the cost is associated with JVM overheads, loading and parsing classes (this is supported by other data, for instance on devtools restarts, which are much quicker than cold starts).
Since more beans mean more features, you are paying at startup for actual functionality, so in some ways it should be an acceptable cost. On the other hand, there might be features that you end up never using, or don’t use until later and you would be willing to defer the cost. Spring doesn’t allow you to do that easily. It has some features that might defer the cost of creating beans, but that might not even help if the bean definitions still have to be created and most of the cost is actually to do with loading and parsing classes.
Benchmark (sample) Mode Cnt Score Error Units Beans Classes
MainBenchmark zuul avgt 10 4.510 ± 0.095 s/op 516 9656
MainBenchmark erkb avgt 10 3.137 ± 0.049 s/op 494 7654
MainBenchmark busr avgt 10 2.525 ± 0.038 s/op 392 7129
MainBenchmark erko avgt 10 2.237 ± 0.071 s/op 313 6392
MainBenchmark jpaa avgt 10 3.232 ± 0.057 s/op 257 8297
MainBenchmark jpag avgt 10 2.897 ± 0.023 s/op 178 8294
MainBenchmark jpaf avgt 10 2.865 ± 0.043 s/op 177 8291
MainBenchmark jpae avgt 10 2.829 ± 0.038 s/op 176 8284
MainBenchmark jpad avgt 10 2.739 ± 0.073 s/op 175 7946
MainBenchmark conf avgt 10 1.636 ± 0.029 s/op 250 6232
MainBenchmark actj avgt 10 1.540 ± 0.049 s/op 226 6033
MainBenchmark actr avgt 10 1.316 ± 0.060 s/op 186 5666
MainBenchmark jdbc avgt 10 1.237 ± 0.050 s/op 147 5625
MainBenchmark demo avgt 10 1.056 ± 0.040 s/op 111 5266
MainBenchmark slim avgt 10 1.003 ± 0.011 s/op 105 5208
MainBenchmark thin avgt 10 0.855 ± 0.028 s/op 60 4892
MainBenchmark lite avgt 10 0.694 ± 0.015 s/op 30 4580
MainBenchmark func avgt 10 0.652 ± 0.017 s/op 25 4378
Legend:
-
Zuul: same as "busr" sample but with Zuul proxy
-
Jpad: same as "demo" sample but with 1 JPA entity (and 0 repositories)
-
Jpae: same as "demo" sample but with 1 JPA entity (and 1 repository)
-
Jpaf: same as "demo" sample but with 2 JPA entities (and 2 repositories)
-
Jpag: same as "demo" sample but with 3 JPA entities (and 3 repositories)
-
Jpaa: same as "actr" sample but with 3 JPA entities (and 3 repositories)
-
Erkb: same as "busr" sample but with Eureka client
-
Busr: same as "conf" but adds Spring Cloud Bus and Rabbit
-
Erko: same as "actr" sample but with Eureka client (disabled)
-
Conf: same as "actr" sample plus config client
-
Actj: same as "actr" sample plus JDBC
-
Actr: same as "demo" sample plus Actuator
-
Jdbc: same as "demo" sample plus JDBC
-
Demo: vanilla Spring Boot MVC app with one endpoint (no Actuator)
-
Slim: same thing but explicitly
@Imports
all configuration -
Thin: reduce the
@Imports
down to a set of 4 that are needed for the endpoint -
Lite: copy the imports from "thin" and make them into hard-coded, unconditional configuration
-
Func: extract the configuration methods from "lite" and register bits of it using the function bean API
N.B. The "thin" sample has @EnableWebMvc
(implicitly), but "lite"
and "func" pulled the relevant features of that out into a separate
class (so a few beans were dropped).
Only 2 samples didn’t fit the trend for classes vs. startup. One starts up slower (Sleuth) and one faster (Erka). The JPA and Zuul samples are outliers for the beans vs. startup correlation, so we include those here again.
Benchmark (sample) Mode Cnt Score Error Units Beans Trend Delta Classes
MainBenchmark slth avgt 10 5.110 ± 0.065 s/op 453 2762 2403 7674
MainBenchmark erka avgt 10 2.183 ± 0.076 s/op 287 2760 -577 7893
MainBenchmark jpad avgt 10 2.739 ± 0.073 s/op 175 1385 1354 7946
MainBenchmark jpae avgt 10 2.829 ± 0.038 s/op 176 1390 1439 8284
MainBenchmark jpaf avgt 10 2.865 ± 0.043 s/op 177 1396 1469 8291
MainBenchmark jpag avgt 10 2.897 ± 0.023 s/op 178 1401 1496 8294
MainBenchmark jpaa avgt 10 3.232 ± 0.057 s/op 257 1814 1418 8297
MainBenchmark zuul avgt 10 4.510 ± 0.095 s/op 516 3080 1430 9596
Legend:
-
Slth: same as "busr" sample but with Sleuth
-
Erka: same as "actr" sample but with Eureka client
The "Trend" number is the best fit prediction of the startup time from the number of classes (or beans for the JPA samples), taken from the non-outlier data. "Delta" is the difference between the actual startup time and the trend value (so it is the extra cost of the features being added).
The "erka" sample started up faster then predicted, but it also has a suspiciously large number of loaded classes (even more classes than with Eureka and Bus). The loaded classes measurements are not stable - you get different answers from run to run - but they don’t usually fluctuate by enough to explain the difference here.
Here’s an explanation for the "slth" result. Spring processes @annotation
matchers in @Pointcuts
extremely inefficiently, so the startup time scales with the number of pointcuts with @annotations
, not so much the number of beans. If the pointcuts are driving it (as suggested by results in these aspectj benchmarks), then the 4 pointcuts with @annotation
matchers would be costing 2403ms or around 600ms each, which is horrendous but consistent with the aspectj benchmarks.
With AspectJ 1.8.13
Benchmark (sample) Mode Cnt Score Error Units Beans Classes
MainBenchmark.main slth ss 10 4.002 ± 0.113 s/op 450 8358
(Makes a huge difference, but still slower than the trend.)
Hibernate fixed startup cost is about 1300ms (the "delta" on "jpad"), which more or less doubles the startup time for a JPA app compared to the vanilla "demo". Spring Data JPA repository creation seems to have a fixed cost of about 90ms, which isn’t nothing but isn’t very large in comparison. Adding repositories and entities might cost something, but it isn’t a lot - the best estimate would be about 30ms per entity from these data (these were very basic, vanilla JpaRepositories
, so maybe it would be more for more complex requirements). The JPA samples (and even Zuul) are a pretty good fit for number of classes loaded versus startup time, so Hibernate isn’t doing a lot of intensive stuff beyond forcing a lot of classes to be loaded.
We can’t easily exclude Jackson from all the sample, but anything that doesn’t use the Actuator can be run with and without to see the difference. Here’s the vanilla "demo" sample
Benchmark (sample) Mode Cnt Score Error Units
SnapBenchmark.snap demo ss 10 1.150 ± 0.076 s/op
and with exclusions.spring-boot-jackson=org.springframework.boot:spring-boot-starter-json
in thin.properties
:
Benchmark (sample) Mode Cnt Score Error Units
SnapBenchmark.snap demo ss 10 1.069 ± 0.036 s/op
So that’s probably worth having.
TL;DR: You need Java 8.
$ ../mvnw clean install
$ java -jar target/benchmarks.jar
and the whole suite takes quite a long time to run, so to just try it out quickly, it’s best to cherry pick some specific samples, e.g.
$ java -jar target/benchmarks.jar main -p sample=actr
There are 4 groups of benchmarks:
-
MainBenchmark
- add features to the "main" demo by manipulating the classpath -
StripBenchmark
- "slim", "thin", "lite", "func" - stripping away from the "main" demo by hardcoding config -
OldBenchmark
- same asMainBenchmarks
but with Spring Boot 1.5.6. -
SnapBenchmark
- same asMainBenchmarks
but with Spring Boot 2.0.0 snapshots (and a restricted set of samples, "empt", "demo", "actr", "jdbc").
The JMH benchmarks are mostly just named after the class (so
StripBenchmarks
are all called "strip") but they have a @Param
called "sample" whose value is the name of the sample. They can be run
individually or as a group using a comma-separated list of sample
names, e.g:
$ java -jar target/benchmarks.jar strip -p sample=func,slim
or altogether as
$ java -jar target/benchmarks.jar strip
Also to get decent results from the erk*
samples you need Eureka running locally on port 8761. You can do that with the Spring Boot CLI (for example):
$ spring install org.springframework.cloud:spring-cloud-cli:1.3.4.RELEASE
$ spring cloud eureka
J9 is the IBM JVM, which they open sourced and is now available also as Eclipse J9. The benchmarks are tuned to use different command line optimizations depending on the JVM in use. Here’s a comparison between the regular OpenJDK Hotspot and the OpenJDK Eclipse J9 (still JDK 1.8) build:
Benchmark (sample) Mode Cnt Score Error Units JVM
MainBenchmark.main demo ss 10 1.171 ± 0.044 s/op 8u152-zulu
MainBenchmark.main demo ss 10 1.015 ± 0.116 s/op 8u152-openj9
Eclipse J9 is about 10% faster than HotSpot, probably owing to the ability to cache class data between runs (which is switched on by default in the benchmarks but not in general).
Java 10 is quite a bit slower than Java 8, but you can get back most or all of the difference by switching on Class Data Sharing (CDS):
Benchmark (sample) Mode Cnt Score Error Units JVM
OldBenchmark.old demo ss 10 1.070 ± 0.031 s/op 8u152-zulu
OldBenchmark.old actr ss 10 1.370 ± 0.042 s/op 8u152-zulu
MainBenchmark.main demo ss 10 1.128 ± 0.044 s/op 8u152-zulu
MainBenchmark.main actr ss 10 1.554 ± 0.068 s/op 8u152-zulu
OldBenchmark.old demo ss 10 1.155 ± 0.035 s/op jdk-10
OldBenchmark.old actr ss 10 1.432 ± 0.043 s/op jdk-10
MainBenchmark.main demo ss 10 1.195 ± 0.061 s/op jdk-10
MainBenchmark.main actr ss 10 1.605 ± 0.060 s/op jdk-10
CdsBenchmark.main demo ss 10 0.912 ± 0.051 s/op jdk-10
CdsBenchmark.main actr ss 10 1.286 ± 0.044 s/op jdk-10
CdsBenchmark.old demo ss 10 0.875 ± 0.050 s/op jdk-10
CdsBenchmark.old actr ss 10 1.134 ± 0.032 s/op jdk-10
Spring Boot 1.5 still wins all the races though (the "old" benchmarks above).
Benchmark (sample) Mode Cnt Score Error Units JVM
MainBenchmark.main demo ss 10 1.171 ± 0.044 s/op 8u152-zulu
MainBenchmark.main demo ss 10 1.015 ± 0.116 s/op 8u152-openj9
MainBenchmark.main demo ss 10 1.253 ± 0.076 s/op OpenJDK10
MainBenchmark.main demo ss 10 1.280 ± 0.066 s/op 9.0.4-zulu
There’s a bean factory post processor in Spring Boot issue 9685 that makes all beans lazy by default. It’s quite interesting to see what happens if we add that to our sample applications:
Benchmark (sample) Mode Cnt Score Error Units Classes
MainBenchmark.main empt ss 10 0.477 ± 0.018 s/op 3233
MainBenchmark.main jlog ss 10 0.811 ± 0.016 s/op 4374
MainBenchmark.main demo ss 10 0.913 ± 0.035 s/op 5463
MainBenchmark.main flux ss 10 0.885 ± 0.030 s/op 5325
MainBenchmark.main actr ss 10 1.241 ± 0.030 s/op 6225
MainBenchmark.main jdbc ss 10 1.001 ± 0.033 s/op 5618
MainBenchmark.main actj ss 10 1.388 ± 0.062 s/op 6432
MainBenchmark.main jpae ss 10 1.994 ± 0.055 s/op 8824
MainBenchmark.main conf ss 10 1.599 ± 0.118 s/op 6711
MainBenchmark.main erka ss 10 1.819 ± 0.045 s/op 6804
MainBenchmark.main busr ss 10 2.431 ± 0.068 s/op 7721
MainBenchmark.main zuul ss 10 3.029 ± 0.086 s/op 8348
MainBenchmark.main erkb ss 10 2.886 ± 0.107 s/op 8083
MainBenchmark.main slth ss 10 3.127 ± 0.041 s/op 8314
c.f. the non-lazy results for the same samples:
Benchmark (sample) Mode Cnt Score Error Units Classes Lazy Premium
MainBenchmark.main empt ss 10 0.578 ± 0.028 s/op 3703 17.47%
MainBenchmark.main jlog ss 10 0.953 ± 0.012 s/op 4647 14.90%
MainBenchmark.main demo ss 10 1.105 ± 0.049 s/op 5808 17.38%
MainBenchmark.main flux ss 10 0.988 ± 0.038 s/op 5726 10.43%
MainBenchmark.main actr ss 10 1.542 ± 0.043 s/op 6692 19.52%
MainBenchmark.main jdbc ss 10 1.281 ± 0.045 s/op 6068 21.86%
MainBenchmark.main actj ss 10 1.819 ± 0.191 s/op 6953 23.69%
MainBenchmark.main jpae ss 10 2.003 ± 0.053 s/op 8824 0.45%
MainBenchmark.main conf ss 10 1.948 ± 0.097 s/op 7216 17.92%
MainBenchmark.main erka ss 10 2.703 ± 0.106 s/op 8909 32.70%
MainBenchmark.main busr ss 10 3.111 ± 0.157 s/op 8282 21.86%
MainBenchmark.main zuul ss 10 3.834 ± 0.086 s/op 9325 21.00%
MainBenchmark.main erkb ss 10 4.026 ± 0.119 s/op 10176 28.32%
MainBenchmark.main slth ss 10 4.066 ± 0.073 s/op 8901 23.09%
and the same thng for the Petclinic:
Benchmark (sample) Mode Cnt Score Error Units Classes Lazy Premium
PetclinicLatestBenchmark.noverify (lazy) avgt 10 3.495 ± 0.059 s/op 9687 25.80%
PetclinicLatestBenchmark.explodedJarMain (lazy) avgt 10 3.023 ± 0.092 s/op 10644 27.40%
PetclinicLatestBenchmark.noverify none avgt 10 4.710 ± 0.053 s/op 11099
PetclinicLatestBenchmark.explodedJarMain none avgt 10 4.164 ± 0.068 s/op 12132
Spring Boot 2.0.0 snapshots (before RC2):
Benchmark (sample) Mode Cnt Score Error Units Beans Classes
StripBenchmark.strip slim ss 10 1.102 ± 0.041 s/op 107 5754
StripBenchmark.strip thin ss 10 0.941 ± 0.034 s/op 62 5444
StripBenchmark.strip lite ss 10 0.767 ± 0.021 s/op 30 5094
StripBenchmark.strip func ss 10 0.718 ± 0.010 s/op 26 5030
Even in the "lite" and "func" samples, where all the beans are hard coded (no scanning, no autoconfig, no condition evaluation), Boot 2.0 loads way more classes.
(Boot 1.5.4 without -noverify
)
sample | configs | beans | startup(millis) |
---|---|---|---|
slth |
176 |
460 |
5366 |
zuul |
181 |
495 |
4336 |
busr |
151 |
389 |
2758 |
erka |
127 |
310 |
2423 |
conf |
100 |
245 |
1779 |
actr |
72 |
183 |
1430 |
demo |
32 |
108 |
1154 |
slim |
31 |
103 |
1112 |
thin |
14 |
60 |
968 |
lite |
4 |
30 |
813 |
func |
1 |
25 |
742 |
(Boot 1.5.6, 2.0.0.M3 and 2.0.0.BUILD-SNAPSHOT)
Benchmark (sample) Mode Cnt Score Error Units Beans Classes
OldBenchmark.old empt avgt 10 0.738 ± 0.031 s/op 23 3031
OldBenchmark.old demo avgt 10 1.623 ± 0.069 s/op 109 4965
OldBenchmark.old actr avgt 10 2.098 ± 0.093 s/op 187 5384
OldBenchmark.old jdbc avgt 10 1.920 ± 0.083 s/op 140 5280
OldBenchmark.old actj avgt 10 2.417 ± 0.123 s/op 222 5715
OldBenchmark.old jpae avgt 10 2.536 ± 0.124 s/op 165 6841
OldBenchmark.old conf avgt 10 2.639 ± 0.146 s/op 251 5906
OldBenchmark.old erka avgt 10 2.960 ± 0.101 s/op 294 6077
OldBenchmark.old busr avgt 10 3.555 ± 0.125 s/op 370 6443
OldBenchmark.old zuul avgt 10 4.736 ± 0.507 s/op 433 6922
OldBenchmark.old erkb avgt 10 4.519 ± 0.365 s/op 434 6889
OldBenchmark.old slth avgt 10 7.331 ± 0.186 s/op 444 7058
MainBenchmark.main empt avgt 10 0.848 ± 0.059 s/op 22 3271
MainBenchmark.main demo avgt 10 1.773 ± 0.074 s/op 112 5360
MainBenchmark.main actr avgt 10 2.204 ± 0.121 s/op 187 5756
MainBenchmark.main jdbc avgt 10 2.081 ± 0.082 s/op 147 5625
MainBenchmark.main actj avgt 10 2.508 ± 0.091 s/op 226 6033
MainBenchmark.main jpae avgt 10 2.807 ± 0.100 s/op 176 8284
MainBenchmark.main conf avgt 10 2.781 ± 0.159 s/op 350 6232
MainBenchmark.main erka avgt 10 3.311 ± 0.407 s/op 294 6491
MainBenchmark.main busr avgt 10 3.777 ± 0.102 s/op 392 7129
MainBenchmark.main zuul avgt 10 4.758 ± 0.113 s/op 516 9656
MainBenchmark.main erkb avgt 10 4.773 ± 0.105 s/op 494 7654
MainBenchmark.main slth avgt 10 7.926 ± 0.197 s/op 453 7674
StripBenchmark.strip func avgt 10 1.112 ± 0.032 s/op 25 4378
StripBenchmark.strip lite avgt 10 1.205 ± 0.076 s/op 30 4580
StripBenchmark.strip slim avgt 10 1.743 ± 0.099 s/op 105 5208
StripBenchmark.strip thin avgt 10 1.501 ± 0.071 s/op 60 4892
SnapBenchmark.endp N/A avgt 10 2.515 ± 0.509 s/op 199 5838
SnapBenchmark.snap empt avgt 10 0.969 ± 0.123 s/op 22 3269
SnapBenchmark.snap demo avgt 10 1.880 ± 0.205 s/op 112 5356
SnapBenchmark.snap actr avgt 10 2.296 ± 0.101 s/op 198 5833
SnapBenchmark.snap jdbc avgt 10 2.136 ± 0.117 s/op 148 5716