- Coding challenge 5 - Load balancer
- Technologies and Libraries
- Install and run
- Steps by challenge
- Final remark
from https://codingchallenges.fyi/challenges/challenge-load-balancer/ by https://twitter.com/johncrickett
Since a few years I am in C# but I wanted to try out java for this challenge.
package manager: gradle
I am using following libraries as support
- command line tool:
- info.picocli:picocli
- https://picocli.info/
- webserver:
- org.eclipse.jetty:jetty-server
- https://www.eclipse.org/jetty/
- http client:
- com.squareup.okhttp3
- https://square.github.io/okhttp/
shell scripts to start backends and load balancer.
I have also implemented a chaos engineering to kill some backends and start them again
I use gradle version 8.1 and this is related to java 20 using OpenJDK
- checkout comp matrix https://docs.gradle.org/current/userguide/compatibility.html
you need to start in theh no-5 folder (not in the parent folder)
./gradlew clean installDist
check
Open 3 terminals and start in each
./be.sh 9000
INFO: Start Backend on port 9000
[main] INFO org.eclipse.jetty.server.Server - jetty-11.0.15; built: 2023-04-11T18:37:53.775Z; git: 5bc5e562c8d05c5862505aebe5cf83a61bdbcb96; jvm 20.0.2+9-78
...
./be.sh 9001
INFO: Start Backend on port 9001
[main] INFO org.eclipse.jetty.server.Server - jetty-11.0.15; built: 2023-04-11T18:37:53.775Z; git: 5bc5e562c8d05c5862505aebe5cf83a61bdbcb96; jvm 20.0.2+9-78
...
./be.sh 9002
INFO: Start Backend on port 9002
[main] INFO org.eclipse.jetty.server.Server - jetty-11.0.15; built: 2023-04-11T18:37:53.775Z; git: 5bc5e562c8d05c5862505aebe5cf83a61bdbcb96; jvm 20.0.2+9-78
...
Open another terminal and start the load balancer
notice: lb.sh automatically connects with 3 be on port 9000 9001 9002 as started above
./lb.sh
INFO: Start Loadbalancer on port 8080
[main] INFO org.eclipse.jetty.server.Server - jetty-11.0.15; built: 2023-04-11T18:37:53.775Z; git: 5bc5e562c8d05c5862505aebe5cf83a61bdbcb96; jvm 20.0.2+9-78
[main] INFO org.eclipse.jetty.server.session.DefaultSessionIdManager - Session workerName=node0
[main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@4f638935{/,null,AVAILABLE}
[main] INFO org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@cd2dae5{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
[main] INFO org.eclipse.jetty.server.Server - Started Server@40f08448{STARTING}[11.0.15,sto=0] @625ms
Nov 26, 2023 9:52:36 AM lb.strategies.LoadBalancerStrategy healthy
INFO: + Healthy backend: http://localhost:9000
Nov 26, 2023 9:52:36 AM lb.strategies.LoadBalancerStrategy healthy
INFO: + Healthy backend: http://localhost:9001
Nov 26, 2023 9:52:36 AM lb.strategies.LoadBalancerStrategy healthy
INFO: + Healthy backend: http://localhost:9002
Open another terminal and start test.sh
./test.sh
from http://localhost:9002 // Replied with a hello message
from http://localhost:9001 // Replied with a hello message
from http://localhost:9000 // Replied with a hello message
from http://localhost:9002 // Replied with a hello message
from http://localhost:9001 // Replied with a hello message
This will start 20 bakcend servers and the load balancer and writes into own log files under ./log
./be-servers.sh
all killed
start all backends
start load balancer
it starts all in the background.
run now test.sh
./test.sh
from http://localhost:9000 // Replied with a hello message
from http://localhost:9002 // Replied with a hello message
from http://localhost:9006 // Replied with a hello message
from http://localhost:9009 // Replied with a hello message
from http://localhost:9001 // Replied with a hello message
start backend and test.sh as above
./be-servers.sh
all killed
start all backends
start load balancer
./test.sh
from http://localhost:9000 // Replied with a hello message
then use chaos.sh to kill some processes and watch the test.sh for the reduced list. e.g with 19 to see that only 1 backend serves for about 10s and then back to 20
./chaos.sh 19
kill now 63168
kill now 63177
kill now 63164
kill now 63179
kill now 63174
kill now 63173
kill now 63166
kill now 63170
kill now 62545
kill now 63176
kill now 63171
kill now 63172
kill now 63175
kill now 63182
kill now 63180
kill now 63181
kill now 63167
kill now 63165
kill now 63169
wait 10s to see test.sh spread across less servers
start the missing ones again
start all backends
in test.sh you can see e.g. http://localhost:9000 was still running at the end. Later all of them had been started
.....
from http://localhost:9000 // Replied with a hello message
from http://localhost:9000 // Replied with a hello message
from http://localhost:9000 // Replied with a hello message
from http://localhost:9000 // Replied with a hello message
from http://localhost:9008 // Replied with a hello message
from http://localhost:9009 // Replied with a hello message
from http://localhost:9005 // Replied with a hello message
to stop all
./be-servers.sh -kill
you can also check the load balancer log file to check the logic.
Nov 26, 2023 10:40:53 AM lb.strategies.LoadBalancerStrategy unhealthy
I am using Visual Code with some plugins for java, gradle, linting, testing I am using github as repo
I have choosen jetty because it is simple, small and easy to create 1 or 2 APIs. enough for this challenge. with the use of picocli it is very simple to implement a little command line tool. I decided to use the same main class for all backends and loadbalancer
starting a backend on port 9000
app -b -p=9000
starting a load balancer on port 8080
app -p=8080
later I have refactored it to use additional parameters for a backend list
starting a load balancer with a backend list
app -p=8080 -blist="http://localhost:9000,http://localhost:9000"
design used 2 Main classes SimpleBackend and SimpleLoadBalancer instantiated via delegation from the main class.
protected SimpleBackend(int port)
protected SimpleLoadBalancer(int port)
later I have introduced an additional parameter for the Loadbalancer strategy - defaulted to RoundRobin
protected SimpleLoadBalancer(int port, LoadBalancerStrategy lbStrategy)
to server many requests in parallel I have already added a server with thread pool - here with 20
var server = new Server(new QueuedThreadPool(20));
For the implementation of the load balancer I am using OkHttp3 library. I tried many others but they had been interfering with jetty webserver.
here is the refactored method to execute a request to a backend server url
private Response executeRequest(String beUrl, HttpServletRequest request) throws IOException {
var httpClient = new OkHttpClient();
var reqBuilder = new Request.Builder()
.url(beUrl + request.getRequestURI());
var reqHeaders = request.getHeaderNames();
while (reqHeaders.hasMoreElements()) {
var reqHeaderKey = reqHeaders.nextElement();
var reqHeaderValue = request.getHeader(reqHeaderKey);
reqBuilder.header(reqHeaderKey, reqHeaderValue);
}
var beRequest = reqBuilder.build();
return httpClient.newCall(beRequest).execute();
}
I also dynamically copied all the headers from the request and sent it to the backend. For simplicity I implemented a simple get and no body is taken over. This could be done on the beRequest builder.
var beRequest = reqBuilder.build();
as described there are many Load balancer strategies. Here I am implementing only RoundRobin but used a Superclass to be able to implement later somemore else.
as described in Step 1 I refactored now the main classes to pass multiple servers. To be able to simulate a dynamic list of backends, I implemented some bash shell scripts to launch and kill servers.
start an own backend server with a port. here you can see the logs in the console
./be.sh 9000
INFO: Start Backend on port 9000
[main] INFO org.eclipse.jetty.server.Server - jetty-11.0.15; built: 2023-04-11T18:37:53.775Z; git: 5bc5e562c8d05c5862505aebe5cf83a61bdbcb96; jvm 11.0.9+7-LTS
[main] INFO org.eclipse.jetty.server.session.DefaultSessionIdManager - Session workerName=node0
[main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@793f29ff{/,null,AVAILABLE}
[main] INFO org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@4facf68f{HTTP/1.1, (http/1.1)}{0.0.0.0:9000}
[main] INFO org.eclipse.jetty.server.Server - Started Server@3c0be339{STARTING}[11.0.15,sto=0] @524ms
start an the load balancer with an example of 2 backends. Here you can see the logs in the console
./lb.sh
May 06, 2023 12:02:17 AM lb.SimpleLoadBalancer <init>
INFO: Start Loadbalancer on port 8080
[main] INFO org.eclipse.jetty.server.Server - jetty-11.0.15; built: 2023-04-11T18:37:53.775Z; git: 5bc5e562c8d05c5862505aebe5cf83a61bdbcb96; jvm 11.0.9+7-LTS
[main] INFO org.eclipse.jetty.server.session.DefaultSessionIdManager - Session workerName=node0
[main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@32502377{/,null,AVAILABLE}
[main] INFO org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@50eac852{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
[main] INFO org.eclipse.jetty.server.Server - Started Server@548e6d58{STARTING}[11.0.15,sto=0] @541ms
I have implemented a dynamic script to start backend and load balancer and also kill the processes before to have a fast restart.
here you can start all and logs are written to ./logs/ folder for tailing
./be-servers.sh
all killed
start all backends
start load balancer
you can adapt the number of backends available in the script
N=20
additionally you can also kill all or just start all
just kill all
./be-servers.sh -kill
all killed
just start all backends, no load balancer and no kill
./be-servers.sh -nokill
start all backends
a simple while loop is firing at a high speed in parallel and with a sleep of 0.5. this can be adapted in the script
./test.sh
from http://localhost:9007 // Replied with a hello message
from http://localhost:9004 // Replied with a hello message
from http://localhost:9003 // Replied with a hello message
from http://localhost:9001 // Replied with a hello message
from http://localhost:9005 // Replied with a hello message
THe load balancer is returning back the backend and its message for testing. For sure this is not a real load balancer, but good for debugging and showcasing.
Now it gets interesting and funny. I wanted to see that I dont loose any message to the load balancer even if lots of backend servers are killed suddenly
some additional remarks
- a health check is running every 2s for all backends
- a simple retry mechanism if the backend was chosen by RoundRobin but suddenly killed
- if a retry fails the backend server is move to unhealthy
you can now start killing N backend servers
./chaos.sh 15
kill now 33283
kill now 33285
kill now 33276
...
kill now 33272
start all backends
this will kill randomly 15 backends, sleep 2s in between and then call the be-servers.sh -nokill to start the again.
now you should see if you loose one of your client requests. I hope not.
I use very restricted logging to not flood the logs. Most importent events are written out
every backend and the load balance has own logs
-rw-r--r-- 1 Lolo staff 14780 May 6 00:18 be-9000.log
-rw-r--r-- 1 Lolo staff 36722 May 6 00:18 be-9001.log
-rw-r--r-- 1 Lolo staff 36601 May 6 00:18 be-9002.log
-rw-r--r-- 1 Lolo staff 36722 May 6 00:18 be-9003.log
...
-rw-r--r-- 1 Lolo staff 13929 May 6 00:18 be-9017.log
-rw-r--r-- 1 Lolo staff 13929 May 6 00:18 be-9018.log
-rw-r--r-- 1 Lolo staff 36601 May 6 00:18 be-9019.log
-rw-r--r-- 1 Lolo staff 23857 May 6 00:18 lb.log
- if a backend server is added based on the Healthcheck
- if a backend server is removed based on the negative Healthcheck or after a retry
- successful or failure access log for every request to the load balancer
if you run the chaos.sh script
./chaos.sh 5
kill now 33839
kill now 33847
kill now 33836
kill now 33850
kill now 33281
start all backends
you can see the following events
tail -f lb.log
May 06, 2023 12:18:19 AM lb.SimpleLoadBalancer log
INFO: http://localhost:9019 [0:0:0:0:0:0:0:1] GET /hello HTTP/1.1 User-Agent: Wget/1.20.3 (darwin20.1.0)
May 06, 2023 12:20:15 AM lb.SimpleLoadBalancer log
INFO: http://localhost:9011 [0:0:0:0:0:0:0:1] GET /hello HTTP/1.1 User-Agent: Wget/1.20.3 (darwin20.1.0)
May 06, 2023 12:20:15 AM lb.SimpleLoadBalancer log
INFO: http://localhost:9005 [0:0:0:0:0:0:0:1] GET /hello HTTP/1.1 User-Agent: Wget/1.20.3 (darwin20.1.0)
May 06, 2023 12:20:16 AM lb.SimpleLoadBalancer log
...
May 06, 2023 12:20:02 AM lb.strategies.LoadBalancerStrategy unhealthy
INFO: - Unhealthy backend: http://localhost:9007
May 06, 2023 12:20:02 AM lb.SimpleLoadBalancer log
INFO: http://localhost:9015 [0:0:0:0:0:0:0:1] GET /hello HTTP/1.1 User-Agent: Wget/1.20.3 (darwin20.1.0)
May 06, 2023 12:20:03 AM lb.strategies.LoadBalancerStrategy unhealthy
INFO: - Unhealthy backend: http://localhost:9015
...
May 06, 2023 12:20:16 AM lb.strategies.LoadBalancerStrategy healthy
INFO: + Healthy backend: http://localhost:9007
May 06, 2023 12:20:16 AM lb.strategies.LoadBalancerStrategy healthy
INFO: + Healthy backend: http://localhost:9004
May 06, 2023 12:20:16 AM lb.strategies.LoadBalancerStrategy healthy
INFO: + Healthy backend: http://localhost:9013
details - remove unhealthy
May 06, 2023 12:20:03 AM lb.strategies.LoadBalancerStrategy unhealthy
INFO: - Unhealthy backend: http://localhost:9015
details - add healthy back
May 06, 2023 12:20:16 AM lb.strategies.LoadBalancerStrategy healthy
INFO: + Healthy backend: http://localhost:9007
details - normal access log
May 06, 2023 12:18:19 AM lb.SimpleLoadBalancer log
INFO: http://localhost:9019 [0:0:0:0:0:0:0:1] GET /hello HTTP/1.1 User-Agent: Wget/1.20.3 (darwin20.1.0)
all the details used to validate and test RoundRobin are already documented in Step 2
important of a healthcheck is
- should be executed very fast
- should not get blocked or timeout to just say 'it not running'
- fail if layer 4 is not working and then run the /health api
- implement client side with connection and read timeouts
I added a TCP socket verification
private boolean isTcpHealthy(String urlString) {
int timeout = 1000; // 1 seconds
try (var socket = new Socket()) {
var url = new URL(urlString);
socket.connect(new InetSocketAddress(url.getHost(), url.getPort()), timeout);
return true;
} catch (IOException e) {
return false;
}
}
here the full health check. With 1s connection timeout, 3s read timeout and TCP then API only healthy if status = 200 and starts with "healthy". Don't do .equals(__) because there is very often a \n char at the end of the body.
private boolean isHealthy(String url) {
if (!this.isTcpHealthy(url)) {
return false;
}
try {
var httpClient = new OkHttpClient.Builder()
.connectTimeout(1, java.util.concurrent.TimeUnit.SECONDS)
.readTimeout(3, java.util.concurrent.TimeUnit.SECONDS)
.build();
var request = new Request.Builder()
.url(url + "/health").get().build();
var response = httpClient.newCall(request).execute();
int statusCode = response.code();
var responseBody = response.body().string();
return statusCode == 200 && responseBody.startsWith("healthy");
} catch (Exception e) {
return false;
}
}
First attempt: keep the index
- keep an index of all servers, increment and % every time.
- BAD:
- not all are healthy
- index can be very wrong if low index backends are dying
Second: keep the url and dynamically order servers
- if server are unhealthy they are removed from the list
- if ok they come back - at the end of the list
- a linear search by url is needed and move to next one % size
Third: retry if failure during execution
- try twice to send request - if IOException then the server is not responding.
- if it would be a http-status then server is running but error in processing
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
var retryCount = 2;
while (retryCount > 0) {
var be = loadbalancer.lbStrategy.getNext();
if (be.isEmpty()) {
throw new ServletException("No backend found");
}
var beUrl = be.get();
try {
var beResponse = this.executeRequest(beUrl, request);
fetchResponseAndSendBack(request, response, beUrl, beResponse);
return;
} catch (IOException e) {
loadbalancer.lbStrategy.unhealthy(beUrl);
retryCount--;
}
}
}
Thanks https://twitter.com/johncrickett for the great challenge
I fully recommend this challenges for all Software Engineers who would like get better https://codingchallenges.fyi/challenges/challenge-load-balancer/