Skip to content

Commit

Permalink
update README.md to hint to new branch
Browse files Browse the repository at this point in the history
  • Loading branch information
cookieID committed Oct 10, 2024
1 parent 1083545 commit 9328483
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 43 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
# ArDoCo REST
Details concerning the architectural decisions as well as the API responses can be found [here](architecture_decisions.md).
Details concerning the architectural decisions as well as the API responses can be found [here](architecture_decisions.md).

emark: There also exists another branch (`extra-result-controller`) which introduces a new controller to handle the retrieval of the TraceLinks separate
from starting the pipeline. The branch also contains a ``.md`` file explaining the approach further.
87 changes: 45 additions & 42 deletions architecture_decisions.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,75 +13,78 @@ Redis insight: http://localhost:5540/

This is the outermost layer of the REST API and is responsible for the HTTP input and forwarding it to the service layer.\
Each ardoco-runner is represented by one controller: sad-sam, sad-code, sam-code, sad-sam-code\
Each controller has 4 Endpoints:
Each controller has 4 Endpoints:
- runPipeLine: starts the Ardoco Pipeline for the runner and returns a unique id which can be used to retreive the result
- runPipelineAndWait: starts the Ardoco Pipeline for the runner and waits up to 60 seconds for the result, otherwise it
simply returns the id with which the result can be queried
- runPipelineAndWait: starts the Ardoco Pipeline for the runner and waits up to 60 seconds for the result, otherwise it
simply returns the id with which the result can be queried
- getResult: returns the result for the given id, if it already exisits
- waitForResult: waits up to 60 seconds for the result of the given id

The endpoints that start the pipeline each proceed in a similar way:
1. convert the inputMultipartFiles into Files
2. generate a unique id from the input files, the projectName and the runner/tracelink-type (sad-sam,sad-code,...) to
later identify the result using an md5 hash
2. generate a unique id from the input files, the projectName and the runner/tracelink-type (sad-sam,sad-code,...) to
later identify the result using an md5 hash
3. set up the runner using the input Files
4. forward the runner to the service layer, to run the pipeline asynchronously in case no result is in the database yet,
or simply get the result from the database
4. forward the runner to the service layer, to run the pipeline asynchronously in case no result is in the database yet,
or simply get the result from the database
5. case runPipeline: return the unique id
6. case runPipelineAndWait: return unique id and result if present after 60 seconds

## Notes on the Architecture
Handling the result from calling the service is similar for all controllers. This similar behaviour can be found in the AbstractController class.\
Moreover are the endpoints to retrieve the results the same for each controller (meaning, you can also use the getResult-Endpont from sad-sam to query the result from a sam-code pipeline).
Currently, all controllers have the 2 methods so they can function on its own, but in order to minimize code duplicates and to make the API more intuitive
it might make more sense to put them in a separate controller, whose only task it will be to retreive the result.
Moreover are the endpoints to retrieve the results the same for each controller (meaning, you can also use the
getResult-endpoint from sad-sam to query the result from a sam-code pipeline).

Currently, all controllers have the 2 methods so they can function on their own, but in order to minimize code duplicates and to make the API more intuitive
it might make more sense to put them in a separate controller, whose only task it will be to retrieve the result.
This idea is implemented in the ``extra-result-controller branch``.

### Remarks
- So far, the API doesn't allow users to define additional Configs (in the Controller classes)
This is because at the time of implementation, these configs (which can be used to define the pipeline in the
ArDoCoForSadCodeTraceabilityLinkRecovery) are not used.
They can be added later as param in the methods of the controller.
This is because at the time of implementation, these configs (which can be used to define the pipeline in the
ArDoCoForSadCodeTraceabilityLinkRecovery) are not used.
They can be added later as param in the methods of the controller.

### Accepted file types:
- So far no file checks have been implemented and is left to ardoco itself. It is only checked whether the
file is empty or not.
file is empty or not.

## Service
### Purpose:
This layer is responsible for processing the input and making the needed calls to ArDoCo to run the pipeline in
This layer is responsible for processing the input and making the needed calls to ArDoCo to run the pipeline in
order to retrieve a result.

### Architectural Remarks
The Controllers already set up the runner. The controllers then feed the runner to the runPipeline() and
The Controllers already set up the runner. The controllers then feed the runner to the runPipeline() and
runPipelineAndWaitForResult() Methods. This has the advantage that runPipeline() and runPipelineAndWaitForResult() have the same
signature which maximizes code reusability without adding too much complexity. Moreover there is a unified(?)interface
for the Services which the Controllers can use. This works, since the runner in ardoco are part of an inheritance hierarchy
in ardoco. However, the current method has the disadvantage that ArdoCo is already invoked in the controller and not only
in the service layer and that the runner is always set up for the runPipeline-methods ignoring whether the result already
signature which maximizes code reusability without adding too much complexity. Moreover, there is a unified(?)interface in form of an abstract class
for the Services which the Controllers can use. This works, since the runner in ardoco are part of an inheritance hierarchy
as well. However, the current method has the disadvantage that ArdoCo is already invoked in the controller and not only
in the service layer and that the runner is always set up for the runPipeline-methods ignoring whether the result already
in the database or not.\
Another option would be to only do invoke Ardoco in the service layer. This means that setting up the runner would be
needed to do there as well. But since the setup-methods of the runners each require different parameters, there can't be
Another option would be to only invoke ArDoCo in the service layer. This means that setting up the runner would be
needed to do there as well. But since the setup-methods of the runners each require different parameters, there can't be
a unified interface containing the startPipeline() methods without introducing a lot of complexity through generics.

### Remarks

- The ids of the ongoing asynchronous calls are stored in a concurrentHashmap. This has the advantage that
when a user calls getResult to potentially receive the result, it can first be checked in the concurrentHashmap whether
the asynchronous call of ardoco has finished yet instead of unnecessarily doing a database call.
Additionally storing the Completable Futures in the hashmap allows wait for ardoco without constantly
querying the database for a result.
when a user calls getResult to potentially receive the result, it can first be checked in the concurrentHashmap whether
the asynchronous call of ardoco has finished yet instead of unnecessarily doing a database call.
Additionally storing the Completable Futures in the hashmap allows to wait for the ArDoCoResult without constantly
querying the database for a result.

## Remarks to Interacting with Ardoco
## Remarks to Interacting with ArDoCo

- The output directory, which is required by ardoco when running any pipeline, is internally set to a temporary directory
and is not made available to the outside, since the result will be returned in form of a response entity

- only the direct interaction with ardoco is asynchronous. Handling the input file (including conversion and
checking whether its file type is correct) is done before, since like this the user can get quicker feedback that
sth. went wrong.
something went wrong.

## Hashing (Generating the ProjectID)
Only the files, the projectName and the controller/tracelink-type are used to create the hash, the configs not, meaning
Only the files, the projectName and the controller/traceLink-type are used to create the hash, the configs not, meaning
that in case only the configs change, the same hash is generated. In the future, the configs might need to be hashed as well.
A md5 hash is used to ensure to get a hash space great enough to ensure that the probability of collisions is almost 0.

Expand All @@ -91,28 +94,28 @@ large enough this should work fine since there are few enough entries being stor
## Database (Repository Layer)
The no-sql database Redis is used. The results of the querying ardoco are stored like a in a giant hash table.
This means, that everything is stored as key-result(in JSON format). The key is identically with the hash used
to check whether the result has been calculated before to avoid calculating it again. All entries have a
Time To Live of 24h, so that the database never gets to large because of stored results which are not needed anymore
to check whether the result has been calculated before to avoid calculating it again. All entries have a
Time To Live of 24h, so that the database never gets to large because of stored results which are not needed anymore
(because the client's request has been too long ago).

To be able to change the database used smoothly, repositories implement a DatabaseAccessor Interface, which is used
by the classes which use the database (e.g the Services).
To be able to change the database used smoothly, repositories implement a DatabaseAccessor Interface, which is used
by the classes which use the database (e.g the Services).

## Converting the Tracelinks to JSON
The found tracelinks are converted into a raw JSON-String directly after the pipline has finished and stored in the database as raw JSON,
The found traceLinks are converted into a raw JSON-String directly after the pipeline has finished and stored in the database as raw JSON,
to avoid having to convert the result multiple times in case a user queries the ready result multiple times.
The TraceLinkConverter-Class provides functionality to convert different types of tracelink into JSON.
The TraceLinkConverter-Class provides functionality to convert different types of traceLink into JSON.

## Exception Handeling
## Exception Handling
Exceptions are centrally handled by the GlobalExceptionHandler which produces a such an ErrorResponse for
the user in case an exception is thrown which is not caught elsewhere. This central handling of exceptions standardizes
the way how the system deals with errors.

## API Response schemas
The API has 2 response schemas:
The API has 2 response schemas:
1. **Schema for expected behaviour** \
- #### Sad-Code
Example:
Example:
```json
{
"requestId": "SadCodeResult:bigBlueButtonF2BD94533508F2F2DE4130AB43403B63",
Expand All @@ -131,7 +134,7 @@ The API has 2 response schemas:
}
```
- #### Sam-Code
Example:
Example:
```json
{
"requestId": "SamCodeResult:bigBlueButton2B867FE03AF1FE8DE3C1DEE7F1D9CB4E",
Expand All @@ -154,7 +157,7 @@ The API has 2 response schemas:
}
```
- #### Sad-Sam
Example:
Example:
```json
{
"requestId": "SadSamResult:bigBlueButton6AA76050BA630F6D8A6E099A30D1053C",
Expand All @@ -175,7 +178,7 @@ The API has 2 response schemas:
}
```
- #### Sad-Sam-Code
Example:
Example:
```json
{
"requestId": "SadSamCodeResult:bigBlueButton8E0E764E3B368781CF0DDDC67F19ABC0",
Expand All @@ -196,7 +199,7 @@ The API has 2 response schemas:
Note: Depending on the invoked endpoint and on the concrete result, some parameters (esp traceLinks) might be null

2. **Schema for when an error occurred**
Example:
Example:
```json
{
"timestamp": "09-10-2024 12:58:42",
Expand Down

0 comments on commit 9328483

Please sign in to comment.