-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
398 additions
and
363 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Golem pipeline | ||
|
||
Golem pipelines is library which allows to spawn golem tasks in declarative form, this approach unlocks some sexy features. | ||
|
||
## Features | ||
- **declarative** Declarative code should be easier to read/maintain/extend especially for highly parallelized tasks | ||
- **composability** Cause of pipeline/stage/executable abstraction it is easy to build reusable tasks. This could intresting direction of development especially in web. | ||
- **pipelining**  It means that `task_3` will start as soon as `task_0` finishes. We don't have to wait for `task_1`. The dependency is defined by `output <> input` variables between two or more tasks. | ||
- **checkpointing** Every task result after successful execution is persisted. Thus we don't loose progress on interrupted pipelines. It also allows for iterative building new stages, without re-running already working stages over and over again. | ||
- `FsCheckPointer` - persists checkpoints to file system | ||
- `S3CheckPointer` - (TBD) interface can be easily implemented for S3 which would be good solution for browser execution | ||
- **Input** Input abstraction allows to choose from different input resources | ||
- `FileInput` - upload files to Provider, and pass path as args to command line | ||
- `ArgInput` - command line arguments | ||
- `S3Input` - (TBD) interface can be easily implemented for S3 which would be good solution for browser execution | ||
- **Executable** Different modes of execution tasks, reduce boilerplate code | ||
- `ExecutableToStdout` - executes program and treats `stdout` as a result | ||
- `ExecutableToFiles`- allows producing multiple results (files) from single input | ||
|
||
|
||
## Future improvements | ||
- **visualization** - Such organized pipelines could be easly visualized using [graphs](https://www.npmjs.com/package/react-json-graph). Also bills etc. could be visualized in real time, by implementing event bus. I didn't make it in time :( | ||
- **s3 interfaces** - I didn't make it in time :( | ||
|
||
## Feedback for Golem team | ||
- Really great experience. Most of the hhings were working out of the box. Interface (js-sdk) is very intuitive. Generally it was pleasure to this library. Good job! | ||
- What could be better | ||
- More `stream API` both for uploading data to provider and for fetching. It would avoid copying data, and allow processing bigger files. (I had problem with `ctx.spawn` api I didnt managed to make it work) | ||
- I was getting error when tried to upload/download file concurrently | ||
- Global event bus for better observability | ||
|
||
Checkout out example of using this library: [example](./example/README.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Example | ||
|
||
1. In this example we have two files which contains William Shakespeare texts. | ||
2. We first partition the sentences in this texts by given list of 8 words [ "I", "you", "We", "god", "devil", "mother", "father"]. We use for this [partition_by.js](../golem_tasks/partition_by.js). (2 parallel tasks) | ||
3. As the result we got 2x8 outputs. For every output we create task with [go program](../golem_tasks//go//main.go) which in parallel score sentiment of every sentence. Score 1 => positive 0=> negative | ||
4. The last stage is responsible for aggregating sentiments and generating report concluding that "the most positive word" in William Shakespeare poetry is "father" and the least is "god". | ||
|
||
|
||
I understand that Golem is better suited for tasks that demand significant CPU and RAM resources rather than those involving extensive data transmission. Unfortunately, due to time constraints, I was unable to develop a more suitable example. The primary purpose of this example is to showcase the library's capabilities |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.