Skip to content

An open source framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud.

License

Notifications You must be signed in to change notification settings

classicvalues/lithops

 
 

Repository files navigation

Lithops

Lithops is a Python multi-cloud distributed computing framework. It allows to run unmodified local python code at massive scale in the main serverless computing platforms. Lithops delivers the user’s code into the cloud without requiring knowledge of how it is deployed and run. Moreover, its multicloud-agnostic architecture ensures portability across cloud providers, overcoming vendor lock-in.

Lithops is specially suited for highly-parallel programs with little or no need for communication between processes, but it also supports parallel applications that need to share state among processes. Examples of applications that run with Lithops include Monte Carlo simulations, deep learning and machine learning processes, metabolomics computations, and geospatial analytics, to name a few.

Installation

  1. Install Lithops from the PyPi repository:

    $ pip install lithops
  2. Execute a Hello World test function:

    $ lithops test

Configuration

Lithops provides an extensible backend architecture (compute, storage) that is designed to work with different Cloud providers and on-premise backends. In this sense, you can code in python and run it unmodified in IBM Cloud, AWS, Azure, Google Cloud and Alibaba Aliyun. Moreover, it provides support for running jobs on vanilla kubernetes, or by using a kubernetes serverless framework like Knative or OpenWhisk.

Follow these instructions to configure your compute and storage backends

Multicloud Lithops

High-level API

Lithops is shipped with 2 different high-level Compute APIs, and 2 high-level Storage APIs

Futures API

Multiprocessing API

from lithops import FunctionExecutor

def hello(name):
    return 'Hello {}!'.format(name)

with FunctionExecutor() as fexec:
    fut = fexec.call_async(hello, 'World')
    print(fut.result())
from lithops.multiprocessing import Pool

def double(i):
    return i * 2

with Pool() as pool:
    result = pool.map(double, [1, 2, 3, 4, 5])
    print(result)

Storage API

Storage OS API

from lithops import Storage

if __name__ == "__main__":
    st = Storage()
    st.put_object(bucket='mybucket',
                  key='test.txt',
                  body='Hello World')

    print(st.get_object(bucket='mybucket',
                        key='test.txt'))
from lithops.storage.cloud_proxy import os

if __name__ == "__main__":
    filepath = 'bar/foo.txt'
    with os.open(filepath, 'w') as f:
        f.write('Hello world!')

    dirname = os.path.dirname(filepath)
    print(os.listdir(dirname))
    os.remove(filepath)

You can find more usage examples in the examples folder.

Execution Modes

Lithops is shipped with 3 different modes of execution. The execution mode allows you to decide where and how the functions are executed.

Localhost Mode

Serverless Mode

Standalone Mode

This mode allows to run functions in your local machine, by using processes. This is the default mode of execution if no configuration is provided.

This mode allows to run functions by using publicly accessible Serverless compute services, such as IBM Cloud Functions, Amazon Lambda or Google Cloud Functions, among others. In this mode of execution, each function invocation equals to a parallel task running in the cloud in an isolated environment.

This mode allows to run functions by using one or multiple Virtual machines (VM), either in a private cluster or in the cloud. In each VM, functions run using parallel processes like in the Localhost mode.

Documentation

For documentation on using Lithops, see the User guide.

If you are interested in contributing, see CONTRIBUTING.md and DEVELOPMENT.md.

Additional resources

Blogs and Talks

Papers

Acknowledgements

image

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825184.

About

An open source framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.9%
  • Other 1.1%