HPT

Hpt is a high performance N-dimensional array library. It is being highly optimized and is designed to be easy to use. Most of the operators are implemented based on Onnx operator list. Hence, you can use it to build most of the deep learning models.

Features

Memory Layout

Optimized memory layout with support for both contiguous and not contiguous tensors.

SIMD Support

Leverages CPU SIMD instructions (SSE/AVX/NEON) for vectorized operations.

Iterator API

Flexible iterator API for efficient element-wise/broadcast operations and custom implementations.

Multi-Threading

Auto efficient parallel processing for CPU-intensive operations.

Broadcasting

Automatic shape broadcasting for element-wise operations, similar to NumPy.

Auto Type Promote

Allows auto type promote when compute with different types.

Customizable

Allows user to define their own data type for calculation (CPU support only) and allocator for memory allocation (All Backends).

Note

Hpt is in early stage, bugs and wrong calculation results may happen, API may change.

Cargo Features

cuda: enable cuda support.
bound_check: enable bound check, this is experimental and will reduce performance.
normal_promote: auto type promote. There may be more type promote feature in the future.

Get Start

use hpt::Tensor;
use hpt::ops::FloatUnaryOps;
fn main() -> anyhow::Result<()> {
    let x = Tensor::new(&[1f64, 2., 3.]);
    let y = Tensor::new(&[4i64, 5, 6]);

    let result: Tensor<f64> = x + &y; // with `normal_promote` feature enabled, i64 + f64 will output f64
    println!("{}", result); // [5. 7. 9.]

    // All the available methods are listed in https://jianqoq.github.io/Hpt/user_guide/user_guide.html
    let result: Tensor<f64> = y.sin()?;
    println!("{}", result); // [-0.7568 -0.9589 -0.2794]
    Ok(())
}

To use Cuda, enable feature cuda (Note that Cuda is in development and not tested)

use hpt::{Tensor, backend::Cuda};
use hpt::ops::FloatUnaryOps;

fn main() -> anyhow::Result<()> {
    let x = Tensor::<f64>::new(&[1f64, 2., 3.]).to_cuda::<0/*Cuda device id*/>()?;
    let y = Tensor::<i64>::new(&[4i64, 5, 6]).to_cuda::<0/*Cuda device id*/>()?;

    let result = x + &y; // with `normal_promote` feature enabled, i64 + f64 will output f64
    println!("{}", result); // [5. 7. 9.]

    // All the available methods are listed in https://jianqoq.github.io/Hpt/user_guide/user_guide.html
    let result: Tensor<f64, Cuda, 0> = y.sin()?;
    println!("{}", result); // [-0.7568 -0.9589 -0.2794]
    Ok(())
}

For more examples, reference here and documentation

How To Get Highest Performance

Compile your program with the following configuration in Cargo.toml, note that lto is very important.

opt-level = 3
lto = "fat"
codegen-units = 1

Ensure your Env variable RUSTFLAGS enabled the best features your CPU has, like -C target-feature=+avx2 -C target-feature=+fma.

Benchmarks

benchmarks

Backend Support

Backend	Supported
CPU	✅
Cuda	🚧

CPU	Supported
AVX2	✅
AVX512	❌
SSE	✅
Neon	✅

It is welcome to get contribution for supporting machines that is not supported in the list. Before contribute, please look at the dev guide.

Documentations

For more details, visit https://jianqoq.github.io/Hpt/

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Contribution are wellcome, please check https://jianqoq.github.io/Hpt/dev_guide/dev_guide.html for more detail

Name		Name	Last commit message	Last commit date
Latest commit History 1,835 Commits
.cargo		.cargo
.github/workflows		.github/workflows
docs		docs
hpt-allocator		hpt-allocator
hpt-bench		hpt-bench
hpt-codegen		hpt-codegen
hpt-common		hpt-common
hpt-cudakernels		hpt-cudakernels
hpt-dataloader		hpt-dataloader
hpt-display		hpt-display
hpt-examples		hpt-examples
hpt-iterator		hpt-iterator
hpt-macros		hpt-macros
hpt-tests		hpt-tests
hpt-traits		hpt-traits
hpt-types		hpt-types
hpt		hpt
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
publish.ps1		publish.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

HPT

Features

Memory Layout

SIMD Support

Iterator API

Multi-Threading

Broadcasting

Auto Type Promote

Customizable

Note

Cargo Features

Get Start

For more examples, reference here and documentation

How To Get Highest Performance

Benchmarks

Backend Support

Documentations

License

Contribution

About

Licenses found

Releases 8

Packages

Languages

License

Licenses found

Jianqoq/Hpt

Folders and files

Latest commit

History

Repository files navigation

HPT

Features

Memory Layout

SIMD Support

Iterator API

Multi-Threading

Broadcasting

Auto Type Promote

Customizable

Note

Cargo Features

Get Start

For more examples, reference here and documentation

How To Get Highest Performance

Benchmarks

Backend Support

Documentations

License

Contribution

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages