forked from scikit-hep/awkward
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ByteMaskedArray, BitMaskedArray, and tomask operation (scikit-hep#143)
* [WIP] Add a 'tomask' operation to make masked data, rather than filtering. * Added stubs for ByteMaskedArray and BitMaskedArray. * Linked ByteMaskedArray and BitMaskedArray to Python. * [skip ci] Save work. * [skip ci] Save work. * Added UnmaskedArray stubs. * Some of 'getitem' for ByteMaskedArray is done. * ByteMaskedArray::asslice. * ByteMaskedArray, BitMaskedArray, UnmaskedArray integrated into Python (everywhere there had been a reference to IndexedOptionArray64). * Found a jagged indexing case that hadn't been covered before. * [skip ci] save work * [skip ci] save work; added 'carry'. * [skip ci] This isn't working: 'jagged getitem on an array containing Nones'. * Removed projections and carrying of SliceItems. * Masked jagged arrays may now be used to slice masked jagged arrays (though the missing values have to be in the same places, of course). * Finally, we're triggering ByteMaskedArray::getitem_next_jagged_generic * And ByteMaskedArray::getitem_next_jagged_generic was easy. * ByteMaskedArray::setidentities is done. * ByteMaskedArray::deep_copy implemented (without testing). * ByteMaskedArray::validityerror is implemented. * ByteMaskedArray::num is implemented and tested. * ByteMaskedArray::offsets_and_flattened is implemented and tested. * ByteMaskedArray::rpad/rpad_and_clip are implemented and tested, fixing IndexedOptionArray::rpad/rpad_and_clip in the process. * ByteMaskedArray::reducers are implemented and tested. * ByteMaskedArray::localindex is implemented and tested. * ByteMaskedArray::choose is implemented and tested, fixing List/ListOffset/RegularArray::choose in the process. * Finished merging functions for option type arrays, but all other type arrays have to check for the new option type arrays. * All other type arrays now check for the new option type arrays. * The option-type and pass-through type 'simplify' methods are all aware of each other. * Renamed to 'simplify_optiontype' and 'simplify_uniontype' and introduced a 'shallow_simplify'. (Not sure if we'll ever need a 'deep_simplify'...) * Defined BitMaskedArray::toByteMaskedArray (and verified lsb_order against awkward0). * Defined (but didn't test) toIndexedOptionArray64 as well. * Implemented getitem_at/iteration and it agrees with conversions to ByteMaskedArray/IndexedOptionArray64. * BitMaskedArray::bytemask uses the same code as BitMaskedArray::toByteMaskedArray. * BitMaskedArray::getitem_range_nowrap remains a BitMaskedArray only if start % 8 == 0. * BitMaskedArray is done. * UnmaskedArray has been fully implemented, though the tests are minimal. * Byte/Bit/UnmaskedArray boxing and unboxing in Numba is done and tested; needs 'hasfield', 'getitem_at', and 'lower_getitem_at'. * ByteMaskedArray for Numba is done. * BitMaskedArray for Numba is done. * UnmaskedArray for Numba is done. * All cases of IndexedOptionArray in Python that would be better served by ByteMaskedArray have been changed. * Found all the places where ByteMaskedArray was needed in C++ and fixed the validwhen/lsb_order conventions. * Implemented and tested tomask. * Stubs for the 'semigroup' parameter. * Remove those 'semigroup' stubs because it's already there as 'mask'. The high-level parameter name has been renamed to 'maskidentity' to be a little more clear. All I need to do is replace awkward_numpyarray_reduce_mask_indexedoptionarray64 with a ByteMaskedArray version. * The whole 'semigroup'/'maskidentity' thing turned out to be just changing the already-implemented IndexedOptionArray64 into a ByteMaskedArray. Done with the PR.
- Loading branch information
Showing
53 changed files
with
3,582 additions
and
123 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE | ||
|
||
#ifndef AWKWARD_BITMASKEDARRAY_H_ | ||
#define AWKWARD_BITMASKEDARRAY_H_ | ||
|
||
#include <string> | ||
#include <memory> | ||
#include <vector> | ||
|
||
#include "awkward/cpu-kernels/util.h" | ||
#include "awkward/Slice.h" | ||
#include "awkward/Index.h" | ||
#include "awkward/Content.h" | ||
#include "awkward/array/ByteMaskedArray.h" | ||
#include "awkward/array/IndexedArray.h" | ||
|
||
namespace awkward { | ||
class EXPORT_SYMBOL BitMaskedArray: public Content { | ||
public: | ||
BitMaskedArray(const std::shared_ptr<Identities>& identities, const util::Parameters& parameters, const IndexU8& mask, const std::shared_ptr<Content>& content, bool validwhen, int64_t length, bool lsb_order); | ||
const IndexU8 mask() const; | ||
const std::shared_ptr<Content> content() const; | ||
bool validwhen() const; | ||
bool lsb_order() const; | ||
const std::shared_ptr<Content> project() const; | ||
const std::shared_ptr<Content> project(const Index8& mask) const; | ||
const Index8 bytemask() const; | ||
const std::shared_ptr<Content> simplify_optiontype() const; | ||
const std::shared_ptr<ByteMaskedArray> toByteMaskedArray() const; | ||
const std::shared_ptr<IndexedOptionArray64> toIndexedOptionArray64() const; | ||
|
||
const std::string classname() const override; | ||
void setidentities() override; | ||
void setidentities(const std::shared_ptr<Identities>& identities) override; | ||
const std::shared_ptr<Type> type(const std::map<std::string, std::string>& typestrs) const override; | ||
const std::string tostring_part(const std::string& indent, const std::string& pre, const std::string& post) const override; | ||
void tojson_part(ToJson& builder) const override; | ||
void nbytes_part(std::map<size_t, int64_t>& largest) const override; | ||
int64_t length() const override; | ||
const std::shared_ptr<Content> shallow_copy() const override; | ||
const std::shared_ptr<Content> deep_copy(bool copyarrays, bool copyindexes, bool copyidentities) const override; | ||
void check_for_iteration() const override; | ||
const std::shared_ptr<Content> getitem_nothing() const override; | ||
const std::shared_ptr<Content> getitem_at(int64_t at) const override; | ||
const std::shared_ptr<Content> getitem_at_nowrap(int64_t at) const override; | ||
const std::shared_ptr<Content> getitem_range(int64_t start, int64_t stop) const override; | ||
const std::shared_ptr<Content> getitem_range_nowrap(int64_t start, int64_t stop) const override; | ||
const std::shared_ptr<Content> getitem_field(const std::string& key) const override; | ||
const std::shared_ptr<Content> getitem_fields(const std::vector<std::string>& keys) const override; | ||
const std::shared_ptr<Content> getitem_next(const std::shared_ptr<SliceItem>& head, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> carry(const Index64& carry) const override; | ||
const std::string purelist_parameter(const std::string& key) const override; | ||
bool purelist_isregular() const override; | ||
int64_t purelist_depth() const override; | ||
const std::pair<int64_t, int64_t> minmax_depth() const override; | ||
const std::pair<bool, int64_t> branch_depth() const override; | ||
int64_t numfields() const override; | ||
int64_t fieldindex(const std::string& key) const override; | ||
const std::string key(int64_t fieldindex) const override; | ||
bool haskey(const std::string& key) const override; | ||
const std::vector<std::string> keys() const override; | ||
|
||
// operations | ||
const std::string validityerror(const std::string& path) const override; | ||
const std::shared_ptr<Content> shallow_simplify() const override; | ||
const std::shared_ptr<Content> num(int64_t axis, int64_t depth) const override; | ||
const std::pair<Index64, std::shared_ptr<Content>> offsets_and_flattened(int64_t axis, int64_t depth) const override; | ||
bool mergeable(const std::shared_ptr<Content>& other, bool mergebool) const override; | ||
const std::shared_ptr<Content> reverse_merge(const std::shared_ptr<Content>& other) const; | ||
const std::shared_ptr<Content> merge(const std::shared_ptr<Content>& other) const override; | ||
const std::shared_ptr<SliceItem> asslice() const override; | ||
const std::shared_ptr<Content> rpad(int64_t length, int64_t axis, int64_t depth) const override; | ||
const std::shared_ptr<Content> rpad_and_clip(int64_t length, int64_t axis, int64_t depth) const override; | ||
const std::shared_ptr<Content> reduce_next(const Reducer& reducer, int64_t negaxis, const Index64& parents, int64_t outlength, bool mask, bool keepdims) const override; | ||
const std::shared_ptr<Content> localindex(int64_t axis, int64_t depth) const override; | ||
const std::shared_ptr<Content> choose(int64_t n, bool diagonal, const std::shared_ptr<util::RecordLookup>& recordlookup, const util::Parameters& parameters, int64_t axis, int64_t depth) const override; | ||
|
||
const std::shared_ptr<Content> getitem_next(const SliceAt& at, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> getitem_next(const SliceRange& range, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> getitem_next(const SliceArray64& array, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> getitem_next(const SliceJagged64& jagged, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> getitem_next_jagged(const Index64& slicestarts, const Index64& slicestops, const SliceArray64& slicecontent, const Slice& tail) const override; | ||
const std::shared_ptr<Content> getitem_next_jagged(const Index64& slicestarts, const Index64& slicestops, const SliceMissing64& slicecontent, const Slice& tail) const override; | ||
const std::shared_ptr<Content> getitem_next_jagged(const Index64& slicestarts, const Index64& slicestops, const SliceJagged64& slicecontent, const Slice& tail) const override; | ||
|
||
private: | ||
const IndexU8 mask_; | ||
const std::shared_ptr<Content> content_; | ||
const bool validwhen_; | ||
const int64_t length_; | ||
const bool lsb_order_; | ||
}; | ||
|
||
} | ||
|
||
#endif // AWKWARD_BITMASKEDARRAY_H_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE | ||
|
||
#ifndef AWKWARD_BYTEMASKEDARRAY_H_ | ||
#define AWKWARD_BYTEMASKEDARRAY_H_ | ||
|
||
#include <string> | ||
#include <memory> | ||
#include <vector> | ||
|
||
#include "awkward/cpu-kernels/util.h" | ||
#include "awkward/Slice.h" | ||
#include "awkward/Index.h" | ||
#include "awkward/Content.h" | ||
|
||
namespace awkward { | ||
class EXPORT_SYMBOL ByteMaskedArray: public Content { | ||
public: | ||
ByteMaskedArray(const std::shared_ptr<Identities>& identities, const util::Parameters& parameters, const Index8& mask, const std::shared_ptr<Content>& content, bool validwhen); | ||
const Index8 mask() const; | ||
const std::shared_ptr<Content> content() const; | ||
bool validwhen() const; | ||
const std::shared_ptr<Content> project() const; | ||
const std::shared_ptr<Content> project(const Index8& mask) const; | ||
const Index8 bytemask() const; | ||
const std::shared_ptr<Content> simplify_optiontype() const; | ||
const std::shared_ptr<Content> toIndexedOptionArray64() const; | ||
|
||
const std::string classname() const override; | ||
void setidentities() override; | ||
void setidentities(const std::shared_ptr<Identities>& identities) override; | ||
const std::shared_ptr<Type> type(const std::map<std::string, std::string>& typestrs) const override; | ||
const std::string tostring_part(const std::string& indent, const std::string& pre, const std::string& post) const override; | ||
void tojson_part(ToJson& builder) const override; | ||
void nbytes_part(std::map<size_t, int64_t>& largest) const override; | ||
int64_t length() const override; | ||
const std::shared_ptr<Content> shallow_copy() const override; | ||
const std::shared_ptr<Content> deep_copy(bool copyarrays, bool copyindexes, bool copyidentities) const override; | ||
void check_for_iteration() const override; | ||
const std::shared_ptr<Content> getitem_nothing() const override; | ||
const std::shared_ptr<Content> getitem_at(int64_t at) const override; | ||
const std::shared_ptr<Content> getitem_at_nowrap(int64_t at) const override; | ||
const std::shared_ptr<Content> getitem_range(int64_t start, int64_t stop) const override; | ||
const std::shared_ptr<Content> getitem_range_nowrap(int64_t start, int64_t stop) const override; | ||
const std::shared_ptr<Content> getitem_field(const std::string& key) const override; | ||
const std::shared_ptr<Content> getitem_fields(const std::vector<std::string>& keys) const override; | ||
const std::shared_ptr<Content> getitem_next(const std::shared_ptr<SliceItem>& head, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> carry(const Index64& carry) const override; | ||
const std::string purelist_parameter(const std::string& key) const override; | ||
bool purelist_isregular() const override; | ||
int64_t purelist_depth() const override; | ||
const std::pair<int64_t, int64_t> minmax_depth() const override; | ||
const std::pair<bool, int64_t> branch_depth() const override; | ||
int64_t numfields() const override; | ||
int64_t fieldindex(const std::string& key) const override; | ||
const std::string key(int64_t fieldindex) const override; | ||
bool haskey(const std::string& key) const override; | ||
const std::vector<std::string> keys() const override; | ||
|
||
// operations | ||
const std::string validityerror(const std::string& path) const override; | ||
const std::shared_ptr<Content> shallow_simplify() const override; | ||
const std::shared_ptr<Content> num(int64_t axis, int64_t depth) const override; | ||
const std::pair<Index64, std::shared_ptr<Content>> offsets_and_flattened(int64_t axis, int64_t depth) const override; | ||
bool mergeable(const std::shared_ptr<Content>& other, bool mergebool) const override; | ||
const std::shared_ptr<Content> reverse_merge(const std::shared_ptr<Content>& other) const; | ||
const std::shared_ptr<Content> merge(const std::shared_ptr<Content>& other) const override; | ||
const std::shared_ptr<SliceItem> asslice() const override; | ||
const std::shared_ptr<Content> rpad(int64_t length, int64_t axis, int64_t depth) const override; | ||
const std::shared_ptr<Content> rpad_and_clip(int64_t length, int64_t axis, int64_t depth) const override; | ||
const std::shared_ptr<Content> reduce_next(const Reducer& reducer, int64_t negaxis, const Index64& parents, int64_t outlength, bool mask, bool keepdims) const override; | ||
const std::shared_ptr<Content> localindex(int64_t axis, int64_t depth) const override; | ||
const std::shared_ptr<Content> choose(int64_t n, bool diagonal, const std::shared_ptr<util::RecordLookup>& recordlookup, const util::Parameters& parameters, int64_t axis, int64_t depth) const override; | ||
|
||
const std::shared_ptr<Content> getitem_next(const SliceAt& at, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> getitem_next(const SliceRange& range, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> getitem_next(const SliceArray64& array, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> getitem_next(const SliceJagged64& jagged, const Slice& tail, const Index64& advanced) const override; | ||
const std::shared_ptr<Content> getitem_next_jagged(const Index64& slicestarts, const Index64& slicestops, const SliceArray64& slicecontent, const Slice& tail) const override; | ||
const std::shared_ptr<Content> getitem_next_jagged(const Index64& slicestarts, const Index64& slicestops, const SliceMissing64& slicecontent, const Slice& tail) const override; | ||
const std::shared_ptr<Content> getitem_next_jagged(const Index64& slicestarts, const Index64& slicestops, const SliceJagged64& slicecontent, const Slice& tail) const override; | ||
|
||
protected: | ||
template <typename S> | ||
const std::shared_ptr<Content> getitem_next_jagged_generic(const Index64& slicestarts, const Index64& slicestops, const S& slicecontent, const Slice& tail) const; | ||
|
||
const std::pair<Index64, Index64> nextcarry_outindex(int64_t& numnull) const; | ||
|
||
private: | ||
const Index8 mask_; | ||
const std::shared_ptr<Content> content_; | ||
const bool validwhen_; | ||
}; | ||
|
||
} | ||
|
||
#endif // AWKWARD_BYTEMASKEDARRAY_H_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.