`ak.records_to_regular` to convert `[{"x": 1, "y": 2}, {"x": 3, "y": 4}]` into `[[1, 2], [3, 4]]` #3257

jpivarski · 2024-09-25T16:34:22Z

Description of new feature

Awkward Array's idiomatic form for data points with named features is to use RecordArray, which keeps each record field in a separate array (useful for loading or working with a subset of columns).

Machine learning libraries like to see a feature-set (an input vector into a neural network) as a regular dimension, either RegularArray or NumpyArray with inner_shape != () (which become the same thing after conversion out of Awkward). Unlike a RecordArray, the different features of the same vector are contiguous in memory.

Also unlike a RecordArray, the elements of a feature vector have no names. I do not know if there's a way to preserve these feature names, in PyTorch for instance, but it would be nice to do so in a conversion from Awkward Arrays into PyTorch Tensors.

ak.records_to_regular in which the records are one level deep,

>>> array = ak.Array([[{"pt": 0.0, "eta": 1.1}, {"pt": 2.2, "eta": 3.3}], [], [{"pt": 4.4, "eta": 5.5}]])

can be implemented as

>>> ak.unflatten(ak.concatenate(ak.unzip(array), axis=1), 2, axis=1)
<Array [[[0, 2.2], [1.1, 3.3]], ..., [[4.4, ...]]] type='3 * var * 2 * float64'>

but we're interested in a function that can be applied regardless of how deep the first level of records is. It would be written with recursively_apply. At some level of recursively_apply, you'd have passed through the list-type node and would be seeing the RecordArray directly:

>>> array = ak.Array([{"pt": 0.0, "eta": 1.1}, {"pt": 2.2, "eta": 3.3}, {"pt": 4.4, "eta": 5.5}])

and then you'd want to do something like

>>> ak.concatenate([x[:, np.newaxis] for x in ak.unzip(array)], axis=1)
<Array [[0, 1.1], [2.2, 3.3], [4.4, 5.5]] type='3 * 2 * float64'>

(preserves the length, 3, so it's good for recursively_apply).

This function would be useful for Awkward → ML conversions regardless of whether the data are ragged or not.

If more than one RecordArray is nested within each other, this function can be applied multiple times to turn each record-type into a dimension.

The text was updated successfully, but these errors were encountered:

jpivarski · 2024-10-02T16:01:09Z

Cc: @livaage, @GageDeZoort, @maxymnaumchyk, @ianna

jpivarski added the feature New feature or request label Sep 25, 2024

This was referenced Sep 25, 2024

feat: to/from PyTorch JaggedTensor #3246

Open

Function to convert ragged arrays into (Python) _lists_ of Tensors? #3265

Open

maxymnaumchyk mentioned this issue Oct 3, 2024

Add interoperability between Awkward Array and ML libraries #3267

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ak.records_to_regular` to convert `[{"x": 1, "y": 2}, {"x": 3, "y": 4}]` into `[[1, 2], [3, 4]]` #3257

`ak.records_to_regular` to convert `[{"x": 1, "y": 2}, {"x": 3, "y": 4}]` into `[[1, 2], [3, 4]]` #3257

jpivarski commented Sep 25, 2024

jpivarski commented Oct 2, 2024

ak.records_to_regular to convert [{"x": 1, "y": 2}, {"x": 3, "y": 4}] into [[1, 2], [3, 4]] #3257

ak.records_to_regular to convert [{"x": 1, "y": 2}, {"x": 3, "y": 4}] into [[1, 2], [3, 4]] #3257

Comments

jpivarski commented Sep 25, 2024

Description of new feature

jpivarski commented Oct 2, 2024

`ak.records_to_regular` to convert `[{"x": 1, "y": 2}, {"x": 3, "y": 4}]` into `[[1, 2], [3, 4]]` #3257

`ak.records_to_regular` to convert `[{"x": 1, "y": 2}, {"x": 3, "y": 4}]` into `[[1, 2], [3, 4]]` #3257