The goal of silicate is to provide a general common form for complex multi-dimensional data.
There are two main motivations for silicate
:
- to provide a central common-form of structure data, and a "universal converter"
- to work with topological primitives for analysis and interaction.
Central is the PATH
model, a normalized form of spatial data that is an intermediate form between standard explicit path forms and topological structures composed of simpler elements. A future version will provide a variety of PRIMITIVE models, but we haven't figured out the final forms yet. PATH and its counterpart/s PRIMITIVE is a dual-view of the two main types of structures used in complex data.
Paths are the turtle-head-down coordinate lists used by lines, polygons, polypath, geom_line.
Primitives are the edge-lists or triangle-lists or quad-lists used in rgl and in many topological structures. The key thing that makes them topological is a unique-vertex-pool, indexed by other types.
The need to provide a language of conversions between these forms for spatial data is illustrated here:
http://rpubs.com/cyclemumner/305595
Paths can be partly topological (as per PATH) in that a unique vertex pool is indexed by variable-length paths, and this is a key distinction from primitives which have a constant number of indexed vertices per element. There's a clash here, because most efficient for paths is very different from most efficient for primitives.
silicate
provides an intermediate form for paths, all instances of all coordinates in one table, and another "path" table that records how many of the coordinates (in native order) are used per path. So this is a kind of run length encoding structure, it provides a common model that can be used by any path-based structure for a decomposition or re-composition form. A family of other functions provide the other component elements that we need, including segments, edges, arcs, nodes and vertices.
There are key worker functions sc_coord
, sc_object
, sc_path
which provide the basis for interpreting simple features from a silicate perspective.
- sc_coord returns the table of coordinates completely flattened, and with no normalization
- sc_object returns the highest level feature metadata
- sc_path returns the table of individual paths, with a coordinate count
From these three components we can generate other forms, and we can round-trip back to the original features.
There are further workers for the vertices (sc_node
), and paths (sc_arc
, sc_segment
, sc_edge
) which provide the necessary lower-level components. These inherently work with identitifiers for components, and so they only make sense with reference to a PATH
(or PRIMITIVE) silicate form.
- sc_node returns only those vertices that are met by three or more paths
- sc_arc returns the paths between nodes, and standalone paths that don't visit any nodes
- sc_segment returns all instances of any two-vertex line segment (directed)
- sc_edge returns only those unique edges, which are inherently undirected
(We possibly also want sc_vertex
to return only the unique coordinates, but that can be obtained from PATH
- though this area needs some care since unique might be in X-Y, Y-Z or X-Y-Z or any geometric space).
This generic set of workers is chosen because we often want the complete set of vertices in their pure form. Returning them with no grouping or identifiers and without any de-duplication means we have a representation of the pure geometry. Since the table has no other columns, generic code can be sure that all columns contain a coordinate. That means we don't need specialist code for 'XYZ', 'XYZM', 'XYT', 'TYX' and so on.
The table of individual paths records which object it belongs to, how many coordinates there are and an ID for the path. This is not intended to be 'relational', it's an intermediate form link by pure indexing. Inserting more levels between the paths and the highest objects is possible, but unclear exactly how to do this yet.
The unjoin
concept is key for mapping the key between unique vertices and a path's instance of those as coordinates, in the right order. We can use the unjoin engine to add structure to other more generic data streams, like GPS, animal tracking, and general sensors.
NOTE: An early implentation had PATH and PRIMITIVE forms, but the latter was too simplistic for real use. The functions sc_arc
, sc_node
, sc_segment
and sc_edge
were added to the basic sc_coord
, sc_path
and sc_object
as these are all that are required to move between any data structure in simple features or topological form. This is an ongoing area of development. See also sphier and svgplotr for related work.
The model functions PRIMITIVE
and PATH
should work in the following cases.
- to flip from one to another
PRIMITIVE(PATH(PRIMITIVE(x)))
should work for any kind of 'x' model - for any
sf
object - convert to igraph objects WIP: https://github.com/mdsumner/scgraph
- spatstat objects WIP: https://github.com/mdsumner/scspatstat
- ...
The classes for all variants of simple features are not worked out, for instance a MULTIPOINT can end up with a degenerate (and expensive) segment table.
More functions sc_uid
provides unique IDs, and sc_node
is a worker for a arc-node intermediate model.
Intermediate models
- Arc-node (WIP)
- Monotone polygons (future work)
There is a hierarchy of sorts with layer, object, path, primitives, coordinates, and vertices.
The current design uses capitalized function names PATH
, PRIMITIVE
... that act on layers, while prefixed lower-case function names produce or derive the named entity at a given level for a given input. E.g. sc_path
will decompose all the geometries in an sf
layer to the PATH model and return them in generic form. PATH
will decompose the layer as a whole, including the component geometries.
PATH()
is the main model used to decompose inputs, as it is the a more general form of the GIS idioms (simple features and georeferenced raster data) This treats connected paths as fully-fledged entities like vertices and objects are, creating a relational model that stores all vertices in one table, all paths in another, and and all highest-level objects in another. The PATH model also takes the extra step of normalizing vertices, finding duplicates in a given geometric space and creating an intermediate link table to record all instances of the vertices. The PATH model does not currently normalize paths, but this is something that could be done, and is close to what arc-node topology is.
The PRIMITIVE
function decomposes a layer into actual primitives, rather than "paths", these are point, line segment, triangle, tetrahedron, and so on.
Currently PATH()
and PRIMITIVE
are the highest level functions to decompose simple features objects.
There are decomposition functions for lower-level sf
objects organized as sc_path
, sc_coord
, and sc_object
. sc_path
does all the work, building a simple map of all the parts and the vertex count. This is used to classify the vertex table when it is extracted, which makes the unique-id management for path-vertex normalization much simpler than it was in gris
or rangl
.
the key ones are OSM-like data with names, LiDAR data (xyz multipoints with time, colour, intensity, groupings, etc. - with groupings such as "contiguous surface" - lidR is good for this), animal track data - grouped-multilines with time, depth, temperature etc. on the coordinates, and triangulations - again grouped-structures from rgl
NOTE: earlier versions of this used the concept of "branch" rather than path, so there is some ongoing migration of the use of these words. Branch is a more general concept than implemented in geo-spatial systems generally, and so path is more accurate We reserve branch for possible future models that are general. A "point PATH" has meaning in the sense of being a single-vertex "path", and so a multipoint is a collection of these degenerate forms. "Path" as a concept is clearly rooted in optimization suited to planar forms, and so is more accurate than "branch".
In our terminology a branch or path is the group between the raw geometry and the objects, and so applies to a connected polygon ring, closed or open linestring, a single coordinate with a multipoint (a path with one vertex). In this scheme a polygon ring and a closed linestring are exactly the same (since they actually are exactly the same) and there are no plane-filling branches, or indeed volume-filling branches. This is a clear limitation of the branch model and it matches that used by GIS.
There are a number of notable exceptions in the spatial world, but unfortunately this highlights how fragemented the landscape is.
TopoJSON, Eonfusion, PostGIS, QGIS geometry generators, Fledermaus, ...
Other technologies are Mapbox, WebGL, Threejs, D3, AFrame, Lavavu ...
The silicate family is composed of a small number of packages that apply the principles here, either to read from path forms or primitive forms.
scdb, sctrip, scrgl, scraster, scicosa,