names = ['long','lat','depth','sigma_h','sigma_d','source_id','pred_depth','dens20', 'dens60','gravity','age','rate','sed thick', 'roughness']
This is the feature names array in the python script that I'm running.
This section describes the columns in sample.cm in the respective order the columns are listed in sample.cm
.
Pretty much as the name suggests, it is the longitude of the data collected.
Latitude of where the data was collected.
This is the measured depth.
Like in statistics, this sigma stands for the horizontal standard deviation.
A placeholder, feature. We use this column to store the bad data flag.
This is the assigned id for the ship's cruise, this will probably make a good key for the sql db..
This is the predicted depth using an algorithm. The seafloor depth based on a combination of depth predictions from gravity and the measured depth from a previous iteration from the model.
Yoav: I think it would make sense to have features that characterize the known depth in a region around the current point. Similarly it would be good to have features that characterize measurements before/after the current measurement point.
The data density filtered over 10 and 30km half wavelengths.
The vertical gravity gradient based on satellite altimetry. Show small scale structures on the seafloor.
This is the age of the seafloor.
Spreading rate based on the age grid above.
This is the floor's sediment thickness. Areas of thick sediment have a flat seafloor.
This is a measure of how rough the seafloor is. High pass filtered topography at 80km half wavelength squared and low pass filtered at 80km square rooted.