Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
copy.scala		copy.scala
optd_por_public_filtered.csv		optd_por_public_filtered.csv

README.md

README

These datasets are based on Open Travel Data datasets (or OPTD for short).

File optd_por_public_filtered.csv

This dataset contains airports from all over the world.

The schema is as follows:

id: replica of the iata_code, is the primary key
iata_code: unique identifier of an airport (for instance, SFO for San Francisco)
envelope_id: the version (as in OPTD) taken for the airport in this dataset
name: name of the airport
latitude
longitude
date_from: start validity of the record
date_until: end validity of the record
comment
country_code
country_name
continent_name
timezone
wiki_link

The CSV has been generated from this csv on OPTD using the following snippet:

// This scripts simply prepares the input dataset, filtering the airports with non null IATA code and location type A.
// Only the most recent entry for each IATA code is kept.
import org.apache.spark.sql.functions.col
val input = ??? // path to optd_por_public.csv from open
val outputPath = ???
val rawCsv = spark.read.option("delimiter","^").option("header","true").csv(input)
val projected = rawCsv.withColumn("id", col("iata_code")).select("id", "iata_code", "envelope_id", "name", "latitude", "longitude", "date_from", "date_until", "comment", "country_code", "country_name", "continent_name", "timezone", "wiki_link")
projected.where(col("location_type")==="A" and col("iata_code").isNotNull).createOrReplaceTempView("table")
val airports = spark.sql("SELECT row_number() OVER (PARTITION BY iata_code ORDER BY envelope_id, date_from DESC) as r, * FROM table").where(col("r") === 1).drop("r")
airports.coalesce(1).write.mode("overwrite").option("delimiter","^").option("header", "true").csv(outputPath)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optd

optd

README.md

README

File optd_por_public_filtered.csv

Files

optd

Directory actions

More options

Directory actions

More options

Latest commit

History

optd

Folders and files

parent directory

README.md

README

File optd_por_public_filtered.csv