How to export and store trajectory and airspace data?

The traffic library is based on the pandas library for representing and manipulating trajectories and on the shapely library for manipulating geographical information like airways, beacons, airports and airspaces.

Various parsers are provided (feel free to file a PR if you use other tools with different data sources) but after applying different operations to imported data, you may want to export data for storing, sharing and loading again later.

Traffic and Flight structures

The question applies to pandas dataframes as well, with many opinions available on the net. In general, the question boils down to whether you want to distribute the data, how fast you need to access it and how long you need to keep it.

  • CSV (comma separated values) is a pretty standard and widely acknowledged format (modulo the definition of the separator). It is easy to parse but it can be slow when data gets large. Also it doesn’t contain information about types so you need to check dtypes and transform them manually if need be. A good rule of thumb could be to parse CSV data only once and to use another format for storing it for future use.
  • JSON (JavaScript Object Notation) is another lightweight text notation, human readable, also slow to parse. The added value compared to CSV is that you can distinguish boolean, numerical and string values.
  • Pickle is the standard format for Python serialisation. The binary representation of data is dumped as is in a file, no question asked. It is fast to read and write and you are sure to recover your data after you restart your Python interpreter/kernel. The downside is that the serialisation format may change with Python and pandas versions so it is not a good format for sharing and storing data in the time.
  • HDF (Hierarchical Data Format) is a cross platform and cross language standard format for storing very large amounts of data. You may need extra dependencies to read and write from this format.
  • Apache Parquet is another columnar cross platform and cross language standard storage format. Its implementation inside pandas is very fast for both read and write operations and the resulting files are rather compact. Types are respected but all Python structures (like sets, lists and dictionaries) may not be directly exportable. You may need extra dependencies to read and write from this format.

The Flight and Traffic structures implement the following methods:

class traffic.core.mixins.DataFrameMixin(data, *args, **kwargs)

DataFrameMixin aggregates a pandas DataFrame and provides the same representation methods.

classmethod from_file(filename, **kwargs)

Read data from various formats.

This class method dispatches the loading of data in various format to the proper pandas.read_* method based on the extension of the filename. Potential compression of the file is inferred by pandas itself based on the extension. :rtype: Optional[TypeVar(T, bound= DataFrameMixin)]

Other extensions return None. Specific arguments may be passed to the underlying pandas.read_* method with the kwargs argument.

Example usage:

>>> from traffic.core import Traffic
>>> t = Traffic.from_file(filename)
to_csv(filename, *args, **kwargs)

Exports to CSV format.

Options can be passed to pandas.DataFrame.to_csv() as args and kwargs arguments.

Read more: How to export and store trajectory and airspace data?

Return type:

None

to_hdf(filename, *args, **kwargs)

Exports to HDF format.

Options can be passed to pandas.DataFrame.to_hdf() as args and kwargs arguments.

Read more: How to export and store trajectory and airspace data?

Return type:

None

to_json(filename, *args, **kwargs)

Exports to JSON format.

Options can be passed to pandas.DataFrame.to_json() as args and kwargs arguments.

Read more: How to export and store trajectory and airspace data?

Return type:

None

to_parquet(filename, *args, **kwargs)

Exports to parquet format.

Options can be passed to pandas.DataFrame.to_parquet() as args and kwargs arguments.

Read more: How to export and store trajectory and airspace data?

Return type:

None

to_pickle(filename, *args, **kwargs)

Exports to pickle format.

Options can be passed to pandas.DataFrame.to_pickle() as args and kwargs arguments.

Read more: How to export and store trajectory and airspace data?

Return type:

None

Airspace structures

Airspaces are manipulated as GeoPandas GeoDataFrame.

The most appropriate format for exporting such data is probably the JSON format, which happens to also be compatible with many GIS toolsets and JavaScript visualisation libraries.

However, there are two types of JSON formats for geometries:

  • the GeoJSON format is the closest to the data we parse. You can export it directly from any GeoDataFrame, with the .to_json(). The community tends to prefer the .geojson suffix for this kind of files.
  • the TopoJSON format presents many advantages over the GeoJSON format. Data is stored as arcs and objects, and a single segment (an arc) may be referenced by many objects. More advanced tricks help keeping a file size often about 6 times smaller than the size of a GeoJSON.

Exports to TopoJSON may require the use of external libraries, such as topojson.

Warning

The difficult part about exporting to TopoJSON is the detection of arcs which may be shared by several objects. If arcs are not exact matches for both sides of a border, this may lead to invalid topologies. This is not a problem of the TopoJSON format, but a problem of data curation, under the responsibility of the data provider.