Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate metadata encoding #10

Open
mactrem opened this issue Aug 15, 2023 · 1 comment
Open

Evaluate metadata encoding #10

mactrem opened this issue Aug 15, 2023 · 1 comment

Comments

@mactrem
Copy link
Collaborator

mactrem commented Aug 15, 2023

A layer of COVTiles is basically divided into a metadata section and a FeatureTable with the actual vector data.
The FeatureTable has a colum oriented layout and uses a custom binary encoding.
The structure of a FeatureTable is described in the metadata section see for example the TypeScript decoder or the Protobuf schema. Currently the metadata section is encoded in a custom binary format.

However, using an existing file format to encode the metadata section would have advantages such as using an existing Interface Definition Language (IDL) to describe the structure of the metadata, easy implementation of encoder/decoder, or out-of-the-box support for versioning that a allows backwards and forwards compatibility of the metadata section. Other widley used state-of-the art file formats are also using the approach of combining a exisiting file format for encoding the metadata with a custom binary encoding of the actual data like for example ORC with ProtoBuf, Parquet with Thrift or Arrow with FlatBuffers.

Below is a brief comparison of the different formats with the currently used custom binary encoding:
Protobuf
Pros

  • Integrated Interface Definition Language (IDL) for describing the structure of the metdata based on proto files
  • Integrated support for versioning enables backward and forward compatible of the metadata section (schema evolution)
  • Easy implementation of encoder/decoder

Cons

  • Additional library for decoding the metadata section is needed which increases the bundle size ->but for example Mapbox's pbf library is only 3 KB in size
  • (Small) increase in tile size mainly due to the additional tags (for versioning) compared to the currently used binary encoding
  • Compatibility only works for metadata not for the actual data

FlatBuffers
Pros

  • Same as Protobuf (IDL, versioning, convenient to use)
  • Faster decoding of the metadata section as FlatBuffers is optimized for speed

Cons

  • Same as Probobuf
  • In addition as FlatBuffers is optimized for access to serialized data without parsing/unpacking a larger size compared to Protobuf basically because no Varint support, overhead based on alignment, ...

Currently I'm leaning toward replacing the binary encoding with one of the formats listed above, as the larger tile and library package size is rather negligible. Since COVTiles is optimized in terms of size, as its main use case is designed for the transfer over the network, I would prefer Protobuf over FlatBuffers.

see #6 for the disucssions around this topic.

@mactrem
Copy link
Collaborator Author

mactrem commented Aug 17, 2023

The following table shows the increase in overall tile sizes when using Protobuf instead of the current custom binary encoding for the metadata:

Zoom Level Delta Compressed % Delta Gzip Compressed %
2 0.65 0.28
3 0.89 0.38
4 1.12 0.44
5 1.15 0.47
6 1.41 0.63
7 1.33 0.63
8 1.68 0.78
9 2.22 0.65
10 3.36 0.89
11 4.49 1.15
12 2.73 0.78
13 3.14 0.85
14 0.79 0.31

Especially in the smaller uncompressed tiles, there is a noticeable increase in the overall tile size.
When using Gzip the increase in size is rather negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant