You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to understand this library better. Since every fst can be used with mmap, I know that the build process creates a serialized fst representation, and at read time, this representation is used in a zero-alloc way. After reading the code in the raw crate, I can kind of follow along, but I still don't understand the file format.
Is there an overview of the file format anywhere? Maybe just an example, to start? Thanks!
The text was updated successfully, but these errors were encountered:
Unfortunately I've never written docs for the format. The folks over at Couchbase did port this library to Go though, and in the course of doing it, they sketched out a format in writing: https://github.com/couchbase/vellum/blob/master/docs/format.md --- I haven't combed over it detail though, so I don't know how accurate it is. And there have been some minor changes (like support for CRC32) since that document was written.
Overall, the format itself was my own devising, but took a lot of inspiration from the various papers linked in the docs with respect to compressing finite automata. An FST is a "compressed data structure," meaning that its native form is itself also compressed. So for example, most states inside an FST are represented by a single byte.
I'm trying to understand this library better. Since every fst can be used with mmap, I know that the build process creates a serialized fst representation, and at read time, this representation is used in a zero-alloc way. After reading the code in the
raw
crate, I can kind of follow along, but I still don't understand the file format.Is there an overview of the file format anywhere? Maybe just an example, to start? Thanks!
The text was updated successfully, but these errors were encountered: