To help you get started quickly with zq
, this repository contains small sample sets of Zeek data. There are six different log formats available, all representing events based on the same network traffic:
Directory | Format |
---|---|
zeek-default/ | Zeek default output format |
zeek-json/ | [ JSON as output by the Zeek package for JSON Streaming Logs |
zng/ | Binary ZNG, output with zq 's default LZ4-compressed format |
zng-uncompressed/ | Binary ZNG, output with zq 's option -zng.compress=false to disable compression |
zson/ | ZSON, a Zed text output format that has the look and feel of JSON |
This sample data is used frequently for a simple Zed performance test and to check for unexpected changes in the Zed output formats.
Because prior changes to the ZNG and ZSON output formats have added some bulk to the revision history, you'll typically want to save time by just downloading the latest revision:
# git clone --depth=1 https://github.com/brimdata/zed-sample-data.git
This sample data set was generated from a subset of the packet capture archives (formerly at https://archive.wrccdc.org/pcaps, though the site has been down of late) that are distributed by the WRCCDC.
This sample data is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, as it is built upon the WRCCDC PCAP data that is distributed under the same license.
We would like to express our thanks to the WRCCDC for generously making their packet capture archives available to the public and for commercial use. The terabytes of "real world" data has been invaluable to us in testing the foundations of zq
at scale.
The data set was made from the several PCAP files in the 2018 set. Zeek v6.2.0 was used in its default configuration with the only change being the addition/enabling of the JSON Streaming Logs package. The packet captures were then processed via the command-lines:
# mergecap -w wrccdc.pcap wrccdc.2018-03-24.10*.pcap
# zeek -r wrccdc.pcap local "JSONStreaming::enable_log_rotation=F"
This produced the logs in Zeek default and JSON formats. As ZNG and ZSON are not yet output directly by Zeek, these logs were created by sending each Zeek default log through zq
, e.g.:
# mkdir -p zng && \
for file in zeek-default/*
do
zq -f zng "$file" \
| gzip -n > zng/"$(basename "$file" | sed 's/\.log\.gz//')".zng.gz
done
# mkdir -p zng-uncompressed && \
for file in zeek-default/*
do
zq -f zng -zng.compress=false "$file" \
| gzip -n > zng-uncompressed/"$(basename "$file" | sed 's/\.log\.gz//')".zng.gz
done
# mkdir -p zson && \
for file in zeek-default/*
do
zq -f zson "$file" \
| gzip -n > zson/"$(basename "$file" | sed 's/\.log\.gz//')".zson.gz
done
Since the sample ZNG and ZSON logs are generated by zq
, regenerating these outputs is a useful zq
test. Assuming zq
is in your $PATH
, a script is provided to regenerate the hash for each ZNG and ZSON log and compare it to a last known "good" hash stored in the md5sums/
directory.
# scripts/check_md5sums.sh zng
capture_loss:62949d22a0a557342d28ee5ee4b64d50
...
x509:10333d3d004c718b04cbedb8ee195cca
diff'ing current "zq -f zng" output hashes vs. committed hashes:
7c7
< ftp:c84824c8114df4db745399ff875b0d92
---
> ftp:2d8d90df3c4b84eb9e281a3f10767aa5
======> diffs detected! Check for a zq bug or intentional zng format change.
Current hashes are in /var/folders/yn/jbkxxkpd4vg142pc3_bd_krc0000gn/T/tmp.9X7Gab9I