Data Schema
Reference for everything CollarID writes to the SD card and transmits over LoRa/LoRaWAN. Use this when parsing files yourself or citing data formats in publications.
SD Card Layout
The SD card uses an exFAT filesystem. The firmware writes the following structure on the card root:
/ ├── config.csv — Schedule + radio configuration ├── lora_state.bin — LoRaWAN session state (nonces, session buffer ~314 B) ├── METADATA.CSV — Periodic samples + status messages (GPS, light, mag, env, particulate, battery, status) ├── audio/ │ └── YYYY/MM/DD/ │ └── YYYYMMDDHHMMSS.wav — 16 kHz, 16-bit, mono PCM └── accelerometer/ └── YYYY/MM/DD/ └── YYYYMMDDHHMMSS.wav — 3-channel (X/Y/Z) 16-bit PCM, daily rollover
Folders are created on first write. Year/month/day folders use 4-digit and zero-padded 2-digit names respectively (e.g. 2026/04/15).
METADATA.CSV
The primary log file. Every line is a single sample, comma-separated, no header. Format:
<unix_epoch_seconds>,<tag>,<tag-specific payload columns…>
The SD Card page parses this file into per-subsystem CSVs you can download. The tags below match the firmware's snprintf format strings exactly — if you parse the file yourself, treat any unknown tag as forward-compatible noise and skip it.
Tag: gps
One row per accepted GPS fix. Coordinates are stored as fixed-point integers to avoid float imprecision in CSV.
| Column | Type | Decoding / units |
|---|---|---|
| latitude_e7 | int32 | ÷ 1e7 → degrees north (negative = south) |
| longitude_e7 | int32 | ÷ 1e7 → degrees east (negative = west) |
| altitude_mm | int32 | ÷ 1000 → metres above sea level (MSL) |
| h_acc_dm | uint32 | ÷ 10 → metres horizontal accuracy (1σ) |
| fix_type | uint8 | GNSS fix type: 0=no fix, 2=2D, 3=3D, 4=DR+GNSS |
| fix_time_ms | uint32 | ÷ 1000 → seconds spent acquiring this fix |
Example: 1777629812,gps,374215600,-1224310500,42100,15,3,8500 → lat 37.4215600°, lon -122.4310500°, alt 42.1 m, h_acc 1.5 m, 3D fix, took 8.5 s.
Tag: light
Ambient light sensor.
| Column | Type | Notes |
|---|---|---|
| clear | uint32 | Clear-channel raw count |
| als | uint32 | ALS (lux) raw count — convert using the sensor's gain & integration time setting from the datasheet |
Tag: mag
3-axis magnetometer (raw counts, not µT). See Sensor Axis Map for the body-frame orientation.
| Column | Type | Notes |
|---|---|---|
| x | int16 | Body-frame X axis raw count (1 LSB ≈ 1.5 mGauss / 0.15 µT) |
| y | int16 | Body-frame Y axis raw count |
| z | int16 | Body-frame Z axis raw count |
Tag: battery
| Column | Type | Decoding / units |
|---|---|---|
| voltage_mv | uint32 | ÷ 1000 → battery terminal voltage in volts. Sampled every 60 s by the master loop. |
Tag: particulate
Particulate-matter (PM) sensor. One summary row per measurement window.
| Column | Type | Units |
|---|---|---|
| pm1 | uint32 | µg/m³ (PM1.0) |
| pm2_5 | uint32 | µg/m³ (PM2.5) |
| pm10 | uint32 | µg/m³ (PM10) |
Tags: COMP_T · COMP_H · RAW_GAS · RAW_P
Environmental sensor, processed by the BSEC2 calibration library. Each tag is its own row with a single value plus a BSEC2 accuracy rank (0–3, where 3 is fully calibrated).
| Tag | Signal | Type | Units | Notes |
|---|---|---|---|---|
| COMP_T | Compensated temperature | float | °C | Self-heating compensated |
| COMP_H | Compensated relative humidity | float | % | Self-heating compensated |
| RAW_GAS | Gas sensor resistance | float | Ω | For VOC / IAQ post-processing |
| RAW_P | Pressure | float | Pa | Divide by 100 for hPa / mbar |
Layout per row: <epoch>,<tag>,<value>,<accuracy>. Sample:
1777629682,RAW_P,94529.398,0 1777629682,RAW_GAS,279475.969,0 1777629682,COMP_T,21.922,0 1777629682,COMP_H,39.226,0
Tag: status
Free-form status messages emitted by the firmware. Two layouts:
- Step counter:
<epoch>,status,step,<count>— periodic accumulator read. - Text message:
<epoch>,status,<message>— the message field may itself contain commas; rejoin everything after the third column.
The website parser categorizes text messages by prefix: BOOT:, DIAG:, GPS fix, GPS fine, with everything else falling to Other. New prefixes can be added without breaking existing exports.
Audio WAV Files
Recorded by the PDM MEMS microphone via the MCU's hardware PDM-to-PCM filter. Each recording window in the schedule produces one file; if a path collision occurs the firmware appends _<n>.
| Property | Value |
|---|---|
| Container | RIFF/WAVE with a 44-byte header (PCM) |
| Sample rate | 16 000 Hz |
| Bit depth | 16-bit signed, little-endian |
| Channels | 1 (mono) |
| Path | /audio/YYYY/MM/DD/YYYYMMDDHHMMSS.wav |
| Filename time | UTC, derived from the file's open time |
| Duration | Determined by the active schedule's microphone window |
Files open and play directly in any standard audio tool (Audacity, ffmpeg, librosa, scipy.io.wavfile). The 16 kHz sample rate captures audio up to 8 kHz (Nyquist), which is well-suited to vocalisations and ambient sound but excludes high-frequency ultrasonics.
Accelerometer WAV Files
Raw 3-axis accelerometer samples are streamed off the sensor FIFO and written as a 3-channel PCM WAV file. The WAV container is convenient because most tools handle it natively, even though the data is motion rather than audio.
| Property | Value |
|---|---|
| Container | RIFF/WAVE, 44-byte header, PCM |
| Channels | 3 — channel order X, Y, Z (body frame; see Sensor Axis Map) |
| Bit depth | 16-bit signed, little-endian |
| Sample rate | Configurable per schedule (see table below); written into the WAV header |
| Range | Configurable per schedule (see table below) |
| Path | /accelerometer/YYYY/MM/DD/YYYYMMDDHHMMSS.wav |
| Rollover | One file per UTC day. If the firmware detects an accelerometer subsystem error, the current file is closed and a new one is opened in the same daily folder with a fresh timestamp. |
Sample-rate options
Configured per schedule slot. The actual rate is written into the WAV header, so any standard tool will read the file at the correct rate.
| Sample rate |
|---|
| 25 Hz |
| 50 Hz (default) |
Higher sample rates are supported by the underlying hardware but are not broken out in the current software revision.
Range options & LSB-to-g conversion
The accelerometer reports signed 12-bit samples that the firmware sign-extends and stores as int16. To convert int16 samples to acceleration in g, divide by the LSB-per-g factor for the configured range:
| Full scale | LSB per g | Resolution per LSB |
|---|---|---|
| ±2 g | 1024 | ≈ 0.98 mg |
| ±4 g (default) | 512 | ≈ 1.95 mg |
| ±8 g | 256 | ≈ 3.91 mg |
config.csv at the time of the deployment. Cross-referencing by file timestamp against schedule windows works for most cases.Example Python decoder:
import wave, numpy as np def load_accel(path, range_g=4): w = wave.open(path) rate = w.getframerate() n = w.getnframes() raw = np.frombuffer(w.readframes(n), dtype=np.int16).reshape(-1, 3) lsb_per_g = {2:1024, 4:512, 8:256}[range_g] g = raw / lsb_per_g t = np.arange(len(g)) / rate return t, g # t in seconds, g shape (N, 3) for X/Y/Z
config.csv
Persisted device configuration. Written on every boot and re-read on startup; also re-written by BLE configuration commits. Three sections, each with key-value rows.
System
firmware_version,<git-hash> system_uid,<MCU unique ID word 0> engaged,yes|no
Radio
lorawan_activation_type,OTAA|ABP lorawan_region,US915|EU868|AU915 lorawan_dev_eui,<16-char hex> lorawan_join_eui,<16-char hex> lorawan_app_key,<32-char hex> lora_spreading_factor,SF7|SF8|SF9|SF10|SF11|SF12 lora_bandwidth,125|250|500 lora_coding_rate,4/5|4/6|4/7|4/8 lora_tx_power_dbm,<int> lora_frequency,<MHz> lost_mode_enabled,0|1 lost_mode_activation_epoch,<unix epoch> lost_mode_transmit_interval_min,<int>
Schedule (5 columns, one per slot)
parameter_name,schedule1,schedule2,schedule3,schedule4,schedule5
start_hour,0,6,12,18,0
end_hour,6,12,18,24,0
gps_enabled,1,1,0,1,0
gps_sample_interval_min,30,15,0,30,0
gps_accuracy,7,8,0,7,0
... (continues for accelerometer, microphone, light, environmental,
particulate, magnetometer, lorawan, lora)
If every parameter in a schedule slot is left at 0, that slot is treated as inactive and the device will not execute it. In the example above, schedule5 is all zeros and is therefore skipped — the active schedule rotation is just slots 1–4.
For reproducibility in publications, snapshot config.csv alongside your data — it captures the exact firmware version and per-schedule sensor settings active during the deployment.
LoRa / LoRaWAN Payload
All over-the-air payloads are nanopb-encoded protobuf. Audio and raw accelerometer samples are not transmitted — only summary statistics — due to LoRa bandwidth constraints.
What's transmitted
Each periodic uplink can carry any subset of the following, depending on which sensors are active in the current schedule:
- Up to 5 most recent GPS fixes (lat / lon / altitude / accuracy / fix duration)
- Environmental summary (temperature, humidity, gas resistance, pressure, lux)
- Particulate summary (PM1, PM2.5, PM10)
- Battery percentage and SD-card free space
- Step counter
- Accelerometer windowed statistics (ODBA / VeDBA mean & max) — live on-device computation is currently TBD; the field exists in the wire format but is not yet populated from real samples
- Recent error-flag bitfield
- Radio statistics (RSSI / SNR) appended on RX
Downlinks (server → device) carry a small set of operational commands — time sync, GPS-accuracy override, remote engage/disengage, system reset, scheduled deployment drop-off, and transactional schedule updates — plus an epoch field that the firmware uses to nudge the RTC.
LoRaWAN region presets
US915, EU868, and AU915 are LoRaWAN regional specifications — they apply only in LoRaWAN mode. Raw LoRa (P2P) mode uses a user-configurable carrier frequency (lora_frequency in config.csv) and is not tied to these regions.
| Region | Activation | SF range |
|---|---|---|
| US915 | OTAA or ABP | SF7–SF12 |
| AU915 | SF7–SF12 | |
| EU868 | SF7–SF12 |
Adaptive data rate (P2P mode only)
When operating in raw LoRa P2P mode, the firmware picks a data rate based on payload size. The selection table the firmware uses depends on which sub-GHz band the radio is configured for (US/AU 915 MHz vs EU 868 MHz):
| Band | Adaptive DR ladder |
|---|---|
| 915 MHz (US / AU) | ≤40 B → DR0; ≤65 B → DR2; ≤110 B → DR3 |
| 868 MHz (EU) | ≤40 B → DR1; ≤65 B → DR3; ≤140 B → DR4 |
Wire-format schema
The full .proto definition (top-level message_packet, every nested message, exact field tags and types) is available upon request for users building their own LoRa receivers, ChirpStack integrations, or downstream pipelines. Reach out via the Support page (category: Data) and we'll send you the canonical schema and decoding examples.
GPS Uplink
GPS fixes appear in two places. On the SD card, each accepted fix is logged as a single gps,... row in METADATA.CSV. Over LoRa/LoRaWAN, the most recent fixes (up to 5) are appended to each periodic uplink — each fix carries latitude, longitude, altitude, fix timestamp, and optional accuracy metrics (HDOP and horizontal accuracy in metres). The exact wire-format schema is available upon request via the Support page.
Timestamps & Time Sync
- All on-device timestamps are Unix epoch in seconds, UTC.
- The RTC is set from the first GPS fix at boot. If GPS times out (10 minutes), the device begins executing its schedule with an unsynced clock and resyncs at its next GPS sampling interval.
- LoRaWAN downlinks include an
epochfield that the firmware uses to nudge the RTC if drift accumulates. - File and folder names use UTC year/month/day (e.g.
/audio/2026/04/15/) regardless of the deployment's local timezone — this avoids day-boundary ambiguity when collars cross meridians. - Files written before the first successful GPS fix carry RTC defaults (typically
2000-01-01until set). The website parser ignores any row withepoch < 2020-01-01 UTC.
Display vs. stored time
The CollarID dashboard converts UTC timestamps to the local timezone of each fix's GPS coordinates for display, so a deployment in Botswana shows in CAT and a deployment in Chile shows in CLT without you having to set anything. This conversion is purely a presentation layer.
All stored and exported data preserves UTC. That includes METADATA.CSV, file/folder names, WAV file metadata, GeoJSON exports, and Movebank CSV exports. If you join CollarID data with another sensor stream or analysis pipeline, work in UTC and apply your own timezone conversion at presentation time only.