CollarID — Data Schema

SD Card Layout

The SD card uses an exFAT filesystem. The firmware writes the following structure on the card root:

/
├── config.csv              — Schedule + radio configuration
├── lora_state.bin          — LoRaWAN session state (nonces, session buffer ~314 B)
├── METADATA.CSV            — Periodic samples + status messages (GPS, light, mag, env, particulate, battery, status)
├── audio/
│   └── YYYY/MM/DD/
│       └── YYYYMMDDHHMMSS.wav      — 16 kHz, 16-bit, mono PCM
└── accelerometer/
    └── YYYY/MM/DD/
        └── YYYYMMDDHHMMSS.wav      — 3-channel (X/Y/Z) 16-bit PCM, daily rollover

Folders are created on first write. Year/month/day folders use 4-digit and zero-padded 2-digit names respectively (e.g. 2026/04/15).

METADATA.CSV

The primary log file. Every line is a single sample, comma-separated, no header. Format:

<unix_epoch_seconds>,<tag>,<tag-specific payload columns…>

The SD Card page parses this file into per-subsystem CSVs you can download. The tags below match the firmware's snprintf format strings exactly — if you parse the file yourself, treat any unknown tag as forward-compatible noise and skip it.

Tag: `gps`

One row per accepted GPS fix. Coordinates are stored as fixed-point integers to avoid float imprecision in CSV.

Column	Type	Decoding / units
latitude_e7	int32	÷ 1e7 → degrees north (negative = south)
longitude_e7	int32	÷ 1e7 → degrees east (negative = west)
altitude_mm	int32	÷ 1000 → metres above sea level (MSL)
h_acc_dm	uint32	÷ 10 → metres horizontal accuracy (1σ)
fix_type	uint8	GNSS fix type: 0=no fix, 2=2D, 3=3D, 4=DR+GNSS
fix_time_ms	uint32	÷ 1000 → seconds spent acquiring this fix

Example: 1777629812,gps,374215600,-1224310500,42100,15,3,8500 → lat 37.4215600°, lon -122.4310500°, alt 42.1 m, h_acc 1.5 m, 3D fix, took 8.5 s.

Tag: `light`

Ambient light sensor.

Column	Type	Notes
clear	uint32	Clear-channel raw count
als	uint32	ALS (lux) raw count — convert using the sensor's gain & integration time setting from the datasheet

Tag: `mag`

3-axis magnetometer (raw counts, not µT). See Sensor Axis Map for the body-frame orientation.

Column	Type	Notes
x	int16	Body-frame X axis raw count (1 LSB ≈ 1.5 mGauss / 0.15 µT)
y	int16	Body-frame Y axis raw count
z	int16	Body-frame Z axis raw count

Tag: `battery`

Column	Type	Decoding / units
voltage_mv	uint32	÷ 1000 → battery terminal voltage in volts. Sampled every 60 s by the master loop.

Tag: `particulate`

Particulate-matter (PM) sensor. One summary row per measurement window.

Column	Type	Units
pm1	uint32	µg/m³ (PM1.0)
pm2_5	uint32	µg/m³ (PM2.5)
pm10	uint32	µg/m³ (PM10)

Tags: `COMP_T` · `COMP_H` · `RAW_GAS` · `RAW_P`

Environmental sensor, processed by the BSEC2 calibration library. Each tag is its own row with a single value plus a BSEC2 accuracy rank (0–3, where 3 is fully calibrated).

Tag	Signal	Type	Units	Notes
COMP_T	Compensated temperature	float	°C	Self-heating compensated
COMP_H	Compensated relative humidity	float	%	Self-heating compensated
RAW_GAS	Gas sensor resistance	float	Ω	For VOC / IAQ post-processing
RAW_P	Pressure	float	Pa	Divide by 100 for hPa / mbar

Layout per row: <epoch>,<tag>,<value>,<accuracy>. Sample:

1777629682,RAW_P,94529.398,0
1777629682,RAW_GAS,279475.969,0
1777629682,COMP_T,21.922,0
1777629682,COMP_H,39.226,0

Tag: `status`

Free-form status messages emitted by the firmware. Two layouts:

Step counter: <epoch>,status,step,<count> — periodic accumulator read.
Text message: <epoch>,status,<message> — the message field may itself contain commas; rejoin everything after the third column.

The website parser categorizes text messages by prefix: BOOT:, DIAG:, GPS fix, GPS fine, with everything else falling to Other. New prefixes can be added without breaking existing exports.

Audio WAV Files

Recorded by the PDM MEMS microphone via the MCU's hardware PDM-to-PCM filter. Each recording window in the schedule produces one file; if a path collision occurs the firmware appends _<n>.

Property	Value
Container	RIFF/WAVE with a 44-byte header (PCM)
Sample rate	16 000 Hz
Bit depth	16-bit signed, little-endian
Channels	1 (mono)
Path	`/audio/YYYY/MM/DD/YYYYMMDDHHMMSS.wav`
Filename time	UTC, derived from the file's open time
Duration	Determined by the active schedule's microphone window

Files open and play directly in any standard audio tool (Audacity, ffmpeg, librosa, scipy.io.wavfile). The 16 kHz sample rate captures audio up to 8 kHz (Nyquist), which is well-suited to vocalisations and ambient sound but excludes high-frequency ultrasonics.

Accelerometer WAV Files

Raw 3-axis accelerometer samples are streamed off the sensor FIFO and written as a 3-channel PCM WAV file. The WAV container is convenient because most tools handle it natively, even though the data is motion rather than audio.

Property	Value
Container	RIFF/WAVE, 44-byte header, PCM
Channels	3 — channel order X, Y, Z (body frame; see Sensor Axis Map)
Bit depth	16-bit signed, little-endian
Sample rate	Configurable per schedule (see table below); written into the WAV header
Range	Configurable per schedule (see table below)
Path	`/accelerometer/YYYY/MM/DD/YYYYMMDDHHMMSS.wav`
Rollover	One file per UTC day. If the firmware detects an accelerometer subsystem error, the current file is closed and a new one is opened in the same daily folder with a fresh timestamp.

Sample-rate options

Configured per schedule slot. The actual rate is written into the WAV header, so any standard tool will read the file at the correct rate.

Sample rate
25 Hz
50 Hz (default)

Higher sample rates are supported by the underlying hardware but are not broken out in the current software revision.

Range options & LSB-to-g conversion

The accelerometer reports signed 12-bit samples that the firmware sign-extends and stores as int16. To convert int16 samples to acceleration in g, divide by the LSB-per-g factor for the configured range:

Full scale	LSB per g	Resolution per LSB
±2 g	1024	≈ 0.98 mg
±4 g (default)	512	≈ 1.95 mg
±8 g	256	≈ 3.91 mg

The configured range is not stored inside the WAV file. To convert raw samples to physical units, you'll need to know which range was active during the recording — this comes from the device's config.csv at the time of the deployment. Cross-referencing by file timestamp against schedule windows works for most cases.

Example Python decoder:

import wave, numpy as np
def load_accel(path, range_g=4):
    w = wave.open(path)
    rate = w.getframerate()
    n = w.getnframes()
    raw = np.frombuffer(w.readframes(n), dtype=np.int16).reshape(-1, 3)
    lsb_per_g = {2:1024, 4:512, 8:256}[range_g]
    g = raw / lsb_per_g
    t = np.arange(len(g)) / rate
    return t, g  # t in seconds, g shape (N, 3) for X/Y/Z

config.csv

Persisted device configuration. Written on every boot and re-read on startup; also re-written by BLE configuration commits. Three sections, each with key-value rows.

System

firmware_version,<git-hash>
system_uid,<MCU unique ID word 0>
engaged,yes|no

Radio

lorawan_activation_type,OTAA|ABP
lorawan_region,US915|EU868|AU915
lorawan_dev_eui,<16-char hex>
lorawan_join_eui,<16-char hex>
lorawan_app_key,<32-char hex>
lora_spreading_factor,SF7|SF8|SF9|SF10|SF11|SF12
lora_bandwidth,125|250|500
lora_coding_rate,4/5|4/6|4/7|4/8
lora_tx_power_dbm,<int>
lora_frequency,<MHz>
lost_mode_enabled,0|1
lost_mode_activation_epoch,<unix epoch>
lost_mode_transmit_interval_min,<int>

Schedule (5 columns, one per slot)

parameter_name,schedule1,schedule2,schedule3,schedule4,schedule5
start_hour,0,6,12,18,0
end_hour,6,12,18,24,0
gps_enabled,1,1,0,1,0
gps_sample_interval_min,30,15,0,30,0
gps_accuracy,7,8,0,7,0
... (continues for accelerometer, microphone, light, environmental,
     particulate, magnetometer, lorawan, lora)

If every parameter in a schedule slot is left at 0, that slot is treated as inactive and the device will not execute it. In the example above, schedule5 is all zeros and is therefore skipped — the active schedule rotation is just slots 1–4.

For reproducibility in publications, snapshot config.csv alongside your data — it captures the exact firmware version and per-schedule sensor settings active during the deployment.

LoRa / LoRaWAN Payload

All over-the-air payloads are nanopb-encoded protobuf. Audio and raw accelerometer samples are not transmitted — only summary statistics — due to LoRa bandwidth constraints.

What's transmitted

Each periodic uplink can carry any subset of the following, depending on which sensors are active in the current schedule:

Up to 5 most recent GPS fixes (lat / lon / altitude / accuracy / fix duration)
Environmental summary (temperature, humidity, gas resistance, pressure, lux)
Particulate summary (PM1, PM2.5, PM10)
Battery percentage and SD-card free space
Step counter
Accelerometer windowed statistics (ODBA / VeDBA mean & max) — live on-device computation is currently TBD; the field exists in the wire format but is not yet populated from real samples
Recent error-flag bitfield
Radio statistics (RSSI / SNR) appended on RX

Downlinks (server → device) carry a small set of operational commands — time sync, GPS-accuracy override, remote engage/disengage, system reset, scheduled deployment drop-off, and transactional schedule updates — plus an epoch field that the firmware uses to nudge the RTC.

LoRaWAN region presets

US915, EU868, and AU915 are LoRaWAN regional specifications — they apply only in LoRaWAN mode. Raw LoRa (P2P) mode uses a user-configurable carrier frequency (lora_frequency in config.csv) and is not tied to these regions.

Region	Activation	SF range
US915	OTAA or ABP	SF7–SF12
AU915		SF7–SF12
EU868		SF7–SF12

Adaptive data rate (P2P mode only)

When operating in raw LoRa P2P mode, the firmware picks a data rate based on payload size. The selection table the firmware uses depends on which sub-GHz band the radio is configured for (US/AU 915 MHz vs EU 868 MHz):

Band	Adaptive DR ladder
915 MHz (US / AU)	≤40 B → DR0; ≤65 B → DR2; ≤110 B → DR3
868 MHz (EU)	≤40 B → DR1; ≤65 B → DR3; ≤140 B → DR4

Wire-format schema

The full .proto definition (top-level message_packet, every nested message, exact field tags and types) is available upon request for users building their own LoRa receivers, ChirpStack integrations, or downstream pipelines. Reach out via the Support page (category: Data) and we'll send you the canonical schema and decoding examples.

GPS Uplink

GPS fixes appear in two places. On the SD card, each accepted fix is logged as a single gps,... row in METADATA.CSV. Over LoRa/LoRaWAN, the most recent fixes (up to 5) are appended to each periodic uplink — each fix carries latitude, longitude, altitude, fix timestamp, and optional accuracy metrics (HDOP and horizontal accuracy in metres). The exact wire-format schema is available upon request via the Support page.

Timestamps & Time Sync

All on-device timestamps are Unix epoch in seconds, UTC.
The RTC is set from the first GPS fix at boot. If GPS times out (10 minutes), the device begins executing its schedule with an unsynced clock and resyncs at its next GPS sampling interval.
LoRaWAN downlinks include an epoch field that the firmware uses to nudge the RTC if drift accumulates.
File and folder names use UTC year/month/day (e.g. /audio/2026/04/15/) regardless of the deployment's local timezone — this avoids day-boundary ambiguity when collars cross meridians.
Files written before the first successful GPS fix carry RTC defaults (typically 2000-01-01 until set). The website parser ignores any row with epoch < 2020-01-01 UTC.

Display vs. stored time

The CollarID dashboard converts UTC timestamps to the local timezone of each fix's GPS coordinates for display, so a deployment in Botswana shows in CAT and a deployment in Chile shows in CLT without you having to set anything. This conversion is purely a presentation layer.

All stored and exported data preserves UTC. That includes METADATA.CSV, file/folder names, WAV file metadata, GeoJSON exports, and Movebank CSV exports. If you join CollarID data with another sensor stream or analysis pipeline, work in UTC and apply your own timezone conversion at presentation time only.

Data Schema

SD Card Layout

METADATA.CSV

Tag: gps

Tag: light

Tag: mag

Tag: battery

Tag: particulate

Tags: COMP_T · COMP_H · RAW_GAS · RAW_P

Tag: status

Audio WAV Files

Accelerometer WAV Files

Sample-rate options

Range options & LSB-to-g conversion

config.csv

System

Radio

Schedule (5 columns, one per slot)

LoRa / LoRaWAN Payload

What's transmitted

LoRaWAN region presets

Adaptive data rate (P2P mode only)

Wire-format schema

GPS Uplink

Timestamps & Time Sync

Display vs. stored time

Tag: `gps`

Tag: `light`

Tag: `mag`

Tag: `battery`

Tag: `particulate`

Tags: `COMP_T` · `COMP_H` · `RAW_GAS` · `RAW_P`

Tag: `status`