Data Schema

Reference for everything CollarID writes to the SD card and transmits over LoRa/LoRaWAN. Use this when parsing files yourself or citing data formats in publications.

SD Card Layout

The SD card uses an exFAT filesystem. The firmware writes the following structure on the card root:

/
├── config.csv              — Schedule + radio configuration
├── lora_state.bin          — LoRaWAN session state (nonces, session buffer ~314 B)
├── METADATA.CSV            — Periodic samples + status messages (GPS, light, mag, env, particulate, battery, status)
├── audio/
│   └── YYYY/MM/DD/
│       └── YYYYMMDDHHMMSS.wav      — 16 kHz, 16-bit, mono PCM
└── accelerometer/
    └── YYYY/MM/DD/
        └── YYYYMMDDHHMMSS.wav      — 3-channel (X/Y/Z) 16-bit PCM, daily rollover

Folders are created on first write. Year/month/day folders use 4-digit and zero-padded 2-digit names respectively (e.g. 2026/04/15).

METADATA.CSV

The primary log file. Every line is a single sample, comma-separated, no header. Format:

<unix_epoch_seconds>,<tag>,<tag-specific payload columns…>

The SD Card page parses this file into per-subsystem CSVs you can download. The tags below match the firmware's snprintf format strings exactly — if you parse the file yourself, treat any unknown tag as forward-compatible noise and skip it.

Tag: gps

One row per accepted GPS fix. Coordinates are stored as fixed-point integers to avoid float imprecision in CSV.

ColumnTypeDecoding / units
latitude_e7int32÷ 1e7 → degrees north (negative = south)
longitude_e7int32÷ 1e7 → degrees east (negative = west)
altitude_mmint32÷ 1000 → metres above sea level (MSL)
h_acc_dmuint32÷ 10 → metres horizontal accuracy (1σ)
fix_typeuint8GNSS fix type: 0=no fix, 2=2D, 3=3D, 4=DR+GNSS
fix_time_msuint32÷ 1000 → seconds spent acquiring this fix

Example: 1777629812,gps,374215600,-1224310500,42100,15,3,8500 → lat 37.4215600°, lon -122.4310500°, alt 42.1 m, h_acc 1.5 m, 3D fix, took 8.5 s.

Tag: light

Ambient light sensor.

ColumnTypeNotes
clearuint32Clear-channel raw count
alsuint32ALS (lux) raw count — convert using the sensor's gain & integration time setting from the datasheet

Tag: mag

3-axis magnetometer (raw counts, not µT). See Sensor Axis Map for the body-frame orientation.

ColumnTypeNotes
xint16Body-frame X axis raw count (1 LSB ≈ 1.5 mGauss / 0.15 µT)
yint16Body-frame Y axis raw count
zint16Body-frame Z axis raw count

Tag: battery

ColumnTypeDecoding / units
voltage_mvuint32÷ 1000 → battery terminal voltage in volts. Sampled every 60 s by the master loop.

Tag: particulate

Particulate-matter (PM) sensor. One summary row per measurement window.

ColumnTypeUnits
pm1uint32µg/m³ (PM1.0)
pm2_5uint32µg/m³ (PM2.5)
pm10uint32µg/m³ (PM10)

Tags: COMP_T · COMP_H · RAW_GAS · RAW_P

Environmental sensor, processed by the BSEC2 calibration library. Each tag is its own row with a single value plus a BSEC2 accuracy rank (0–3, where 3 is fully calibrated).

TagSignalTypeUnitsNotes
COMP_TCompensated temperaturefloat°CSelf-heating compensated
COMP_HCompensated relative humidityfloat%Self-heating compensated
RAW_GASGas sensor resistancefloatΩFor VOC / IAQ post-processing
RAW_PPressurefloatPaDivide by 100 for hPa / mbar

Layout per row: <epoch>,<tag>,<value>,<accuracy>. Sample:

1777629682,RAW_P,94529.398,0
1777629682,RAW_GAS,279475.969,0
1777629682,COMP_T,21.922,0
1777629682,COMP_H,39.226,0

Tag: status

Free-form status messages emitted by the firmware. Two layouts:

The website parser categorizes text messages by prefix: BOOT:, DIAG:, GPS fix, GPS fine, with everything else falling to Other. New prefixes can be added without breaking existing exports.

Audio WAV Files

Recorded by the PDM MEMS microphone via the MCU's hardware PDM-to-PCM filter. Each recording window in the schedule produces one file; if a path collision occurs the firmware appends _<n>.

PropertyValue
ContainerRIFF/WAVE with a 44-byte header (PCM)
Sample rate16 000 Hz
Bit depth16-bit signed, little-endian
Channels1 (mono)
Path/audio/YYYY/MM/DD/YYYYMMDDHHMMSS.wav
Filename timeUTC, derived from the file's open time
DurationDetermined by the active schedule's microphone window

Files open and play directly in any standard audio tool (Audacity, ffmpeg, librosa, scipy.io.wavfile). The 16 kHz sample rate captures audio up to 8 kHz (Nyquist), which is well-suited to vocalisations and ambient sound but excludes high-frequency ultrasonics.

Accelerometer WAV Files

Raw 3-axis accelerometer samples are streamed off the sensor FIFO and written as a 3-channel PCM WAV file. The WAV container is convenient because most tools handle it natively, even though the data is motion rather than audio.

PropertyValue
ContainerRIFF/WAVE, 44-byte header, PCM
Channels3 — channel order X, Y, Z (body frame; see Sensor Axis Map)
Bit depth16-bit signed, little-endian
Sample rateConfigurable per schedule (see table below); written into the WAV header
RangeConfigurable per schedule (see table below)
Path/accelerometer/YYYY/MM/DD/YYYYMMDDHHMMSS.wav
RolloverOne file per UTC day. If the firmware detects an accelerometer subsystem error, the current file is closed and a new one is opened in the same daily folder with a fresh timestamp.

Sample-rate options

Configured per schedule slot. The actual rate is written into the WAV header, so any standard tool will read the file at the correct rate.

Sample rate
25 Hz
50 Hz (default)

Higher sample rates are supported by the underlying hardware but are not broken out in the current software revision.

Range options & LSB-to-g conversion

The accelerometer reports signed 12-bit samples that the firmware sign-extends and stores as int16. To convert int16 samples to acceleration in g, divide by the LSB-per-g factor for the configured range:

Full scaleLSB per gResolution per LSB
±2 g1024≈ 0.98 mg
±4 g (default)512≈ 1.95 mg
±8 g256≈ 3.91 mg
The configured range is not stored inside the WAV file. To convert raw samples to physical units, you'll need to know which range was active during the recording — this comes from the device's config.csv at the time of the deployment. Cross-referencing by file timestamp against schedule windows works for most cases.

Example Python decoder:

import wave, numpy as np
def load_accel(path, range_g=4):
    w = wave.open(path)
    rate = w.getframerate()
    n = w.getnframes()
    raw = np.frombuffer(w.readframes(n), dtype=np.int16).reshape(-1, 3)
    lsb_per_g = {2:1024, 4:512, 8:256}[range_g]
    g = raw / lsb_per_g
    t = np.arange(len(g)) / rate
    return t, g  # t in seconds, g shape (N, 3) for X/Y/Z

config.csv

Persisted device configuration. Written on every boot and re-read on startup; also re-written by BLE configuration commits. Three sections, each with key-value rows.

System

firmware_version,<git-hash>
system_uid,<MCU unique ID word 0>
engaged,yes|no

Radio

lorawan_activation_type,OTAA|ABP
lorawan_region,US915|EU868|AU915
lorawan_dev_eui,<16-char hex>
lorawan_join_eui,<16-char hex>
lorawan_app_key,<32-char hex>
lora_spreading_factor,SF7|SF8|SF9|SF10|SF11|SF12
lora_bandwidth,125|250|500
lora_coding_rate,4/5|4/6|4/7|4/8
lora_tx_power_dbm,<int>
lora_frequency,<MHz>
lost_mode_enabled,0|1
lost_mode_activation_epoch,<unix epoch>
lost_mode_transmit_interval_min,<int>

Schedule (5 columns, one per slot)

parameter_name,schedule1,schedule2,schedule3,schedule4,schedule5
start_hour,0,6,12,18,0
end_hour,6,12,18,24,0
gps_enabled,1,1,0,1,0
gps_sample_interval_min,30,15,0,30,0
gps_accuracy,7,8,0,7,0
... (continues for accelerometer, microphone, light, environmental,
     particulate, magnetometer, lorawan, lora)

If every parameter in a schedule slot is left at 0, that slot is treated as inactive and the device will not execute it. In the example above, schedule5 is all zeros and is therefore skipped — the active schedule rotation is just slots 1–4.

For reproducibility in publications, snapshot config.csv alongside your data — it captures the exact firmware version and per-schedule sensor settings active during the deployment.

LoRa / LoRaWAN Payload

All over-the-air payloads are nanopb-encoded protobuf. Audio and raw accelerometer samples are not transmitted — only summary statistics — due to LoRa bandwidth constraints.

What's transmitted

Each periodic uplink can carry any subset of the following, depending on which sensors are active in the current schedule:

Downlinks (server → device) carry a small set of operational commands — time sync, GPS-accuracy override, remote engage/disengage, system reset, scheduled deployment drop-off, and transactional schedule updates — plus an epoch field that the firmware uses to nudge the RTC.

LoRaWAN region presets

US915, EU868, and AU915 are LoRaWAN regional specifications — they apply only in LoRaWAN mode. Raw LoRa (P2P) mode uses a user-configurable carrier frequency (lora_frequency in config.csv) and is not tied to these regions.

RegionActivationSF range
US915OTAA or ABPSF7–SF12
AU915SF7–SF12
EU868SF7–SF12

Adaptive data rate (P2P mode only)

When operating in raw LoRa P2P mode, the firmware picks a data rate based on payload size. The selection table the firmware uses depends on which sub-GHz band the radio is configured for (US/AU 915 MHz vs EU 868 MHz):

BandAdaptive DR ladder
915 MHz (US / AU)≤40 B → DR0; ≤65 B → DR2; ≤110 B → DR3
868 MHz (EU)≤40 B → DR1; ≤65 B → DR3; ≤140 B → DR4

Wire-format schema

The full .proto definition (top-level message_packet, every nested message, exact field tags and types) is available upon request for users building their own LoRa receivers, ChirpStack integrations, or downstream pipelines. Reach out via the Support page (category: Data) and we'll send you the canonical schema and decoding examples.

Timestamps & Time Sync

Display vs. stored time

The CollarID dashboard converts UTC timestamps to the local timezone of each fix's GPS coordinates for display, so a deployment in Botswana shows in CAT and a deployment in Chile shows in CLT without you having to set anything. This conversion is purely a presentation layer.

All stored and exported data preserves UTC. That includes METADATA.CSV, file/folder names, WAV file metadata, GeoJSON exports, and Movebank CSV exports. If you join CollarID data with another sensor stream or analysis pipeline, work in UTC and apply your own timezone conversion at presentation time only.