# HP-UDP Changelogs

**Changelog from v5.1 → v5.2 (C Implementation Alignment):**

- **§4.5B HKDF Key Derivation:** HKDF-SHA256 now derives **24 bytes** (not 16): bytes 0–15 are the AES-128 session key; bytes 16–23 replace the randomly-generated `iv_base` from the init step. Both sides derive the same `iv_base` deterministically from the shared secret — no additional wire transmission of `iv_base` is needed.

- **§4.5C Nonce Construction:** Redesigned. The 12-byte GCM nonce is now `iv_base(8B) || seq_low32_be(4B)` — the HKDF-derived `iv_base` in bytes 0–7 and the low 32 bits of `SequenceNum` in big-endian in bytes 8–11. The previous 3-field scheme (`SessionID + PacketType + UniqueID`) has been replaced. Uniqueness is preserved: `seq` is monotonically increasing per session; 32-bit overflow cannot occur within the 1 TB file size limit (~810M packets < 232).

- **§4.5E PUSH_ACCEPT payload:** The `Port(2B)` prefix has been removed. The serve daemon uses a single shared UDP socket — no ephemeral data port is negotiated. Encrypted PUSH_ACCEPT now carries only `PubKey(32B)`; unencrypted PUSH_ACCEPT carries no payload.

- **§4.5E PUSH_REQ payload:** Standard PUSH_REQ payload now includes `FileHash(8B)` and `InitialRate(4B)` (matching SESSION_REQ layout) in addition to `FileSize(8B)` and `FileName`. The encrypted variant inserts `PubKey(32B)` after `InitialRate`.

- **§11 Serve Daemon:** Updated from single-lane to **16 concurrent sessions** (`HPUDP_MAX_SESSIONS`). Single-socket model: all sessions share one UDP socket, dispatched by `(src_addr, session_id)` tuple. No ephemeral per-session socket.

- **§11C LIST_RESP:** Each line is tab-separated `filename\tsize\n` (not just a filename). The size is the file's byte count as a decimal integer.

- **§11D PUSH flow:** Rewritten for single-socket architecture. PUSH_ACCEPT no longer carries a data port; the data flows to the same socket and port that received the PUSH_REQ.

- **§12 Resumable Transfers:** Complete redesign. The wire-level RESUME_REQ / RESUME_ACCEPT negotiation (v5.1) has been replaced by a transparent receiver-side checkpoint mechanism. The receiver saves a binary sidecar (`.hpudp-ckpt`) every heartbeat. On a matching SESSION_REQ (same file hash), the receiver restores its receive bitset and `HighestContiguous` from the sidecar; the sender needs no knowledge of the resume. Packet types `0x0B RESUME_REQ` and `0x0C RESUME_ACCEPT` are reserved but unused in the current implementation.

- **Appendix A:** GCM Nonce Size description and Key Derivation output updated to reflect the 24-byte HKDF and `iv_base`-based nonce.

**Changelog from v4.2 → v5.0 (End-to-End Encryption):**

- **§4.5 End-to-End Encryption (new section):** Optional AES-128-GCM per-packet encryption with ephemeral X25519 key exchange. Pure ephemeral keys — no persistent key material, perfect forward secrecy. Encryption is negotiated via the `Encrypted` flag (`0x04`) in the packet header.

- **§3 PacketType:** Added `0x0A SESSION_ACCEPT` — carries receiver's ephemeral public key during encrypted handshake.

- **§3 Flags:** Added `0x04 = Encrypted` flag bit.

- **§4C Handshake:** 0-RTT (unencrypted) / 1-RTT (encrypted). New Step 1.5: `SESSION_ACCEPT` key exchange. Calibration burst is encrypted in encrypted mode.

- **§4.5A-G:** Full encryption specification — key exchange flow for all modes (send/recv, push, pull), HKDF session key derivation, nonce construction (SessionID + PacketType + UniqueID, not transmitted), encrypt-after-FEC data path, extended payload formats, security properties, performance budget.

- **§11A Reason Codes:** Added `0x06 ENCRYPTION_UNSUPPORTED`.

- **Appendix A:** Encrypted MaxPayload (1352 bytes), cipher/nonce/key-exchange constants.

- **Backward compatible:** Unencrypted transfers are unchanged. The `Encrypted` flag defaults to 0.

**Changelog from v4.1 → v4.2 (C Implementation Readiness):**

- **§3 Wire Format / Byte Order:** Explicit statement that *all* multi-byte fields — header, heartbeat payload, SESSION_REQ payload, PUSH_REQ payload — use big-endian (network byte order). Previously only the header stated this; payload byte order was implicit from Go's `binary.BigEndian`.

- **§3 Header / SenderTimestampNs:** Language-neutral wording. References to `time.Now().UnixNano()` replaced with "monotonic nanosecond timestamp" and platform guidance for C (`clock_gettime(CLOCK_MONOTONIC)`) and Go (`time.Now().UnixNano()`).

- **§5A / Receiver Disk I/O:** C implementation note added: `io_uring` submission queue replaces the Go async flush goroutine. Single-threaded `epoll` event loop architecture eliminates the socket ownership race described in Lesson F.

- **§6B / Heartbeat payload byte order:** Explicit network-byte-order requirement added to the Heartbeat payload table preamble. C implementations must `htonl`/`ntohl` every field.

- **§6D / Deficit Accumulator:** Expanded C-specific pacing guidance: `clock_gettime(CLOCK_MONOTONIC)` for elapsed time, `clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME)` for sleeps, optional `SCHED_FIFO` for sub-100µs precision.

- **§8A / Teardown socket ownership:** Architecture-neutral rewrite. Go: stop heartbeat goroutine. C (single-threaded epoll): no action needed — one reader by construction.

- **§10.H.3 / Sliding Window slot count:** Recommended power-of-2 slot count (65,536) for C implementations. Bitmask index wrapping (`idx & 0xFFFF`) replaces modulo, eliminating a division per packet on the hot path.

- **§11C / Filename sanitization:** Language-neutral: manual scan for `/` and `\` to extract base name, replacing Go-specific `filepath.Base()`.

- **§11D / Manifest locking:** Language-neutral: read-write lock (C: `pthread_rwlock_t`, Go: `sync.RWMutex`). Ephemeral socket bind: C uses `bind()` with port 0.

**Changelog from v4.0 → v4.1 (Sender Sliding Window):**

- **§10.H.3 / Sender retransmit buffer:** `sentChunks map[uint64][]byte` replaced by a bounded `SlidingWindow` ring buffer (50,000 slots, ~68 MB peak). Entries are evicted on each `HighestContiguous` advance from incoming heartbeats; the tail pointer never holds confirmed data. The sender blocks new packet sends — but continues processing NACK retransmits — when the window is full, providing natural memory-safe backpressure.

- **§10.K / Backpressure NACK starvation deadlock fix:** The original backpressure implementation used `for sw.IsFull(seqNum) { time.Sleep(1ms) }` — a bare spin that never returned to the NACK-processing step at the top of the outer loop. If the first DATA packet was dropped, `HighestContiguous` stayed at 0 (Advance(0) is a no-op), the window filled at exactly seq 50,000 (~68 MB = ~6% of a 1 GB file), and NACKs in `nackPending` went unserviced — causing receiver inactivity timeout. Fixed by replacing `for IsFull { sleep }` with `if IsFull { sleep; continue }`, so every backpressure iteration still executes NACK retransmits before sleeping.

**Changelog from v3.1 → v4.0 (WAN Reliability Overhaul):**

- **§3 Header:** `SenderTimestampNs` (8 bytes, unix nanoseconds) added at offset `0x18`. `HeaderSize` 24→32 bytes. `MaxPayload` 1376→1368 bytes. Sender timestamp is now in the fixed header (not the payload) for all DATA and PARITY packets.

- **§6 RTT Measurement — same-clock design:** Receiver echoes `pkt.Header.SenderTimestampNs` verbatim as `EchoTimestampNs`. Sender computes `RTT = time.Now().UnixNano() − EchoTimestampNs` using only its own clock, eliminating cross-machine clock-skew error entirely.

- **§6 Frozen-timestamp RTT guard:** `TokenBucket` tracks `lastEchoNs`; RTT is only updated when `echoNs > lastEchoNs`. Stale repeated echoes (sender idle during NACK cooldown) are silently ignored, preventing RTT inflation past the receiver's 5-second inactivity timeout.

- **§8 RTT-aware NACK cooldown:** Teardown retransmit loop gates each sequence to at most one retransmit per RTT × 1.25, eliminating self-reinforcing congestion spirals from redundant retransmits on short-RTT paths.

- **§8 Tail-drop deadlock prevention:** If a teardown heartbeat reports 0 NACKs but `HighestContiguous < totalChunks−1`, the sender proactively injects up to 167 missing tail sequences into the retransmit pipeline, breaking the deadlock where tail drops prevent `HighestReceived` from advancing.

- **§8 Teardown micro-burst prevention:** Retransmits are chunked into batches of 10 with a 2ms sleep between batches, preventing ~232 KB bursts from overwhelming the OS UDP socket buffer and the serve daemon's 256-slot receive channel.

- **§6 CC effective rate:** `rawEffective = NetworkDeliveryRate` only. `StorageFlushRate` removed from the rate formula; disk-flush stalls (always 0 due to pre-allocated ring buffer out-of-order writes) no longer falsely trigger the delivery-collapse guard.

- **§6 Delivery-collapse threshold:** 50%→25%. High-latency paths have ~50% of packets in-flight during warm-up; the old threshold fired prematurely on measurement lag.

- **§9 Progress bar repair state:** After main send loop completes, progress bar shows `Repairing...` instead of a speed figure, indicating tail recovery rather than a hang.

**Changelog from v3.0 → v3.1 (retained):**

- **§4 Calibration:** Burst reduced from 100 to 10 packets (packet train). Receiver measures inter-arrival dispersion and reports bottleneck bandwidth in the first heartbeat via new `DispersionNs` field.

- **§6 RTT Measurement:** DATA packets carry sender timestamps; heartbeats echo them back in new `EchoTimestamp` field. Sender computes RTT for rate-gating.

- **§6 Phased Growth:** Replaced single-phase 1.5× multiplicative increase with two-phase model: Phase 1 (probe, 1.25× per RTT) transitions permanently to Phase 2 (additive, +MaxPayload/RTT per RTT) upon first observation of 1–5% loss.

- **§6 Decrease Formula:** Changed from `E × 1.05` to `E × 0.85`. The old formula sustained congestion by targeting above the capacity that caused loss; the new formula drops below to drain router queues, relying on FEC to bridge the gap.

- **§6 Rate Gating:** Increases are limited to once per RTT. Decreases remain immediate. Prevents runaway probing on high-latency links.

- **§9 Phase 2:** Added `sendmmsg()`/`recvmmsg()` batching, `io_uring`, and throughput analysis explaining the Phase 1 FTP speed gap (~30 MB/s vs 93 MB/s due to per-packet syscall overhead).

**Changelog from v1.0 (retained):**

- Specified SessionID generation and collision handling for 0-RTT.

- Added receiver-side inactivity timeout and sender failure recovery.

- Separated flow control (receiver capacity) from congestion control (network capacity).

- Added receiver-side linger timeout for teardown resilience.

- Clarified MTU hard cap: 1400 total, 1376 max payload.

- Widened SequenceNum and BlockGroup to 64-bit (8 bytes each). Header is now 24 bytes, naturally aligned for 64-bit systems.

- Introduced adaptive FEC ratio based on observed loss.

- Made heartbeat interval rate-proportional.

- Specified FEC tail block handling for partial final groups.