Skip to content

GSID Format

Structure

GSID Structure

Global Subject IDs (GSIDs) in this system use a custom format designed to be unique, roughly sortable by time, and human-readable.

The format is a 21-character string: GSID-XXXXXXXXXXXXXXXX

GSID-TTTTTRRRRRRRRRRR
│     │   │
│     │   └─ Random component (11 characters)
│     └───── Timestamp component (5 characters)
└─────────── Static Prefix "GSID-"
  • Prefix: Always GSID-.
  • Timestamp (TTTTT): A 5-character Base32 string representing the milliseconds since the Unix epoch (modulo 32^5). This allows GSIDs to be roughly sortable by creation time.
  • Randomness (RRRRRRRRRRR): An 11-character Base32 string generated from a cryptographically secure random number. This provides approximately 55 bits of randomness, ensuring a high degree of uniqueness for IDs generated within the same millisecond.

Characteristics

  • Length: 21 characters total.
  • Character set: A custom Base32 alphabet (0123456789ABCDEFGHJKMNPQRSTVWXYZ) that excludes potentially confusing characters like 'I', 'L', 'O', and 'U'.
  • Sortable: Roughly sortable by creation time due to the timestamp component.
  • Unique: Offers ~80 bits of uniqueness (25 from timestamp + 55 from random), making collisions extremely unlikely.

Generation Algorithm

GSID Generation Logic (Python)

The generation logic is custom and does not use the standard ULID library.

import logging
import secrets
import time

# Custom Base32 alphabet (Crockford's base32 without confusing characters)
BASE32_ALPHABET = "0123456789ABCDEFGHJKMNPQRSTVWXYZ"  # 32 chars, no I, L, O, U

def encode_base32(num: int, length: int) -> str:
    """Encode number as base32 string with custom alphabet"""
    if num == 0:
        return BASE32_ALPHABET[0] * length

    result = []
    while num > 0:
        result.append(BASE32_ALPHABET[num % 32])
        num //= 32

    # Pad with leading zeros
    while len(result) < length:
        result.append(BASE32_ALPHABET[0])

    return "".join(reversed(result))


def generate_gsid() -> str:
    """
    Generate a single GSID in format: GSID-XXXXXXXXXXXXXXXX (21 characters total)
    """
    # Get current timestamp in milliseconds
    timestamp_ms = int(time.time() * 1000)

    # Encode timestamp as 5 base32 characters
    timestamp_part = encode_base32(timestamp_ms % (32**5), length=5)

    # Generate 11 characters of randomness
    max_random = 32**11
    random_int = secrets.randbelow(max_random)
    random_part = encode_base32(random_int, length=11)

    return f"GSID-{timestamp_part}{random_part}"

Validation

GSID Validation Logic (Python)

Validation must check for the prefix and the 16-character payload.

import re

def validate_gsid(gsid: str) -> bool:
    """Validate the custom GSID format."""
    if not gsid.startswith("GSID-"):
        return False

    payload = gsid[5:]
    if len(payload) != 16:
        return False

    # Check character set
    pattern = r'^[0-9A-HJKMNP-TV-Z]{16}$'
    return bool(re.match(pattern, payload.upper()))

# Example
# validate_gsid("GSID-4A1B2C3D4E5F6G7H") -> True