GSID Format
Structure
GSID Structure
Global Subject IDs (GSIDs) in this system use a custom format designed to be unique, roughly sortable by time, and human-readable.
The format is a 21-character string: GSID-XXXXXXXXXXXXXXXX
GSID-TTTTTRRRRRRRRRRR
│ │ │
│ │ └─ Random component (11 characters)
│ └───── Timestamp component (5 characters)
└─────────── Static Prefix "GSID-"
- Prefix: Always
GSID-. - Timestamp (
TTTTT): A 5-character Base32 string representing the milliseconds since the Unix epoch (modulo 32^5). This allows GSIDs to be roughly sortable by creation time. - Randomness (
RRRRRRRRRRR): An 11-character Base32 string generated from a cryptographically secure random number. This provides approximately 55 bits of randomness, ensuring a high degree of uniqueness for IDs generated within the same millisecond.
Characteristics
- Length: 21 characters total.
- Character set: A custom Base32 alphabet (
0123456789ABCDEFGHJKMNPQRSTVWXYZ) that excludes potentially confusing characters like 'I', 'L', 'O', and 'U'. - Sortable: Roughly sortable by creation time due to the timestamp component.
- Unique: Offers ~80 bits of uniqueness (25 from timestamp + 55 from random), making collisions extremely unlikely.
Generation Algorithm
GSID Generation Logic (Python)
The generation logic is custom and does not use the standard ULID library.
import logging
import secrets
import time
# Custom Base32 alphabet (Crockford's base32 without confusing characters)
BASE32_ALPHABET = "0123456789ABCDEFGHJKMNPQRSTVWXYZ" # 32 chars, no I, L, O, U
def encode_base32(num: int, length: int) -> str:
"""Encode number as base32 string with custom alphabet"""
if num == 0:
return BASE32_ALPHABET[0] * length
result = []
while num > 0:
result.append(BASE32_ALPHABET[num % 32])
num //= 32
# Pad with leading zeros
while len(result) < length:
result.append(BASE32_ALPHABET[0])
return "".join(reversed(result))
def generate_gsid() -> str:
"""
Generate a single GSID in format: GSID-XXXXXXXXXXXXXXXX (21 characters total)
"""
# Get current timestamp in milliseconds
timestamp_ms = int(time.time() * 1000)
# Encode timestamp as 5 base32 characters
timestamp_part = encode_base32(timestamp_ms % (32**5), length=5)
# Generate 11 characters of randomness
max_random = 32**11
random_int = secrets.randbelow(max_random)
random_part = encode_base32(random_int, length=11)
return f"GSID-{timestamp_part}{random_part}"
Validation
GSID Validation Logic (Python)
Validation must check for the prefix and the 16-character payload.
import re
def validate_gsid(gsid: str) -> bool:
"""Validate the custom GSID format."""
if not gsid.startswith("GSID-"):
return False
payload = gsid[5:]
if len(payload) != 16:
return False
# Check character set
pattern = r'^[0-9A-HJKMNP-TV-Z]{16}$'
return bool(re.match(pattern, payload.upper()))
# Example
# validate_gsid("GSID-4A1B2C3D4E5F6G7H") -> True