Skip to content

GSID Service Documentation

Overview

The Global Subject ID (GSID) Service is a centralized microservice responsible for managing subject identities across the IDhub platform. It provides GSID generation, local ID resolution, and fuzzy matching capabilities.

Architecture

graph TB
    subgraph "GSID Service"
        API[FastAPI Application]
        AUTH[API Key Authentication]
        ROUTER[API Router]

        subgraph "Core Services"
            GEN[GSID Generator]
            RES[ID Resolver]
            FUZZY[Fuzzy Matcher]
        end

        subgraph "Data Layer"
            REPO[Subject Repository]
            DB[(PostgreSQL)]
        end
    end

    CLIENT[Client Application] -->|HTTPS| API
    API --> AUTH
    AUTH --> ROUTER

    ROUTER -->|/generate| GEN
    ROUTER -->|/resolve| RES
    ROUTER -->|/fuzzy-match| FUZZY

    GEN --> REPO
    RES --> REPO
    FUZZY --> REPO

    REPO --> DB

    style API fill:#9C27B0
    style DB fill:#4CAF50

Technology Stack

  • Framework: FastAPI 0.104+
  • Language: Python 3.11+
  • Database: PostgreSQL 15+ (asyncpg)
  • Authentication: API Key (Header-based)
  • Deployment: Docker, Uvicorn

API Endpoints

Health Check

Endpoint: GET /health

Description: Service health and version information

Authentication: None required

Response:

{
  "status": "healthy",
  "service": "gsid-service",
  "version": "1.0.0",
  "database": "connected"
}

Example:

curl https://api.idhub.ibdgc.org/api/gsid/health

Generate GSID

Endpoint: POST /api/gsid/generate

Description: Generate a new GSID for a subject

Authentication: Required (API Key)

Request Body:

{
  "center_id": 1,
  "local_subject_id": "GAP-001",
  "identifier_type": "consortium_id",
  "metadata": {
    "sex": "F",
    "diagnosis": "CD",
    "age_at_diagnosis": 25
  }
}

Response (201 Created):

{
  "gsid": "GSID-4A1B2C3D4E5F6G7H",
  "subject_id": "550e8400-e29b-41d4-a716-446655440000",
  "local_subject_id": "GAP-001",
  "center_id": 1,
  "identifier_type": "consortium_id",
  "created_at": "2024-01-15T10:00:00Z"
}

Error Responses:

  • 400 Bad Request: Invalid input data
  • 401 Unauthorized: Missing or invalid API key
  • 409 Conflict: Local ID already exists

Example:

curl -X POST https://api.idhub.ibdgc.org/api/gsid/generate \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "center_id": 1,
    "local_subject_id": "GAP-001",
    "identifier_type": "consortium_id"
  }'

Python Example:

import requests

response = requests.post(
    "https://api.idhub.ibdgc.org/api/gsid/generate",
    headers={"X-API-Key": "your-api-key"},
    json={
        "center_id": 1,
        "local_subject_id": "GAP-001",
        "identifier_type": "consortium_id"
    }
)

if response.status_code == 201:
    gsid = response.json()["gsid"]
    print(f"Generated GSID: {gsid}")

Resolve GSID

Endpoint: POST /api/gsid/resolve

Description: Resolve a local subject ID to a GSID

Authentication: Required (API Key)

Request Body:

{
  "center_id": 1,
  "local_subject_id": "GAP-001",
  "identifier_type": "consortium_id"
}

Response (200 OK):

{
  "gsid": "GSID-4A1B2C3D4E5F6G7H",
  "subject_id": "550e8400-e29b-41d4-a716-446655440000",
  "local_subject_id": "GAP-001",
  "center_id": 1,
  "identifier_type": "consortium_id",
  "found": true
}

Response (404 Not Found):

{
  "gsid": null,
  "found": false,
  "message": "No subject found for center_id=1, local_subject_id=GAP-001"
}

Example:

curl -X POST https://api.idhub.ibdgc.org/api/gsid/resolve \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "center_id": 1,
    "local_subject_id": "GAP-001"
  }'

Batch Resolve

Endpoint: POST /api/gsid/batch/resolve

Description: Resolve multiple local IDs in a single request

Authentication: Required (API Key)

Request Body:

{
  "identifiers": [
    {
      "center_id": 1,
      "local_subject_id": "GAP-001",
      "identifier_type": "consortium_id"
    },
    {
      "center_id": 1,
      "local_subject_id": "GAP-002",
      "identifier_type": "consortium_id"
    }
  ]
}

Response (200 OK):

{
  "results": [
    {
      "local_subject_id": "GAP-001",
      "gsid": "GSID-4A1B2C3D4E5F6G7H",
      "found": true
    },
    {
      "local_subject_id": "GAP-002",
      "gsid": "GSID-5B2C3D4E5F6G7H8J",
      "found": true
    }
  ],
  "total": 2,
  "found": 2,
  "not_found": 0
}

Example:

import requests

response = requests.post(
    "https://api.idhub.ibdgc.org/api/gsid/batch/resolve",
    headers={"X-API-Key": "your-api-key"},
    json={
        "identifiers": [
            {"center_id": 1, "local_subject_id": "GAP-001"},
            {"center_id": 1, "local_subject_id": "GAP-002"}
        ]
    }
)

results = response.json()["results"]
for result in results:
    print(f"{result['local_subject_id']} -> {result['gsid']}")

Fuzzy Match

Endpoint: POST /api/gsid/fuzzy-match

Description: Find potential subject matches using fuzzy matching

Authentication: Required (API Key)

Request Body:

{
  "first_name": "John",
  "last_name": "Doe",
  "date_of_birth": "1990-01-15",
  "threshold": 0.85
}

Response (200 OK):

{
  "matches": [
    {
      "gsid": "GSID-4A1B2C3D4E5F6G7H",
      "confidence": 0.95,
      "matched_fields": {
        "first_name": "John",
        "last_name": "Doe",
        "date_of_birth": "1990-01-15"
      }
    },
    {
      "gsid": "GSID-5B2C3D4E5F6G7H8J",
      "confidence": 0.87,
      "matched_fields": {
        "first_name": "Jon",
        "last_name": "Doe",
        "date_of_birth": "1990-01-15"
      }
    }
  ],
  "total_matches": 2
}

Parameters:

  • threshold: Minimum confidence score (0.0-1.0, default: 0.8)

Example:

curl -X POST https://api.idhub.ibdgc.org/api/gsid/fuzzy-match \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "first_name": "John",
    "last_name": "Doe",
    "date_of_birth": "1990-01-15",
    "threshold": 0.85
  }'

Get Subject by GSID

Endpoint: GET /api/gsid/subjects/{gsid}

Description: Retrieve subject details by GSID

Authentication: Required (API Key)

Response (200 OK):

{
  "gsid": "GSID-4A1B2C3D4E5F6G7H",
  "subject_id": "550e8400-e29b-41d4-a716-446655440000",
  "sex": "F",
  "diagnosis": "CD",
  "age_at_diagnosis": 25,
  "local_identifiers": [
    {
      "center_id": 1,
      "local_subject_id": "GAP-001",
      "identifier_type": "consortium_id"
    },
    {
      "center_id": 1,
      "local_subject_id": "MRN-123456",
      "identifier_type": "mrn"
    }
  ],
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-20T14:30:00Z"
}

Response (404 Not Found):

{
  "detail": "Subject not found"
}

Example:

curl https://api.idhub.ibdgc.org/api/gsid/subjects/GSID-4A1B2C3D4E5F6G7H \
  -H "X-API-Key: your-api-key"

GSID Format

The technical specification for the Global Subject ID, including its structure, generation, and validation, has been moved to a dedicated page.

➡️ See the GSID Format Documentation


Configuration

Environment Variables

# Database Configuration
DB_HOST=idhub_db
DB_NAME=idhub
DB_USER=idhub_user
DB_PASSWORD=your_secure_password
DB_PORT=5432

# API Configuration
GSID_API_KEY=your-secure-random-key-here-min-32-chars
API_HOST=0.0.0.0
API_PORT=8000

# Logging
LOG_LEVEL=INFO

# CORS (optional)
CORS_ORIGINS=https://idhub.ibdgc.org,https://qa.idhub.ibdgc.org

# Fuzzy Matching
FUZZY_MATCH_THRESHOLD=0.8

Database Connection

# core/database.py
import asyncpg
from typing import Optional
import os

class Database:
    def __init__(self):
        self.pool: Optional[asyncpg.Pool] = None

    async def connect(self):
        """Create database connection pool"""
        self.pool = await asyncpg.create_pool(
            host=os.getenv("DB_HOST"),
            port=int(os.getenv("DB_PORT", 5432)),
            database=os.getenv("DB_NAME"),
            user=os.getenv("DB_USER"),
            password=os.getenv("DB_PASSWORD"),
            min_size=5,
            max_size=20,
            command_timeout=60
        )

    async def disconnect(self):
        """Close database connection pool"""
        if self.pool:
            await self.pool.close()

db = Database()

Authentication

API Key Authentication

The GSID Service uses header-based API key authentication:

# core/auth.py
from fastapi import Security, HTTPException, status
from fastapi.security import APIKeyHeader
import os

API_KEY_HEADER = APIKeyHeader(name="X-API-Key")

async def verify_api_key(api_key: str = Security(API_KEY_HEADER)):
    """Verify API key"""
    expected_key = os.getenv("GSID_API_KEY")

    if not expected_key:
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail="API key not configured"
        )

    if api_key != expected_key:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid API key"
        )

    return api_key

Usage in Endpoints

from fastapi import Depends
from core.auth import verify_api_key

# Assuming router is defined elsewhere
# from fastapi import APIRouter
# router = APIRouter()

# Placeholder for GenerateRequest if it exists
# class GenerateRequest:
#     pass

# @router.post("/generate")
# async def generate_gsid(
#     request: GenerateRequest,
#     api_key: str = Depends(verify_api_key)
# ):
#     # Endpoint logic
#     pass

Client Authentication

import requests

# Set API key in headers
headers = {
    "X-API-Key": "your-api-key-here",
    "Content-Type": "application/json"
}

response = requests.post(
    "https://api.idhub.ibdgc.org/api/gsid/generate",
    headers=headers,
    json={"center_id": 1, "local_subject_id": "GAP-001"}
)

Error Handling

Error Response Format

{
  "detail": "Error message",
  "error_code": "ERROR_CODE",
  "timestamp": "2024-01-15T10:00:00Z"
}

Common Error Codes

Status Code Error Code Description
400 INVALID_INPUT Invalid request data
401 UNAUTHORIZED Missing or invalid API key
404 NOT_FOUND Resource not found
409 CONFLICT Resource already exists
500 INTERNAL_ERROR Server error

Error Handling Example

import requests

try:
    response = requests.post(
        "https://api.idhub.ibdgc.org/api/gsid/generate",
        headers={"X-API-Key": "your-api-key"},
        json={"center_id": 1, "local_subject_id": "GAP-001"}
    )
    response.raise_for_status()

    gsid = response.json()["gsid"]
    print(f"Generated GSID: {gsid}")

except requests.exceptions.HTTPError as e:
    if e.response.status_code == 409:
        print("Subject already exists")
    elif e.response.status_code == 401:
        print("Invalid API key")
    else:
        print(f"Error: {e.response.json()['detail']}")

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Performance

Benchmarks

Operation Avg Response Time Throughput
Generate GSID 15ms 500 req/s
Resolve GSID 10ms 800 req/s
Batch Resolve (100) 50ms 2000 IDs/s
Fuzzy Match 100ms 100 req/s

Optimization Tips

  1. Use Batch Operations: Resolve multiple IDs in a single request
  2. Connection Pooling: Reuse HTTP connections
  3. Caching: Cache frequently accessed GSIDs
  4. Async Requests: Use async HTTP clients for concurrent requests

Batch Processing Example

import asyncio
import aiohttp

async def batch_resolve_async(identifiers, api_key):
    """Resolve multiple IDs asynchronously"""
    async with aiohttp.ClientSession() as session:
        async with session.post(
            "https://api.idhub.ibdgc.org/api/gsid/batch/resolve",
            headers={"X-API-Key": api_key},
            json={"identifiers": identifiers}
        ) as response:
            return await response.json()

# Usage
identifiers = [
    {"center_id": 1, "local_subject_id": f"GAP-{i:03d}"}
    for i in range(1, 101)
]

results = asyncio.run(batch_resolve_async(identifiers, "your-api-key"))
print(f"Resolved {results['found']} out of {results['total']} IDs")

Monitoring

Health Checks

# Basic health check
curl https://api.idhub.ibdgc.org/api/gsid/health

# Database connectivity check
curl https://api.idhub.ibdgc.org/api/gsid/health/db

Metrics

The service exposes the following metrics:

  • Request count: Total API requests
  • Request duration: Response time percentiles
  • Error rate: Failed requests per endpoint
  • Database connections: Active/idle connections
  • GSID generation rate: GSIDs created per minute

Logging

# Structured logging format
{
  "timestamp": "2024-01-15T10:00:00Z",
  "level": "INFO",
  "service": "gsid-service",
  "endpoint": "/api/gsid/generate",
  "method": "POST",
  "status_code": 201,
  "duration_ms": 15,
  "gsid": "GSID-4A1B2C3D4E5F6G7H",
  "center_id": 1,
  "local_subject_id": "GAP-001"
}

Development

Local Setup

# Clone repository
git clone https://github.com/ibdgc/idhub.git
cd idhub/gsid-service

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install -r requirements-test.txt

# Set environment variables
cp .env.example .env
# Edit .env with your configuration

# Run database migrations
python -m alembic upgrade head

# Start development server
uvicorn main:app --reload --port 8000

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=. --cov-report=html

# Run specific test file
pytest tests/test_api.py -v

# Run integration tests only
pytest -m integration

Docker Development

# Build image
docker build -t gsid-service:dev .

# Run container
docker run -p 8000:8000 \
  -e DB_HOST=host.docker.internal \
  -e DB_NAME=idhub \
  -e GSID_API_KEY=test-key \
  gsid-service:dev

# Run with docker-compose
docker-compose up gsid-service

Deployment

Production Deployment

# Build production image
docker build -t gsid-service:latest .

# Run with production settings
docker run -d \
  --name gsid-service \
  -p 8000:8000 \
  -e DB_HOST=prod-db.example.com \
  -e DB_NAME=idhub \
  -e DB_USER=idhub_user \
  -e DB_PASSWORD=${DB_PASSWORD} \
  -e GSID_API_KEY=${GSID_API_KEY} \
  -e LOG_LEVEL=INFO \
  --restart unless-stopped \
  gsid-service:latest

Health Check Configuration

# docker-compose.yml
services:
  gsid-service:
    image: gsid-service:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Troubleshooting

Common Issues

1. Database Connection Errors

Symptom: asyncpg.exceptions.ConnectionDoesNotExistError

Solution:

# Check database connectivity
psql -h $DB_HOST -U $DB_USER -d $DB_NAME

# Verify environment variables
echo $DB_HOST
echo $DB_PORT
echo $DB_NAME

# Check connection pool settings
# Increase max_size if needed

2. API Key Authentication Failures

Symptom: 401 Unauthorized

Solution:

# Verify API key is set
echo $GSID_API_KEY

# Check header format
curl -H "X-API-Key: your-key" https://api.idhub.ibdgc.org/api/gsid/health

# Ensure no trailing whitespace in .env file

3. Slow Response Times

Symptom: Requests taking >1 second

Solution:

-- Check database indexes
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE tablename IN ('subjects', 'local_subject_ids');

-- Analyze query performance
EXPLAIN ANALYZE
SELECT * FROM local_subject_ids
WHERE center_id = 1 AND local_subject_id = 'GAP-001';

-- Rebuild indexes if needed
REINDEX TABLE local_subject_ids;

Support

For issues or questions:

  • GitHub Issues: https://github.com/ibdgc/idhub/issues
  • Email: support@ibdgc.org