GSID Service Documentation
Overview
The Global Subject ID (GSID) Service is a centralized microservice responsible for managing subject identities across the IDhub platform. It provides GSID generation, local ID resolution, and fuzzy matching capabilities.
Architecture
graph TB
subgraph "GSID Service"
API[FastAPI Application]
AUTH[API Key Authentication]
ROUTER[API Router]
subgraph "Core Services"
GEN[GSID Generator]
RES[ID Resolver]
FUZZY[Fuzzy Matcher]
end
subgraph "Data Layer"
REPO[Subject Repository]
DB[(PostgreSQL)]
end
end
CLIENT[Client Application] -->|HTTPS| API
API --> AUTH
AUTH --> ROUTER
ROUTER -->|/generate| GEN
ROUTER -->|/resolve| RES
ROUTER -->|/fuzzy-match| FUZZY
GEN --> REPO
RES --> REPO
FUZZY --> REPO
REPO --> DB
style API fill:#9C27B0
style DB fill:#4CAF50
Technology Stack
- Framework: FastAPI 0.104+
- Language: Python 3.11+
- Database: PostgreSQL 15+ (asyncpg)
- Authentication: API Key (Header-based)
- Deployment: Docker, Uvicorn
API Endpoints
Health Check
Endpoint: GET /health
Description: Service health and version information
Authentication: None required
Response:
Example:
Generate GSID
Endpoint: POST /api/gsid/generate
Description: Generate a new GSID for a subject
Authentication: Required (API Key)
Request Body:
{
"center_id": 1,
"local_subject_id": "GAP-001",
"identifier_type": "consortium_id",
"metadata": {
"sex": "F",
"diagnosis": "CD",
"age_at_diagnosis": 25
}
}
Response (201 Created):
{
"gsid": "GSID-4A1B2C3D4E5F6G7H",
"subject_id": "550e8400-e29b-41d4-a716-446655440000",
"local_subject_id": "GAP-001",
"center_id": 1,
"identifier_type": "consortium_id",
"created_at": "2024-01-15T10:00:00Z"
}
Error Responses:
400 Bad Request: Invalid input data401 Unauthorized: Missing or invalid API key409 Conflict: Local ID already exists
Example:
curl -X POST https://api.idhub.ibdgc.org/api/gsid/generate \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"center_id": 1,
"local_subject_id": "GAP-001",
"identifier_type": "consortium_id"
}'
Python Example:
import requests
response = requests.post(
"https://api.idhub.ibdgc.org/api/gsid/generate",
headers={"X-API-Key": "your-api-key"},
json={
"center_id": 1,
"local_subject_id": "GAP-001",
"identifier_type": "consortium_id"
}
)
if response.status_code == 201:
gsid = response.json()["gsid"]
print(f"Generated GSID: {gsid}")
Resolve GSID
Endpoint: POST /api/gsid/resolve
Description: Resolve a local subject ID to a GSID
Authentication: Required (API Key)
Request Body:
Response (200 OK):
{
"gsid": "GSID-4A1B2C3D4E5F6G7H",
"subject_id": "550e8400-e29b-41d4-a716-446655440000",
"local_subject_id": "GAP-001",
"center_id": 1,
"identifier_type": "consortium_id",
"found": true
}
Response (404 Not Found):
{
"gsid": null,
"found": false,
"message": "No subject found for center_id=1, local_subject_id=GAP-001"
}
Example:
curl -X POST https://api.idhub.ibdgc.org/api/gsid/resolve \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"center_id": 1,
"local_subject_id": "GAP-001"
}'
Batch Resolve
Endpoint: POST /api/gsid/batch/resolve
Description: Resolve multiple local IDs in a single request
Authentication: Required (API Key)
Request Body:
{
"identifiers": [
{
"center_id": 1,
"local_subject_id": "GAP-001",
"identifier_type": "consortium_id"
},
{
"center_id": 1,
"local_subject_id": "GAP-002",
"identifier_type": "consortium_id"
}
]
}
Response (200 OK):
{
"results": [
{
"local_subject_id": "GAP-001",
"gsid": "GSID-4A1B2C3D4E5F6G7H",
"found": true
},
{
"local_subject_id": "GAP-002",
"gsid": "GSID-5B2C3D4E5F6G7H8J",
"found": true
}
],
"total": 2,
"found": 2,
"not_found": 0
}
Example:
import requests
response = requests.post(
"https://api.idhub.ibdgc.org/api/gsid/batch/resolve",
headers={"X-API-Key": "your-api-key"},
json={
"identifiers": [
{"center_id": 1, "local_subject_id": "GAP-001"},
{"center_id": 1, "local_subject_id": "GAP-002"}
]
}
)
results = response.json()["results"]
for result in results:
print(f"{result['local_subject_id']} -> {result['gsid']}")
Fuzzy Match
Endpoint: POST /api/gsid/fuzzy-match
Description: Find potential subject matches using fuzzy matching
Authentication: Required (API Key)
Request Body:
Response (200 OK):
{
"matches": [
{
"gsid": "GSID-4A1B2C3D4E5F6G7H",
"confidence": 0.95,
"matched_fields": {
"first_name": "John",
"last_name": "Doe",
"date_of_birth": "1990-01-15"
}
},
{
"gsid": "GSID-5B2C3D4E5F6G7H8J",
"confidence": 0.87,
"matched_fields": {
"first_name": "Jon",
"last_name": "Doe",
"date_of_birth": "1990-01-15"
}
}
],
"total_matches": 2
}
Parameters:
threshold: Minimum confidence score (0.0-1.0, default: 0.8)
Example:
curl -X POST https://api.idhub.ibdgc.org/api/gsid/fuzzy-match \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"first_name": "John",
"last_name": "Doe",
"date_of_birth": "1990-01-15",
"threshold": 0.85
}'
Get Subject by GSID
Endpoint: GET /api/gsid/subjects/{gsid}
Description: Retrieve subject details by GSID
Authentication: Required (API Key)
Response (200 OK):
{
"gsid": "GSID-4A1B2C3D4E5F6G7H",
"subject_id": "550e8400-e29b-41d4-a716-446655440000",
"sex": "F",
"diagnosis": "CD",
"age_at_diagnosis": 25,
"local_identifiers": [
{
"center_id": 1,
"local_subject_id": "GAP-001",
"identifier_type": "consortium_id"
},
{
"center_id": 1,
"local_subject_id": "MRN-123456",
"identifier_type": "mrn"
}
],
"created_at": "2024-01-15T10:00:00Z",
"updated_at": "2024-01-20T14:30:00Z"
}
Response (404 Not Found):
Example:
curl https://api.idhub.ibdgc.org/api/gsid/subjects/GSID-4A1B2C3D4E5F6G7H \
-H "X-API-Key: your-api-key"
GSID Format
The technical specification for the Global Subject ID, including its structure, generation, and validation, has been moved to a dedicated page.
➡️ See the GSID Format Documentation
Configuration
Environment Variables
# Database Configuration
DB_HOST=idhub_db
DB_NAME=idhub
DB_USER=idhub_user
DB_PASSWORD=your_secure_password
DB_PORT=5432
# API Configuration
GSID_API_KEY=your-secure-random-key-here-min-32-chars
API_HOST=0.0.0.0
API_PORT=8000
# Logging
LOG_LEVEL=INFO
# CORS (optional)
CORS_ORIGINS=https://idhub.ibdgc.org,https://qa.idhub.ibdgc.org
# Fuzzy Matching
FUZZY_MATCH_THRESHOLD=0.8
Database Connection
# core/database.py
import asyncpg
from typing import Optional
import os
class Database:
def __init__(self):
self.pool: Optional[asyncpg.Pool] = None
async def connect(self):
"""Create database connection pool"""
self.pool = await asyncpg.create_pool(
host=os.getenv("DB_HOST"),
port=int(os.getenv("DB_PORT", 5432)),
database=os.getenv("DB_NAME"),
user=os.getenv("DB_USER"),
password=os.getenv("DB_PASSWORD"),
min_size=5,
max_size=20,
command_timeout=60
)
async def disconnect(self):
"""Close database connection pool"""
if self.pool:
await self.pool.close()
db = Database()
Authentication
API Key Authentication
The GSID Service uses header-based API key authentication:
# core/auth.py
from fastapi import Security, HTTPException, status
from fastapi.security import APIKeyHeader
import os
API_KEY_HEADER = APIKeyHeader(name="X-API-Key")
async def verify_api_key(api_key: str = Security(API_KEY_HEADER)):
"""Verify API key"""
expected_key = os.getenv("GSID_API_KEY")
if not expected_key:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="API key not configured"
)
if api_key != expected_key:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid API key"
)
return api_key
Usage in Endpoints
from fastapi import Depends
from core.auth import verify_api_key
# Assuming router is defined elsewhere
# from fastapi import APIRouter
# router = APIRouter()
# Placeholder for GenerateRequest if it exists
# class GenerateRequest:
# pass
# @router.post("/generate")
# async def generate_gsid(
# request: GenerateRequest,
# api_key: str = Depends(verify_api_key)
# ):
# # Endpoint logic
# pass
Client Authentication
import requests
# Set API key in headers
headers = {
"X-API-Key": "your-api-key-here",
"Content-Type": "application/json"
}
response = requests.post(
"https://api.idhub.ibdgc.org/api/gsid/generate",
headers=headers,
json={"center_id": 1, "local_subject_id": "GAP-001"}
)
Error Handling
Error Response Format
Common Error Codes
| Status Code | Error Code | Description |
|---|---|---|
| 400 | INVALID_INPUT |
Invalid request data |
| 401 | UNAUTHORIZED |
Missing or invalid API key |
| 404 | NOT_FOUND |
Resource not found |
| 409 | CONFLICT |
Resource already exists |
| 500 | INTERNAL_ERROR |
Server error |
Error Handling Example
import requests
try:
response = requests.post(
"https://api.idhub.ibdgc.org/api/gsid/generate",
headers={"X-API-Key": "your-api-key"},
json={"center_id": 1, "local_subject_id": "GAP-001"}
)
response.raise_for_status()
gsid = response.json()["gsid"]
print(f"Generated GSID: {gsid}")
except requests.exceptions.HTTPError as e:
if e.response.status_code == 409:
print("Subject already exists")
elif e.response.status_code == 401:
print("Invalid API key")
else:
print(f"Error: {e.response.json()['detail']}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
Performance
Benchmarks
| Operation | Avg Response Time | Throughput |
|---|---|---|
| Generate GSID | 15ms | 500 req/s |
| Resolve GSID | 10ms | 800 req/s |
| Batch Resolve (100) | 50ms | 2000 IDs/s |
| Fuzzy Match | 100ms | 100 req/s |
Optimization Tips
- Use Batch Operations: Resolve multiple IDs in a single request
- Connection Pooling: Reuse HTTP connections
- Caching: Cache frequently accessed GSIDs
- Async Requests: Use async HTTP clients for concurrent requests
Batch Processing Example
import asyncio
import aiohttp
async def batch_resolve_async(identifiers, api_key):
"""Resolve multiple IDs asynchronously"""
async with aiohttp.ClientSession() as session:
async with session.post(
"https://api.idhub.ibdgc.org/api/gsid/batch/resolve",
headers={"X-API-Key": api_key},
json={"identifiers": identifiers}
) as response:
return await response.json()
# Usage
identifiers = [
{"center_id": 1, "local_subject_id": f"GAP-{i:03d}"}
for i in range(1, 101)
]
results = asyncio.run(batch_resolve_async(identifiers, "your-api-key"))
print(f"Resolved {results['found']} out of {results['total']} IDs")
Monitoring
Health Checks
# Basic health check
curl https://api.idhub.ibdgc.org/api/gsid/health
# Database connectivity check
curl https://api.idhub.ibdgc.org/api/gsid/health/db
Metrics
The service exposes the following metrics:
- Request count: Total API requests
- Request duration: Response time percentiles
- Error rate: Failed requests per endpoint
- Database connections: Active/idle connections
- GSID generation rate: GSIDs created per minute
Logging
# Structured logging format
{
"timestamp": "2024-01-15T10:00:00Z",
"level": "INFO",
"service": "gsid-service",
"endpoint": "/api/gsid/generate",
"method": "POST",
"status_code": 201,
"duration_ms": 15,
"gsid": "GSID-4A1B2C3D4E5F6G7H",
"center_id": 1,
"local_subject_id": "GAP-001"
}
Development
Local Setup
# Clone repository
git clone https://github.com/ibdgc/idhub.git
cd idhub/gsid-service
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-test.txt
# Set environment variables
cp .env.example .env
# Edit .env with your configuration
# Run database migrations
python -m alembic upgrade head
# Start development server
uvicorn main:app --reload --port 8000
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=. --cov-report=html
# Run specific test file
pytest tests/test_api.py -v
# Run integration tests only
pytest -m integration
Docker Development
# Build image
docker build -t gsid-service:dev .
# Run container
docker run -p 8000:8000 \
-e DB_HOST=host.docker.internal \
-e DB_NAME=idhub \
-e GSID_API_KEY=test-key \
gsid-service:dev
# Run with docker-compose
docker-compose up gsid-service
Deployment
Production Deployment
# Build production image
docker build -t gsid-service:latest .
# Run with production settings
docker run -d \
--name gsid-service \
-p 8000:8000 \
-e DB_HOST=prod-db.example.com \
-e DB_NAME=idhub \
-e DB_USER=idhub_user \
-e DB_PASSWORD=${DB_PASSWORD} \
-e GSID_API_KEY=${GSID_API_KEY} \
-e LOG_LEVEL=INFO \
--restart unless-stopped \
gsid-service:latest
Health Check Configuration
# docker-compose.yml
services:
gsid-service:
image: gsid-service:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
Troubleshooting
Common Issues
1. Database Connection Errors
Symptom: asyncpg.exceptions.ConnectionDoesNotExistError
Solution:
# Check database connectivity
psql -h $DB_HOST -U $DB_USER -d $DB_NAME
# Verify environment variables
echo $DB_HOST
echo $DB_PORT
echo $DB_NAME
# Check connection pool settings
# Increase max_size if needed
2. API Key Authentication Failures
Symptom: 401 Unauthorized
Solution:
# Verify API key is set
echo $GSID_API_KEY
# Check header format
curl -H "X-API-Key: your-key" https://api.idhub.ibdgc.org/api/gsid/health
# Ensure no trailing whitespace in .env file
3. Slow Response Times
Symptom: Requests taking >1 second
Solution:
-- Check database indexes
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE tablename IN ('subjects', 'local_subject_ids');
-- Analyze query performance
EXPLAIN ANALYZE
SELECT * FROM local_subject_ids
WHERE center_id = 1 AND local_subject_id = 'GAP-001';
-- Rebuild indexes if needed
REINDEX TABLE local_subject_ids;
Related Documentation
Support
For issues or questions:
- GitHub Issues: https://github.com/ibdgc/idhub/issues
- Email: support@ibdgc.org