GitHub Actions Workflows Guide
This guide provides an overview of the GitHub Actions workflows used in the IDhub project for continuous integration, deployment, and data operations.
Overview
The project uses a series of workflows to automate testing, deployment, and routine data management tasks. These workflows are defined as YAML files in the .github/workflows directory.
The general philosophy is:
- CI on every push/PR: All code is tested automatically.
- CD for key branches: Pushes to qa and prod trigger deployments.
- Manual triggers for sensitive operations: Data ingestion and direct deployments can be triggered manually by authorized users.
- Scheduled tasks for routine syncs: The REDCap sync runs on a schedule to keep data fresh.
Continuous Integration (CI)
test-and-coverage.yml
This is the primary CI workflow that ensures code quality and correctness.
- Trigger: Runs on any push or pull request to the
main,develop,prod, orqabranches. - Purpose: To run the test suite for each microservice and upload code coverage reports.
Workflow Diagram
graph TD
A[Push or PR] --> B{Branch Check};
B -->|main, dev, etc.| C[Start 'test' Job];
subgraph "Matrix Job: test"
direction LR
D[gsid-service]
E[redcap-pipeline]
F[fragment-validator]
G[table-loader]
end
C --> D & E & F & G;
D --> H[Build, Test, Upload Coverage];
E --> I[Build, Test, Upload Coverage];
F --> J[Build, Test, Upload Coverage];
G --> K[Build, Test, Upload Coverage];
K -- needs --> L[Start 'coverage-summary' Job];
L --> M[Download Artifacts & Post Summary];
Key Steps
- Matrix Strategy: The
testjob runs in parallel for each of the four main services:gsid-service,redcap-pipeline,fragment-validator, andtable-loader. - Build & Test: For each service, it builds a dedicated test container using the
docker-compose.test.ymlfile and runs the tests within that container. - Upload Artifacts: It uploads the generated HTML coverage reports and JUnit test reports as artifacts. This allows developers to download and inspect test results and coverage locally.
- Codecov Upload: Coverage reports in XML format are uploaded to Codecov for tracking and analysis.
- Coverage Summary: A final job runs after all tests are complete, downloads the coverage artifacts, and posts a summary to the GitHub job summary page, making it easy to see the coverage for each service at a glance.
Continuous Deployment (CD)
deploy.yml
This workflow handles deploying the entire application stack to the qa and prod environments.
- Trigger:
- Automatically on pushes to the
qaandprodbranches. - Manually via a
workflow_dispatchevent, allowing a user to choose the target environment.
- Automatically on pushes to the
- Purpose: To securely connect to the target server, set up the environment, and restart the application services using Docker Compose.
Key Steps
- Determine Environment: The job first determines if it's deploying to
qaorprodbased on the branch name or the manual input. - Set up SSH: It uses a secret SSH key to establish a secure connection to the deployment server.
- Create
.envfile: An environment-specific.env.deployfile is created locally using secrets stored in GitHub. This file contains all necessary environment variables for the application. - Deploy to Server:
- The
.env.deployfile is securely copied to/opt/idhub/.envon the server. - An SSH command is executed on the server to:
- Pull the latest code from the correct branch.
- Stop the running services.
- Rebuild the Docker images for the services.
- Restart the services using
docker-compose up -d. - Run health checks to ensure the services started correctly.
- The
- Cleanup: The local SSH key is removed.
Documentation Workflow
docs.yml
This workflow automates the build and deployment of this documentation site.
- Trigger: Runs on pushes to the
mainbranch that modify files in thedocs/directory or themkdocs.ymlfile. - Purpose: To build the MkDocs site and deploy it to GitHub Pages.
Key Steps
- Build: It installs the Python dependencies (including MkDocs and the Material theme) and runs the
mkdocs buildcommand to generate the static HTML site. - Upload Artifact: The generated
site/directory is uploaded as a Pages artifact. - Deploy: A second job,
deploy, uses theactions/deploy-pages@v4action to deploy the artifact to thegithub-pagesenvironment.
Data & Operational Workflows
fragment-ingestion.yml
This is a manually triggered workflow for running the Table Loader service to ingest validated data fragments into the database.
- Trigger:
workflow_dispatchonly. A user must manually start this workflow. - Purpose: To provide a safe and audited way to load data into the
qaorproddatabases. - Inputs:
environment: The target environment (qaorprod).batch_id: The ID of the fragment batch to load.dry_run: Iftrue, runs the loader in preview mode.
Key Steps
- Validate Inputs: Ensures the
batch_idformat is correct. - SSH Tunnel: Establishes a secure SSH tunnel to the database for the
qaorprodenvironment. - Verify Connectivity: Tests the database connection and verifies that the
fragment_resolutionstable exists for audit logging. - Verify S3 Batch: Checks that the specified
batch_idexists in the correct S3 bucket before proceeding. - Run Table Loader: Executes the
table-loader'smain.pyscript with the providedbatch_idanddry_runflag. - Upload Artifacts: The logs and any generated reports from the loader are uploaded as artifacts for review.
redcap-sync.yml
This workflow runs the REDCap Pipeline to sync data from REDCap projects.
- Trigger:
- Scheduled to run daily (
cron: 0 8 * * *). - Manually via a
workflow_dispatchevent.
- Scheduled to run daily (
- Purpose: To keep the IDhub database up-to-date with the latest data from connected REDCap projects.
Key Steps
- SSH Tunnel: Establishes a secure SSH tunnel to the database.
- Run REDCap Pipeline: Executes the
redcap-pipeline'smain.pyscript. It can be run for all enabled projects or for a specific project if provided as a manual input. - Upload Logs: The pipeline logs are uploaded as artifacts for auditing and troubleshooting.