Understanding Audit & Resolution Tables

To ensure data quality, transparency, and traceability, IDhub uses a set of "audit" and "resolution" tables. As a data curator, you can use these tables (and the helpful views built on top of them) to troubleshoot ingestion issues, track the history of a record, and manage data conflicts.

This guide provides a detailed overview of these tables and how to use them.

Key Audit & Resolution Tables

These tables store the raw logs and data related to the ingestion and identity resolution processes. You can query them directly in NocoDB for in-depth analysis.

Table: data_change_audit

What it is: A complete, unchangeable log of every single INSERT and UPDATE that happens in the database via the ingestion workflows.
Why it's useful: This is your go-to table for answering "Who changed this record, when, and what did they change?". It provides a full history for any piece of data.
Key Columns:
table_name: The database table that was modified (e.g., lcl, subjects).
record_key: The unique "natural key" of the record that was changed (e.g., {"sample_id": "SAMPLE-001"}).
changes: A JSON object showing the exact fields that were changed, including their old and new values. For new records, this shows all the inserted values.
changed_by: The system or user that initiated the change (e.g., table_loader, manual_update).
changed_at: The timestamp of when the change occurred.
batch_id: Links the change back to a specific validation batch from the Fragment Validator.

Table: identity_resolutions

What it is: A log of every attempt to resolve a subject's local IDs to a Global Subject ID (GSID).
Why it's useful: If a subject is not being created correctly or is being linked to the wrong GSID, this table provides the full story of the resolution decision made by the GSID service.
Key Columns:
local_subject_id: The incoming ID that was being resolved.
identifier_type: The type of the incoming ID (e.g., consortium_id).
input_center_id: The center associated with the incoming record.
matched_gsid: The final GSID that the local ID was resolved to.
action: The decision made (e.g., create_new, link_existing, conflict_resolved).
match_strategy: The logic used to make the decision (e.g., exact_match, center_agnostic_match).
requires_review: A flag (true/false) indicating if this resolution created a conflict that a curator needs to address.
review_reason: An explanation of why a review is required (e.g., multiple_gsid_conflict).

Table: conflict_resolutions

What it is: A queue of data conflicts that require manual review and action from a data curator. These are typically generated by the Fragment Validator when it detects inconsistencies.
Why it's useful: This is a primary work queue for data curators. By working through the items in this table, you actively clean and maintain the integrity of the database.
Key Columns:
conflict_type: The type of conflict (e.g., center_mismatch, multi_gsid).
local_subject_id: The ID that is causing the conflict.
existing_gsid / incoming_gsid: Shows the conflicting GSIDs.
existing_center_id / incoming_center_id: Shows the conflicting center IDs.
resolution_action: Your action goes here! You or an administrator can update this field in NocoDB to tell the system how to resolve the conflict (e.g., keep_existing, use_incoming, merge).
resolved: A flag that is set to true once the system has successfully applied your resolution action.

Table: fragment_resolutions

What it is: A log of every file (or "fragment batch") processed by the Fragment Validator and Table Loader.
Why it's useful: It gives you a high-level summary of a file ingestion. You can see if a file succeeded, failed, or was only partially loaded, and how many rows were affected.
Key Columns:
batch_id: The unique ID for the file processing run.
fragment_key: The path to the file in the S3 staging area.
load_status: The final status of the ingestion (success, partial, failed).
rows_attempted, rows_loaded, rows_failed: Counts for the records in the file.
error_message: If the load failed, this contains the reason why.

Helpful Views for Data Curation

Views are essentially saved, pre-built queries that present the information from the audit tables in a more focused and user-friendly way.

View: v_subjects_requiring_review

What it shows: A filtered list of all subjects who have been automatically flagged for review by the system or have withdrawn consent.
When to use it: Check this view regularly to find subjects whose identity or status is uncertain. You can then investigate using the other audit tables and add notes to the subjects.review_notes field.

Views: v_multi_gsid_conflicts & v_center_conflicts

What it shows: These views are filtered lists from the identity_resolutions table, showing only the specific identity conflicts that need attention.
v_multi_gsid_conflicts: Shows cases where a single local ID is associated with more than one GSID.
v_center_conflicts: Shows cases where a single local ID is being submitted from a different center than the one it's already associated with.
When to use it: Use these views as a starting point to investigate and resolve identity conflicts in the conflict_resolutions table.

View: v_recent_data_changes

What it shows: A simple, reverse-chronological list of the 1,000 most recent changes made to any table in the database.
When to use it: A great way to get a quick, high-level overview of recent activity in the database, especially after a large data load.

View: v_subjects_by_source

What it shows: A summary of how many subjects have been created by each data source (e.g., redcap_pipeline, fragment_validator).
When to use it: Useful for tracking the volume of data coming from different sources and for auditing purposes.

View: v_resolution_summary_by_center

What it shows: A statistical summary of identity resolution decisions, grouped by center.
When to use it: Helps identify if a particular center is having a high rate of identity conflicts or other resolution issues, which may point to a systematic problem in their data entry.