Understanding Audit & Resolution Tables
To ensure data quality, transparency, and traceability, IDhub uses a set of powerful "audit" and "resolution" tables. As a data curator, you can use these tables (and the helpful views built on top of them) to troubleshoot ingestion issues, track the history of a record, and manage data conflicts.
This guide provides a detailed overview of these tables and how to use them.
Key Audit & Resolution Tables
These tables store the raw logs and data related to the ingestion and identity resolution processes. You can query them directly in NocoDB for in-depth analysis.
data_change_audit
- What it is: A complete, unchangeable log of every single
INSERTandUPDATEthat happens in the database. - Why it's useful: This is your go-to table for answering "Who changed this record, when, and what did they change?". It provides a full history for any piece of data.
- Key Columns:
table_name: The database table that was modified (e.g.,lcl,subjects).record_key: The unique "natural key" of the record that was changed (e.g.,{"sample_id": "SAMPLE-001"}).changes: A JSON object showing the exact fields that were changed, including theiroldandnewvalues. For new records, this shows all the inserted values.changed_by: The system or user that initiated the change (e.g.,table_loader,manual_update).changed_at: The timestamp of when the change occurred.batch_id: Links the change back to a specific validation batch from the Fragment Validator.
identity_resolutions
- What it is: A log of every attempt to resolve a subject's local IDs to a Global Subject ID (GSID).
- Why it's useful: If a subject is not being created correctly or is being linked to the wrong GSID, this table provides the full story of the resolution decision made by the GSID service.
- Key Columns:
local_subject_id: The incoming ID that was being resolved.identifier_type: The type of the incoming ID (e.g.,consortium_id).input_center_id: The center associated with the incoming record.matched_gsid: The final GSID that the local ID was resolved to.action: The decision made (e.g.,create_new,link_existing,conflict_resolved).match_strategy: The logic used to make the decision (e.g.,exact_match,center_agnostic_match).requires_review: A flag (true/false) indicating if this resolution created a conflict that a curator needs to address.review_reason: An explanation of why a review is required (e.g.,multiple_gsid_conflict).
conflict_resolutions
- What it is: A queue of data conflicts that require manual review and action from a data curator. These are typically generated by the Fragment Validator when it detects inconsistencies.
- Why it's useful: This is a primary work queue for data curators. By working through the items in this table, you actively clean and maintain the integrity of the database.
- Key Columns:
conflict_type: The type of conflict (e.g.,center_mismatch,multi_gsid).local_subject_id: The ID that is causing the conflict.existing_gsid/incoming_gsid: Shows the conflicting GSIDs.existing_center_id/incoming_center_id: Shows the conflicting center IDs.resolution_action: Your action goes here! You or an administrator can update this field in NocoDB to tell the system how to resolve the conflict (e.g.,keep_existing,use_incoming,merge).resolved: A flag that is set totrueonce the system has successfully applied your resolution action.
fragment_resolutions
- What it is: A log of every file (or "fragment batch") processed by the Fragment Validator and Table Loader.
- Why it's useful: It gives you a high-level summary of a file ingestion. You can see if a file succeeded, failed, or was only partially loaded, and how many rows were affected.
- Key Columns:
batch_id: The unique ID for the file processing run.fragment_key: The path to the file in the S3 staging area.load_status: The final status of the ingestion (success,partial,failed).rows_attempted,rows_loaded,rows_failed: Counts for the records in the file.error_message: If the load failed, this contains the reason why.
Helpful Views for Data Curation
Views are essentially saved, pre-built queries that present the information from the audit tables in a more focused and user-friendly way. As a curator, you will likely interact with these views more often than the raw tables.
v_subjects_requiring_review
- What it shows: A filtered list of all subjects who have been automatically flagged for review by the system or have withdrawn consent.
- When to use it: Check this view regularly to find subjects whose identity or status is uncertain. You can then investigate using the other audit tables and add notes to the
subjects.review_notesfield.
v_multi_gsid_conflicts & v_center_conflicts
- What it shows: These views are filtered lists from the
identity_resolutionstable, showing only the specific identity conflicts that need attention. v_multi_gsid_conflicts: Shows cases where a single local ID is associated with more than one GSID.v_center_conflicts: Shows cases where a single local ID is being submitted from a different center than the one it's already associated with.- When to use it: Use these views as a starting point to investigate and resolve identity conflicts in the
conflict_resolutionstable.
v_recent_data_changes
- What it shows: A simple, reverse-chronological list of the 1,000 most recent changes made to any table in the database.
- When to use it: A great way to get a quick, high-level overview of recent activity in the database, especially after a large data load.
v_subjects_by_source
- What it shows: A summary of how many subjects have been created by each data source (e.g.,
redcap_pipeline,fragment_validator). - When to use it: Useful for tracking the volume of data coming from different sources and for auditing purposes.
v_resolution_summary_by_center
- What it shows: A statistical summary of identity resolution decisions, grouped by center.
- When to use it: Helps identify if a particular center is having a high rate of identity conflicts or other resolution issues, which may point to a systematic problem in their data entry.