Skip to content

Understanding Audit & Resolution Tables

To ensure data quality, transparency, and traceability, IDhub uses a set of powerful "audit" and "resolution" tables. As a data curator, you can use these tables (and the helpful views built on top of them) to troubleshoot ingestion issues, track the history of a record, and manage data conflicts.

This guide provides a detailed overview of these tables and how to use them.


Key Audit & Resolution Tables

These tables store the raw logs and data related to the ingestion and identity resolution processes. You can query them directly in NocoDB for in-depth analysis.

data_change_audit

  • What it is: A complete, unchangeable log of every single INSERT and UPDATE that happens in the database.
  • Why it's useful: This is your go-to table for answering "Who changed this record, when, and what did they change?". It provides a full history for any piece of data.
  • Key Columns:
  • table_name: The database table that was modified (e.g., lcl, subjects).
  • record_key: The unique "natural key" of the record that was changed (e.g., {"sample_id": "SAMPLE-001"}).
  • changes: A JSON object showing the exact fields that were changed, including their old and new values. For new records, this shows all the inserted values.
  • changed_by: The system or user that initiated the change (e.g., table_loader, manual_update).
  • changed_at: The timestamp of when the change occurred.
  • batch_id: Links the change back to a specific validation batch from the Fragment Validator.

identity_resolutions

  • What it is: A log of every attempt to resolve a subject's local IDs to a Global Subject ID (GSID).
  • Why it's useful: If a subject is not being created correctly or is being linked to the wrong GSID, this table provides the full story of the resolution decision made by the GSID service.
  • Key Columns:
  • local_subject_id: The incoming ID that was being resolved.
  • identifier_type: The type of the incoming ID (e.g., consortium_id).
  • input_center_id: The center associated with the incoming record.
  • matched_gsid: The final GSID that the local ID was resolved to.
  • action: The decision made (e.g., create_new, link_existing, conflict_resolved).
  • match_strategy: The logic used to make the decision (e.g., exact_match, center_agnostic_match).
  • requires_review: A flag (true/false) indicating if this resolution created a conflict that a curator needs to address.
  • review_reason: An explanation of why a review is required (e.g., multiple_gsid_conflict).

conflict_resolutions

  • What it is: A queue of data conflicts that require manual review and action from a data curator. These are typically generated by the Fragment Validator when it detects inconsistencies.
  • Why it's useful: This is a primary work queue for data curators. By working through the items in this table, you actively clean and maintain the integrity of the database.
  • Key Columns:
  • conflict_type: The type of conflict (e.g., center_mismatch, multi_gsid).
  • local_subject_id: The ID that is causing the conflict.
  • existing_gsid / incoming_gsid: Shows the conflicting GSIDs.
  • existing_center_id / incoming_center_id: Shows the conflicting center IDs.
  • resolution_action: Your action goes here! You or an administrator can update this field in NocoDB to tell the system how to resolve the conflict (e.g., keep_existing, use_incoming, merge).
  • resolved: A flag that is set to true once the system has successfully applied your resolution action.

fragment_resolutions

  • What it is: A log of every file (or "fragment batch") processed by the Fragment Validator and Table Loader.
  • Why it's useful: It gives you a high-level summary of a file ingestion. You can see if a file succeeded, failed, or was only partially loaded, and how many rows were affected.
  • Key Columns:
  • batch_id: The unique ID for the file processing run.
  • fragment_key: The path to the file in the S3 staging area.
  • load_status: The final status of the ingestion (success, partial, failed).
  • rows_attempted, rows_loaded, rows_failed: Counts for the records in the file.
  • error_message: If the load failed, this contains the reason why.

Helpful Views for Data Curation

Views are essentially saved, pre-built queries that present the information from the audit tables in a more focused and user-friendly way. As a curator, you will likely interact with these views more often than the raw tables.

v_subjects_requiring_review

  • What it shows: A filtered list of all subjects who have been automatically flagged for review by the system or have withdrawn consent.
  • When to use it: Check this view regularly to find subjects whose identity or status is uncertain. You can then investigate using the other audit tables and add notes to the subjects.review_notes field.

v_multi_gsid_conflicts & v_center_conflicts

  • What it shows: These views are filtered lists from the identity_resolutions table, showing only the specific identity conflicts that need attention.
  • v_multi_gsid_conflicts: Shows cases where a single local ID is associated with more than one GSID.
  • v_center_conflicts: Shows cases where a single local ID is being submitted from a different center than the one it's already associated with.
  • When to use it: Use these views as a starting point to investigate and resolve identity conflicts in the conflict_resolutions table.

v_recent_data_changes

  • What it shows: A simple, reverse-chronological list of the 1,000 most recent changes made to any table in the database.
  • When to use it: A great way to get a quick, high-level overview of recent activity in the database, especially after a large data load.

v_subjects_by_source

  • What it shows: A summary of how many subjects have been created by each data source (e.g., redcap_pipeline, fragment_validator).
  • When to use it: Useful for tracking the volume of data coming from different sources and for auditing purposes.

v_resolution_summary_by_center

  • What it shows: A statistical summary of identity resolution decisions, grouped by center.
  • When to use it: Helps identify if a particular center is having a high rate of identity conflicts or other resolution issues, which may point to a systematic problem in their data entry.