High Level Principles for Data Integration - Principle Six - Preserving Privacy & Confidentiality
High Level Principles for Data Integration series
- High Level Principles for Data Integration
- About CPSIC
- Statistical Integration – why?
- PRINCIPLE ONE - Strategic Resource
- PRINCIPLE TWO - Custodian's Accountability
- PRINCIPLE THREE - Integrator's Accountability
- PRINCIPLE FOUR - Public Benefit
- PRINCIPLE FIVE - Statistical & Research Purposes
- PRINCIPLE SIX - Preserving Privacy & Confidentiality
- PRINCIPLE SEVEN - Transparency
- Statistical Data Integration
Policies and procedures used in data integration must minimise any potential impact on privacy and confidentiality.
This principle ensures privacy and confidentiality are preserved to the maximum extent possible.
Operational, administrative and personal identifiers should be removed from datasets as soon as they are no longer required to meet the approved purposes of the statistical data integration. Where identifiers need to be retained, for example for longitudinal studies, they should be kept separate from the integrated dataset.
The number of unit records and data variables to be included in an integrated dataset should be no more than required to support the approved purposes.
The type of matching used (exact, probabilistic or statistical) should be chosen as the minimum needed to support the approved purposes, and the range of attributes used to establish a common identity should be the minimum necessary for the linking operation to succeed.
Access to potentially identifiable data for statistical and research purposes, outside secure and trusted institutional environments should only occur where: legislation allows; it is necessary to achieve the approved purposes; and meets agreements with source data agencies.
Risks of indirect as well as direct identification should be carefully managed when data is disseminated outside secure and trusted institutions, particularly in terms of units with unusual characteristics. This management must take account of the potential increase in identifiability of one set of data when combined with another set. It might involve strict data use licensing conditions, reducing detail, perturbing data, or seeking the consent of the individual or business involved to release potentially identifiable data, the last of these being most likely in the case of business data.
Once the approved purpose of the project is met, the related datasets should be destroyed, or if retained, the reasons for and necessity of retention documented, and a review process set up. If such retention was not part of the initial approval process, re-approval of the decision to retain is required.
Archiving of statistically integrated data sets should be restricted to confidentialised datasets.