Data Integration Projects - How to determine the risk level - Appendix A
Projects in Scope series
- What's in scope?
- Public register of Data Integration Projects
- How to determine the risk level
- Key Concepts
- Risk Assessment Process
- Risk Assessment Guidelines
- Appendix A
The purpose of this appendix is to demonstrate through a case study, the process of determining the post mitigation risk rating of data integration projects using the draft risk framework guidelines. A practical example is provided in this document with both pre- and post- mitigation risks.
For the purpose of classifying data integration projects, the following definitions are used:
|Single agency||There is only one Commonwealth data custodian involved in the data integration project.|
|Multiple agencies||There is more than one Commonwealth data custodian involved in the data integration project.|
|Non-Commonwealth||There is one or more non-Commonwealth data custodian involved in the data integration project.|
Risks at different stages of the project
There are varying levels of risks associated with the different stages of a data integration project. Broadly, the stages include extraction, file transfer, linkage, analysis, publication, storage and destruction. Not all stages will be applicable to all data integration projects. For example, a data integration project may retain the linked dataset indefinitely. Therefore, the destruction stage may not be applicable.
Mitigation strategies can be applied at various stages to reduce the likelihood of a breach occurring. The data integration project can be designed in such a way that even if the consequence of a breach occurring is high, the likelihood is reduced such that the project does not require an accredited Integrating Authority. For example, even if the data required for the integration project are sensitive, using the separation principle may mean that the project does not necessarily require an accredited Integrating Authority.
Application of the framework through a selected case study
The following section provides a case study of pre- and post-mitigation assessments. The final risk assessment can assist in determining whether an accredited Integrating Authority is required or not.
The selected case study is:
|#||Type of case study||Project Name|
|4.1||High Risk – Single agency||Client Data Collection (DSS)|
Case studies from other agencies will be included later. This will help ensure the framework continues to be developed in a way that is fit for all situations.
High Risk – Single agency
Client Data Collection (CDC) data integration project (DSS) Department of Social Services (DSS) proposed Client Data Collection (CDC) is an example of a high risk single agency project. DSS is the sole data custodian. In this case, DSS is also the integrating authority.
The purpose of the CDC project is to enable better monitoring and research within and between DSS programs and payments data. This project is still in the planning stage. The following strategies are hypothetical and subject to change.
There are three dimensions that influence the consequence of a breach (from the eight risk dimensions agreed upon). The table below outlines the consequence (or impact) on individuals if there is a breach.
|Sensitivity||High||The project involves integrating all DSS programs and payments data. Information collected includes educational background, health status, income level and much more. The data is considered to be highly sensitive. If leaked, there is the potential to cause harm to individuals and the Commonwealth Government as a whole.|
|Consent||High||Some of the programs directly obtain consent (Endnote 17) from clients. However, the majority of them do not. Generally, both programs and payments data are collected as an administrative by-product.|
|Amount of information about a data provider||High||There may be twenty or more variables with different personal information about a data provider.|
|Pre-mitigation Consequence Assessment||High|
There are five dimensions that influence the likelihood of a breach (from the agreed eight risk dimensions).
|Managerial complexity||Low||There will only be one agency involved in this project. However, a considerable number of internal stakeholders will be part of the project team. The number of DSS staff directly involved in the integration is fewer than ten.|
|Nature of access||Low||Restricted. Access granted to approved staff and access control to be reviewed regularly. The separation principle is applied.|
|Duration of the project||High||Data is proposed to be retained for more than three years.|
|Likelihood of identification||High||A high rating is given as there are a lot of different variables including quasi-identifying variables (such as date of birth, address, indigenous status etc.) contained in the programs and payments data that will be used in the data integration project.|
|Technical complexity||Low||Technical complexity here refers to the output. That is, how difficult it is to confidentialise data for external publication and/or ensuring that external users who need access have access to unit record data. For example, external users may need access to linked data for research purposes.
At this stage, external output is not proposed.
|Pre-mitigation Likelihood Assessment||Medium|
This pre-mitigation likelihood assessment of medium aligns with the risk framework guidelines.
|Overall pre-mitigation assessment||High|
Based on the above assessment, this project is classified as ‘high’ risk. The data is highly sensitive, with a large number of identifiable variables on both the programs and payments data. Having assessed the initial risk, DSS can now take actions to reduce the risk of undertaking this project.
There are a number of things that can be done to mitigate against the likelihood of a breach occurring. The data integration project can be designed in such a way that even if the consequence of a breach occurring is high, the likelihood is reduced such that the project does not require an accredited Integration Authority. These are the mitigation strategies applied to reduce the consequence risk:
|Elements||Reducing the likelihood of a breach occurring in the first place|
|Sensitivity||Initially, the data is assessed as highly sensitive. However, the project design is such that the entire dataset is not required. The separation principle plays a big role here.
It is proposed that a SLK would be created for programs and payments data using the same algorithm. This would negate the need for access to variables that are highly sensitive on the original dataset. A file with the record identifier between the programs and payments data would be retained. This means that the linked file would not contain any sensitive data. Only if a research request is approved would they be given access to the linked dataset.
|Consent||The data being linked is all an administrative by-product. The purpose of this linking activity is to analyse client pathways through the whole social security system to improve programs and policies.
Ultimately, this is the objective of the organisation and thus this project is a strategic move to enable us to provide sound policy advice and better design our programs to achieve quality outcomes.
|Amount of information about a data provider||Linkage stage:
It is proposed that only five or fewer variables be used to create the SLK. These would include variables such as:
The linked file would only contain the SLK, weight (the strength indicator of the link) and record identifier. There is only one quasi-identifying variable in the linked file, as an SLK includes date of birth. This is not enough for identification.
Researchers (internal) would need to go through an approval stage and access would only be granted once it can be shown that the public benefit of the research outweigh the risk.
Throughout the separation principle is applied.
Below are the mitigation strategies that DSS proposes to adopt to reduce the likelihood of a breach:
|Dimension||Initial rating||Mitigation strategies (reducing likelihood of breach)||Revised rating|
|Managerial complexity||Low||DSS will be responsible for managing the internal stakeholders and ensuring that there is clarity around the complex data governance of this project. They will report to a steering committee to ensure that the risk of breaches are minimised. There would be clear terms of reference for the steering committee.||Low|
|Nature of access||Low||Different staff members require access to various aspects of the data at different stages of the project. To mitigate against this risk, the separation principle is applied throughout the project.
Staff members with appropriate security clearance will create the SLK based on the four variables identified above.
File transfer stage:
No external file transfer is required for this project. Internally, access to the system/s would only be granted on a need-to-know basis. Internal data transfers (if required) would only be undertaken by staff with appropriate level of clearance.
Staff members responsible for data linking would only have access to variables needed for the linkage – in this case this is the SLK variable.
The internal researcher is only given data extracts required for their research.
The variables from both datasets are never stored in full in a single file.
|Duration of the project||High||While the duration of the project is long-term, the stored linked file does not contain any variables that would pose a risk to individuals in the event of a breach.
It’s only when the content data is extracted for researchers that there is a risk of a breach. However, there are already policies and protocols in place (such as departmental protective security) to ensure this does not occur.
|Likelihood of identification||High||Risk mitigation strategies can be applied to various stages in the data linking cycle, including extraction, file transfer, linkage, analysis and storage.
There are two extraction stages – one to create the SLK and the other to extract the content data for the researcher. In the first stage, only four variables are needed to create the SLK (and SLK in itself cannot spontaneously identify an individual).
In the second stage, the extraction of content data for research purposes is already subject to systems that protect the privacy of data providers and confidentiality of data, including protective security measures.
File transfer stage:
As there is only one data custodian involved and access to data is already established, there are no security issues involved with file transfer.
As SLKs are used to link the two datasets together, only one quasi-identifying variable (date of birth) used in the linkage stage.
The release of contents data to the researcher would go through already established practices and Protective Security. No name data is provided for analysis.
The linked file would contain content variables, the SLK and a weight variable. There are no spontaneously identifying variables.
|Technical complexity||Low||At this stage, output is not proposed to be published externally.||Low|
|Post-mitigation Likelihood Assessment||Low|
|Post-mitigation risk rating||Medium|
In this particular case, an accredited Integrating Authority is not required (Endnote 18)
Next Steps: Register project on the Data Integration project register. An accredited Integrating Authority is not required for this project.
ENDNOTES: 17 Here clients gave consent for the information to be collected for statistical purposes.
18 It is worth noting that an accredited Integrating Authority may also introduce an increased element of risk. Mitigation strategies would be required to address the introduced risk. For example, if an accredited Integrating Authority is required, then an extra file transfer stage is introduced. There would need to be mitigation strategies to minimise the introduced risk.