Risk

Appendix A

Purpose

The purpose of this appendix is to demonstrate through a case study, the process of determining the post mitigation risk rating of data integration projects using the draft risk framework guidelines. A practical example is provided in this document with both pre- and post- mitigation risks.

Definitions

For the purpose of classifying data integration projects, the following definitions are used:

Project Types	Definition
Single agency	There is only one Commonwealth data custodian involved in the data integration project.
Multiple agencies	There is more than one Commonwealth data custodian involved in the data integration project.
Non-Commonwealth	There is one or more non-Commonwealth data custodian involved in the data integration project.

Risks at different stages of the project

There are varying levels of risks associated with the different stages of a data integration project. Broadly, the stages include extraction, file transfer, linkage, analysis, publication, storage and destruction. Not all stages will be applicable to all data integration projects. For example, a data integration project may retain the linked dataset indefinitely. Therefore, the destruction stage may not be applicable.

Mitigation strategies can be applied at various stages to reduce the likelihood of a breach occurring. The data integration project can be designed in such a way that even if the consequence of a breach occurring is high, the likelihood is reduced such that the project does not require an accredited Integrating Authority. For example, even if the data required for the integration project are sensitive, using the separation principle may mean that the project does not necessarily require an accredited Integrating Authority.

Application of the framework through a selected case study

The following section provides a case study of pre- and post-mitigation assessments. The final risk assessment can assist in determining whether an accredited Integrating Authority is required or not.

The selected case study is:

#	Type of case study	Project Name
4.1	High Risk – Single agency	Client Data Collection (DSS)

Case studies from other agencies will be included later. This will help ensure the framework continues to be developed in a way that is fit for all situations.

High Risk – Single agency

Client Data Collection (CDC) data integration project (DSS) Department of Social Services (DSS) proposed Client Data Collection (CDC) is an example of a high risk single agency project. DSS is the sole data custodian. In this case, DSS is also the integrating authority.

The purpose of the CDC project is to enable better monitoring and research within and between DSS programs and payments data. This project is still in the planning stage. The following strategies are hypothetical and subject to change.

Pre-mitigation Assessment

There are three dimensions that influence the consequence of a breach (from the eight risk dimensions agreed upon). The table below outlines the consequence (or impact) on individuals if there is a breach.

Dimension	Impact	Comments
Sensitivity	High	The project involves integrating all DSS programs and payments data. Information collected includes educational background, health status, income level and much more. The data is considered to be highly sensitive. If leaked, there is the potential to cause harm to individuals and the Commonwealth Government as a whole.
Consent	High	Some of the programs directly obtain consent (Endnote 1) from clients. However, the majority of them do not. Generally, both programs and payments data are collected as an administrative by-product.
Amount of information about a data provider	High	There may be twenty or more variables with different personal information about a data provider.
Pre-mitigation Consequence Assessment	High

There are five dimensions that influence the likelihood of a breach (from the agreed eight risk dimensions).

Dimension	Rating	Comments
Managerial complexity	Low	There will only be one agency involved in this project. However, a considerable number of internal stakeholders will be part of the project team. The number of DSS staff directly involved in the integration is fewer than ten.
Nature of access	Low	Restricted. Access granted to approved staff and access control to be reviewed regularly. The separation principle is applied.
Duration of the project	High	Data is proposed to be retained for more than three years.
Likelihood of identification	High	A high rating is given as there are a lot of different variables including quasi-identifying variables (such as date of birth, address, indigenous status etc.) contained in the programs and payments data that will be used in the data integration project.
Technical complexity	Low	Technical complexity here refers to the output. That is, how difficult it is to confidentialise data for external publication and/or ensuring that external users who need access have access to unit record data. For example, external users may need access to linked data for research purposes. At this stage, external output is not proposed.

Pre-mitigation Likelihood Assessment - Medium

This pre-mitigation likelihood assessment of medium aligns with the risk framework guidelines.

Overall pre-mitigation assessment - High

Based on the above assessment, this project is classified as ‘high’ risk. The data is highly sensitive, with a large number of identifiable variables on both the programs and payments data. Having assessed the initial risk, DSS can now take actions to reduce the risk of undertaking this project.

Post-mitigation Assessment

There are a number of things that can be done to mitigate against the likelihood of a breach occurring. The data integration project can be designed in such a way that even if the consequence of a breach occurring is high, the likelihood is reduced such that the project does not require an accredited Integration Authority. These are the mitigation strategies applied to reduce the consequence risk:

Elements	Reducing the likelihood of a breach occurring in the first place
Sensitivity	Initially, the data is assessed as highly sensitive. However, the project design is such that the entire dataset is not required. The separation principle plays a big role here. It is proposed that a SLK would be created for programs and payments data using the same algorithm. This would negate the need for access to variables that are highly sensitive on the original dataset. A file with the record identifier between the programs and payments data would be retained. This means that the linked file would not contain any sensitive data. Only if a research request is approved would they be given access to the linked dataset. DSS already has processes in place to handle research requests. All internal research requests would also need to go through an approval stage.
Consent	The data being linked is all an administrative by-product. The purpose of this linking activity is to analyse client pathways through the whole social security system to improve programs and policies. Ultimately, this is the objective of the organisation and thus this project is a strategic move to enable us to provide sound policy advice and better design our programs to achieve quality outcomes.
Amount of information about a data provider	Linkage stage: It is proposed that only five or fewer variables be used to create the SLK. These would include variables such as: Name (surname and given name) Sex Date of Birth Storage stage: The linked file would only contain the SLK, weight (the strength indicator of the link) and record identifier. There is only one quasi-identifying variable in the linked file, as an SLK includes date of birth. This is not enough for identification. Analysis stage: Researchers (internal) would need to go through an approval stage and access would only be granted once it can be shown that the public benefit of the research outweigh the risk. Throughout the separation principle is applied.

Elements

Reducing the likelihood of a breach occurring in the first place

Sensitivity

Initially, the data is assessed as highly sensitive. However, the project design is such that the entire dataset is not required. The separation principle plays a big role here.

It is proposed that a SLK would be created for programs and payments data using the same algorithm. This would negate the need for access to variables that are highly sensitive on the original dataset. A file with the record identifier between the programs and payments data would be retained. This means that the linked file would not contain any sensitive data. Only if a research request is approved would they be given access to the linked dataset.

DSS already has processes in place to handle research requests.
All internal research requests would also need to go through an approval stage.

Consent

The data being linked is all an administrative by-product. The purpose of this linking activity is to analyse client pathways through the whole social security system to improve programs and policies. Ultimately, this is the objective of the organisation and thus this project is a strategic move to enable us to provide sound policy advice and better design our programs to achieve quality outcomes.

Amount of information about a data provider

Linkage stage:
It is proposed that only five or fewer variables be used to create the SLK. These would include variables such as:

Name (surname and given name)
Sex
Date of Birth

Storage stage:
The linked file would only contain the SLK, weight (the strength indicator of the link) and record identifier. There is only one quasi-identifying variable in the linked file, as an SLK includes date of birth. This is not enough for identification.

Analysis stage:
Researchers (internal) would need to go through an approval stage and access would only be granted once it can be shown that the public benefit of the research outweigh the risk. Throughout the separation principle is applied.

Below are the mitigation strategies that DSS proposes to adopt to reduce the likelihood of a breach:

Dimension	Initial rating	Mitigation strategies (reducing likelihood of breach)	Revised rating
Managerial complexity	Low	DSS will be responsible for managing the internal stakeholders and ensuring that there is clarity around the complex data governance of this project. They will report to a steering committee to ensure that the risk of breaches are minimised. There would be clear terms of reference for the steering committee.	Low
Nature of access	Low	Different staff members require access to various aspects of the data at different stages of the project. To mitigate against this risk, the separation principle is applied throughout the project. Extraction stage: Staff members with appropriate security clearance will create the SLK based on the four variables identified above. File transfer stage: No external file transfer is required for this project. Internally, access to the system/s would only be granted on a need-to-know basis. Internal data transfers (if required) would only be undertaken by staff with appropriate level of clearance. Linkage stage: Staff members responsible for data linking would only have access to variables needed for the linkage – in this case this is the SLK variable. Analysis stage: The internal researcher is only given data extracts required for their research. Storage stage: The variables from both datasets are never stored in full in a single file.	Low
Duration of the project	High	While the duration of the project is long-term, the stored linked file does not contain any variables that would pose a risk to individuals in the event of a breach. It’s only when the content data is extracted for researchers that there is a risk of a breach. However, there are already policies and protocols in place (such as departmental protective security) to ensure this does not occur.	Medium
Likelihood of identification	High	Risk mitigation strategies can be applied to various stages in the data linking cycle, including extraction, file transfer, linkage, analysis and storage. Extraction stage: There are two extraction stages – one to create the SLK and the other to extract the content data for the researcher. In the first stage, only four variables are needed to create the SLK (and SLK in itself cannot spontaneously identify an individual). In the second stage, the extraction of content data for research purposes is already subject to systems that protect the privacy of data providers and confidentiality of data, including protective security measures. File transfer stage: As there is only one data custodian involved and access to data is already established, there are no security issues involved with file transfer. Linkage stage: As SLKs are used to link the two datasets together, only one quasi-identifying variable (date of birth) used in the linkage stage. Analysis stage: The release of contents data to the researcher would go through already established practices and Protective Security. No name data is provided for analysis. Storage stage: The linked file would contain content variables, the SLK and a weight variable. There are no spontaneously identifying variables.	Medium
Technical complexity	Low	At this stage, output is not proposed to be published externally.	Low
Post-mitigation Likelihood Assessment	Low

Post-mitigation risk rating - Medium

In this particular case, an accredited Integrating Authority is not required (Endnote 2)

Next Steps: Register project on the Data Integration project register. An accredited Integrating Authority is not required for this project.

ENDNOTES

Here clients gave consent for the information to be collected for statistical purposes.
It is worth noting that an accredited Integrating Authority may also introduce an increased element of risk. Mitigation strategies would be required to address the introduced risk. For example, if an accredited Integrating Authority is required, then an extra file transfer stage is introduced. There would need to be mitigation strategies to minimise the introduced risk.

Risk Assessment Guidelines

Statistical Data Integration Framework