Data Integration - FAQ
- 1 General Questions
- 1.1 What is ‘data integration for statistical and research purposes’?
- 1.2 Why integrate data?
- 1.3 What are some examples of data integration for statistical and research purposes?
- 1.4 What are the Commonwealth arrangements for data integration?
- 1.5 Is the project in scope of the Commonwealth arrangements?
- 1.6 Who oversees the Commonwealth arrangements?
- 2 PRIVACY AND CONFIDENTIALITY
- 3 DATA CUSTODIAN
- 4 INTEGRATING AUTHORITY
- 5 ACCREDITATION
- 6 DATA USERS
- 7 DATA MANAGEMENT
What is ‘data integration for statistical and research purposes’?
Data integration involves combining data about an individual entity from different administrative and/or survey sources, at the unit record level (i.e. for an individual person or organisation) or micro level (e.g. information for a small geographic area), to produce new datasets for statistical and research purposes. This approach leverages more information from the combination of individual datasets than is available from the individual datasets separately. For more information on the process of linking data see Data Linking Information Series, in particular Sheet 1.
Using integrated data for statistical purposes means that the information is used specifically to help in producing statistics or research, not to monitor an individual person, household, family or business. High Level Principle 5 requires that where data integration is approved and implemented for statistical and research purposes, the integrated data is not then used for regulatory or compliances purposes.
Why integrate data?
Integrated datasets provide public benefits in terms of improved research, supporting good government policy making, program management and service delivery. Data integration has already been used to improve people’s health. It has led to the use of folate in pregnancy to prevent neural tube defects like spina bifida, and a linkage study done in Western Australia has been used to provide more information to help quantify the relationship between long haul air travel and deep vein thrombosis (DVT).
A major advantage of data integration is that it allows better use of data that is already available, so it can be a cost effective and timely way of gathering more information in order to help improve social, economic and environmental wellbeing. It also reduces the duplication of information collection from people and businesses, as integration projects make use of existing information which was collected from them for other purposes. For example, the Western Australian study on DVT and long haul air travel, mentioned above, used information from hospitals and data collected on the passenger cards which people complete on international flights.
One of the governing principles for data integration involving Commonwealth data is that data integration should only occur where it provides significant overall benefit to the public. Data integration will also maximise the value of existing datasets, while protecting and maintaining privacy and confidentiality.
What are some examples of data integration for statistical and research purposes?
See the Public Register of Data Integration Projects for a current list of projects.
What are the Commonwealth arrangements for data integration?
The arrangements for Data Integration involving Commonwealth Data for Statistical and Research Purposes (the Commonwealth arrangements) provide a framework that builds a safe and effective environment for statistical data integration activities. The arrangements aim to maximise the potential statistical value of existing and new datasets, to improve community health, as well as social and economic wellbeing by integrating data across multiple sources and by working with governments, the community and researchers.
A set of principles and governance and institutional arrangements were approved by the Commonwealth Secretaries Board (i.e. heads of all Commonwealth government agencies and the Australian Public Service Commission) in 2010. These provide assurances that the privacy of individuals and businesses will be protected by ensuring strong and consistent governance, methods, policies and protocols around integration of Commonwealth data for statistical and research purposes. The arrangements are an administrative scheme and the authority for them is vested in the agreements between agencies to participate in the arrangements, as agreed by the Portfolio Secretaries.
For more information, see About the Commonwealth arrangements in the guide.
Is the project in scope of the Commonwealth arrangements?
Who oversees the Commonwealth arrangements?
The Cross Portfolio Data Integration Oversight Board (Oversight Board) has been established as part of the Governance and Institutional arrangements for Data Integration involving Commonwealth Data for Statistical and Research Purposes (as endorsed by the Secretaries board).The Oversight Board oversees the development of a safe and effective cross government environment for data integration involving Commonwealth data for statistical and research purposes.
The Oversight Board has responsibility to help manage the systemic risk associated with conducting data integration projects involving Commonwealth data.
The Oversight Board is chaired by the Australian Statistician (Australian Bureau of Statistics) and membership currently comprises the heads of the Department of Health; the Department of Social Services; and the Department of Human Services.
PRIVACY AND CONFIDENTIALITY
How is my privacy protected?
There are a number of ways that your personal information is protected when data is linked:
1. Through legislation The Privacy Act 1988 sets out the rights of individuals in relation to the collection, storage, use and sharing of access to information provided to government. All Commonwealth and ACT government departments must comply with the Privacy Act 1988. In addition, more specific legislation governs the use of particular datasets collected for specific government activities, for example the Social Security (Administration) Act 1999, the Health Insurance Act 1973 and the Census and Statistics Act 1905.
2. By removing name, address, and other identifiers (such as Australian Business Number) that directly identify an individual or organisation and also, removing or altering other information that might enable an individual or organisation to be indirectly recognised in the data.
3. By using ethics committees to balance the benefits of a project with any risk that the project may cause harm, inconvenience or discomfort to an individual.
Human Research Ethics Committees review research proposals that involve people, their data or their tissue. Their job is to protect the welfare and rights of individuals by ensuring that the proposed project is ethically acceptable and in accordance with relevant standards and guidelines.
There are more than 200 Human Research Ethics Committees in institutions and organisations across Australia. Each project must have research merit and integrity, select participants fairly and minimise the burden imposed, and respect the privacy of participants and the confidentiality of their data.
4. By adopting agreed principles, governance and institutional arrangements.
A set of principles and governance and institutional arrangements were approved by the Commonwealth Secretaries Board (i.e. heads of all Commonwealth government agencies and the Australian Public Service Commission) in 2010. These provide assurances that the privacy of individuals and businesses will be protected by ensuring strong and consistent governance, methods, policies and protocols around integration of Commonwealth data for statistical and research purposes.
What happens if something goes wrong?
In addition to the Privacy Act 1988, Commonwealth agencies have specific legislation that provides additional protection governing the use of, and access to, identifiable information. If an unauthorised disclosure of confidential information relating to an individual or a business occurs, the penalties can include heavy fines and jail terms up to two years.
Government agencies are also subject to other general requirements regarding the security of information, for example, the Australian Government’s Protective Security Framework and the Information Security Manual. There may also be common law duties relating to confidentiality of particular information.
The High Level Principles and the governance and institutional arrangements for data integration involving Commonwealth data for statistical and research purposes were endorsed by a Commonwealth Secretaries Board (i.e. heads of all Commonwealth government agencies and the Australian Public Service Commission) in 2010. This framework provides added assurances that the privacy of individuals and businesses will be protected by ensuring strong and consistent governance, methods, policies and protocols around the integration of Commonwealth data for statistical and research purposes.
For more information, see Data breaches in the guide.
What if I don't want my information to be used in this way?
Consent is sought for data integration projects involving Commonwealth data when it is practical and possible to do so (for example, the ‘Growing up in Australia’ study is only done with consent). It is not always practical to seek consent. In the case of population-based data collection and linkage, for example, it would be prohibitively costly to seek written or verbal consent from the whole population.
Some data integration projects for statistical and research purposes are undertaken without asking for consent. This only happens where the legislation allows all participating parties (data custodians and integrating authorities) to undertake these projects. Details of the data integration projects will be made publicly available on the Public Register of Data Integration Projects. In all cases, data linkage projects are only approved after careful consideration of the disclosure risks involved and the public benefit which would result from the project.
There are strong protections and safeguards in place in all cases to minimise the possibility of identification of an individual or business. These include jail terms for those who disclose information, except as permitted by law. A set of principles and governance and institutional arrangements apply to data integration projects involving Commonwealth data for statistical and research purposes, which help to safeguard the privacy of individuals and businesses who provide data.
What is a ‘data custodian’?
A data custodian is the agency responsible for managing the collection, use, disclosure and protection of source data e.g. a survey or administrative dataset. Data custodians collect and hold information on behalf of data providers (defined as individuals, households, businesses or other organisations), who supply their data/information to the data custodian for statistical or administrative purposes.
For data integration projects where the data custodian differs from the integrating authority, the data custodian must be authorised to release identifiable data to the integrating authority. This authorisation comes from:
1) legislated authority or 2) by consent from the data provider (that is, the person, household, business or other organisation who originally supplied the data for statistical or administrative purposes), where this is not precluded by legislation.
Note: If at any time a data custodian is unsure if their legislation allows for the release of data for an integration project, independent legal advice should be sought (for example, from the agency’s legal department in the first instance).
For more information, see Authorisation to release identifiable data in the guide.
How does a data custodian remain accountable for their data?
Data custodians have continued accountability for the source data within integrated datasets and must establish adequate controls over the use of personal or other sensitive data. The agreement between the data custodian(s) and the integrating authority provides a mechanism for the data custodians to exercise their accountability for the security and confidentiality of the source data. Agreements should therefore include conditions relating to data security obligations, privacy and confidentiality requirements, data access provisions to be passed on to the data users and potential sanctions which may apply to misuse of the data.
The decision of whether or not to approve a project is at the discretion of each data custodian. Where an agency does not agree to the use of its source data in a data integration proposal, that data will not be included. Commonwealth administrative data cannot be used for statistical or research purposes if this contravenes legislation or any commitment made to data providers regarding the purpose for which their data may be used.
The data custodians should be kept informed of, and must agree, to any changes to existing and approved projects.
For more information, see Principle two – custodian’s accountability and the Data custodian section in the guide.
How do I assess the risk of a project?
Prior to giving in principle approval for a project, it is the responsibility of the data custodian(s) to assess the risk of a project to determine whether a project should proceed and whether an accredited Integrating Authority is required to manage the project. The risk assessment framework describes a two stage process that assesses the risk of the data integration project against criteria agreed by the Oversight Board. The first stage (the pre-mitigation risk assessment) identifies and rates the elements of risk presented by the project. The second stage assesses the residual risk after accounting for risk mitigation strategies (the post-mitigation risk assessment). If the project risk is high after mitigation then the project must be managed by an accredited Integrating Authority.
For more information, see Risk Assessment Guidelines and the Risk framework section in the guide.
Integrating authorities are an essential pillar of establishing a safe and effective environment for data integration involving Commonwealth data. The High Level Principles for Commonwealth data require that an integrating authority be nominated for each data integration project. The integrating authority is the single organisation responsible for the sound conduct of the statistical data integration project. It is responsible for the implementation of the data integration project and the management of the integrated datasets throughout their life cycle, ensuring full compliance with commitments made to data custodians, and in line with the high level principles and supporting governance and institutional arrangements. The integrating authority is also responsible for providing researchers with safe and secure access to the integrated data, in line with the requirements of data custodians.
For more information, see integrating authorities in the guide.
The integrating authority has an important role in managing the increased risk of identification that exists when two or more datasets are integrated. Generally an integrating authority will be chosen because they are authorised to receive the data and have expertise to manage the confidentiality of the integrated dataset and capability for maintaining security.
An integrating authority must be:
- nominated by the data custodians for each statistical data integration project involving Commonwealth data. The data custodian(s) must ensure they are authorised to release identifiable data to the integrating authority, either by the data custodian’s legislation or by consent from the data provider.
- a secure and trusted institution. In addition, the governance arrangements agreed by the Portfolio Secretaries require integrating authorities undertaking high risk (post mitigation) projects (see Risk Assessment Guidelines) to be accredited.
- in a position to comply with the requirements of the Privacy Act 1988(in regards to information about individuals) and secrecy provisions generally (in regards to information with respect to the affairs of any third party, corporate or individual).
For more information see Integrating authorities in the guide, the Legal framework for integrating authorities undertaking low and medium risk projects and the Legal framework for integrating authorities undertaking high risk projects - project level requirements.
Yes. The role of an integrating authority is unique to the Commonwealth arrangements for statistical data integration. An integrating authority is responsible for the overall management of a project for its entire life cycle. This includes:
- the management of the data integration project on behalf of data custodians;
- minimising privacy concerns associated with the use of data once it is received from data custodians and after integration; and
- facilitating the use of the integrated data within the constraints of privacy legislation.
To ensure the integrator’s accountability (Principle 3), the integrating authority is responsible for undertaking two key aspects of the linkage process: the merging of the data and its confidentialisation.
Some aspects of the project can be outsourced. For example, with the agreement of the data custodians, an integrating authority can outsource the creation of linkage keys to specific linkage units with the expertise and infrastructure to undertake this component of work.
For more information, see outsourcing or working in partnership in the guide.
Yes. All data integration projects involving Commonwealth data for statistical and research purposes should nominate an integrating authority and register the project on the Public Register of Data Integration Projects.
For more information, see 'What is an ‘integrating authority?' and ‘Who can be an integrating authority?’.
How do I register a project?
Once the project has been approved and formal agreements have been signed, the integrating authority can register the project. To do this, complete the online form on the Public Register of Data Integration Projects on the National Statistical Service website.
For more information on the registration process, see the help documentation.
At the time of registration the risk assessment should also be submitted to the Oversight Board through the Secretariat.
For more information, see Risk Assessment Guidelines.
What is accreditation?
Accreditation of an integrating authority is the recognition by the Cross Portfolio Data Integration Oversight Board (the Oversight Board) that the organisation has the requisite expertise, skills and knowledge, infrastructure and secure environment to undertake high risk (post mitigation) data integration projects involving Commonwealth data for statistical and research purposes.
For more information on the accreditation process and how to apply for accreditation, see ‘The interim accreditation process for integrating authorities’.
For more information, see Accreditation in the guide.
Who can apply for accreditation?
The interim accreditation arrangements will be tested on Commonwealth government agencies first. While this does not preclude state government agencies applying now for accreditation against the interim arrangements (provided that they meet all the requirements), it will not be possible for any state government agencies to be accredited in the short term, as this would not allow time for sufficient testing and evaluation of the arrangements with Commonwealth agencies. The system is not yet mature enough to ensure that adequate safeguards apply to private firms. State government agencies and private firms can continue to apply for access to Commonwealth data under existing arrangements.
The Cross Portfolio Data Integration Oversight Board will only consider applications for accreditation, against the interim arrangements, for those agencies covered by privacy legislation (either the Privacy Act or state/territory equivalent).
No. Appointment of an integrating authority will be considered for each individual project, or family of projects, on a case by case basis. Choice of an integrating authority is at the discretion of the data custodians and must be mutually agreed. The decision about which accredited Integrating Authority to use for a specific project will depend on a range of factors, including whether the data custodians are authorised to release the data to the integrating authority by legislation or consent.
For more information on the legal framework, see the legal framework for integrating authorities undertaking high risk projects – project level requirements.
When is an accredited Integrating Authority required?
An accredited Integrating Authority is required for all high risk (post mitigation) projects. The risk rating of a project is determined through the Risk Assessment Guidelines.
Accredited Integrating Authorities have been approved by the Cross Portfolio Data Integration Oversight Board (the Oversight Board) as having the capacity to deal with high risk (post mitigation) data integration projects.
For more information, see Accreditation in the guide.
What are the steps involved in gaining accreditation?
The interim accreditation process includes:
- an assessment by the applicant against eight accreditation criteria (these include the ability to ensure secure data management, availability of appropriate skills, transparency of operation and a culture and set of values that ensure the protection of confidential information and support the use of data);
- an audit by an independent third party to assess whether a prospective agency or organisation meets the criteria;
- a final decision made by the Cross Portfolio Data Integration Oversight Board.
For more information on the accreditation process and criteria, see ‘The interim accreditation process for integrating authorities’; and
- inclusion on a public list of accredited Integrating Authorities, together with a summarised version of the integrating authority’s application. This list is maintained by the Cross Portfolio Data Integration Secretariat on the National Statistical Service website.
It is recommended that potential applicants contact the Cross Portfolio Data Integration Secretariat (phone: 02 6252 7198 or email email@example.com) before starting the application, as well as at any point where guidance is needed during the application.
For more information, see The interim accreditation process for integrating authorities, ‘Who can apply for accreditation?’ and the Accreditation section in the guide
Who are ‘data users’?
Data users (researchers) are those involved in accessing and investigating integrated datasets at the unit record level, for statistical and research purposes.
Data users include academics working in research institutions and employees undertaking research in Commonwealth and State/Territory agencies. The term can also include multiple data users working as part of a consortium, alliance or collaborative network.
How will the data integration arrangements affect data users?
The High Level Principles and supporting governance and institutional arrangements will bring about some changes in the role data users play in integration projects involving Commonwealth data. Historically, data users are often responsible for merging the data using linkage keys and providing an appropriately secure environment for accessing and storing the integrated dataset. Under the Commonwealth arrangements for data integration, integrating authorities will carry out these tasks as part of their end-to-end management of the data integration project.
The integrating authority has an important role in managing the increased risk of identification that exists when two or more datasets are integrated. Generally an integrating authority will be chosen because they are authorised to receive the data and have expertise to manage the confidentiality of the integrated dataset and capability for maintaining security. Data users will access integrated datasets through secure arrangements provided by the integrating authority and the integrating authority will de-identify and confidentialise the data according to the requirements of data custodians before the integrated dataset is released.
The aim of the Commonwealth arrangements is to maximise the use of the rich data sources held by government bodies, as articulated in Principle 1 of the High Level Principles for Data Integration Involving Commonwealth Data. By creating a safe environment for data integration, the new arrangements should enable greater access to Commonwealth data for data integration projects in the future.
How do I propose a project?
The data user (researcher) wishing to undertake a statistical or research project that is in scope of the Commonwealth arrangements should prepare a project proposal. During this stage, data users may need to consult with the data custodian(s) to clarify some of the details for the project, particularly around the data items required for the project.
In some circumstances an integrating authority may assist with the preparation of the project proposal. In such circumstances it should be noted that the final decision concerning which integrating authority to appoint will remain subject to the outcome of the risk assessment.
For more information, see Project proposal in the guide.
Who can access integrated data?
Integrated data will be made available to approved data users, as specified in the project proposal. Data access agreements between the integrating authority and data users should set out the conditions and arrangements for accessing and producing output from the integrated dataset. The conditions and nature of access may vary from project to project, according to the requirements of data custodians.
Part of the integrating authority’s role in managing integrated datasets is to provide data users with secure access to the data as specified in the project agreements. It is recommended that data users be provided with integrated data for their specified research after it has been merged and confidentialised by the integrating authority, according to the requirements of the data custodians.
For more information, see Access to integrated data in the guide.
What is confidentialising?
Confidentiality refers to the obligation of data custodians (agencies that collect information) to keep the confidential information they are entrusted with secret. It is the responsibility of the integrating authority, on behalf of the data custodians, to ensure that information is only released to data users in a way that is not likely to enable identification, either directly or indirectly, of individuals or organisations.
For more information, see Confidentiality Information Series.
What is the separation principle?
The separation principle is a mechanism to protect the identities of individuals and organisations in datasets, applied as part of the linking and merging process used to form the integrated dataset. The separation principle means that no individual working with the data can view both the linking (identifying) information (such as name, address, date of birth or ABN) together with the merged analysis (content) data (such as clinical information, benefit details or company profits) in an integrated dataset.
For more information, see the separation principle in the guide.
What happens to the integrated data once a project is complete?
Once the approved purpose of the project is met, the related datasets will be destroyed, unless otherwise agreed by data custodians. If an integrated dataset is retained, the reasons for and necessity of retention, will be documented and a review process set up (for example, if the project is ongoing, or the integrated dataset is required to support a family of projects). If such retention was not part of the initial approval process, re-approval of the decision to retain the integrated dataset should be obtained from all data custodians.
Archiving of statistically integrated datasets should be restricted to confidentialised datasets.
For more information, see Principle 6 – Preserving privacy and confidentiality.
It is the responsibility of the integrating authority to ensure the above tasks are completed. This includes initiating and managing the regular review of the project if the integrated dataset is retained.
For more information, see Evaluation and project completion in the guide.