Part 1 - What is confidentiality and why is it important?
Agencies collecting information from people and organisations have a legal and ethical responsibility to ensure:
they respect the privacy of those providing the information; and
that individuals and organisations cannot be identified in a disseminated dataset.
There is a clear relationship between confidentiality and privacy. A breach of confidentiality can result in disclosure of information which might intrude on the privacy of a person or an organisation.
Confidentiality refers to the obligation of data custodians (agencies that collect information) to keep the confidential information they are entrusted with secret.
Why is confidentiality important?
Agencies collecting data often rely on the trust and goodwill of the Australian people to provide information.
Maintaining public trust helps to achieve better quality data and a higher response to data collections.
Protecting confidentiality is a key element in maintaining the trust of data providers.
This leads to reliable data to inform governments, researchers and the community.
Confidentiality and therefore trust can be broken when a person or organisation can be identified in a disseminated dataset, either directly or indirectly.
For example, a person could be directly identified in a dataset if that dataset contains their name and address. However, a person or an organisation could also be indirectly identified if there is a combination of information in the dataset from which their identity can be deduced.
Example: the combination of date of birth and a detailed area code (for example, a town where 300 people live) may enable identification as there will be some unique dates of birth in such a small area.
What does ‘confidentialise’ mean?
The term confidentialise refers to the steps a data custodian must take to mitigate the risk that a particular person or organisation could be identified in a dataset, either directly or indirectly. Confidentialisation requires two key steps:
de-identification of the data, that is, the removal of any direct identifiers (e.g. name and address) from the data; and
assessment and management of the risk of indirect identification occurring in the de-identified dataset.
De-identified data does not necessarily protect the identity of individuals or organisations.
Removing identifying information such as name and address protects data providers from direct identification.
However, it may still be possible to indirectly identify a person or an organisation in a de-identified dataset. If enough detail is available, the identity of a particular person or organisation may be derived from the presence of a very rare characteristic or the combination of unique or remarkable characteristics.
Example: the identity of a person could be deduced if a dataset indicates the person is over 85 years old, has yearly income of more than one million dollars, and resides in a town of 400 people.
Example: the identity of a person with a very rare disease or health condition could be deduced even in highly aggregated data.
To protect the identity of individuals and organisations, both direct and indirect identification need to be considered.
Confidentialising data involves removing or altering information, or collapsing detail, to ensure that no person or organisation is likely to be identified in the data (either directly or indirectly).
There are various methods used to confidentialise data. These methods aim to protect the identity of individuals and organisations while enabling sufficiently detailed information to be released to make the data useful for statistical and research purposes.
The main techniques for confidentialising data are described in below: "How to confidentialise data: the basic principles".
For more information about assessing and managing the risks of indirect identification in microdata see below: "Managing the risk of disclosure in the release of microdata".
The confidentiality information series
This information sheet is part of a series designed to explain, and provide advice on, a range of issues around confidentialising data. The other sheets are below.