Definitions

From Data.gov.au
Jump to: navigation, search

What is Open Data

Put briefly, open data is data that is freely-available, easily-discoverable, accessible and published in ways and under licences that allow reuse. Open data may be available in other forms that do not meet those standards. For example, data published in a PDF file with all rights reserved is less open than data in a spread sheet file published under a Creative Commons BY licence. See below data.gov.au for more advice about open data.

All Commonwealth Government data should be hosted on or linked through data.gov.au as the authoritative point of discovery for Commonwealth data.

Assess upcoming agency publications, websites, mobile apps and other agency information resources to identify datasets suitable for release as open data

Consider whether upcoming publications contain information that is suitable for release in an open data format. Many government reports include agency data about financial, economic, social or regulatory activity or trends, or present data in formats unsuited to reuse (such as publishing tabular data in a PDF file rather than a reusable spread sheet format). Often the underlying raw data could be released alongside the report in a reusable open data format.

For example, as part of the budget process the Government now publishes financial data from the portfolio budget statements in a reusable format on data.gov.au in addition to the statements themselves which are available on departmental websites.

This approach can also work for agency websites and mobile apps that present information to the public. For example, the Department of Human Services’ (DHS) Service Locator tool allows people to find their nearest Centrelink, Medicare or Child Support location. DHS released the raw Service Locator geospatial data on data.gov.au. This makes the data available to the public to build their own apps using the data. Another example is the Department of Finance’s release of historical government contract data from the AusTender website on data.gov.au.

Agencies can also assess whether any material they have already published online in publications, websites or apps is suitable for release as open data. This could include material published as part of the FOI Act Information Publication Scheme or in an agency disclosure log.

Ensure you will have access to the raw data and the right to publish it

Effective information governance (PSI Principle 3) will help you ensure that your agency has access to the raw data that was used to create existing publications or apps.

When commissioning research, collecting data or establishing a new ICT system, adopt information management and procurement practices that ensure you have access to associated raw data in an open format and the right to publish that data online under an open licence. This is important if a service provider is contracted to collect the data or to develop a website or mobile app built on an agency data source.

Prepare, publish and refine

As noted in the Australian Government Web Guide, once the data is ready to publish in an open format, the agency should:

  • prepare appropriate metadata to accompany the dataset, to ensure the data is discoverable and meaningful to the public
  • publish the data in an appropriate place, such as the agency website, data.gov.au, or an existing domain-specific collection or catalogue repository.

You should also consider how to refine your approach so that the data (or subsequent releases of equivalent data) remains relevant and useful in future. This could include engaging with stakeholders, assessing how the data was reused and considering whether the data should be presented in a different format or made accessible in different ways (for example, through an application programming interface or API that allows programmers to easily reuse the data).

Next steps: releasing unpublished data

Consider whether your agency has unpublished data that could be released as open data.

Agencies wishing to convert or release previously unpublished data should first consider legislative and policy requirements that may prevent publication or require modification of the data before release.

In particular, you should consider obligations under the Australian Privacy Principles (APPs) in the Privacy Act 1988. Guidance about the Privacy Act is available in the OAIC's APP Guidelines. In addition, the OAIC's Information Policy Agency Resource 1 — De-identification of data and information discusses de-identification as a technique that allows agencies to balance privacy and transparency objectives when publishing open data.

The information provided in this resource is of a general nature. It is not a substitute for legal advice.

Licensing

The Australian Government policy on publishing data and information is Creative Commons Attribution 3.0 Australia (CC BY 3.0 AU) as the default. This is stated at point 11(b) in the mandatory Statement of IP Principles for Australian Government Agencies.

Agencies are advised that processed data or data in aggregate does not inherit the source data licensing or ownership because it is a new dataset and can be licensed however the agency wants, preferable CC-BY as per the default Commonwealth position above.

Standards, Specifications and Formats

Standards and specifications developed in processes with the attributes identified above enable data, products, and services to be used by anyone, at any time, and spur innovation and growth. Agencies should prioritise the use of open data formats that are non-proprietary, publicly available, and place no restrictions upon its use. Use of standards, specifications and formats can provide significant benefits to agencies and stakeholders while helping to implement open data priorities. See the ODI website for some useful examples of open formats.

Best Practices for Data Standards

System owners and data owners should, wherever possible, consider relevant international and Australian standards for data elements. Standards bodies dealing with data include:

International Standards

Australian Government Standards that have been adopted for use:

  • AGLS (National Archives)
  • ANZLIC Spatial Profile
  • SDMX (ABS)
  • Tax
  • AIXM (Aeronautical Information Exchange Model standard)

This section will be updated as guidance evolves.

Common Core Metadata

This section contains guidance to support the use of the common core metadata to list agency datasets and application programming interfaces (APIs) as hosted at data.gov.au and the geospatial data held on FIND.ga.gov.au.

Metadata is information about data. It describes the content, format, quality, currency and availability of data in a structured machine readable way. Establishing a common vocabulary for the metadata allows for the management of data in a well defined, structured way. As we evolve our infrastructure, detailed metadata will not only allow more people to find your data, it will also allow maximum use.

“Common Core” Required Fields for data.gov.au

The core discovery metadata for data.gov.au is maintained at http://data.gov.au/dataset/data-gov-au-metadata-and-other-schemas and was recently updated with extensive stakeholder and experts engagement to align with international best practices DCAT whilst mapping to important local standards including AGLS, ANZLIC, ISO19115 and AGIFT.

Common Core Metadata Schema for FIND.ga.gov.au

Creating and maintaining quality metadata is a significant organisational commitment; however, it should not be seen as a major burden on resources or business processes. Organisations that conform to the ANZLIC Metadata Profile should find that the creation and maintenance of metadata becomes an integral and seamless component of their business processes.

The ANZLIC Metadata Profile will facilitate efficient access to descriptions of information resources, and in particular geographic (or spatial) data. Adoption of, and compliance with, the ANZLIC Metadata Profile will ensure a consistent approach to spatial information resources throughout Australia and New Zealand. This will help people and applications to locate resources without detailed knowledge of the data or resources being sought or an understanding of complex jurisdictional or organisational structures.

The use of standardised descriptions will enable online search engines to process queries more efficiently. This helps to ensure that people and applications conducting searches are presented with relevant and meaningful results. Custodians of geospatial data assets will benefit as their information resources become discoverable by a much wider range of potential users, at negligible cost, than could ordinarily be found through traditional marketing and distribution channels.

Please download the ‎FIND metadata profile noting M: Mandatory, C: Conditional (on if you select another element), O: Optional in the standard but recommended as good practise.

See the ANZLIC profile document and guidance or contact spatial@communications.gov.au for more information or assistance on best practice implementation.

Types of data

Below is a diagram that nicely articulates the difference and commonalities of open, big and government data. Published with permission from author Joel Gurin:

Open-big-government-data-diagram-joelgurin.jpg

Tabular Data

Much data held by agencies is tabular in nature. That is, tables or spreadsheets. Machine readable tabular data published on data.gov.au automatically generates an API that can be used by developers.

Spatial Data

The Department of Communications is the Commonwealth Government policy authority for spatial data and you can find information about spatial policy and practice at http://spatial.gov.au/. We will add more content here in the near future, but please see the work on the ANZ Foundation Spatial Data Framework (FSDF) for major spatial data work happening across the Commonwealth.

FIND is the national spatial directory, listing government, research and private sector spatial datasets. Spatial data hosted on data.gov.au automatically generates appropriate spatial web services (WFS/WMS/etc) also automatically shows up on FIND.

Agencies can also find useful guidance on geocoding on the Statistical Spatial Framework Home Page.

Big Data

The Department of Finance is the Commonwealth Government policy authority for big data. Below are a few useful big data resources:

Aggregated Data

Aggregated or other types of processed data does not tend to have the same privacy issues as unit record level data. Unfortunately however, agencies use myriad ways to aggregate data which is not always comparable. Agencies are encouraged to use the most appropriately granular level from the Australian Statistical Geography Standard (ASGS) such as mesh or SA1 blocks to publish aggregate data as this makes aggregate data comparable over time and across agencies. If agencies use postcodes or electorates to publish data, it is difficult to compare the data over time as these administrative boundaries change regularly.

Agencies are encouraged to review the Confidentiality Information Series and Guidance_Geographic Differencing_1.pdf Protecting Privacy for Geospatially Enabled Statistics guidance provided by the National Statistical Service for good general information and further links to more detailed information relating to both aggregate and unit record data, including specific issues around data for regions.

Further guidance on this will be made available in late 2014.

Unit Record and Integrated Data

Unit record data is generally the most granular level of data an agency has to deal with as it generally means the individual records in a dataset. Often it is very difficult to make this data publicly available in its raw state, even with unique identifies, as it can be possible to re-identify individuals if not careful. There are a few ways agencies can go about dealing such data:

  • With anonymisation on the fly APIs, such as is the case with the ABS data through ABS.Stat. Agencies are encouraged to speak to the ABS about this approach to share expertise.
  • Agencies can publish the data in aggregate - see above. Agencies are encouraged to publish aggregate data according to the ASGS administrative mesh or SA1 blocks.
  • Sometimes unit record data can be sufficiently deidentified to publish in bulk, but this depends strongly on the data. Agencies are encouraged to seek advice from Finance, Communications or the Privacy Commissioner.

Government Entities are encouraged to use the Guide for Data Integration Projects Involving Commonwealth Data for Statistical and Research Purposes for integrated data purposes. It includes principles, guidance, how to become an "Integrating Authority" and appropriate approaches to data integration projects. The Guide has the stated goal of "Creating a safe and effective environment for data integration: an Australian Government approach to facilitate linkage of social, economic and environmental data for statistical and research purposes".

The National Statistical Service has also developer resources on data linking.

It is also worth noting that it is generally researchers (both internal and external) who need access to such data in this format. Researchers generally already have the ethical, legal, technical and other appropriate frameworks and mechanisms to access this data, though there are significant challenges for researchers in accessing and analysing data across government(s). Lowering the granularity of data reduces the privacy issues but also reduces the value to researchers. There is currently some work underway to identify issues and improve researcher access to data.

A useful case study for enabling research access is the ABS Remote Access Data Laboratory (RADL).

Unit record data that is published in raw form can be reidentified given enough data points for an individual record. For example:

The Business Case for Open Data

The three core benefit and value areas of opening government data are generally seen as:

  1. social and economic public benefits and opportunities.
  2. more efficient and effective government; and,
  3. transparency & accountability;

These aspects are given varying prioritisation in different government jurisdictions around the world and even within Australia. In Australia the focus of open data as a public good is well understood but difficult to sustain in an environment of increasingly shrinking resources and funding. The application of data for improving the public service however, is becoming better understood. Articulating the benefits to the public service in opening up data is a vital part in encouraging Agencies to develop sustainable data publishing strategies. Benefits include productivity gains, cutting red tape through more efficient practices and data collection/management, more effective online/mobile service delivery, improved policy development, cost savings through common data access across whole of government and improved application of data for better spending decision-making and whole of government strategic development, etc.

The Australian Government holds a large amount of data but does not currently used it to best effect. Some agencies collect large amounts of data in the natural course of their operations, and tend to focus more on collection rather than analysis or broader utility. Some hold ‘single sources of truth’ fundamental to the operation of government. Others have developed a number of data initiatives in isolation which have provided key insights to a small number of people. Agencies’ efforts are rarely connected, sometimes duplicative, and of variable quality with inconsistent standards. The value of such initiatives to the whole of government is rarely articulated or realised.

Better use of available data can identify problems, improve decisions, strengthen policy-making and support more efficient, effective and sustainable program. Exposing quality data to business and the community can assist accountability, transparency and innovation.

Given the wide variety of data sources, level of capability and capacity within government, the constraints currently limiting agencies in their use of data, and the lack of a mandate or imperative on Agencies to publish, the challenge seems considerable. But the Government cannot afford not to improve its use of data. Finance’s own experience running one of the key single source of truth—the Budget forecast and actuals — its collection and management of data for a number of whole of government programs it administers, and the carriage of data.gov.au demonstrates that where data is analysed from a whole of government perspective, it improves knowledge of products and services, industry, decision-making and outcomes and overall productivity of government.

The eGovernment and Digital Economy Policy pre-election commitment clearly identifies open data as key building block to improve government services and policy, facilitate private and public sector innovation, and stimulate economic development.

The data.gov.au team have a short presentation which might be useful called The Shift to {Open|Big|Linked} Data