Publishing your data

From Data.gov.au
Jump to: navigation, search

Click here to go straight to information on adding data to data.gov.au

Publishing open data

A fundamental aspect of open data is that it’s available for re-use with formats and licencing that allow others to re-use and remix the data. This section will guide you through topics like:

Process to opening a dataset

While each organisation’s approach to open data will vary, the first step is to determine the classification of the data. If it is unclassified then it is appropriate for public access. If it is classified, then you need to consider secure and non-open approaches to sharing the data. For a guide to improving your organisation’s open data capability, see the open data self-assessment.

Steps

  1. Choose data sets for release in line with your organisation’s approach and your users’ needs.
  2. Clarify who will be responsible for preparing, releasing and updating the data.
  3. Apply an open licence – the APS default is Creative Commons Attribution
  4. Make the data available
  5. Make the data discoverable on or through data.gov.au

Sourcing data

These possible sources of data that your organisation could open should be considered as part of a larger process to develop and maintain an open data strategy.

Data from new projects

When commissioning research, collecting data or establishing a new ICT system, adopt information management and procurement practices that ensure you have access to associated raw data in an open format and the right to publish that data online under an open licence. See Licensing your data for more information. This is important if a service provider is contracted to collect the data or to develop a website or mobile app built on an agency data source. Effective information governance (PSI Principle 3) will help you ensure that your agency has access to the raw data that was used to create existing publications or apps.

You should also ensure that other people in your organisation are aware that these governance processes and practices exist and are followed.

Releasing unpublished data

Consider whether your agency has unpublished data that could be released as open data. This may come from public reports, studies and newsletters that have only included processed data with select results from internal analysis.

Internal data such as project locations, demographic research and administrative data should also be considered for release.

Agencies wishing to convert or release previously unpublished data should first consider legislative and policy requirements that may prevent publication or require modification of the data before release.

In particular, you should consider obligations under the Australian Privacy Principles (APPs) in the Privacy Act 1988. Guidance about the Privacy Act is available in the OAIC's APP Guidelines. In addition, the OAIC's Information Policy Agency Resource 1 — De-identification of data and information discusses de-identification as a technique that allows agencies to balance privacy and transparency objectives when publishing open data.

Creating datasets

Creating a basic open data set

Creating a dataset can be a quick and easy process. At its most basic, a data set is simply a structured presentation of data, such as a spreadsheet, with some special features. These features can be designed as part of the data set from the beginning, or changed before publishing.

An open data set must be:

Saved in an open format

Any type of data can be shared in an open format but sometimes this means transforming the data from the original format to a different format. The benefit to agencies in publishing data in an open format is it makes it easier for someone else to reuse the data, such as another government agency or company. The benefits of open data come often from the ability to analyse and remix data alongside other data sets.

The table below rates common file types for their accessibility to users with a range of computing systems and access requirements.

For data.gov.au, it should be noted that users can publish any data file type, and that Finance encourages organisations to publish the most machine readable and open format. Data.gov.au automatically generates full API access to tabular and spatial datasets uploaded to data.gov.au (through CKAN and Geoserver) and is investigating similar support for other data types. Agencies should contact the data.gov.au team if they are considering publishing relational databases, realtime data or other data types.

If you are creating data for analysis or machine processing, it is important to note that spatial files, CSV and XLS are the only formats that automatically generate visualisations or API access for your data set on data.gov.au. CSV/XLS files will need to be structured according to the advice on the creating data sets page. Data.gov.au provides mapping services for some geospatial data types including KML, and will advise on additional formats as they are supported.

Data formats
Tabular
File type Openness Notes
CSV High The best format for opening structured data (eg. As spreadsheets)
XLS or XLSX Low Limits machine reading and use on non-Microsoft systems
Spatial
File type Openness Notes
KML High An open standard developed for Google Earth. May not translate to other systems. KMZ is also available as a packaged set of KML files.
WMS High Standardised format for georeferenced map images
WFS High Standardised format for geographical features
Text
File type Openness Notes
TXT High Simple text format readable on most operating systems. No formatting is available
RTF High Simple text format readable on most operating systems which retains some formatting
ODT Medium Limits machine reading
DOC or DOCX Low Limits machine reading and use on non-Microsoft systems
PDF Low Useful for document exchange to preserve formatting, but has limitations for machine reading, character recognition and remixing.

Formatted properly for tabular data

Any tabular data should be published in a CSV file as well as being included as a report. This allows users to analyse the data without having to convert it to an appropriate format. This is especially important for reports in formats such as PDF which restrict access to data and limit the ability for people to share and remix. PDFs should be made accessible or converted to an alternative format whenever possible. Tabular data for publishing should be both:

  • raw – presented in the simplest possible format with a single header row – and
  • clean – using uniform data formatting (eg. Numerical dates, postcodes in every field) with no missing entries, no embedded non-text information, data in every field and as few mistakes as possible.

Obtaining raw, clean data can be a challenge if you’re converting an existing file into a file for uploading as part of a data set. It’s particularly important to look out for elements like merged cells and formulas which can prevent the data from being read.

The examples below show how clean data can be easily compared and combined by a computer, whereas the non-clean data would confuse the system. For example, the use of Fem, Female and F could be processed as separate genders, and the ‘Copyright of Dept. X’ could cause an error in automatic processing of the data

Examples of raw, clean data
Date Age Gender Postcode
20/10/2013 12 M 2580
10/01/2013 - F 1462
02/11/2011 22 M -
12/05/2012 45 F 1464
19/01/2010 75 F 1800
Example of data that is not raw or clean
Copyright of Dept. X
Date Age Gender Postcode
01/20/2013 Fifteen Female Barton
10th Dec 11 15 Fem -
02/11/2011 xx Male 3652
12/05/2012 45* F 1464
* Footnote information

Accompanied by supportive/contextual documentation

Supportive documentation, caveats and contextual information should be included in descriptive information about the data set. If the information is extensive, it may also be possible to upload it as an additional resource to the dataset. Please do not put the data into the documentation itself, as it will restrict access to the data. This means the data will become less accessible to users, and will not be able to be picked up by APIs, data visualisation tools or other machine-to-machine processes.

Formatted to be useful

Data should be published with consideration for how it will be most useful. For example, column labels with internal codes like ‘DBQ-12-W’ will be a lot less useful than human-readable labels like ‘Drop Bear Queries 2012 Western Site’.

This is also a consideration when publishing data. For example, a data set on ‘procurement contract data’ with individual files for each year will make it easier for users to locate related data than individual data sets for each year. It will also be easier for data custodians to manage and maintain.

Extra credit

As noted in the Australian Government Web Guide, once the data is ready to publish in an open format, the agency should:

  • prepare appropriate metadata to accompany the dataset to ensure the data is discoverable and meaningful to the public. See intro to metadata for more information.
  • publish the data in an appropriate place, such as the agency website, data.gov.au, or an existing domain-specific collection or catalogue repository. See the where to publish page and the section on using data.gov.au.

You should also consider how to refine your approach so that the data (or subsequent releases of equivalent data) remains relevant and useful in future. This could include engaging with stakeholders; assessing how the data was reused; and considering whether the data should be presented in a different format or made accessible in different ways, such as through an application programming interface or API that allows programmers to easily reuse the data.

Where to publish

There are a range of data publishing options available to government organisations based on the jurisdiction and type of data. See below for national data publishing options, or visit the data portals section for more information on state, territory and local data sites, as well as other resources like state globes.

Federal

  • Data.gov.au: The single point of discovery for Federal open data. See What is data.gov.au for more information
  • National Map: A spatial visualisation tool for government open data. Users can’t publish information directly to the site, but it draws data from data.gov.au and directly from agencies when relevant. See the National Map section for more information
  • FIND: A catalogue of spatial data or services from governments, the private sector and research and education organisations. Published through negotiation with Geoscience Australia. See the FIND section for more information

Intro to metadata

What is metadata?

Metadata is information about data. It describes the content, format, quality, currency and availability of data in a consistent and meaningful way.

Metadata is useful for cataloguing single documents, but is most important for managing a large body of data. This is particularly important for open government data, as people who work with the data may be combining data sets from a wide range sources, both inside and outside of government.

Establishing a common vocabulary for metadata – using standards – makes it possible for users to find and remix data in a clear and structured way. As open data infrastructure evolves, detailed metadata will not only allow more people to find your data, it will also allow them to re-use the data in more meaningful ways.

How is it used?

There are a range of metadata standards that are used based on the data that is being described and where it is available. Each metadata standard contains elements, or fields, that describe the data. A common example of a metadata element is the ‘Title’, which contains the name of the dataset.

The data.gov.au metadata section has information on the simple form on data.gov.au that is used to make data discoverable through the sites. For people who need technical information, the metadata requirements for specific data types such as spatial data are also described there.

Agencies considering the establishment of their own data catalogues should consider the data.gov.au metadata profile, based on DCAT. Spatial data catalogues should use the ANZLIC metadata schema (which also maps to the data.gov.au DCAT schema).

FIND, the Australian Government spatial data catalogue, works with data.gov.au to provide access to a network of government data. The FIND metadata profile includes information on how to structure spatial metadata (XLS) so that your data can be made interoperable and accessible, as well as discoverable through FIND.

Licensing your data

Licences provide a clear and standardised guide for other people about how they can use your data, including the option to reuse, remix and share the content.

The Statement of IP Principles for Australian Government Agencies requires agencies to encourage public use and easy access to published material. This includes permission for public use and re-use of material without requiring royalties and on a non-exclusive basis.

The default licence for the Australian Government is the Creative Commons Attribution 3.0 Australia (CC BY 3.0 AU) for publishing data and information, unless a clear case is made for another open licence. You can access the licence text on the Creative Commons website in either plain English or legal code.

For support and guidance with licensing, see the Australian Governments Open Access and Licensing Framework (AusGOAL) website, which gives more detail on the open licences that can be used by government, risk management for implementing open licences and information on licensing for special cases such as software.

Privacy and security

It is important to ensure you approach data publishing with privacy and security principles in mind. For more information about privacy and security considerations, please refer to the Principles on Open Public Sector Information from the Office of the Australian Information Commissioner.

Standards

Standards are an important aspect of open data, as they ensure data is accessible and interoperable. Please see the section on [Data Formats] for information about open data standards. Below are some additional standards agencies should consider.

Implementing standards

There are a wide range of standards that may be relevant to data projects, relating to issues like spatial information, metadata and addressing. The spatial and metadata pages of this toolkit have more information on how to use their required standards.

Standards management

System owners and data owners should, wherever possible, consider relevant international and Australian standards for their data.

Standards bodies dealing with data include:

Australian Government Standards that have been adopted for use include:

  • AGLS (National Archives standard for making online information and services visible, manageable and interoperable)
  • ANZLIC Spatial Metadata Profile
  • SDMX (Statistical Data and Metadata Exchange standard managed by the Australian Bureau of Statistics)
  • AIXM (Aeronautical Information Exchange Model standard)
  • AS/NZS4819 Rural and Urban Addressing
  • AS4590 Interchange of client information

This section will be updated as guidance evolves.

Intro to spatial data

Spatial data is any data that refers to places in the physical world. This can include:

  • Geographic features like mountains and lakes
  • Man-made objects like houses and roads
  • Non-physical objects or information about the location like electoral boundaries or internet quality

Spatial data sets are fundamentally the same as all other data sets; they simply contain fields for spatial information such as latitude and longitude as well as their other information. This means all the guides to making data open still apply.

The Statistical Spatial Framework

The ABS's Statistical Spatial Framework provides a principals based framework for spatially enabling socio-economic datasets, including administrative datasets, to ensure consistency and comparability. Implementation of the framework is supported by guidance material and resources, and references existing standards and infrastructure. Key guidance materials include:

FIND

FIND is the Australian Government spatial data catalogue, and in conjunction with data.gov.au, provides access to a network of government data. It enables users to search a wide range of spatially-referenced datasets, many of which are available for free download. FIND queries ‘nodes’ from Australian state, territory and federal governments, research institutions and other authoritative organisations to provide a centralised search for Australian spatial data information.

A key objective of FIND is to improve access to and availability of nationally consistent spatial datasets, as part of the Australian Government’s declaration of Open Government. Technology used for FIND is based on the open source GeoNetwork platform. The platform has been endorsed by a number of international organisations including various United Nations Organisations and the World Bank as world best practice for the collection, management and discovery of spatial metadata. The standards underlying FIND were developed by the Open Geospatial Consortium.

FIND is developed and maintained by Geoscience Australia on behalf of the Department of Communications and ANZLIC - the Spatial Information Council.

Becoming a node of FIND

For a metadata record to be eligible to be added to FIND it must be:

  1. metadata for spatial data or a spatial service; and
  2. produced by government or an authoritative body such as an educational or research institution.

Enquiries can be directed to spatial@communications.gov.au.

Diagram of the qualifiers for an organisation to become a FIND node. The fundamental requirements are that the organisation must be creating metadata records for spatial data or spatial services that are produced by government or an authoritative body like an educational or research institution. For the detailed process, please email spatial@communications.gov.au

National Map

What is the National Map

The National Map is a website for map-based access to government spatial data. It is designed to:

  • Provide easy access to data for government, business and the public
  • Integrate datasets into a ‘front end map’ for data.gov.au
  • Provide an open framework of geospatial data services that supports commercial and community innovation
  • Provide agencies with an easy map to embed on their own websites

The National Map is an initiative of the Australian Commonwealth Government's Department of Communications. The software was developed by NICTA working closely with the Department and Geoscience Australia.

NationalMap screenshot.PNG

How do you use the data?

To view a data set on the National Map, go to http://nationalmap.gov.au/, select Data then National Data Sets. This will display a list of available data topics.

Other tips:

  • To zoom to a data set’s area on the map, click on the name of the data set.
  • For detail about specific information captured in a data set, click on the specific point, line or area on the map.
  • For detail about the entire data set, click ‘info’ next to the name of a data set for more information, conditions of use and a link to download the data.

National Map is best used with a browser with WebGL support such as the latest versions of Google Chrome, Mozilla Firefox and Internet Explorer 11. It will work with limited functionality in older browsers such as Internet Explorer 9 and Internet Explorer 10.

Possible uses for the National Map include:

  • Finding data sets and services (for any data set or service visible in the National Map, click "info" to view how to access the data set/service directly).
  • Set up a Web Map Service (WMS) or Web Feature Service (WFS) and load the URL for that service into National Map. For example, you can use the open source software Geoserver to do this or use one of many commercial GIS systems such as ESRI ArcGIS Server, Pitney Bowes' Mapinfo or Google Maps Engine and enable WMS and/or WFS services from it.
  • Build a website that uses the value-add service API. Email nationalmap@lists.nicta.com.au to find out how.

How do you add data?

Government data can be added to the National Map by uploading it to data.gov.au in a common spatial data format. The main data formats supported by the National Map are GeoJSON, KML, KMZ and CSV (with latitude and longitude columns). The National Map routinely harvests spatial services from data.gov.au and FIND. It takes between 24 - 48 hours from when a spatial data set is uploaded to data.gov.au for it to appear on the National Map.

If uploaded spatial data to data.gov.au is not showing on National Map, start by checking the data is in one of the formats listed above. If the issues is not resolved, email spatial@communications.gov.au.

A data set can be added to National Map for a single session by dragging it on to the map or clicking on ‘Add Data’ under the Data tab. This is particularly relevant when working with personal, private or temporary data that cannot be uploaded to data.gov.au. Data added to a map in this way will not be saved on the National Map and cannot be shared using the share button on the site.

For large data sets that are better streamed than uploaded, email nationalmap@lists.nicta.com.au to discuss options for making data available in more detail.

How does the National Map work?

The National Map is a fully open architecture that provides a direct link between the user and the government department or agency who is the custodian of the data .For example, if you access data relating to "broadband availability and quality", you are accessing that directly from the Department of Communications; when you access data relating to surface geology, it is accessed directly from Geoscience Australia. The National Map itself does not store any data - it provides a map-based view to data that is stored by a growing number of government bodies.

Open source software

The National Map was created with the following open source software. The developers contribute back to the software projects as appropriate.

See the NICTA Github page for more information about the National Map.

Data portals

See below for an incomplete list of open data portals and resources in Australia and New Zealand. Please send any suggestions for additions to the list to spatial@communications.gov.au.

Federal

  • The Federal Open Data Toolkit including information about all data policies and guidance for the Federal Government.
  • The data.gov.au portal for Federal data. Also includes metadata sharing from other Australian governments.
  • National Map provides a mapping service auto-generated from data.gov.au, with the capability to add private data sets for visualisation and comparison.
  • The Australian Open Data 500 was a Dept Communications initiative to identify private sector demand for public sector data.
  • FIND is a catalogue of Australian spatial datasets which queries data from Australian federal, state and territory governments as well as research organisations.
  • The data.gov.au team created a mindmap of the the government data landscape in Australia. Any updates to this mindmap are welcome.

Australian Capital Territory

New South Wales

Queensland

  • The data.qld.gov.au portal for Queensland government data.
  • Queensland Globe, a mapping and data application for exploring spatial data and spatially referenced Queensland open data.

South Australia

  • The data.sa.gov.au portal for South Australian government data, notable for releasing an enormous amount of new and high value datasets.
  • The South Australian Open Data Toolkit, launched in November 2014 including departmental reporting.
  • The Open Data Declaration launched in September 2013.

Victoria

Western Australia

Tasmania

Local Governments

On data.gov.au

Glenorchy

City of Melbourne

Brisbane City Council

Community

  • GovHack was first run in 2009 to draws together people from government, industry, academia and, of course, the general public to mashup, reuse, and remix government data. In 2014, this included over 1300 participants and observers in 11 cities creating more than 170 added-value open data projects.
  • The Random Hacks of Kindness communities in Adelaide, Melbourne and Sydney.

International

New Zealand

United States of America

This page is maintained by the data.gov.au team. Please contact data.gov@finance.gov.au with any questions, comments or congratulations.