Publishing your data
Publishing open data
A fundamental aspect of open data is that it’s available for re-use with formats and licencing that allow others to re-use and remix the data. This section will guide you through topics like:
- How to create datasets from new or existing data
- What metadata is and how you should add it
- What licences can be used for open data
- Tools for finding published data
Process to opening a dataset
While each organisation’s approach to open data will vary, the first step is to determine the classification of the data. If it is unclassified then it is appropriate for public access. If it is classified, then you need to consider secure and non-open approaches to sharing the data. For a guide to improving your organisation’s open data capability, see the open data self-assessment.
- Choose data sets for release in line with your organisation’s approach and your users’ needs.
- Clarify who will be responsible for preparing, releasing and updating the data.
- Apply an open licence – the APS default is Creative Commons Attribution
- Make the data available
- Make the data discoverable on or through data.gov.au
These possible sources of data that your organisation could open should be considered as part of a larger process to develop and maintain an open data strategy.
Data from new projects
When commissioning research, collecting data or establishing a new ICT system, adopt information management and procurement practices that ensure you have access to associated raw data in an open format and the right to publish that data online under an open licence. See Licensing your data for more information. This is important if a service provider is contracted to collect the data or to develop a website or mobile app built on an agency data source. Effective information governance (PSI Principle 3) will help you ensure that your agency has access to the raw data that was used to create existing publications or apps.
You should also ensure that other people in your organisation are aware that these governance processes and practices exist and are followed.
Releasing unpublished data
Consider whether your agency has unpublished data that could be released as open data. This may come from public reports, studies and newsletters that have only included processed data with select results from internal analysis.
Internal data such as project locations, demographic research and administrative data should also be considered for release.
Agencies wishing to convert or release previously unpublished data should first consider legislative and policy requirements that may prevent publication or require modification of the data before release.
In particular, you should consider obligations under the Australian Privacy Principles (APPs) in the Privacy Act 1988. Guidance about the Privacy Act is available in the OAIC's APP Guidelines. In addition, the OAIC's Information Policy Agency Resource 1 — De-identification of data and information discusses de-identification as a technique that allows agencies to balance privacy and transparency objectives when publishing open data.
Creating a basic open data set
Creating a dataset can be a quick and easy process. At its most basic, a data set is simply a structured presentation of data, such as a spreadsheet, with some special features. These features can be designed as part of the data set from the beginning, or changed before publishing.
An open data set must be:
Saved in an open format
Any type of data can be shared in an open format but sometimes this means transforming the data from the original format to a different format. The benefit to agencies in publishing data in an open format is it makes it easier for someone else to reuse the data, such as another government agency or company. The benefits of open data come often from the ability to analyse and remix data alongside other data sets.
The table below rates common file types for their accessibility to users with a range of computing systems and access requirements.
For data.gov.au, it should be noted that users can publish any data file type, and that Finance encourages organisations to publish the most machine readable and open format. Data.gov.au automatically generates full API access to tabular and spatial datasets uploaded to data.gov.au (through CKAN and Geoserver) and is investigating similar support for other data types. Agencies should contact the data.gov.au team if they are considering publishing relational databases, realtime data or other data types.
If you are creating data for analysis or machine processing, it is important to note that spatial files, CSV and XLS are the only formats that automatically generate visualisations or API access for your data set on data.gov.au. CSV/XLS files will need to be structured according to the advice on the creating data sets page. Data.gov.au provides mapping services for some geospatial data types including KML, and will advise on additional formats as they are supported.
|CSV||High||The best format for opening structured data (eg. As spreadsheets)|
|XLS or XLSX||Low||Limits machine reading and use on non-Microsoft systems|
|KML||High||An open standard developed for Google Earth. May not translate to other systems. KMZ is also available as a packaged set of KML files.|
|WMS||High||Standardised format for georeferenced map images|
|WFS||High||Standardised format for geographical features|
|TXT||High||Simple text format readable on most operating systems. No formatting is available|
|RTF||High||Simple text format readable on most operating systems which retains some formatting|
|ODT||Medium||Limits machine reading|
|DOC or DOCX||Low||Limits machine reading and use on non-Microsoft systems|
|Low||Useful for document exchange to preserve formatting, but has limitations for machine reading, character recognition and remixing.|
Formatted properly for tabular data
Any tabular data should be published in a CSV file as well as being included as a report. This allows users to analyse the data without having to convert it to an appropriate format. This is especially important for reports in formats such as PDF which restrict access to data and limit the ability for people to share and remix. PDFs should be made accessible or converted to an alternative format whenever possible. Tabular data for publishing should be both:
- raw – presented in the simplest possible format with a single header row – and
- clean – using uniform data formatting (eg. Numerical dates, postcodes in every field) with no missing entries, no embedded non-text information, data in every field and as few mistakes as possible.
Obtaining raw, clean data can be a challenge if you’re converting an existing file into a file for uploading as part of a data set. It’s particularly important to look out for elements like merged cells and formulas which can prevent the data from being read.
The examples below show how clean data can be easily compared and combined by a computer, whereas the non-clean data would confuse the system. For example, the use of Fem, Female and F could be processed as separate genders, and the ‘Copyright of Dept. X’ could cause an error in automatic processing of the data
|Copyright of Dept. X|
|10th Dec 11||15||Fem||-|
|* Footnote information|
Accompanied by supportive/contextual documentation
Supportive documentation, caveats and contextual information should be included in descriptive information about the data set. If the information is extensive, it may also be possible to upload it as an additional resource to the dataset. Please do not put the data into the documentation itself, as it will restrict access to the data. This means the data will become less accessible to users, and will not be able to be picked up by APIs, data visualisation tools or other machine-to-machine processes.
Formatted to be useful
Data should be published with consideration for how it will be most useful. For example, column labels with internal codes like ‘DBQ-12-W’ will be a lot less useful than human-readable labels like ‘Drop Bear Queries 2012 Western Site’.
This is also a consideration when publishing data. For example, a data set on ‘procurement contract data’ with individual files for each year will make it easier for users to locate related data than individual data sets for each year. It will also be easier for data custodians to manage and maintain.
As noted in the Australian Government Web Guide, once the data is ready to publish in an open format, the agency should:
- prepare appropriate metadata to accompany the dataset to ensure the data is discoverable and meaningful to the public. See intro to metadata for more information.
- publish the data in an appropriate place, such as the agency website, data.gov.au, or an existing domain-specific collection or catalogue repository. See the where to publish page and the section on using data.gov.au.
You should also consider how to refine your approach so that the data (or subsequent releases of equivalent data) remains relevant and useful in future. This could include engaging with stakeholders; assessing how the data was reused; and considering whether the data should be presented in a different format or made accessible in different ways, such as through an application programming interface or API that allows programmers to easily reuse the data.
Where to publish
There are a range of data publishing options available to government organisations based on the jurisdiction and type of data. See below for national data publishing options, or visit the data portals section for more information on state, territory and local data sites, as well as other resources like state globes.
- Data.gov.au: The single point of discovery for Federal open data. See What is data.gov.au for more information
- NationalMap: A spatial visualisation tool for government open data. Users can’t publish information directly to the site, but it draws data from data.gov.au and directly from agencies when relevant. See the NationalMap section for more information
- FIND: A catalogue of spatial data or services from governments, the private sector and research and education organisations. Published through negotiation with Geoscience Australia. See the FIND section for more information
Intro to metadata
What is metadata?
Metadata is information about data. It describes the content, format, quality, currency and availability of data in a consistent and meaningful way.
Metadata is useful for cataloguing single documents, but is most important for managing a large body of data. This is particularly important for open government data, as people who work with the data may be combining data sets from a wide range sources, both inside and outside of government.
Establishing a common vocabulary for metadata – using standards – makes it possible for users to find and remix data in a clear and structured way. As open data infrastructure evolves, detailed metadata will not only allow more people to find your data, it will also allow them to re-use the data in more meaningful ways.
How is it used?
There are a range of metadata standards that are used based on the data that is being described and where it is available. Each metadata standard contains elements, or fields, that describe the data. A common example of a metadata element is the ‘Title’, which contains the name of the dataset.
The data.gov.au metadata section has information on the simple form on data.gov.au that is used to make data discoverable through the sites. For people who need technical information, the metadata requirements for specific data types such as spatial data are also described there.
Agencies considering the establishment of their own data catalogues should consider the data.gov.au metadata profile, based on DCAT. Spatial data catalogues should use the ANZLIC metadata schema (which also maps to the data.gov.au DCAT schema).
FIND, the Australian Government spatial data catalogue, works with data.gov.au to provide access to a network of government data. The FIND metadata profile includes information on how to structure spatial metadata (XLS) so that your data can be made interoperable and accessible, as well as discoverable through FIND.
Licensing your data
Licences provide a clear and standardised guide for other people about how they can use your data, including the option to reuse, remix and share the content.
The Statement of IP Principles for Australian Government Agencies requires agencies to encourage public use and easy access to published material. This includes permission for public use and re-use of material without requiring royalties and on a non-exclusive basis.
The default licence for the Australian Government is the Creative Commons Attribution 3.0 Australia (CC BY 3.0 AU) for publishing data and information, unless a clear case is made for another open licence. You can access the licence text on the Creative Commons website in either plain English or legal code.
For support and guidance with licensing, see the ODI website, which gives more detail on the open licences that can be used by government, risk management for implementing open licences and information on licensing for special cases such as software.
Privacy and security
It is important to ensure you approach data publishing with privacy and security principles in mind. For more information about privacy and security considerations, please refer to the Principles on Open Public Sector Information from the Office of the Australian Information Commissioner.
Standards are an important aspect of open data, as they ensure data is accessible and interoperable. Please see the section on [Data Formats] for information about open data standards. Below are some additional standards agencies should consider.
There are a wide range of standards that may be relevant to data projects, relating to issues like spatial information, metadata and addressing. The spatial and metadata pages of this toolkit have more information on how to use their required standards.
System owners and data owners should, wherever possible, consider relevant international and Australian standards for their data.
Standards bodies dealing with data include:
- International Organization for Standardization (ISO)
- Open Geospatial Consortium (OGC)
- Standards Australia
Australian Government Standards that have been adopted for use include:
- AGLS (National Archives standard for making online information and services visible, manageable and interoperable)
- ANZLIC Spatial Metadata Profile
- SDMX (Statistical Data and Metadata Exchange standard managed by the Australian Bureau of Statistics)
- AIXM (Aeronautical Information Exchange Model standard)
- AS/NZS4819 Rural and Urban Addressing
- AS4590 Interchange of client information
This section will be updated as guidance evolves.
Intro to spatial data
Spatial data is any data that refers to places in the physical world. This can include:
- Geographic features like mountains and lakes
- Man-made objects like houses and roads
- Non-physical objects or information about the location like electoral boundaries or internet quality
Spatial data sets are fundamentally the same as all other data sets; they simply contain fields for spatial information such as latitude and longitude as well as their other information. This means all the guides to making data open still apply.
The Statistical Spatial Framework
The ABS's Statistical Spatial Framework provides a principals based framework for spatially enabling socio-economic datasets, including administrative datasets, to ensure consistency and comparability. Implementation of the framework is supported by guidance material and resources, and references existing standards and infrastructure. Key guidance materials include:
- SSF-on-a-page (includes resource links)
- Geocoding Using Address
- Using Geography with Statistics
- Protecting Privacy for Geospatially Enabled Statistics
What is the NationalMap
The NationalMap is a website for map-based access to government spatial data. It is designed to:
- Provide easy access to data for government, business and the public
- Integrate datasets into a ‘front end map’ for data.gov.au
- Provide an open framework of geospatial data services that supports commercial and community innovation
- Provide agencies with an easy map to embed on their own websites
How do you use the data?
To view a data set on the NationalMap, go to http://nationalmap.gov.au/, select Data then National Data Sets. This will display a list of available data topics.
- To zoom to a data set’s area on the map, click on the name of the data set.
- For detail about specific information captured in a data set, click on the specific point, line or area on the map.
- For detail about the entire data set, click ‘info’ next to the name of a data set for more information, conditions of use and a link to download the data.
NationalMap is best used with a browser with WebGL support such as the latest versions of Google Chrome, Mozilla Firefox and Internet Explorer 11. It will work with limited functionality in older browsers such as Internet Explorer 9 and Internet Explorer 10.
Possible uses for the NationalMap include:
- Finding data sets and services (for any data set or service visible in the NationalMap, click "info" to view how to access the data set/service directly).
- Set up a Web Map Service (WMS) or Web Feature Service (WFS) and load the URL for that service into NationalMap. For example, you can use the open source software Geoserver to do this or use one of many commercial GIS systems such as ESRI ArcGIS Server, Pitney Bowes' Mapinfo or Google Maps Engine and enable WMS and/or WFS services from it.
- Build a website that uses the value-add service API. Email email@example.com to find out how.
How do you add data?
Government data can be added to the NationalMap by uploading it to data.gov.au in a common spatial data format. The main data formats supported by the NationalMap are GeoJSON, KML, KMZ and CSV (with latitude and longitude columns). The NationalMap routinely harvests spatial services from data.gov.au and FIND. It takes between 24 - 48 hours from when a spatial data set is uploaded to data.gov.au for it to appear on the NationalMap.
A data set can be added to NationalMap for a single session by dragging it on to the map or clicking on ‘Add Data’ under the Data tab. This is particularly relevant when working with personal, private or temporary data that cannot be uploaded to data.gov.au. Data added to a map in this way will not be saved on the NationalMap and cannot be shared using the share button on the site.
For large data sets that are better streamed than uploaded, email firstname.lastname@example.org to discuss options for making data available in more detail.
How does the NationalMap work?
The NationalMap is a fully open architecture that provides a direct link between the user and the government department or agency who is the custodian of the data. For example, if you access data relating to "broadband availability and quality", you are accessing that directly from the Department of Communications and the Arts; when you access data relating to surface geology, it is accessed directly from Geoscience Australia. The NationalMap itself does not store any data - it provides a map-based view to data that is stored by a growing number of government bodies.
Open source software
The NationalMap was created with the following open source software. The developers contribute back to the software projects as appropriate.
- Cesium (open source under the Apache 2.0 licence)
- Leaflet (open source under the simplified BSD licence)
- Geoserver (open source under the GNU GPL 2.0 licence)
- jquery, URI.js, proj4js, html2canvas, knockout (all open source under the MIT licence)
- esri-leaflet.js (open source under the Apache 2.0 licence)
- togeojson, Tilelayer.Bing.js (open source under the wtfpl ver 2)
See the Data61 Github page for more information about the NationalMap.
See below for an incomplete list of open data portals and resources in Australia and New Zealand. Please send any suggestions for additions to the list to email@example.com.
- The Federal Open Data Toolkit including information about all data policies and guidance for the Federal Government.
- The data.gov.au portal for Federal data. Also includes metadata sharing from other Australian governments.
- NationalMap provides a mapping service auto-generated from data.gov.au, with the capability to add private data sets for visualisation and comparison.
- The Australian Open Data 500 was a Dept Communications initiative to identify private sector demand for public sector data.
- FIND is a catalogue of Australian spatial datasets which queries data from Australian federal, state and territory governments as well as research organisations.
- The data.gov.au team created a mindmap of the the government data landscape in Australia. Any updates to this mindmap are welcome.
Australian Capital Territory
- The data.act.gov.au portal for ACT government data.
- The ACT Open Data policy, released on GitHub in 2013.
New South Wales
- The NSW Open Data Policy
- The data.nsw.gov.au portal for NSW government data.
- The NSW Open Data Implementation Plan and Dashboard
- The NSW Open Data Policy, Implementation Plan and Guidelines.
- NSW Globe, a mapping and data application for exploring spatial data and spatially referenced NSW open data.
- The data.qld.gov.au portal for Queensland government data.
- Queensland Globe, a mapping and data application for exploring spatial data and spatially referenced Queensland open data.
- The data.sa.gov.au portal for South Australian government data, notable for releasing an enormous amount of new and high value datasets.
- The South Australian Open Data Toolkit, launched in November 2014 including departmental reporting.
- The Open Data Declaration launched in September 2013.
- The data.vic.gov.au portal for Victorian government data.
- The DataVic Access Policy and the Intellectual Property Policy launched in August 2012.
- The DataVic Access Policy Standards and Guidelines released in April 2013.
- The Western Australian Government Open Data policy and portal
- The Shared Land Information Platform (SLIP), which gives users access to Western Australia's significant land and geographic information resources over the web.
- All foundational spatial datasets (property, contours, roads, etc) now available on redeveloped LIST system with 50 high value datasets freely availabel under CC-BY.
- Have been working primarily on spatial data requirements for the state and with the Sense-T sensor data project.
- Several Local Councils are publishing data on data.gov.au. The latest list can be found (with a few exceptions) by searching for council on data.gov.au
- The Glenorchy City Council open data policy was released in November 2014.
- Data for the council area is available on data.gov.au.
City of Melbourne
- The Melbourne beta open data portal launched in May 2014.
Brisbane City Council
- GovHack was first run in 2009 to draws together people from government, industry, academia and, of course, the general public to mashup, reuse, and remix government data. In 2014, this included over 1300 participants and observers in 11 cities creating more than 170 added-value open data projects.
- The Random Hacks of Kindness communities in Adelaide, Melbourne and Sydney.
- The NZ Government’s data.govt.nz portal for New Zealand government data.
- The NZ open data case studies site, which Finance highly recommend. The case study of the release of traffic density data which led to improved prediction of GDP growth and decline in the New Zealand economy is particularly recommended.
- FYI: Brasil is unlocking the value of government information and data on the New Zealand Government Web Toolkit blog.
United States of America
- data.gov has a list of international data portals
This page is maintained by the data.gov.au team. Please contact firstname.lastname@example.org with any questions, comments or congratulations.