Introduction to Data.gov.au
What is data.gov.au
data.gov.au is the national open data portal. The vision that all Federal Government data is discoverable in the one place, is machine readable, permissibly licensed and API enabled with some visualisation capability. We are on a journey to realise that vision across all government datasets but have made significant headway in recent times.
How does it work
data.gov.au is run by the Department of Finance (Finance). It was relaunched 17 July 2013 using the CKAN (Comprehensive Knowledge Archive Network) platform on the Amazon cloud (Australian based) based on the model of data.gov.uk as an example of international best practice. CKAN is a basic but useful tool that provides a catalogue of data sources with good metadata, as well as API enabled hosting of tabular data. By default CKAN supports the hosting of any files but automatically generates API access for machine readable tabular datasets that are uploaded. CKAN itself is a tool for:
- Publishing data files – the platform best supports tabular and spatial data files, but any file type can be uploaded to data.gov.au.
- Searching for data – using the metadata and descriptive information stored on each dataset. Data.gov.au will also share metadata with other Australian data initiatives to make finding data easier across jurisdictions and disciplines.
- Organising data files – data is organised by government organisation, tags or government functions based on the Australian Governments’ Interactive Function Thesaurus (AGIFT)
- Accessing data programmatically – CKAN generates API access to uploaded CSV data files and some spatial files for ease of use by analysts and developers.
- Linking to externally hosted data services (APIs) – data custodians can add links to their external data services, such as ABS or Geoscience Australia data APIs.
- Please note however, the new data.gov.au publishing policy is to no longer link to individual files hosted on government websites as these addresses often change and users/entities don’t get the advantage of consistent access to data.
- Basic data visualisation – CKAN supports basic spreadsheet, graphing and mapping views of datasets that are uploaded in CSV or spatial formats. Finance are looking to support additional data visualisation tools down the track and can assist entities in finding appropriate tools in the meantime.
You can read more on the CKAN platform at http://ckan.org – it is an open source community and there are active CKAN usergroups around the world.
Since the relaunch, the data.gov.au team have taken an agile and iterative improvements approach, expanding the system to meet the needs of data custodians and data users. Below is a live list of functional improvements the data.gov.au have made to date:
- Software stack architectural design to improve system scalability.
- The additional of a GeoServer, integrated with CKAN to add spatial web services capability to the platform for hosting spatial datasets.
- Integration of the data.gov.au blog on the front page of data.gov.au.
- Integration of Google Analytics with CKAN] for data.gov.au site/data reporting.
- Improvements to the metadata schema to meet Australian (AGLS) and international (DCAT, ANZLIC, ISO19115) data discovery best practices.
- Improved services around automation for data extraction, transformation and publishing.
- The ability to regularly harvest from CSW for ease of data discovery from existing data catalogs.
All code developed for data.gov.au is available on our data.gov.au GitHub.
You can see more technical information about the data.gov.au stack and change log at /index.php?title=Data.gov.au_Platform
What datasets should entities publish
Please see the 'What is open data' section on the wiki, which articulates different data types and how to deal with them.
There are broadly three data types found in Government Entities:
- Raw data generated out of business as usual activities – such as spatial data from a program, energy ratings data, crime statistics, administration data, etc. Often this data is stored in databases and primarily used in business applications.
- Processed data – new data that results from a process such as tables from annual reports, FOI logs, other data created through the functions and running of entities. This could also be an aggregate view of a raw data set, fit for public access.
- System data – data that is automatically generated from other processes such as web analytics, project management, access logs and other systems.
All three types have different values and benefits in publishing.
Identifying different data across your organisation means getting out of the traditional data teams and looks at other datasets that exist and how you can leverage them to improve services, policies and efficiencies. For instance, we recommend you analyse your FOI and Helpdesk logs for common requests, as you may be able to identify a number of datasets that you could actively publish. Thereby reducing time and resources spent on repetitively providing the same data to individual requests.
It is also beneficial to look at data the entity already publishes, either in data form or PDF form. Where possible, publishing the data form of your data on data.gov.au will improve accessibility, reuse, discoverability and the ability for your Entity to better reuse the data. For instance, publishing the tables from your Annual Report, budget, grants, administrative data or Entity-specific mandatory reporting, is all useful. The 2012-2013 Productivity Commission report largely spoke about the value of administrative data at http://www.pc.gov.au/annual-reports/2012-13
The entity may also want to assess what known datasets your entity has and identify a top 10 which would provide greater economic, transparency or policy benefits if made publicly available.
Mitigating Cost of FOI
The FOI Act contains an Information Publication Scheme (IPS) that requires entities to publish certain categories of information online, including information routinely given through FOI requests. Part 13 of the Information Commissioner’s FOI Guidelines mentions that the IPS is designed to lessen the number of individual FOI requests to entity, including by releasing data frequently requested under FOI.
The IPS also gives entities the discretion to publish other information. The FOI Guidelines recommend that entities should exercise that discretion to publish datasets they hold that could be made available for access and reuse, including on data.gov.au (see the ‘Managing an IPS entry’ section, under the ‘Publication on a website’ heading). The Guidelines also contain a template IPS agency plan that includes scope for publishing data on data.gov.au.
Section 93A of the FOI Act requires entities to ‘have regard’ to the FOI Guidelines when performing functions or exercising powers under the FOI Act.
Finance recommends entities analyse their FOI (and helpdesk) requests to identify specific datasets and other information that would create efficiencies through proactive publishing online.