Publishing Data

From Data.gov.au
Jump to: navigation, search

Publishing Data

Format

You must post tabular data in CSV if you want API access to be automatically generated for your dataset. The platform will host files of nearly any type, but it will only enable API or data visualisation for ‘clean’ CSV files. Finance also provides support for some geospatial data types including KML, and will advise on additional formats as they are supported.

However, you can publish any data file type, and Finance encourages entities to publish multiple formats where appropriate or useful to users.

Compressing a CSV into a ZIP

Compressing CSV's reduces the amount of time it takes to upload and download a file. This is particularly useful when a CSV has a large amount of data.

For windows users:

  1. Browse to the relevant CSV
  2. Select the file you wish to compress
  3. Right click on the CSV
  4. Hover over “send to”
  5. Click "Compressed (zipped) file"

You will now have a ZIP file in the same folder as your original CSV

  • Please note that you should only compress one csv at a time if you are uploading it to data.gov.au.

To upload your newly created ZIP file follow the instructions on Manual Publishing

Supportive documentation/context

Publish any supportive documentation/caveats/contextual information in the text of the descriptive information about the data set. If the information is extensive it can also be uploaded as a an additional resource to the dataset. Please do not put the data into the documentation itself, as it means the platform will treat it as just another file like a PDF rather than as a data file. This means the data is inaccessible to users and no API or data visualisation will be available for your dataset.

Automated Publishing

To automate your data uploads, you will need to either:

  • Have technical expertise in house to develop scripts that use the CKAN API;
  • Use software such as FME (proprietary) or Kettle (open source) to extract data from your data sources, clean them and push appropriate data up to data.gov.au; or
  • Pay a small amount to a services provider to create automation tools for you.

There are both push and pull methods for dataset updates. Documentation on the CKAN API is at http://docs.ckan.org/en/943-writing-extensions-tutorial/api.html#example-importing-datasets-with-the-ckan-api

Automating data harvest from your ArcGIS Open Data site

If you are making use of ArcGIS Online, you can also setup and expose your spatial data via an ArcGIS Open Data site. To sync your items from this Open Data site to data.gov.au, please follow the below steps:

  1. Point your browser to https://data.gov.au/harvest
  2. Click the Add Harvest Source button
  3. Locate the data.json file on your ArcGIS Open Data site. To access the data.json of your site, append /data.json to the home page URL. e.g.http://vicroadsopendata.vicroadsmaps.opendata.arcgis.com/data.json
  4. Paste the location of your data.json file into URL textbox
  5. Name your harvest we recommend a name like [Organisation Name] ArcGIS Harvest
  6. Select ‘data.json’ as your harvest source
  7. You don’t need to enter any other settings as the entry will be reviewed and approved by the data.gov.au team
  8. Click Save

Related screenshots

Automating data transformation and uploading with Taverna

The data.gov.au team is still exploring this tool but it was strongly recommend by colleagues in the Western Australian Government for data workflows and automating data uploads.

Details at http://www.taverna.org.uk/introduction/

Automating CKAN data uploads using FME

An FME workbench has been developed to update existing datasets which have been previously published to data.gov.au utilising the CKAN API.

Before you start you will need:

  1. A unique API key - This is assigned to you once you have set up an account with data.gov.au.
  2. Publish your dataset to your organisation.
  3. The dataset's URL. Example: https://data.gov.au/api/rest/dataset/ballarat-planning-applications-currently-on-advertising
  4. Create a private parameter in the workbench with your API key. This is easier to use instead of copying and pasting your API key multiple times.

You can find a sample FME workbench (created by Matthew Swards from City of Ballarat) on the datagovau github

Manual Publishing

  1. Log in with an authenticated account
  2. Browse to Datasets (top menu)
  3. Click Add Dataset – Remember you may have multiple files in the one dataset
  4. On the first page you’ll be asked to complete fields to describe your data (the metadata).
    • Title (required): name of the dataset.
    • Description: descriptive information about the dataset can also include further information or caveats pertaining to the data. * For formatting you can use markdown. Below you'll find a useful table of markdown you are likely to use on data.gov.au.
    • Keywords: some keywords (or tags) that describe your data.
    • License (required): a dropdown of available licenses for data.gov.au (the default is Creative Commons Attribution 3.0 Australia)
    • Organisation: a dropdown of organisations you can publish to. Most users can only publish to a single organisation. This will be automatically filled in.
    • Visibility: whether the dataset will be viewable to all users once complete. The default is private.
    • Geospatial Coverage (required): inherited from organisation metadata this is the area which the data covers. It can be;
      • a point/polygon (Well-known text);
      • an administrative boundary API; or,
      • a reference URL (website address) from the National Gazatteer. Gazetteer reference URLs can be found by searching for a place at http://www.ga.gov.au/place-names/ then clicking through to the most appropriate location "Reference ID", and then copying and pasting the URL from the page into the Geospatial field in data.gov.au.
    • Temporal Coverage From / To (required): the span of time from/to which the data is applicable. If the data applies only to a single point in time you should only fill in the Temporal Coverage From field.
    • Language: the language in which the dataset is published. The default is English.
    • Data Status (required): the status of the data with regard to whether it is kept updated (active, yes) or historic (inactive, no).
    • Update Frequency (required): how often the dataset is updated. Eg: Daily, Weekly, Never. (for remote machine readable files this field will be used to fetch new versions of this data)
    • Expose User Contact Information: display additional contact information for the dataset.
    • AGIFT Function/Theme: the AGIFT top level government function to which the dataset relates.
    • Publisher: name of entity/publishing organisation. The default is set to the organisation’s name.
    • Jurisdiction: name of the jurisdiction in which the dataset belongs. The default is set to the organisation’s jurisdiction.
  5. Click Next: Add Data
  6. Click Upload and select the file you wish to add to the dataset. Add a description for this file and add the (likely) 3 letter file extension to the Format field. You can add additional files to the dataset by clicking Save and add another and repeating the process. When finished adding resources click Next: Additional Info.
  7. You may choose to fill out the information on the Additional Info page.
    • Geospatial Topic: dropdown list this is the high level ISO19115 topic. Multiple topics can be selected.
    • Data Models: add any links to information on relevant data models, ontologies, taxonomies etc specific to your dataset. You can upload data models to the data.gov.au data model repository.
    • Field of Research: Australian and New Zealand Standard Research Classification (ANZSRC), 2008 defined field or fields of research relevant to the dataset.
  8. Click the Finish button.
Type Markdown formatting Result
Header Text ### Heading <h3>Heading</h3>
Italic text *Italic text* Italic text
Bold text **Bold text** Bold text
Link [data.gov.au](http://www.data.gov.au) data.gov.au
Bullet list

*   List Item
*   List Item
*   List Item
NB: there are at least 3 spaces between the asterix and text.

  • List Item
  • List Item
  • List Item
Numbered list

1. List Item
2. List Item
3. List Item

  1. List Item
  2. List Item
  3. List Item

Manual Publishing Screenshots

Updating an Existing Dataset

Using the steps below you can change metadata and update or add additional resources (files) to the dataset.

Updating a Dataset’s Metadata

  1. Browse to the relevant dataset
  2. Click the Manage button located near top right of the page
  3. On the subsequent page the metadata associated with a dataset can be updated
    1. Fields marked with a red asterisk are mandatory
  4. Once finished click the Update Dataset button at the bottom of the page

Updating a Dataset’s Metadata Screenshots

Adding Additional Resources to an Existing Dataset

  1. Browse to the relevant dataset
  2. Click the Manage button located on the near top right of the page
  3. Click the Resources link at the top of the page
  4. Click the + Add new resource button
  5. If uploading a file click the Upload button
  6. If linking to an existing service, file or site click the Link button
    1. When linking to a file the user will be presented with an option to generate an API using the linked data. This option will only work with formats compatible with CKAN or Geoserver
  7. Enter a Name for the resource
  8. If required enter a specific Description for the resource (markdown applies)
  9. Enter a Format (filetype eg, csv, kml, shp…) for the resource
    1. data.gov.au will attempt to guess format of the file if the field is left blank
    2. Format determines which visualisation will be automatically attached to the resource
    3. As of June 2015 additional visualisations can be added to a dataset
  10. If left blank Last Modified will default to time of addition
  11. Click the Add button

Adding Additional Resources to an Existing Dataset Screenshots =

Updating a Resource in an Existing Dataset

  1. Browse to the relevant dataset
  2. Click on the resource that requires an update
  3. Click the Manage button located on the near top right of the page
  4. Click the Red X button to remove the current version of the resource
  5. If uploading an updated file click the Upload button
  6. If updating a link to an existing service, file or site click the Link button
  7. If required update the Name
  8. If required update the Description (markdown applies)
  9. If required update the Format (filetype eg, csv, kml, shp…)
  10. If left blank Last Modified will default to time of update
  11. Click the Update Resource button

Updating a Resource in an Existing Dataset Screenshots

Changing order in which Resources are Displayed

  1. Browse to the relevant dataset
  2. Click the Manage button located on the near top right of the page
  3. Click the Resources link at the top of the page
  4. Click the Reorder resources button
  5. Resources can now be reordered by dragging and dropping into a new position
  6. Once finished click the Save order button

Changing order in which Resources are Displayed Screenshots

Deleting Resources from a Dataset

Deleting a resource will mean that is no longer available for use. If you are planning to remove a resource and replace with an updated version we recommend overwriting the resource. Updating a resource will mean its unique identifier will not change.

  1. Browse to the relevant dataset
  2. Click on the resource that is to be deleted
  3. Click the Manage button located on the near top right of the page
  4. Click the Delete button
  5. Click the Confirm button

Deleting Resources from a Dataset Screenshots

A note on updating existing resources

If you wish to replace an existing resource with a new version ensure that whenever possible the new file continues to use the same structure for the document. Developers and others are able to access the resource via its unique identifier (eg: [ad5c6594-571e-4874-994c-a9f964d789df]). If you overwrite a resource with a new file which is formatted differently or a different file type (replacing CSV with an XLS) you will run the risk of disrupting applications which utilise the resource.

If you plan to radically change the structure or format the data is delivered you should consider adding the new file as a different resource to the existing dataset. This will allow developers to continue to use the existing resource’s API while they make the necessary changes to their existing applications.

Dataset Comments

CKAN is integrated with Disqus for comment management. We have turned full moderation on for comments to allow entities the ability (and confidence) to manage and respond to comments in an appropriate and timely fashion.

The cloud based commenting system is Disqus and the moderation panel is at http://datagovau.disqus.com/admin/moderate/#/approved

Entities need to request the username and password for moderating comments. data.gov.au administrators monitor comments on a daily basis.

Please find more information on CKAN commenting at http://docs.ckan.org/en/ckan-1.7/commenting.html