Discovering Metadata

From Data.gov.au
Jump to: navigation, search

Metadata

Data.gov.au uses components of the AGLS, Dublin Core and Data Catalog Vocabulary metadata standards as per below. Finance are doing analysis on this approach and expect to improve upon the metadata approach to get consistency of discovery data across all data types. This work is being done in collaboration with several entities and the Australian Government Linked Data Working Group.

See the Publishing PSI guide for more information on metadata.

You can download a copy of metadata and other schemas used by data.gov.au from the metadata and other schemas dataset on data.gov.au

Metadata Fields

Metadata Mapping

Human Readable Name AGLS Map DCAT Map ANZLIC Map G8 ODC Map
Title agls:title dcat:Dataset/dct:title MD_Metadata.identificationInfo:MD_DataIdentification.citation:CI_Citation.title 2.8.M Title
Description agls:description dcat:Dataset/dct:description MD_Metadata.identificationInfo:MD_DataIdentification.abstract 2.5.M Description
Keyword agls:subject dcat:Dataset/dct:keyword MD_Metadata.identificationInfo:MD_DataIdentification.topicCategory:MD_TopicCategoryCode

MD_Metadata.identificationInfo:MD_DataIdentification.descriptiveKeywords:MD_Keywords.keyword

2.6.M Keyword
Theme agls:function dcat:Dataset/dcat:theme MD_Metadata.identificationInfo:MD_DataIdentification.topicCategory:MD_TopicCategoryCode

MD_Metadata.identificationInfo:MD_DataIdentification.descriptiveKeywords:MD_Keywords.keyword

2.9.M Category
Language agls:language dcat:Dataset/dct:language MD_Metadata.identificationInfo:MD_DataIdentification.language 5.7.M Language
Licence agls:license dcat:Dataset/dcat:distribution

dcat:Distribution/dct:license

MD_LegalConstraints 5.1.M Licence
Rights agls:rights dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:rights

MD_Metadata.identificationInfo.resourceConstraints.useLimitation; MD_LegalConstraints (with MD_REstrictionCode = 'copyright' or 'intellectualPropertyRights') 5.2.M Copyright
Data Status - - MD_Metadata.identificationInfo:MD_Identification.status -
Update Frequency - dcat:Dataset/dct:accrualPeriodocity MD_MaintenanceInformation.maintenanceAndUpdateFrequency in 19115 (have not checked ANZLIC profile) 2.7.H Frequency of Update
Expose User Contact Information - - - -
Landing page agls:identifier

agls:source

dcat:Dataset/dcat:landingPage anzlic:linkage

MD_DigitalTransferOptions.onLine:CI_Onlineresoiurce.linkage:URL

4.1.M Documentation URL - resource
Publish date agls:date dcat:Dataset/dct:issued MD_Metadata.identificationInfo:MD_DataIdentification.citation:CI_Citation.date:CI_Date.date where CI_Date.dateType = 'publication' 2.2.M Release Date
Update date agls:modified dcat:Dataset/dct:modified MD_Metadata.identificationInfo:MD_DataIdentification.citation:CI_Citation.date:CI_Date.date where CI_Date.dateType = 'revision' 2.3.M Modified
Identifier agls:fileIdentifier dcat:Dataset/dct:identifier MD_Metadata.fileIdentifier 2.1.M Unique Identifier
Metadata URI - - MD_Metadata.dataSetURI -
Download URL agls:identifier dcat:Dataset/dcat:distribution/dcat:Distribution/dcat:downloadURL MD_Metadata.datasetURI or MD_DigitalTransferOptions.onLine:CI_Onlineresoiurce.linkage:URL 5.4.M URL - resource
File size agls:SizeorDuration dcat:Dataset/dcat:distribution/dcat:Distribution/dcat:byteSize MD_DigitalTransferOptions.transferSize 5.3.M Size
Access URL agls:identifier

agls:source

dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:accessURL

anzlic:linkage.URL Also CI_OnlineResource (19115) MD_DigitalTransferOptions.onLine:CI_Onlineresoiurce.linkage:URL 5.4.M URL - resource
Media type - dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:mediaType

MD_Metadata.identificationInfo.resourceFormat:MD_Format.name
MD_Metadata.identificationInfo.resourceFormat:MD_Format.version (version is mandatory in the ISO 19115/ANZLIC if a format name is specified)

MD_Metadata.identificationInfo.distributionFormat:MD_Format.name
MD_Metadata.identificationInfo.distributionFormat:MD_Format.version

5.6.M Format - resource
Format agls:medium

agls:format

dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:format

MD_Metadata.identificationInfo.resourceFormat:MD_Format.name

MD_Metadata.identificationInfo.resourceFormat:MD_Format.version (version is mandatory in the ISO 19115/ANZLIC if a format name is specified)
MD_Metadata.identificationInfo.distributionFormat:MD_Format.name
MD_Metadata.identificationInfo.distributionFormat:MD_Format.version
MD_Metadata.identificationInfo:MD_Distribution:MediumName

5.6.M Format - resource
Publisher agls:corporateName dct:publisher

(foaf:agent)

MD_Metadata.contact:CI_Responsibleparty.organisationName

MD_Metadata.identificationInfo:MD_DataIdentification.citation:CI_Citation.citedResponsibleParty:CI_ResponsibleParty

1.4.M Publisher

1.3.M Organisation (Owner)

Contact agls:AglsAgent dcat:contactPoint (vCard) MD_Metadata.contact:CI_ResponsibleParty.organisationName 1.1.M Person

1.2.M Contact Email - Dataset

Data Portal - - - -
Jurisdiction agls:jurisdiction - MD_Metadata.identificationInfo:MD_DataIdentification.extent:EX_Extent:EX_GeographicDescription 3.4.M Geographic Region Name
Homepage agls:agentterms:web dct:publisher (foaf:homePage) 19115:CI_Contact/onlineResource.linkage 5.5.M Homepage URL
Publisher (User Account) - dct:publisher (foaf:agent) MD_Metadata.identificationInfo:MD_DataIdentification.citation:CI_Citation.citedResponsibleParty:CI_ResponsibleParty 1.4.M Publisher
Contact(User Account) agls:AglsAgent dcat:contactPoint (vCard) MD_Metadata.contact:CI_ResponsibleParty.organisationName 1.1.M Person

1.2.M Contact Email - Dataset

Temporal coverage from agls:temporal

agls:coverage

dcat:Dataset/dct:temporal MD_Metadata.identificationInfo:MD_DataIdentification.extent:EX_Extent.temporalElement 3.10.M Temporal coverage starts
Temporal coverage to agls:temporal

agls:coverage

dcat:Dataset/dct:temporal MD_Metadata.identificationInfo:MD_DataIdentification.extent:EX_Extent.temporalElement 3.11.M Temporal coverage ends
Geospatial coverage agls:spatial dcat:Dataset/dct:spatial MD_Metadata.identificationInfo:MD_DataIdentification.extent:EX_Extent:EX_GeographicDescription 3.1.M Spatial coverage
ISO19115 Topic agls:subject dcat:Dataset/dcat:theme 19115 MD_TopicCategoryCode -
Field(s) of Research - dcat:Dataset/dcat:theme MD_Metadata.identificationInfo:MD_DataIdentification.descriptiveKeywords:MD_Keywords.keyword
MD_Metadata.identificationInfo:MD_DataIdentification.descriptiveKeywords:MD_Keywords.thesaurusName (e.g. New Zealand Standard Research Classification (ANZSRC), 2008)
-
Data Models - - - -

Dataset

Human Readable Name Class Attribute Description Example Vocab Control Commonwealth Definition System Generated Mandatory Repeatable
Title dcat:Dataset/dct:title Title of dataset Location of Medicare offices Free text N/A No Yes No
Description dcat:Dataset/dct:description Description of the dataset The Department of Human Services Service Centre locator contains information updated weekly, a search function and maps. Free text N/A No Yes No
Keyword dcat:Dataset/dct:keyword Keywords, subjects, topics of dataset health, health-care Free text tagging with autocompletion / LCSH N/A No Yes Yes
Theme agls:function The government jurisdiction defined business function to which the resource relates Communications, broadcasting standards Limited choice AGIFT top level categories No - selected from a list (list defined per jurisdiction) Yes Yes
Language dcat:Dataset/dct:language If not English, language should be assigned a value en Language codes (consider user of RFC4646) en Yes (default set, but changable) Yes No
Licence dcat:Dataset/dcat:distribution

dcat:Distribution/dct:license

License details Creative Commons Attribution 3.0 Australia Predefined list determined by the jurisdiction. As currently defined No - selected from a list Yes No
Rights dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:rights

Will automatically populate based on what is chosen in the license field. Text (automatic) No Yes No
Data Status Boolean The status of the data with regard to whether it is kept updated (active, yes) or historic (inactive, no) Active Limited choice Active

Inactive

No Yes No
Update Frequency dcat:Dataset/dct:accrualPeriodocity How often the dataset is updated Daily Limited choice Daily

Weekly
Monthly
Quarterly
Yearly
As Required

No Yes - conditional on Active Status No
Expose User Contact Information Whether the user contact details should be exposed as well as the organisation contact details. Relevant only to researcher/scientist users who want No Limited choice Yes

No

No Yes - is selected as "No" by default No
NOTE: the rest of the dataset-specific attributes are automatically generated
Landing page dcat:Dataset/dcat:landingPage URL with information on resource. http://data.gov.au/dataset/559708e5-480e-4f94-8429-c49571e82761 URL - Automatically generated from system. Must be the UUID URL. N/A Yes Yes No
Publish date dcat:Dataset/dct:issued Original publish date of record 1994-11-05T08:15:30-05:00 Datetime - automated ISO 8601 Yes Yes No
Update date dcat:Dataset/dct:modified Date modified 1994-11-05T08:15:30-05:00 Date - automated (date of most recently updated resource) ISO 8601 Yes Yes No
Identifier dcat:Dataset/dct:identifier The fileIdentifier for a metadata record must never change, irrespective of where that metadata record is stored. Should be system generated. In CKANs case the UUID is common to dataset and metadata record, and takes the UUID with it across new systems. URN:UUID (example 559708e5-480e-4f94-8429-c49571e82761) Automatically generaled unique ID. Decided against DOI as unique ID already generated in CKAN. DOI records created in ANDS can be leveraged for those who want them given data.gov.au metadata to be harvested by ANDS. N/A Yes Yes No
Metadata URI MD_Metadata.dataSetURI Automatically generated metadata URI. http://data.gov.au/dataset/559708e5-480e-4f94-8429-c49571e82761.rdf N/A Yes Yes No


Individual Resources (within datasets)

Human Readable Name Class Attribute Description Example Vocab Control Commonwealth Definition System Generated Mandatory Repeatable
Download URL dcat:dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:downloadURL

URL with information on resource. http://data.gov.au/dataset/559708e5-480e-4f94-8429-c49571e82761 URL - Automatically generated from system. Must be the UUID URL. N/A Yes Conditional No
File size dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:byteSize

Conditional if download URL is used. Automatically generated from the system where locally hosted, otherwise numerical entry. 84MB Automatically generated N/A Yes Conditional No
Access URL dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:accessURL

Conditional: Use Access URL when resource is not a direct download (i.e. index page, SPARQL endpoint, feed etc.) http://data.gov.au/geoserver/geelong-trees/wfs?request=GetCapabilities URL N/A Yes Conditional No
Media type dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:mediaType

Conditional: Media type of distribution as defined by IANA text/csv Automatically generated based on file type from IANA defitions - http://www.iana.org/assignments/media-types/media-types.xhtml IANA definitions Yes Conditional No
Format dcat:Dataset/dcat:distribution

dcat:Distribution/dcat:format

Conditional:File format of the distribution. If available in IANA, use Media Type Free text N/A No - conditional only if media type isn't automatically detected. Conditional No

Entity

Human Readable Name Class Attribute Description Example Vocab Control Commonwealth Definition System Generated Mandatory Repeatable
Publisher dct:publisher

(foaf:agent)

Name of the Entity/publishing organisation. Controlled via CKAN accounts. Department of Human Services Automatically taken from "Organisation", able to be modified. N/A Yes - inherited from organisation information. Able to be overridden Yes No
Contact dcat:contactPoint

(vCard)

Contact details of the publishing organisation. Controlled via CKAN accounts. Inc full name, telephone, email FN: Spatial Team

Tel Type: work
Tel: xx xxx xxxx
Email: spatial@entity.gov.au

Automatically taken from additional metadata added to the CKAN "Organisation" with these details able to be modified. N/A Yes - inherited from organisation information. Able to be overridden Yes No
Data Portal Which data portal, necessary for representing search results in search federation project. http://data.gov.au/ Automatic, drawn from data portals. http://data.gov.au/ Yes - inherited from system. Yes Yes
Jurisdiction agls:jurisdiction Which Australian Government Jurisdiction Commonwealth of Australia, New South Wales, Adelaide City Council, Etc Drop down choice of jurisdiction. "Commonwealth of Australia" and any existing jurisdictions already on data.gov.au. To add as required. Yes - inherited from organisation information, available from list. Able to be overridden Yes No
Homepage dct:publisher

(foaf:homePage)

Entity/Publisher homepage http://www.humanservices.gov.au/ Automatically taken from homepage of the "Organisation". N/A Yes - inherited from organisation information, available from list. Able to be overridden No No

User Account

Human Readable Name Class Attribute Description Example Vocab Control Commonwealth Definition System Generated Mandatory Repeatable
Publisher dct:publisher

(foaf:agent)

Name of the individual who has published the data in the organisation. Controlled via CKAN accounts. Bruce Wayne Automatically taken from the user account. Only enabled for public visibility if chosen by the user on a dataset by dataset basis. Only likely to be taken up by scientists and those who want personal citation. N/A Yes - inherited from user details. Yes No
Contact dcat:contactPoint

(vCard)

Contact details of the publisher. Controlled via CKAN accounts. Inc full name, telephone, email. FN: Joe Bloggs

Tel Type: work
Tel: xx xxx xxxx
Email: info@entity.com

Automatically taken from additional metadata added to the CKAN "Organisation" with these details able to be modified. N/A Yes - inherited from user details. Yes No

Extent

Human Readable Name Class Attribute Description Example Vocab Control Commonwealth Definition System Generated Mandatory Repeatable
Temporal coverage from dcat:Dataset/dct:temporal Start of temporal series in dataset. If only a point in time, then the user doesn't fill in the "to" temporal coverage. Can make it very usef friendly 2001/10/1 Date ISO 8601 No Yes No
Temporal coverage to dcat:Dataset/dct:temporal End of temporal series used in dataset 2001/10/2 Date ISO 8601 No Conditional No
Geospatial coverage dcat:Dataset/dct:spatial Spatial description of resource (gazetteer) Sydney Free text with a mandatory requirement to use one of the following:
  1. a point/polygon (WKT);
  2. an administrative boundary API; or,
  3. a reference URL (website address) from the National Gazatteer. Gazetteer reference URLs can be found by searching for a place at http://www.ga.gov.au/place-names/ then clicking through to the most appropriate location "Reference ID", and then copying and pasting the URL from the page into the Geospatial field in data.gov.au.
N/A but with explanatory text linking to Gazatteer and WKT information and strongly recommending either a Gazatteer record ID link or a point/polygon definition. No Yes - default to the Jurisdiction by default (Australia, South Australia, Geelong, etc) No

Additional Commonwealth Specific Options

Human Readable Name Class Attribute Description Example Vocab Control Commonwealth Definition System Generated Mandatory Repeatable
ISO19115 Topic dcat:Dataset/dcat:theme Main theme(s) of the dataset if spatial

[options: farming, biota, boundaries, climatology / meteorology / atmosphere, economy, elevation, environment, geoscientific information, health, imagery / base maps / earth cover, intelligence / military, inland waters, location, oceans, planning / cadastre, society, transportation, utilities /

Buildings and Structures ANZLIC:Open Spatial Data Taxonomy [options: farming, biota, boundaries, climatology / meteorology / atmosphere, economy, elevation, environment, geoscientific information, health, imagery / base maps / earth cover, intelligence / military, inland waters, location, oceans, planning / cadastre, society, transportation, utilities / communication] No No Yes
Field(s) of Research dcat:Dataset/dcat:theme The Australian and New Zealand Standard Research Classification (ANZSRC), 2008 defined field or fields of research relevant to the dataset. Atomic and Molecular Physics, Classical and Physical Optics Predefined text, but chosen as tags. Available Here No No Yes
Data Models See notes Free text - integrated with CKAN as well as possible. Free text option for data custodians to add information for any relevant data models, ontologies, taxonomies etc specific to their dataset. N/A No No No