This project automated the collection of state and territory governments’ open data feeds about current roadworks and road closures to form a national harmonised dataset and a historical record of roadworks data.
This project contributes to our understanding of many enduring questions for freight, including:
- What and where are the physical and regulatory bottlenecks and barriers for the efficient and safe movement of freight?
- How well are Australia’s freight transport networks performing?
- What and where are the opportunities for freight movements to be more efficient and safe?
The purpose of the visualisation is to demonstrate the benefits of having a national harmonised dataset. It enables the user to see where roadworks are occurring and how border restrictions are shifting due to COVID-19. The historic information is useful for understanding where investment is happening across the network, and could be combined with other data to analyse and plan for improved safety as well as efficiency.
The type and format of data reported vary across state Application Programming Interfaces (APIs). Some states provided a dedicated roadworks feed, others provided a categorisation as part of a wider road events API. These were all recorded in different formats. We have automated the collection and harmonisation of this data from state roadworks data feeds. Data is collected and updated in this visualisation daily, and historical data can be viewed using the time slider.
By consolidating the mapping of the categories, a semi-harmonised national dataset has been created. Currently the category mappings are automated in the data gathering software. This capture of daily data enables the insight to be generated.
An examination of the mapping for the ‘Roadworks’ category shows there are multiple formats and names for capturing roadworks data.
|Category||Source Data Field Name (Case Sensitive) ||State(s)|
Category mapping was established to normalise the roadworks data for use in the Hub. This created a common set of fields which forms the harmonised dataset. For example, all of the ‘Source Data Field Names’ in the above table are referred to as ‘Roadworks’ in the harmonised dataset.
This complexity increases as additional filters are added to the Insight. To mitigate this the following categories were applied to the state and territory governments data:
- Road conditions
- Road closure
Border restrictions present an interesting insight into the effects of COVID-19 in the Observation section. Most states have categorised border restrictions as ‘hazards’. Text matching against the description has been used to identify these records and re-categorise them as ‘road closures’. This text matching uses the following search terms:
- Border Control
Future improvements to the data
More consistent data would enhance the ability to draw out insights from the data. Consistency in the reporting of category mapping, date filters and reference numbers would be three important improvements.
- Category mapping – clearer definitions for category mapping will improve the accuracy of the insight, for example, COVID-19 border closures have been defined as a hazard and a road closure.
- Date filters – currently there are start dates to the events in the data. However, the end date is inconsistently reported. From the Hubs perspective the disappearance of a record could constitute the event finishing, however, this is not possible without consistent reference numbers (ID’s).
- Reference numbers (ID’s) – some states and territories have different reference numbers for the same event. Other data sources are missing reference numbers or they are not globally unique. This means that when a record disappears for an event it might reappear at the same time for the same event but with a different number. This makes it difficult to track the start and end date of the event.
About the data
- Dataset name: Harmonised National Roadworks.
- Data owner: Department of Infrastructure, Transport, Regional Development and Communications.
- Geographical coverage: all roads in Australia.
- Date range: 21 June 2019 to 10 June 2020 and 21 September 2020 to present day.
- Frequency: near real-time (daily).
- Description: The collection of state and territory governments APIs of roadworks and road closures generates a daily snapshot. This builds a database of historical roadworks and road closures for all of Australia.
Limitations of the data
- Data was collected from 21 June 2019 to 10 June 2020 as a proof of concept within the Department of Infrastructure. From 21 September 2020, the Hub has collected daily data using the state and territory governments APIs.
- The type and format of data reported vary across APIs. Some states and territories provided a dedicated roadworks feed, others provided a categorisation as part of a wider road events API. These were all recorded in different formats and mappings were established to normalise the roadworks data for use in the prototype website.
- In the visualisation it is possible to search by month. The filter uses the start date criteria. The month selection captures start dates, however, in January to May 2019 and July to August 2020 the Department of Infrastructure was not collecting data. Events are recorded for dates when data wasn’t captured as some events are backdated.
- In the month selection, the month filter doesn’t show all roadworks occurring in that month as not all events have end dates. As a result the filter just shows roadworks starting in that month.
- The standardisation of reporting for category mapping, date filters, reference numbers will improve the accuracy of the visualisation and spatial descriptors and direction of travel will add further detail.