Use of machine readable open data to streamline government reporting processes

Use of machine readable open data to streamline government reporting processes

The purpose of this post is to analyse how the publication of open data can benefit existing government reporting processes and to emphasise the importance of making open data available in machine readable format.

For this we are going to take an example in New South Wales, related to the reporting process around Development Applications.

Note: Although we have used real data for this post, the purpose of this post is solely to comment on the process itself, and not the performance of the organisations from which we have used the data.

Situation:

A Development Application (DA) is an administrative process that any party i.e. individuals and businesses, need to follow before undertaking construction work e.g. modification of existing building or building of new property. This process is largely managed by local councils and so is the associated data. The majority of councils in NSW are making this data available to the public for consultation (open data), in various formats.

A government agency is processing data from the 152 councils on a yearly basis to produce a report called the Local Development Performance Monitoring report (LDPM). The report contains a variety of metrics around number of DAs, their value, processing time etc. and is made available to the public.

The process involved in producing this report is largely manual. It consists of the agency collecting data from 152 agencies once a year through manual processes and then compiling and producing the pdf report. Due to the manual nature of the process, the report is produced once a year and typically published several months after the end of the financial year.

Possible improvements using open data technologies:

The government agency asked us to investigate how the process could be streamlined and to develop a proof-of-concept of what a new generation report could look like using machine readable open data.

We started by analysing the format of DA data sets made available by councils: almost all of them are publishing DAs data through online tools in the form of HTML pages i.e. non machine readable. The fact that the data is non machine readable makes it impossible to automate the generation of the LDPM report. It is also worth noting that the set of attributes available varies greatly between councils.

Our analysis only identified two councils who are following good open data practise and publishing DAs data in machine readable format:

The City of Sydney is particularly interesting as it provides access to approximately ten years  of DAs historical data. We used those two data sources to demonstrate what a new generation DA report could look like. It could be summarised in the following points:

  • It can refreshed on an on-going basis (e.g. weekly) as soon as new DA data is made available by councils, as opposed to yearly publication in its current form
  • It can be presented in an interactive fashion and accessible through various channels e.g. web, mobile, tablet rather just pdf
  • Access to raw data can be made available to the public through a data API, for them to create services around it. For example, construction companies could use the data to track new development applications in which they may see commercial opportunities; other organisations may get value from accessing and analysing this data.

A sample output from our Proof-Of-Concept (POC) DAs dashboard can be accessed by clicking on the image below.  For the purpose of this blog post we are only displaying a subset, sanitised version of the reports.

DADashboard

The solution

In order to produce this POC we’ve developed ETL jobs to connect, transform and load Mosman and two years of City of Sydney DAs into OpenDataSoft (2013 and 2014, totalling approx. 6,600 DAs).

ETLJob

 

The raw dataset can be accessed by clicking on the image below. Data can be filtered according to a number of parameters including location, value, assessment time etc. It can be visualised on Maps, Charts and accessed through a Data-API.

RawDataset

 

With such an integrated open data solution now available it would be straightforward to consume DAs from additional councils as soon as they become available in machine readable format, and over time further automate the production of the LDPM report and the aggregation of all councils DA into one centralised and standardised repository.