New tool boosts safe data sharing

The national science agency has developed a computer model that can help government agencies minimise the risk of releasing sensitive personal information in open datasets.

Ian Oppermann

The Personal Information Factor (PIF) uses a data analytics algorithm to identify the risk of de-identified personal information contained in a dataset being matched to its owner.

CSIRO says this means that data and privacy experts who traditionally do the analysis can now rely on a computer model to validate their work.

Tracking COVID-19

An early version of the tool is currently being used by the NSW government to analyse datasets tracking the spread of COVID-19, and it’s also being used in areas like domestic violence data and public transport use.

CSIRO has been working with the Cyber Security CRC to enhance the tool since 2020.

“Every day, it helps us analyse the security and privacy risks of releasing de-identified datasets of people infected with COVID-19 in NSW and the testing cases for COVID-19, allowing us to minimise the re-identification risk before releasing to the public,” Chief NSW Data Scientist Dr Ian Oppermann says.

“Given the very strong community interest in growing COVID-19 cases, we needed to release critical and timely information at a fine-grained level detailing when and where COVID-19 cases were identified.

“This also included information such as the likely cause of infection and, earlier in the pandemic, the age range of people confirmed to be infected.

“We wanted the data to be as detailed and granular as possible, but we also needed to protect the privacy and identity of the individuals associated with those datasets.”

Attack scenarios

Project lead researcher and Senior Research Scientist at CSIRO’s Data61 Dr Sushmita Ruj says the PIF takes a tailored approach to each dataset by considering various ‘attack scenarios’ used to de-identify information.

“The tool then assigns a PIF score to each set,” she says.

“If the PIF is higher than a desired threshold, the program makes recommendations on how to design a more secure and safe framework.”

The CSCRC’s Research Director, Professor Helge Janicke, says PIF is unique because it provides a scale by which risk can be assessed.

“PIF is hugely valuable in achieving the ethical and responsible sharing of critical data, with this technology allowing data owners to fully assess the risks and residual impacts associated with data sharing,” she says.

The PIF was developed by CSIRO’s digital unit Data61 in collaboration with the state and Commonwealth governments, the Australian Computer Society (ACS) and industry groups.

It’s expected to be made available for wider public use by June 2022.

Comment below to have your say on this story.

If you have a news story or tip-off, get in touch at  

Sign up to the Government News newsletter

Leave a comment:

Your email address will not be published. All fields are required