As manager of the largest fleet of autonomous underwater research gliders in Canada, Richard Davis oversees a team of 15 to 20 people, including a number of data managers and software developers, at Dalhousie University’s Ocean Frontier Institute (OFI).
“Gliders generate a ton of data. And so it’s very important to us that we have really good data management,” said Mr. Davis, especially since his team deploys the gliders on behalf of researchers who need the data and metadata managed in a way that allows them to properly interpret and analyze it.
But before Mr. Davis was a pro at research data management, he was a biological oceanographer working as a technician on a project to put automated research buoys in Lunenburg Bay in order to develop a model for predicting ocean conditions. Along with principal investigator John Cullen, Mr. Davis spent a lot of time designing and deploying the buoys to make sure everything was working.
“And then all this data started to come back, and then we looked at each other and went, ‘wow, that’s a lot of data, we should probably do something with that.’ We had zero data management plan, it was a complete afterthought,” he said.
Mr. Davis had never worked on such a large multi-investigator project before. The biggest struggle was integrating the data to see the bigger picture.
“It was just assumed that individual researchers who were subject-matter experts would get the data and they would be able to process it and then make it available. But nobody quite realized the amount of effort that was required to try to integrate all that data into a system that made it accessible to everybody else,” he said.
If Mr. Davis could go back and do things differently, he would have had a data management plan from the start. Which is exactly what researchers applying for OFI’s help today have to submit.
Like in Mr. Davis’ case, intentional research data management planning can make for easier data sharing, but it can also protect against data loss, and lead to better reproducibility, verifiability and overall better science.
Changing the culture
Supporting Canadian research excellence across all disciplines is the reason behind a new Research Data Management (RDM) policy released in March by Canada’s federal granting agencies.
“Research is increasingly data intensive, so how you manage your data has a big impact on your research,” said Matthew Lucas, executive director of corporate strategy and performance for the Social Sciences and Humanities Council.
But data management is not always a top priority for researchers.
“Research data management is not exciting until you need it,” said Donna Bourne-Tyson, dean of libraries at Dalhousie University and past president of the Canadian Association of Research Libraries, which along with the tri-agencies and a consortium of like-minded organizations, hope the new RDM policy will help transform attitudes.
“We need to change the culture so that well-documented data, metadata and code are considered to be as valuable as a well-written and nicely published journal article,” said Jeff Moon, the director of Canada’s New Digital Research Infrastructure Organization Portage Network.
The tri-agencies’ policy promotes data management through three pillars. The first calls on postsecondary institutions and research hospitals who receive agency funding to develop institutional data management strategies by March 2023. The strategies are meant to assess gaps in RDM and educate researchers about resources. The second pillar requires researchers to submit data management plans (DMPs) as part of grant proposals for select funding opportunities starting in the spring of 2022, and the third necessitates the deposit of grant funded data into digital repositories. The timeline for the third pillar remains undefined.
“The objective is to ensure researchers are planning at the outset how they’re going to manage their data during the course of their research … that institutions clearly state how they’re able to support their researchers in thinking about and managing their data … [and] to ensure that the data directly related to research outputs … is preserved for the longer term,” said Dr. Lucas.
Leah Cowen, associate vice-president of research at the University of Toronto, said the phased-in approach has broadly been welcomed because it allows for adequate time to prepare supports and implement the policy, which was delayed by a year due to COVID-19.
The policy builds on consultations with university offices of research, research ethics boards and libraries that began following the release of the Tri-Agency Statement of Principles on Digital Data Management in 2016, said Dr. Lucas.
Many institutions had already begun implementing their data strategies and providing resources for researchers ahead of the tri-agency policy release. Dalhousie’s strategy was approved in 2019 and will be updated now to fully align with the tri-agencies’ policy, including a strengthened focus on Indigenous data sovereignty that arose from feedback on a draft version of the policy.
Resources to help plan
As implementation moves forward, Portage is offering resources such as an institutional RDM strategy template and an online bilingual data management planning tool called the DMP Assistant. The assistant helps researchers think through the kind of data they are collecting, how to format, back-up and share it, questions of ethics and legal compliance, and more.
Lucas said that while all of all of these considerations eventually come up within the course of a research project, creating a DMP at the outset allows researchers “to think about issues in advance of challenges emerging.”
Neither the larger policy or the tools offered by Portage are meant to be prescriptive, say Dr. Lucas and Mr. Moon. Both are looking for ongoing feedback as the policy is adopted.
Many disciplines already have strong practices around data management and dedicated repositories for their data. For those that don’t, Portage has also developed, among other data deposit initiatives, a national multidisciplinary big data capable repository — the Federated Research Data Repository (FRDR). FRDR is not only itself a repository, but also indexes metadata from almost 90 other Canadian repositories, directing researchers to data held within them. Through a partnership, that data is also indexed in the European Union repository OPENaire, making it available to an even broader group of international researchers.
Though the tri-agencies are clear that the RDM policy does not require deposited data to be open access, depositing that data does offer an avenue to Canadian researchers to increase the visibility and re-use of their research.
“We are putting our data on a global stage and it’s driving traffic back to Canadian repositories that house that data,” said Mr. Moon.
Now that’s a reason to make it top priority.