Skip navigation
News

Over 60 million pages of digitized Canadian heritage documents now accessible

The Canadian Research Knowledge Network made its Canadiana collections open to the public on January 1.

BY MICHAEL RANCIC | JAN 21 2019

As of January 1, the Canadian Research Knowledge Network has made its Canadiana collections – the largest online collections of early textual artifacts pertaining to Canadian culture – fully accessible to the public at no charge.

Based on the historical and public significance of the collections, the organization, which represents a partnership of 75 universities, saw it as the highest priority to make them widely available to not only its membership, but the public as well. “It isn’t just the academic researcher but the citizen researcher who is interested in this content,” explained Clare Appavoo, executive director at CRKN. “So there’s a much broader interest in this content for the full public.”

The network acquired the collections as part of a 2018 merger with Canadiana.org, both a subscription-based platform for historical research as well as a coalition of memory institutions dedicated to the digital documentation and preservation of Canadian heritage. Ms. Appavoo added that CRKN is also committed to making sure that the Canadiana content, all 60 million pages of it, is open access as well. “Part of the commitment that the CRKN members made when we went through the merger [with Canadiana] last April, was that the content should be accessible, open access ultimately.”

The Canadiana Collection is divided into three sub-collections: Early Canadiana Online, Canadiana Online and Héritage. The former two collections feature over 19 million pages of historical content, including monographs, government publications and newspapers, primarily published prior to 1920.

The largest collection is Héritage, created in partnership with Library and Archives Canada, which identified items from its own collection to be included in the Canadiana repository. The focus of its 41 million pages is a combination of collections from government departments, personal correspondence from prime ministers and content that is generally from the 1600s to the mid-1900s, all scanned from microfilm. “These are the multiple stages of our digital world – going from the original print version that then was microfilmed and now we’re digitizing that to the current standard,” Ms. Appavoo said.

A photo of children playing in the snow from Canadian Pictorial, vol.7, no. 1, published in December 1911.

“It’ll be huge for my grad students,” said Daniel Ross, an assistant professor of history at Université du Québec à Montréal and public outreach coordinator for ActiveHistory.ca. “They’ll no longer have the often onerous financial cost associated with going to Ottawa for a week to get into the archives. They’re able to access it from their home or from the university, so that’s something they can do in consultation with me.” Dr. Ross also uses materials from the collection in his undergrad class, so he now has the added benefit of introducing the site to the classroom.

“I think virtually anyone who’s studying Canadian history will consult this archive at one moment or another, it’s just the essentials,” Dr. Ross added.

How searchable the collections are varies depending on their sources. All of the material in the Canadiana Online and Early Canadiana Online collections has been put through optical character recognition software, a method of converting scanned images of text into editable text documents, so they are full-text searchable. However, Beth Stover, manager of digitization and heritage collections at CRKN, said the Héritage collection proved more difficult to prepare because a great deal of its content is handwritten. That collection is still being processed using a combination of OCR and transcription.

“It’s a very slow process,” Ms. Stover said. “The Héritage microfilm reels have 7,000 pages on each reel. It’s really hard for the OCR software to process it. Over 80 different collections were sent out for transcription to another organization that went through page by page and wrote down what was in that collection.”

Although moving the documents online increases their accessibility, it also puts an increased demand on CRKN, which hosts and maintains the vast amount of data. “That’s an ongoing cost and maintenance, and that’s what the CRKN members have agreed to continue to fund with a three-year commitment,” Ms. Appavoo said. “We are working toward updating to current and best practices in such a way that we can continue to evolve. And that the platform continues and can continue to meet those new standards as they arrive.”

COMMENTS
Post a comment
University Affairs moderates all comments according to the following guidelines. If approved, comments generally appear within one business day. We may republish particularly insightful remarks in our print edition or elsewhere.

Leave a Reply to Paul Knox Cancel reply

Your email address will not be published. Required fields are marked *

  1. Paul Knox / January 24, 2019 at 08:44

    The CRKN and its partners deserve high praise for this effort to make primary-source material widely accessible. The rapidly growing trove of searchable documents has transformed certain areas of inquiry by making names, places, linguistic features and visual elements much easier to find and analyze.

    Nevertheless, the practice of digitizing from microfilm where original documents still exist raises important questions. In some cases it should be considered an interim step.

    Digitized microfilm must be openly acknowledged as a deficient technology. The quality of microfilm images is generally poor. Sloppy filming practices have produced blurred, underexposed or over-exposed text that may be unreadable by the human eye, let alone OCR software. Poor framing can crop originals and render images incomplete. Bleed through can occur. Even where photography was high-quality, microfilm does not capture colour of paper, ink or other material. Existing originals of microfilm-to-digital materials should therefore never be discarded, and access to them should also be open. Their location and condition should be noted in metadata, ideally with details of size, material (e.g. type of paper, ink and printing process) and relevant conditions of production such as medium, creator, audience and number of copies produced.

    Image capture from originals, to international archive specifications, remains the gold standard. It should remain a priority as funds become available, starting with fragile and poorly microfilmed materials. Where microfilmed copies are the only ones available, they too should be priorities for digitization. Both scholars and the public should be aware of these considerations. Preserving what remains of our documented past is more complex than it might seem at first glance.

  2. xavier / January 24, 2019 at 13:21

    just testing

Click to fill out a quick survey