Wednesday, October 1, 2008

Katie R - Blog 4 - Yale Daily News Historical Archive


The Yale University Libraries are working on archiving the Yale Daily News, the oldest student-run daily college newspaper in the country. The Yale Daily News Historical Archive is a digital project started in 2006 by the Yale library system and two other partners.

Collection Principles
The project is trying to digitize the entire holdings of the Yale Daily News up to 2001, which is when the paper started archiving online. So far only groups of years have been digitized, but they smartly chose important eras to start with - the first year of publication (1878-79), the WWI years, WWII years, the late 1960's, and 1978-81, when Yale apparently had an important President. The group responsible for working on the digitization, called DPIP, the Digital Production and Integration Program, looks for alumni who are interested in funding specific years, or for grants. By digitizing important years of history, the project is more successful and beneficial to researchers.

Object Characteristics
To search the collection, users must access another site that allows specific searching or browsing (this site is actually contains the main collection URL). ContentYale explains how during digitization "an image of each page is made. These images are "cleaned up" to improve readability on the screen and to improve the quality of printing." They take out any yellowing of the paper, which is somewhat questionable from an archiving aspect, but understandable from a straight content-research side. Also commendable is the zoning technology used, so "when the material is scanned, so that articles that display on different pages are joined together." Clicking on the 'zoned' area will bring up the entire article. Zoom features allow the user to get quite close up on words.


Metadata
Each page of the newspaper has been indexed and OCR'd. Yale provides detailed information about their digitization process, including the different contractors and their locales. The site explains that "the images are delivered in lossless JPEG 2000 format (300 ppi, 8-bit, high-contrast grayscale, or 24-bit color) and then "word positions, article boundaries, document structure, and OCR output are encoded following METS/ALTO standards. Finally, cleanup work (such as verification of article boundaries) and quality control checks are performed by staff in Cambodia." The information is stored on a server at Yale.
Because so much information is recorded about each page, for later issues copyright has become an issue. At my internship this summer I sat in on a meeting with DPIP as they were discussing how to deal with copyright issues around the comics that appeared in the paper in the 1970s-80s. They were discussing how to restrict access to those items to only Yale-affiliated users in order to get around some Family Circus or Garfield copyright problems.


Intended Audience
This digitization project is aimed at researchers and also benefits the library, as reformatting the older YDN issues will decrease physical use. The collection is also of interest to Yale alums, since that community is quite strong.

No comments: