Wednesday, December 3, 2008
Hannah Norton Blog 10: Herbert Simon Collection, Carnegie Mellon Libraries
Herbert Simon was a Nobel Prize winning economist whose work also incorporated aspects of computer science and cognitive psychology and helped develop the field of artificial intelligence. The Herbert Simon Collection at Carnegie Mellon is a full-text digital archive of his papers including his personal papers, lectures and talks, materials relevant to his schoolwork and work at the Illinois Institute of Technology and Carnegie Mellon University, his professional publications, and awards.
Collection Principles
The Carnegie Mellon Libraries own the physical archive of Herbert Simon, which was presumably provided to the university directly by him or his family. The digital collection was brought into being through funding from an early IMLS grant (the copyright on the “about” web page is from 2001) in order to create a “Smart Web Exhibit” in conjunction with the Carnegie Museum of Natural History. According to this page it includes 100,000 digital images (ie. PDFs).
Object Characteristics
Objects are primarily scans of written documents including Simon’s publications, course schedules, bibliographies, etc. Both an image and text are available. The image is a PDF of the document, which has not undergone OCR. The text is complete but littered with miscellaneous metadata. For example, the “text” of a bibliography contains this type of information at the division between each page of the original document:
..CLPAGE: 3
..PageImage: [image/tiff;00000003.tif]
[text/xdoc;00000003.xdc]
..OrigQlty: good
..OCRQlty: good
..ScanQlty: good
..OCR:
Apparently, this text represents uncorrected OCR, given statements such as “`Now at Slanford Univeraity.” This sort of thing is explained by the fact this is a relatively old digitization project. Nonetheless, the “text” version is difficult to read through and browse, while the image is impossible to search – not exactly an ideal situation.
Metadata
In the text version of each document a number of metadata elements are available, including document type, location in the physical archive (quite useful), title, and date. The elements mentioned above that are visible throughout the document indicate that data on the effectiveness of the scan was collected throughout the scanning process and that master scans are tiff files.
Intended Audience
The intended audience is clearly scholars of economics, computer science, and/or psychology. In fact, in the description of the collection’s contents, the following statement appears at the bottom of the page: “A working knowledge of economic studies, artificial intelligence, computer science, and cognitive psychology would assist in understanding the collection.”
Like other digital collections we’ve discussed in class, this digital archive was most likely quite an accomplishment at the time that it was produced, but is now outdated. Scholars with an interest in Herbert Simon (of whom there are probably many, since he is a Nobel laureate) will most likely find these documents useful, even in their current condition, but it is a shame that nothing has been done to update the interface to this collection.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment