FAQ
Scanning and OCR
We used OmniPage Pro 9.0 for scanning and OCR from Scantek III scanner and saved the output in Microsoft Word. However, we had to add handwritten notes by hand. Most of the text went through quite nicely. Some of the text had to be rewritten with digital access in mind. For instance, we added related links and shortened the biographical history. Some of the unittitle were reformatted. We tried not to immitate the paper copy and rather think from scratch about the content of the digital copy with access in mind.
Tagging
We have been developing an EAD/SGML template for our finding aids over the last several months. We've recently converted it to XML. Now we are in the process of writing a up-to-date version of the XSL layout as well as creating a workflow for future finding aids and converting the one already in digital format.
We've experimented with a number of solutions for tagging the raw electronic files:
Parsing and validating
We used James Clark's SP to parse the XML document. IE5 validates the document when it loads it using the new MSxml parser that ships with IE5. When it doesn't validate it doesn't display the data in the browser.
Web pages (green = htm, yellow = xml and blue = xsl)
General navigation
|
Figure 1. General navigation of .htm pages
The idea is to have a "default" page linked to a "browser" page, which checks whether the user has IE4.0 or higher and directs the user to the appropriate page. We use Javascript to do the checking.
| <html> <head><title>Browser verify</title></head> <body onload="browser();"> <script language="JavaScript"> function browser() { browser_name = navigator.appName; browser_version = parseFloat(navigator.appVersion); goodBrowser = false; if( browser_name == "Microsoft Internet Explorer" && browser_version >= 4.0 ){ goodBrowser = true; } if( goodBrowser ) { location.replace("Fuertes.htm"); } else { location.replace("Fuertes-html.htm"); } } </script> </body> </html> |
Code 1. Browser
Fuertes-HTML (an HTML version of the XML document) was created using Javascript to load both Fuertes.xml and Fuertes.xsl. This presents HTML to the browser, which can then copy the output ('view source'). This HTML output was then tweaked using Microsoft's Front Page.
| <HTML> <SCRIPT> function display() { data.transformNodeToObject(ss.XMLDocument, resultTree.XMLDocument); result.document.write(resultTree.xml); } function refresh() { result.document.body.innerHTML = ""; display(); } </SCRIPT> <SCRIPT FOR="window" EVENT="onload"> data.async = false; data.load("fuertes.xml"); ss.async = false; ss.load("fuertes.xsl"); display(); </SCRIPT> <XML id="data"></XML> <XML id="ss"></XML> <XML id="resultTree"></XML> <FRAMESET> <FRAME ID="result"> </FRAMESET> </HTML> |
Code 2. Transformation
The Fuertes page
![]() |
Figure 2. What happens when Fuertes is loaded
Fuertes.htm loads source to two frames, Navigation (left) and Main (right). It is defined so that every hyperlink in navigation points to main.
The source for Navigation is Fuertes-nav.htm, which contains JavaScript that loads Fuertes.xml and Fuertes-nav.xsl. This is useful since we only want to have one xml document. The script is the same as for the transformation except for different xsl document.
The source for Main is Fuertes.xml, which in return loads Fuertes.xsl to display itself.