Monday, June 20, 2011

integrating text into the Naval Reactors History Database

With my XTF presentation at Code4Lib Northwest completed, I've begun to do some more significant modifications of the XTF instance that supports the Naval Reactors History Database service. One need was prompted by the inclusion of text content, in the form of documents that describe NR's work in Project Prometheus.

Previously, the database was composed solely of image files, many of which containing internal text metadata that is indexed by XTF. Now, I'm adding textual content, in PDF format, to the index. This change introduced a problem: In a displayed record, the Matches field displays text snippets from the image metadata and text files. These two types (image and text) need to be differentiated, so the display is more comprehensible.

Solution: Modify file resultFormatter.xsl so that the Matches display is contextually customized. Two xsl:if elements are added, with the differentiation based upon the data in the XML metadata's Dublin Core Type field:

I'm continuing to look at XTF programming possibilities, including those described in Rowan Brownlee's XTF guide.

Also, I have to say that in coding this change, I took the NRHDB site down for several seconds or even minutes. For that reason, I'll be testing changes in a test instance, running parallel to the production site, in order to eliminate downtime. I can do this using EC2 micro instances that I terminate upon completion of testing. It's at this point that the open source/cloud blend is most advantageous - instead of licensing commercial digital collections software to support production and test and running both servers locally, I can run the production instance 24/7/365 using an Amazon Web Services EC2 Reserved Instance and spin up an EC2 micro instance on demand for several hours to customize and extend XTF as needed. And since XTF and its supporting components are all open source, there's no software costs for this work.

No comments:

Post a Comment