Monday, June 20, 2011

integrating text into the Naval Reactors History Database

With my XTF presentation at Code4Lib Northwest completed, I've begun to do some more significant modifications of the XTF instance that supports the Naval Reactors History Database service. One need was prompted by the inclusion of text content, in the form of documents that describe NR's work in Project Prometheus.

Previously, the database was composed solely of image files, many of which containing internal text metadata that is indexed by XTF. Now, I'm adding textual content, in PDF format, to the index. This change introduced a problem: In a displayed record, the Matches field displays text snippets from the image metadata and text files. These two types (image and text) need to be differentiated, so the display is more comprehensible.

Solution: Modify file resultFormatter.xsl so that the Matches display is contextually customized. Two xsl:if elements are added, with the differentiation based upon the data in the XML metadata's Dublin Core Type field:

I'm continuing to look at XTF programming possibilities, including those described in Rowan Brownlee's XTF guide.

Also, I have to say that in coding this change, I took the NRHDB site down for several seconds or even minutes. For that reason, I'll be testing changes in a test instance, running parallel to the production site, in order to eliminate downtime. I can do this using EC2 micro instances that I terminate upon completion of testing. It's at this point that the open source/cloud blend is most advantageous - instead of licensing commercial digital collections software to support production and test and running both servers locally, I can run the production instance 24/7/365 using an Amazon Web Services EC2 Reserved Instance and spin up an EC2 micro instance on demand for several hours to customize and extend XTF as needed. And since XTF and its supporting components are all open source, there's no software costs for this work.

Wednesday, June 15, 2011

gaaa....Powell Technical Books is closed


In yet another sign of the times, Powell's Technical Bookstore is now closed. This happened last fall, I believe, but I just came across it when attending Code4Lib Northwest in Portland earlier this week.

There is a much-smaller (relative to the previous Technical Books) Powell's 2 location, with computer and science books. This store is on the same block as the large Powell's store.

I'm learning to love reading on my Kindle, but I'll miss the bricks-and-mortar stores, no doubt.

Tuesday, June 7, 2011

customizing stop words in XTF

I've been immersing myself in the inner workings of the California Digital Library's XTF platform. I expect to make a number of changes to my XTF-based Naval Reactors History Database service in the next few months, in preparation for a fall LITA Forum presentation. The change described in this post is actually pretty trivial - adding a customized stop words list for an XTF instance - but it illustrates the kind of back-end customizations that are possible.

I decided to use the stop words list provided on the SEO Tools website.
To employ the index in XTF, I copied the file to xtf/conf/stopwords directory, replacing the existing stopwords.txt file that was included in the release version of XTF with the one that I obtained from the SEO Tools site.

I then stopped Apache Tomcat and rebuilt the XTF index. A clean build is recommended, as described in this XTF users group post. (I received the error described in the message before restoring to a clean build.) Upon restarting Tomcat, the new stop words list is in use.