Dec
11
Introduction to SOLR “Enterprise Search”
Filed Under Lucene
SOLR bills itself as an open source enterprise search engine. I would not go as far as to call it “enterprise,” but certainly believe SOLR is a nice value added wrapper around the already powerful Lucene package. It may become “enterprise class,” but it has a ways to go. That being said, it’s certainly good to aim high!
Raw Lucene search engines have a few issues that SOLR very nicely addresses. SOLR addresses Lucene issues including:
- Platform Lock-in (Java)
- Indexing requires custom Java coding
- Existing document update issues
- Index Replication
- IndexSearcher warming (fake automated queries to prepopulate the cache)
SOLR runs as a java web application. The nightly version os SOLR that I downloaded came bundled with Jetty in a very easy to run, test, and even deploy into a light duty production role.
Instead of being a Java library that you use directly from your Java code, SOLR works as a web application that you POST documents via HTTP to index documents, and query using HTTP GET requests. This HTTP interaction means your application does not need to be written in Java. Your application can be in any language that can post data and request data via http.
When it comes to setting up your index, instead of having to code in your field information into Java code. It is setup in an XML file. This file, among other things, includes the list of document fields and document primary keys needed to maintain your index.
If you are familiar with Lucene, my mention of the primary key may have peaked your interest. To those not familir with Lucene, it does not have any notion of primary keys or document updates. To update a document in Lucene, you first needed to locate and delete the previous version of the document through a rather indirect process. When provided with a primary key, SOLR will handle that process automatically for you.
The SOLR example that is included with the nightly build includes a failr simple script and sample documents to show how indexing works. Run the examples to index the samples, then you can run searches through the admin interface. To use the search results in your own application, you query the same url as shows up in the admin interface, and parse the xml response.
As an initial experiment, I have used SOLR to index product information on my sheetmusic site. It’s used for the sheetmusic searchengine as well as related products query. My searchengine implementation is only really “quick hack” quality (I have not implemented next/previous page links yet) , but the related products part usage is more polished.
Comments
Leave a Reply