SOLR bills itself as an open source enterprise search engine.  I would not go as far as to call it “enterprise,” but certainly believe SOLR is a nice value added wrapper around the already powerful Lucene package.  It may become “enterprise class,” but it has a ways to go.  That being said, it’s certainly good to aim high!

Raw Lucene search engines have a few issues that SOLR very nicely addresses.  SOLR addresses Lucene issues including:

  • Platform Lock-in (Java)
  • Indexing requires custom Java coding
  • Existing document update issues
  • Index Replication
  • IndexSearcher warming (fake automated queries to prepopulate the cache)

SOLR runs as a java web application.  The nightly version os SOLR that I downloaded came bundled with Jetty in a very easy to  run, test, and even deploy into a light duty production role.

Instead of being a Java library that you use directly from your Java code, SOLR works as a web application that you POST documents via HTTP to index documents, and query using HTTP GET requests.  This HTTP interaction means your application does not need to be written in Java.  Your application can be in any language that can post data and request data via http.

When it comes to setting up your index, instead of having to code in your field information into Java code. It is setup in an XML file.  This file, among other things, includes the list of document fields and document primary keys needed to maintain your index.

If you are familiar with Lucene, my mention of the primary key may have peaked your interest.  To those not familir with Lucene, it does not have any notion of primary keys or document updates.  To update a document in Lucene, you first needed to locate and delete the previous version of the document through a rather indirect process.  When provided with a primary key, SOLR will handle that process automatically for you.

The SOLR example that is included with the nightly build includes a failr simple script and sample documents to show how indexing works.  Run the examples to index the samples, then you can run searches through the admin interface.  To use the search results in your own application, you query the same url as shows up in the admin interface, and parse the xml response.

As an initial experiment, I have used SOLR to index product information on my sheetmusic site.  It’s used for the sheetmusic searchengine as well as related products query.  My searchengine implementation is only really “quick hack” quality (I have not implemented next/previous page links yet) , but the related products part usage is more polished.

In some instances, indexing into RAM rather than direct to disk can create a large indexing performance increase. Here’s one way to do it. You may need to increase the Java JVM memory parameters with the arguments -Xms128M -Xmx256M, of course modifing the sizes to fit your needs. Tweeking the foldCount size with affect how much memory is required by setting how large the RAMDirectory is allowed to grow in terms of the number of Lucene Documents it can hold. Each time the foldCount is reached, and/or when indexing is complete, the index will be flushed to disk.

Lucene Example Code: RAM to Disk


int foldCount = 500000;
int indexSize = 0;
int count = 0;
try {
   RAMDirectory ramDir    = new RAMDirectory();
   IndexWriter  ramWriter = new IndexWriter(ramDir, analyzer, true);

   IndexWriter writer = new IndexWriter(indexDir,analyzer,true );
   writer.mergeFactor = 100000;

   while(rs.next()){
      Document doc = new Document();
      ramWriter.addDocument(doc);
      count++;
      indexSize++;
      if(indexSize == foldCount){
         foldToDisk(ramDir, ramWriter, writer);
         ramWriter = new IndexWriter(ramDir, analyzer, true);
         indexSize = 0;
      }
   }

   foldToDisk(ramDir, ramWriter, writer);
   writer.optimize();
   writer.close();
} catch (IOException e) {
   e.printStackTrace();
}


public static void foldToDisk(RAMDirectory ramDir,
			IndexWriter ramWriter,
			IndexWriter writer) throws IOException {
		ramWriter.close();
		Directory dirA[] = new Directory[1];
		dirA[0] = ramDir;
		System.out.print(”.”);
		mergeDirs(writer, dirA);
		System.out.println(”.”);
}

Storing and search more than one field is very easy to do in Lucene — This can make your lucene search engine much more powerful!

Tip: If you’re not already familiar with how to index and search single field documents, this is intended to build on our Simple Lucene Example.

Lucene Example Code: Multi-Field Documents

Create a Lucene document with more than one field

   String content = "This is the example text I want to have Lucene index";

   Document doc = new Document();

   doc.add(Field.Keyword("keyword","Java"));

   doc.add(Field.Text("title","My Document Title"));

   doc.add(Field.Text("content",content));

You would then add the document to the index like normal.

Create a Lucene MultiFieldQueryParser

   String fields[] = {"keyword","title","content"};

   String queryString = "Java";

   try {

 Query query =  MultiFieldQueryParser.parse(queryString,fields,new StandardAnalyzer());

   } catch (ParseException e) {

 System.out.println("Lucene ParseException: " + e. getMessage);

 e.printStackTrace();

   }

Read the additional fields from the returned hits

   int hitCount = hits.length();

   for(int i=0; (i < hitCount && i < 10); i++){

 Document doc = hits.doc(i);

 System.out.println(doc.get("keyword") + ", " + doc.get("title") + ", " + doc.get("content"));

   }

That’s it!

That all you need to do to take the step to multi-field Lucene documents and searching from a single field.

Lucene is a great core for a Java search engine. Here is simple Lucene example code to index simple single field data along with a very basic search function. This will create simple Java search engine. For this simple lucene example code, each block is catching the thrown exceptions so you can see what is thrown. In a real world lucene implementation, you may handle this differently.

Lucene Example Code: Steps to Index the data

    Create a new Lucene index using an IndexWriter
    Create a Lucene Document
    Add the Lucene document to the index
    Optimize and close the index

Create a new Lucene index using an IndexWriter

   String indexPath = "/path/to/whereYou/wantThe/IndexStored";   IndexWriter writer = null;

try {

// Make a lucene  writer and create new Lucene index with arg3 = true

writer = new IndexWriter(indexPath, new StandardAnalyzer(), true);

} catch (IOException e) {

System.out.println("IOException opening Lucene IndexWriter: " + e.getMessage());

}

Create a Lucene document

   String content = "This is the example text I want to have Lucene index";   Document doc = new Document();

doc.add(Field.Text("content",content));

Add the document to the index

   try { writer.addDocument(doc);

} catch (IOException e) {

System.out.println("IOException adding Lucene Document: " + e.getMessage());

}

Optimize and close the IndexWriter

   try { writer.optimize();

writer.close();

catch (IOException e) {

System.out.println("IOException closing Lucene IndexWriter: " + e.getMessage());

}

Lucene Example Code: Steps to Search the Lucene Index

Open a Lucene IndexSearcher

   IndexSearcher indexSearcher = new IndexSearcher(indexPath);

If you are using the Lucene search engine from a web page, you should store and reuse the same IndexSearcher for each query. The Lucene IndexSearcher caches information to make queries after the first one faster. Reusing the Lucene IndexSearcher also takes it easy on the Java garbage collector, increasing performance and memory utilization. Not reusing the IndexSearcher is a common mistake and cause of frustration for many first time lucene users. For use on the web, here is some simple JSP code to store the IndexSearcher in an application attribute and reuse it for future page loads.

   indexSearcher = (IndexSearcher) application.getAttribute("searcher");   if(indexSearcher == null){

indexSearcher = new IndexSearcher(indexPath);

application.setAttribute("searcher",indexSearcher);

}

Construct a Lucene Query

   String queryString = "example";   try {

Query query = QueryParser.parse(queryString,”content”,new StandardAnalyzer());

} catch (ParseException e) {

System.out.println(”Lucene ParseException: ” + e. getMessage);

e.printStackTrace();

}

Have Lucene perform the Search

   Hits hits = null;   try {

Hits hits = indexSearcher.search(query);

catch (IOException e) {

System.out.println("Lucene Searching Exception: " + e.getMessage());

}

Display the top Lucene Hits

   int hitCount = hits.length();   for(int i=0; (i < hitCount && i < 10); i++){

Document doc = hits.doc(i);

System.out.println(doc.get("content"));

}

That’s it!

Those are the bits needed to create a simple, one field, Lucene search engine in Java. In terms of the try and catch block and variables, you’d probably implement things in a more combined manor, but the samples on this page are designed at least at some level to exist in isolation from each other.

Want a more powerful Search Engine?

I also have a Multi-Field Search Engine Example if you want to get a little bit more powerful.