Dec
11
High Performance Lucene Indexing
Filed Under Lucene
In some instances, indexing into RAM rather than direct to disk can create a large indexing performance increase. Here’s one way to do it. You may need to increase the Java JVM memory parameters with the arguments -Xms128M -Xmx256M, of course modifing the sizes to fit your needs. Tweeking the foldCount size with affect how much memory is required by setting how large the RAMDirectory is allowed to grow in terms of the number of Lucene Documents it can hold. Each time the foldCount is reached, and/or when indexing is complete, the index will be flushed to disk.
Lucene Example Code: RAM to Disk
int foldCount = 500000; int indexSize = 0; int count = 0; try { RAMDirectory ramDir = new RAMDirectory(); IndexWriter ramWriter = new IndexWriter(ramDir, analyzer, true); IndexWriter writer = new IndexWriter(indexDir,analyzer,true ); writer.mergeFactor = 100000; while(rs.next()){ Document doc = new Document(); ramWriter.addDocument(doc); count++; indexSize++; if(indexSize == foldCount){ foldToDisk(ramDir, ramWriter, writer); ramWriter = new IndexWriter(ramDir, analyzer, true); indexSize = 0; } } foldToDisk(ramDir, ramWriter, writer); writer.optimize(); writer.close(); } catch (IOException e) { e.printStackTrace(); } public static void foldToDisk(RAMDirectory ramDir, IndexWriter ramWriter, IndexWriter writer) throws IOException { ramWriter.close(); Directory dirA[] = new Directory[1]; dirA[0] = ramDir; System.out.print(”.”); mergeDirs(writer, dirA); System.out.println(”.”); }
Comments
Leave a Reply