Index updateing performance is very, very slow.

Apr 4, 2012 at 5:53 PM

I'm not sure if this is due to the context of my application or a general problem.  However, when updating an existing index with a new set of records, I see lots of file system activity, but it takes forever.  Visual Studio is outputting thousands of the following errors:

A first chance exception of type 'System.IO.IOException' occurred in Lucene.Net.DLL
A first chance exception of type 'System.IO.IOException' occurred in mscorlib.dll

They come in pairs, one error in "Lucen.Net.DLL" the next in "mscorlib.dll".  I don't know what is going on, as the errors seem to be occurring deep in the DLLs or system API.  When pausing the debugger, I'm in the Lucene.Linq.Search.Index<TEntity> class's Add(TEntity, bool) method, at the "_context.Modifier.Flush();" statement on line 287 of the "~\trunk\Lucene.Linq\Search\Index`1.cs" file.

Apr 4, 2012 at 5:58 PM

Turning on the Visual Studio Exception Helper and setting it to stop at all System.IO.IOExceptions is telling that the files to be modified or deleted are already being accessed by another process.  Is Lucene.Net or Lucene.Linq spawning multiple threads to perform file operations and these are colliding?

Developer
Apr 5, 2012 at 8:45 PM

You are correct - this happens deep in the Lucene.Net DLL itself as it attempts to spawn multiple threads to read/write/index content. And those are colliding.

Apr 5, 2012 at 8:51 PM

I don't suppose there's a way to serialize Lucene's file system access so different threads aren't trying to simultaneously access the same files or send the drive heads to widely separated places on the drive?

Developer
Apr 5, 2012 at 9:33 PM

I've worked with Lucene.Net for several years but never so deep within it's internals

I'll see about getting source and see if it's something obvious, I am however not very comfortable stepping away from the standard Lucene.Net library and I know for a fact that the folks on that project follow the guidelines:

  1. Maintain the existing line-by-line port from Java to C#, fully automating and commoditizing the process such that the project can easily synchronize with the Java Lucene release schedule;
  2. Maintaining the high-performance requirements excepted of a first class C# search engine library;
  3. Maximize usability and power when used within the .NET runtime. To that end, it will present a highly idiomatic, carefully tailored API that takes advantage of many of the special features of the .NET runtime.

I've been quite verbal before about my disagreement of the fact that goals #1  and #2 are in the order they are, those should be reversed. But I also see where they are coming from, if they reverse those, it will essentially bring them away from the benefits of the Java version of Lucene.

In other words: I can maybe fix it, but I'll also have to give them the patch, which they will have to approve, which won't go into Lucene.Net but rather into Lucene (Java) and then finally be automatically ported from Java to C#...

If you have another trick, I'm open to it

Apr 5, 2012 at 9:42 PM
Edited Apr 5, 2012 at 9:43 PM

I wish I had a trick to do that - but I'm nowhere near as familiar with Lucene as you are.  And my knowledge of file system programming is merely at the surface.  I just know that seeing hundreds of System.IO.Exceptions per minute due to file access conflicts is not a good thing for any system where performance is an important issue.  I'd actually expect an issue like this to be a problem for Java as well as .Net - unless their development is on radically different hardware...