Lucene.net multiple .cfs files searching

Question

1.00/5 (1 vote)

See more:

It doesn't work for my case. My requirement is today i may create index for www.sample.com/page1.aspx, www.sample.com/page2.aspx pages. Tomorrow i may create index for www.sample.com/page3.aspx. If I search the index then the search should apply on all the 3 pages.

In this case when I create Index on daily basis with new page urls, its creating cfs files like _0.cfs, _1.cfs, _2.cfs so on and also segment file are updating.

I got struck here. Could anyone please guide me.

What I have tried:

Is there possibility to achive in my case

Posted 26-Jul-20 0:57am

Member 13517891

Updated 26-Jul-20 5:10am

Dave Kreskowiak

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Richard MacCutchan · Answer 1 · 2020-07-26T01:21:00

You already posted this question at How to read multiple .cfs files from lucene.net index files in C#[^] and received a suggested solution. If you have a problem with the suggestion then please use the Have a Question or Comment? link below the posted message, for your answer, instead of reposting the question.

F-ES Sitecore · Answer 2 · 2020-07-26T05:10:00

Ignore the files that lucene is creating, just let it do what it does. Your issue is that you keep adding new documents. Think of the index like a database table, every time you add a document with properties you add a new row. So on day one your data for page1 is

ID Title
1  Hello

let's say on day two you change the title to Hello World and re-index, you now have

ID Title
1  Hello
1  Hello World

That might be what you want, it might not. It's probably not. The easiest cheapest way to deal with this is to clear the documents before you index

C#

using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
    writer.DeleteAll();

Another way that takes more work is when you re-index a page, first find the matching document if it exists, so you'll look for "+ID:1". If you don't find it, build the document and add it. If you do find it then remove the existing fields and re-add fields with the new data. Doing it this way also gives you another issue in that you'll need to know when to remove documents when pages get deleted.

When working with Lucene I stringly advise you to download Luke

Google Code Archive - Long-term storage for Google Code Project Hosting.[^]

It's a tool that lets you see what data is in your index, see the documents, their fields, ratings, lets you run ad-hoc searches with different analysers etc. You'll need java on your machine to run it, but its worth its weight in gold.

Lucene.net multiple .cfs files searching

2 solutions

Solution 1

Solution 2

Add your solution here

Preview 0