Click here to Skip to main content
15,887,485 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
It doesn't work for my case. My requirement is today i may create index for www.sample.com/page1.aspx, www.sample.com/page2.aspx pages. Tomorrow i may create index for www.sample.com/page3.aspx. If I search the index then the search should apply on all the 3 pages.

In this case when I create Index on daily basis with new page urls, its creating cfs files like _0.cfs, _1.cfs, _2.cfs so on and also segment file are updating.

I got struck here. Could anyone please guide me.

What I have tried:

Is there possibility to achive in my case
Posted
Updated 26-Jul-20 5:10am

You already posted this question at How to read multiple .cfs files from lucene.net index files in C#[^] and received a suggested solution. If you have a problem with the suggestion then please use the Have a Question or Comment? link below the posted message, for your answer, instead of reposting the question.
 
Share this answer
 
Ignore the files that lucene is creating, just let it do what it does. Your issue is that you keep adding new documents. Think of the index like a database table, every time you add a document with properties you add a new row. So on day one your data for page1 is

ID Title
1  Hello


let's say on day two you change the title to Hello World and re-index, you now have

ID Title
1  Hello
1  Hello World


That might be what you want, it might not. It's probably not. The easiest cheapest way to deal with this is to clear the documents before you index

C#
using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
    writer.DeleteAll();


Another way that takes more work is when you re-index a page, first find the matching document if it exists, so you'll look for "+ID:1". If you don't find it, build the document and add it. If you do find it then remove the existing fields and re-add fields with the new data. Doing it this way also gives you another issue in that you'll need to know when to remove documents when pages get deleted.

When working with Lucene I stringly advise you to download Luke

Google Code Archive - Long-term storage for Google Code Project Hosting.[^]

It's a tool that lets you see what data is in your index, see the documents, their fields, ratings, lets you run ad-hoc searches with different analysers etc. You'll need java on your machine to run it, but its worth its weight in gold.
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900