The task is to make an itinerary searchable.
Input: (Sample line of an itinerary)
“Arrived at Manali. Visited hadimba temple, vashist springs, nehru kund, van vihar. Van vihar is adorned with big deodar trees. Evening time spent on mall road and shopping in local market. Had dinner at corner house restaurant.”
Processing:
The aim is to translate this line into token so it can be searched. For example, places/point of interest in the above lines are “hadimba temple, vashist springs, nehru kund, van vihar”, we don’t want to store names of restaurants. Using ID assigned to these places in database, we can store it as
Output:
12 395 454 123
Sample database table
id | name
395 | hadimba temple
454 | vashist springs
123 | nehru kund
12 | Manali
… | ….
Challenges:
As everyone uses different words to express, other may have written the same trip in some other style as “After arriving at Manali went to hadimba temple, vashist springs. After lunch then went to Nehru kund, van vihar.”
Other can write the same hadimba temple as hadimba devi temple or temple of hadimba or in worst case with speeling mistake as hedimba temple.
Finding places / point of interest which does not exist in the table, such as van vihar in the case.
I have been able to create some stopword list after reading some of the itineraries, but how the overcome the listed challenges ?
What I have tried:
I tried basic search and replace but fails most of the time.