Click here to Skip to main content
15,892,005 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
The task is to make an itinerary searchable.

Input: (Sample line of an itinerary)
“Arrived at Manali. Visited hadimba temple, vashist springs, nehru kund, van vihar. Van vihar is adorned with big deodar trees. Evening time spent on mall road and shopping in local market. Had dinner at corner house restaurant.”

Processing:
The aim is to translate this line into token so it can be searched. For example, places/point of interest in the above lines are “hadimba temple, vashist springs, nehru kund, van vihar”, we don’t want to store names of restaurants. Using ID assigned to these places in database, we can store it as

Output:
12 395 454 123

Sample database table
id | name
395 | hadimba temple
454 | vashist springs
123 | nehru kund
12 | Manali
… | ….

Challenges:

As everyone uses different words to express, other may have written the same trip in some other style as “After arriving at Manali went to hadimba temple, vashist springs. After lunch then went to Nehru kund, van vihar.”
Other can write the same hadimba temple as hadimba devi temple or temple of hadimba or in worst case with speeling mistake as hedimba temple.
Finding places / point of interest which does not exist in the table, such as van vihar in the case.

I have been able to create some stopword list after reading some of the itineraries, but how the overcome the listed challenges ?

What I have tried:

I tried basic search and replace but fails most of the time.
Posted
Updated 27-Dec-20 0:38am
Comments
Richard MacCutchan 27-Dec-20 7:18am    
The key to this is to hold all the important keywords in your dictionary. You cannot create a program that is able to guess the difference between a temple and a restaurant.

In Britain it is not uncommon for Indian restaurants to be named Taj Mahal, which would b e even more confusing.
[no name] 27-Dec-20 16:01pm    
If you had used their "proper names", it would have been simpler. Sort of like searching for the golden bar of sloppiness.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900