Click here to Skip to main content
15,886,199 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm trying to write a code that split a spaceless string into meaningful words but when I give sentence like "arealways" it returns ['a', 'real', 'ways'] and what I want is ['are', 'always'] and my dictionary contains all this words. How can I can write a code that keep backtracking till find the best matching?

the code that returns 'a', 'real', 'ways'

splitter.java:

public class splitter {

    HashMap<String, String> map = new HashMap<>();
    Trie dict;

    public splitter(Trie t) {
        dict = t;
    }

    public String split(String test) {
        if (dict.contains(test)) {
            return (test);
        } else if (map.containsKey(test)) {
            return (map.get(test));
        } else {
            for (int i = 0; i < test.length(); i++) {
                String pre = test.substring(0, i);
                if (dict.contains(pre)) {
                    String end = test.substring(i);
                    String fixedEnd = split(end);
                        if(fixedEnd != null){
                            map.put(test, pre + " " + fixedEnd);
                            return pre + " " + fixedEnd;
                        }else {
                        }
                    
                }
            }

        }
        map.put(test,null);
        return null;
    }
} 

Trie.java:
public class Trie {
    public static class TrieNode {
        private HashMap<Character, TrieNode> charMap = new HashMap<>();
        public char c;
        public boolean endOWord;
        public void insert(String s){
        }
        public boolean contains(String s){
            return true;
        }
    }
    public TrieNode root;
    
    public Trie() {
        root = new TrieNode();
    }
    
    public void insert(String s){
        TrieNode p = root;
        for(char c : s.toCharArray()) {
            if(! p.charMap.containsKey(c)) {
                TrieNode node = new TrieNode();
                node.c = c;
                p.charMap.put(c, node);
            }
            p = p.charMap.get(c);
        }
        p.endOWord = true;
    }
    public boolean contains(String s){
        TrieNode p = root;
        for(char c : s.toCharArray()) {
            if(!p.charMap.containsKey(c)) {
                return false;
            }
            p = p.charMap.get(c);
        }
        return p.endOWord;
    }
    public void insertDictionary(String filename) throws FileNotFoundException{
        File file = new File(filename);
        Scanner sc = new Scanner(file);
        while(sc.hasNextLine())
            insert(sc.nextLine());
    }
    
    public void insertDictionary(File file) throws FileNotFoundException{
        Scanner sc = new Scanner(file);
        while(sc.hasNextLine())
            insert(sc.nextLine());
    }
} 


What I have tried:

I tried to find longest meaningful word in dictionary till you encounter with a word that not in the dictionary.txt
Posted
Updated 3-Mar-22 7:35am
Comments
Richard MacCutchan 3-Mar-22 10:34am    
Maybe you should be looking for the longest word first. Once you have removed that then you can deal with the remaining letters.
abc174 3-Mar-22 10:47am    
what if the remaining words doesn't split into meaningful words?
Richard MacCutchan 3-Mar-22 10:57am    
Then perhaps you need to re-evaluate and try shorter ones. There really is no simple way to do this other than repeatedly looking for different words.

I would do that this way:
1) get all permutations of string[^] to list (length: 1 to inputstring.Length)
2) compare all generated string-words with the words in a dictionary
2a) if a string-word does does NOT exist in a dictionary, remove it from the list
3) display "real" words

Good luck!
 
Share this answer
 
Quote:
till find the best matching
That is the source of your problems.
First, you have to define 'best matching': it can be obvious for a human, but machines have no common sense. So you should find a 'measure' of 'good matching', and create an algorithm which tries to maximize it.
 
Share this answer
 
Quote:
How can I can write a code that keep backtracking till find the best matching?

Finding all possible ways to split a string with words in your dictionary, no problem with backtracking.
The concept of "best matching" is far beyond what can do a dictionary, it is still a domain of research. AI with huge amount od sentences is probably the best approach.
Dear Highlander, the way you say "best matching", you will need to use some kind of magic.

You gaol is far beyond reasonable.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900