For one, this is not a 'simple string' problem, you want a
Lexical Analyzer. There are libraries that solve the problem of tokenizing a text and structuring it in a sensible way, given set of grammatical rules. See the various references provided on the site linked above.