Click here to Skip to main content
15,867,686 members
Please Sign up or sign in to vote.
4.90/5 (9 votes)
See more:
This is the first of our weekly coding challenges. Once a week we'll post a simple programming problem and the person who posts the best answer wins a T-shirt. The winner is decided by votes, or be comments, or be the audacity of their answer. Use any language in your answer except profanity.

Imagine that you allow users of your website to post comments, but some users, mainly the Australians, get a little colourful with their language. You decided to implement a Bad Word Filter that will replace certain words in a sentence with safer versions of that word.

The list of bad words and their replacements is

Bad: "poophead", "PHB", "gotten"
Replacement: "p**phead", "boss", "become"

So "My PHB is such a poophead. It's gotten worse since his promotion" should be "My boss is such a p**phead. It's become worse since his promotion".

We also have to allow shouting. So

"My PHB is such a POOPHEAD!" should become "My boss is such a P**PHEAD!"

Bonus points:

Let's make it harder. If the "bad" word starts with "*" then it means any word that ends with that word. If it ends with a star then any word starting with that. If it ends with an "!" then it means that it should do the match case sensitive.

Bad words: "poop*", "PHB!", "gotten"
Replacement: "p**p", "boss", "become"

"My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!"

should become

"My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!"

What I have tried:

Remember: any programming language can be used.
Posted
Updated 27-Jul-18 8:04am
v2
Comments
Suvendu Shekhar Giri 25-Nov-16 10:33am    
That's really an awesome initiative. Will come up with a solution soon. Thanks Chris :)
PIEBALDconsult 25-Nov-16 10:36am    
"My PHB is such a POOPHEAD!" should become "My PHB boss is such a P**PHEAD!"
Right?
Chris Maunder 25-Nov-16 11:19am    
Correct, and thanks for pointing that out. Updated
Jochen Arndt 25-Nov-16 11:03am    
Deadline for posting answers?
Chris Maunder 25-Nov-16 11:20am    
End of the day? Monday? We're pretty relaxed about it, but there's a limit to our attention span.

Yay! A chance to use Regular Expressions without summoning the elder gods! :)

Start with a structure to convert a string to a bad word, taking the "bonus points" rules into account:
C#
public struct BadWord
{
    public BadWord(string word)
    {
        if (string.IsNullOrWhiteSpace(word)) throw new ArgumentNullException(nameof(word));
        
        int startIndex = 0;
        int length = word.Length;
        
        // Skip leading / trailing white-space:
        while (length > 0 && char.IsWhiteSpace(word[startIndex]))
        {
            startIndex++;
            length--;
        }
        while (length > 0 && char.IsWhiteSpace(word[startIndex + length - 1]))
        {
            length--;
        }
        
        // If the word ends with "!", then it's a case-sensitive match:
        if (length > 0 && word[startIndex + length - 1] == '!')
        {
            CaseSensitive = true;
            length--;
        }
        else
        {
            CaseSensitive = false;
        }
        
        // If the word ends with "*", filter anything starting with the word:
        if (length > 0 && word[startIndex + length - 1] == '*')
        {
            Suffix = "(?=\\w*\\b)";
            length--;
        }
        else
        {
            Suffix = "\\b";
        }
        
        // If the word starts with "*", filter anything ending with the word:
        if (length > 0 && word[startIndex] == '*')
        {
            Prefix = "(?<=\\b\\w*)";
            startIndex++;
            length--;
        }
        else
        {
            Prefix = "\\b";
        }
        
        Word = length != 0 ? word.Substring(startIndex, length) : null;
    }
    
    public string Word { get; }
    public string Prefix { get; }
    public string Suffix { get; }
    public bool CaseSensitive { get; }
    
    public Regex ToRegularExpression()
    {
        if (string.IsNullOrWhiteSpace(Word)) return null;
        
        string pattern = Prefix + Regex.Escape(Word) + Suffix;
        var options = CaseSensitive ? RegexOptions.ExplicitCapture : RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase;
        return new Regex(pattern, options);
    }
}

Then a class to represent a single bad word and its replacement:
EDIT: Now with the part of the spec the "customer" forgot to mention! :)
C#
public sealed class WordReplacement
{
    public WordReplacement(BadWord word, string replacement)
    {
        if (string.IsNullOrWhiteSpace(word.Word)) throw new ArgumentNullException(nameof(word));
        
        Pattern = word.ToRegularExpression();
        CaseSensitive = word.CaseSensitive;
        Replacement = replacement;
        
        if (CaseSensitive || replacement == null || replacement.Any(char.IsUpper))
        {
            Replacer = (Match m) => Replacement;
        }
        else
        {
            Replacer = (Match m) => MatchCase(m.Value, Replacement);
        }
    }
    
    public WordReplacement(string word, string replacement) : this(new BadWord(word), replacement)
    {
    }
    
    public Regex Pattern { get; }
    public string Replacement { get; }
    public bool CaseSensitive { get; }
    public MatchEvaluator Replacer { get; }
    
    public static string MatchCase(string wordToReplace, string replacement)
    {
        if (null == replacement) return string.Empty;
        if (wordToReplace.All(char.IsLower)) return replacement;
        if (wordToReplace.All(char.IsUpper)) return replacement.ToUpperInvariant();
        
        char[] result = replacement.ToCharArray();
        bool changed = false;
        
        if (wordToReplace.Length == replacement.Length)
        {
            for (int index = 0; index < result.Length; index++)
            {
                if (char.IsUpper(wordToReplace[index]))
                {
                    char c = result[index];
                    result[index] = char.ToUpperInvariant(c);
                    if (result[index] != c) changed = true;
                }
            }
        }
        else
        {
            if (char.IsUpper(wordToReplace[0]))
            {
                char c = result[0];
                result[0] = char.ToUpperInvariant(c);
                if (result[0] != c) changed = true;
            }
            if (char.IsUpper(wordToReplace[wordToReplace.Length - 1]))
            {
                int index = result.Length - 1;
                char c = result[index];
                result[index] = char.ToUpperInvariant(c);
                if (result[index] != c) changed = true;
            }
        }
        
        return changed ? new string(result) : replacement;
    }
    
    public string Replace(string input) => Pattern.Replace(input, Replacer);
}

And finally, a class to represent a list of bad word replacements:
C#
public sealed class Clbuttifier2000
{
    public Clbuttifier2000(IEnumerable<KeyValuePair<string, string>> replacements)
    {
        Replacements = replacements.Select(p => new WordReplacement(p.Key, p.Value)).ToList().AsReadOnly();
    }
    
    public IReadOnlyList<WordReplacement> Replacements { get; }
    
    public string Clbuttify(string message)
    {
        if (!string.IsNullOrWhiteSpace(message))
        {
            foreach (var replacement in Replacements)
            {
                message = replacement.Replace(message);
            }
        }
        
        return message;
    }
}

Sample usage:
C#
var filter = new Clbuttifier2000(new Dictionary<string, string>
{
    ["poop*"] = "p**p",
    ["PHB!"] = "boss",
    ["gotten"] = "become",
});

string input = "My PHB has gotten started on his new blog phblog.com. He's SUCH A MISBEGOTTEN Poophead!";
string expected = "My boss has become started on his new blog phblog.com. He's SUCH A MISBEGOTTEN P**phead!";
string actual = filter.Clbuttify(input);
Debug.Assert(actual == expected);
 
Share this answer
 
v3
Comments
Chris Maunder 25-Nov-16 12:00pm    
Nicely done. Lots of string allocations though. Now: does it do "Poophead" -> "P**phead"? This wasn't in the specs, but I don't think it does.
Richard Deeming 25-Nov-16 12:02pm    
Bl**dy customers! You follow the specs, and then they complain that it doesn't do something they forgot to mention in the specs, that's suddenly the most important part! :)
Richard Deeming 25-Nov-16 12:36pm    
Updated to account for the missing requirement. :)

Still lots of string allocations in the Clbuttify loop, so an answer that doesn't use regular expressions might be better. But they should all be gen-0, so you might get away with it.
Chris Maunder 25-Nov-16 12:05pm    
You should try writing surveys one day ;)

Your answer solves the problem. The point is to make the question a little loose to allow invention, improvement, and most importantly, pointless religious wars.
Nelek 25-Nov-16 16:56pm    
allowing pointless religion wars outside the soapbox? That's going to create a precedent... :rolleyes:
I was reading up on Python online, then chanced upon this pooping oops I mean coding challenge, thought why not try this out on Python. Here it is fresh from the loo oops again I mean oven.
Python
"""
poop.py

by Peter Leow the pooper

"""

import re

def functionStartWithPoop(m):
    wordFound = m.group(0)

    if wordFound[:5].lower()=='poop*':
        wordRepl = wordFound[0] + '**' + wordFound[3] + wordFound[5:]
    else: #wordFound[:4].lower()=='poop':
        wordRepl = wordFound[0] + '**' + wordFound[3] + wordFound[4:] 

    return wordRepl

def functionEndWithPoop(m):
    wordFound = m.group(0)

    if wordFound[-5:].lower()=='*poop':
        wordRepl = wordFound[:-5] + wordFound[-4] + '**' + wordFound[-1]
    else: #wordFound[-4:].lower()=='poop':
        wordRepl = wordFound[:-4] + wordFound[-4] + '**' + wordFound[-1]

    return wordRepl

def main():
    originalSentence = '''
    poop*ing is in front of make*poop.
    Whether poop* or *poop, there are just pOoP!
    A POOPHEAD cannot change but an exclaimed POOPHEAD! can.'''

    print('Before:')
    print(originalSentence)
    print()
    print('After:')
    
    # Without ! ending
    patternStartWithPoop=r'(?<!\S)poop\*?[\S]*'
    patternEndWithPoop=r'[\S]*\*?poop(?=[?!,.;]?$|[?!,.;]?\s+)'

    # with ! ending
    patternStartWithPoopEndWithExclamation = r'(?<!\S)poop\*?[\S]*!(?=\s|$)'
    patternEndWithPoopAndExclamation=r'[\S]*\*?poop!(?=[?!,.;]?$|[?!,.;]?\s+)'

    # Case sensitive
    filteredSentence = re.sub(patternStartWithPoop, functionStartWithPoop, originalSentence, flags=0)
    #print(filteredSentence)
    filteredSentence = re.sub(patternEndWithPoop, functionEndWithPoop, filteredSentence, flags=0)
    #print(filteredSentence)

    # Case ignorance
    filteredSentence = re.sub(patternStartWithPoopEndWithExclamation, functionStartWithPoop, filteredSentence, flags=re.IGNORECASE)
    #print(filteredSentence)
    filteredSentence = re.sub(patternEndWithPoopAndExclamation, functionEndWithPoop, filteredSentence, flags=re.IGNORECASE)
    print(filteredSentence)
  
main()

Try this out at Coding challenge bad word filter | Python Fiddle[^] and you should see the following OUTPOOP:
Before:

    poop*ing is in front of make*poop.
    Whether poop* or *poop, there are just pOoP!
    A POOPHEAD cannot change but an exclaimed POOPHEAD! can.

After:

    p**ping is in front of makep**p.
    Whether p**p or p**p, there are just p**P!
    A POOPHEAD cannot change but an exclaimed P**PHEAD! can.

I have ignored 'PHB' and 'gotten' as they are simply too trivial.
 
Share this answer
 
v5
Comments
Chris Maunder 26-Nov-16 12:52pm    
>I have ignored 'PHB' and 'gotten' as they are simply too trivial.

:) Except they force you to do do the "starts with" and "ends with"
Peter Leow 26-Nov-16 13:00pm    
You got me!
PIEBALDconsult 26-Nov-16 16:37pm    
Python is never the answer.
Jon McKee 26-Nov-16 19:33pm    
Unless the question is "what is a nonvenomous snake indigenous to Africa, Asia, and Australia with some of the largest snake species currently known in its genus." ;D
Peter Leow 26-Nov-16 22:38pm    
May be for next week.
Not sure on the rules, just wanted to add this for fun. I love code challenges.

This solution, in contrast to my other one, reduces the number of strings nuked during processing. It also uses only a single Regex call to process all bad words. Even though I construct an additional dictionary, my diagnostic tool shows me coming in at less memory consumed than my previous solution. This solution does everything the previous did, plus adds optional casing.

C#
public class BadWordFilter
{
    public static string Replace(string input, 
        IDictionary<string, string> badWordMap)
    {
        if (string.IsNullOrWhiteSpace(input))
            throw new ArgumentException(nameof(input),
                "String cannot be null, empty, or whitespace.");

        Dictionary<string, string> idMap = new Dictionary<string, string>();
        StringBuilder pattern = new StringBuilder();
        int idCounter = 0;
        //For each bad word pair, create an ID mapped to the value and construct the match pattern using the ID and key
        foreach (KeyValuePair<string, string> badWord in badWordMap)
        {
            string id = "ID" + idCounter++;
            idMap.Add(id, badWord.Value);
            ConstructMatchPattern(badWord.Key, id, pattern);
        }
        //Remove the first | from the pattern
        pattern.Remove(0, 1);

        Regex filter = new Regex(pattern.ToString(), RegexOptions.IgnoreCase);
        string[] groupNames = filter.GetGroupNames();
        MatchEvaluator evaluator = match =>
        {
            string replacement = "";
            //Find which group was matched and retrieve the replacement value
            for (int i = 1; i < groupNames.Length; i++)
                if (match.Groups[groupNames[i]].Success)
                {
                    replacement = idMap[groupNames[i]];
                    break;
                }

            //Handle casing
            if (replacement.StartsWith("!"))
            {
                replacement = replacement.Remove(0, 1);
                //All caps check
                if (match.Value == match.Value.ToUpper())
                    replacement = replacement.ToUpper();
                //First letter caps check
                else if (match.Value[0] == char.ToUpper(match.Value[0]))
                    replacement = char.ToUpper(replacement[0]) + replacement.Substring(1);
            }
            return replacement;
        };

        return filter.Replace(input, evaluator);
    }

    private static void ConstructMatchPattern(string badWord, string id, 
        StringBuilder pattern)
    {
        if (string.IsNullOrWhiteSpace(badWord))
            return;
        int patternLength = pattern.Length;
        pattern.Append($@"|(?<{id}>(?:\b){badWord.Trim('*')}");
        if (badWord.StartsWith("*"))
            pattern.Insert(patternLength + id.Length + 11, @"\w*", 1);
        if (badWord.EndsWith("*"))
            pattern.Append(@"\w*");
        pattern.Append(')');
    }
}


Used the same way as the previous example, with the inclusion of the "!" option.

C#
static void Main(string[] args)
{
    Dictionary<string, string> badWords = new Dictionary<string, string>()
    {
        {"poop*", "p**phead"},
        {"*HB", "boss"},
        {"gotten", "become"},
        {"*crap*", "taco supreme"}
    };
    string input = "My PHB is such a poophead. It's gotten worse since his promotion. In fact you might call him a supercraphead.";
    string filteredInput = BadWordFilter.Replace(input, badWords);
    Console.WriteLine(filteredInput);
    Console.ReadKey();
}


Replacing {"*HB", "boss"} with {"*HB", "!boss"} will yield BOSS instead of boss since it cases to its key's match.
 
Share this answer
 
v2
Comments
PIEBALDconsult 26-Nov-16 19:42pm    
The only rule is "no profanity". :cool:
Jon McKee 26-Nov-16 20:55pm    
I see what you did there /golfclap :)
Actually, stating that this challenge is easy is misleading. Handling text is never easy, if you want to work with Unicode. Here is a my take on it.

C#
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;

class WordFilter
{
    private static Dictionary<string, string> badWords = null;

    private static List<KeyValuePair<string, string>> badWordsStart = new List<KeyValuePair<string, string>>();
    private static List<KeyValuePair<string, string>> badWordsEnd = new List<KeyValuePair<string, string>>();
    private static List<KeyValuePair<string, string>> badWordsMiddle = new List<KeyValuePair<string, string>>();
    private static Dictionary<string, string> badWordsCaseSensitive = new Dictionary<string, string>();
    private static Dictionary<string, string> badWordsCaseInsensitive = new Dictionary<string, string>();
        
    private static void Init()
    {
        if (badWords != null)
        {
            return;
        }

        badWords = new Dictionary<string, string>()
        {
            {"poop*", "p**p"},
            {"PHB!", "boss"},
            {"gotten", "become"},
            {"*crap*", "taco supreme"}
        };

        foreach (var item in badWords)
        {
            bool startWord = item.Key.EndsWith("*");
            bool endWord = item.Key.StartsWith("*");
            bool caseSensitive = item.Key.EndsWith("!");

            if (startWord && endWord)
            {
                badWordsMiddle.Add(new KeyValuePair<string, string>(item.Key.Trim('*').ToLower(CultureInfo.InvariantCulture), item.Value));
            }
            else if (startWord)
            {
                badWordsStart.Add(new KeyValuePair<string, string>(item.Key.TrimEnd('*').ToLower(CultureInfo.InvariantCulture), item.Value));
            }
            else if (endWord)
            {
                badWordsEnd.Add(new KeyValuePair<string, string>(item.Key.TrimStart('*').ToLower(CultureInfo.InvariantCulture), item.Value));
            }
            else if (caseSensitive)
            {
                badWordsCaseSensitive.Add(item.Key.TrimEnd('!'), item.Value);
            }
            else
            {
                badWordsCaseInsensitive.Add(item.Key.ToLower(CultureInfo.InvariantCulture), item.Value);
            }
        }
    }

    public WordFilter()
    {
        Init();
    }

    public string Filter(string s)
    {
        var word = new StringBuilder();
        var sout = new StringBuilder();

        foreach (var c in s.GetUTF32Chars())
        {
            if (c.IsLetter)
            {
                word.Append(c);
            }
            else 
            {
                if (word.Length > 0)
                {
                    var niceWord = Replace(word.ToString());
                    word.Clear();
                    sout.Append(niceWord);
                }
                sout.Append(c);
            }
        }
        return sout.ToString();
    }

    private string Replace(string word)
    {
        string newWord;
        if (badWordsCaseSensitive.TryGetValue(word, out newWord)) return newWord;

        var lword = word.ToLower(CultureInfo.InvariantCulture);
        if (badWordsCaseInsensitive.TryGetValue(word, out newWord)) return newWord;

        newWord = badWordsStart.Where(it => word.StartsWith(it.Key)).Select(it => word.Replace(it.Key, it.Value)).FirstOrDefault();
        if (newWord != null) return newWord;

        newWord = badWordsEnd.Where(it => word.EndsWith(it.Key)).Select(it => word.Replace(it.Key, it.Value)).FirstOrDefault();
        if (newWord != null) return newWord;

        newWord = badWordsMiddle.Where(it => word.Contains(it.Key)).Select(it => word.Replace(it.Key, it.Value)).FirstOrDefault();
        if (newWord != null) return newWord;

        return word;
    }
}

public static class StringExtensions
{
    public static System.Collections.Generic.IEnumerable<UTF32Char> GetUTF32Chars(this string s)
    {
        var tee = System.Globalization.StringInfo.GetTextElementEnumerator(s);

        while (tee.MoveNext())
        {
            yield return new UTF32Char(s, tee.ElementIndex);
        }
    }
}

public struct UTF32Char
{
    private string s;
    private int index;

    public UTF32Char(string s, int index)
    {
        this.s = s;
        this.index = index;
    }

    public override string ToString()
    {
        return char.ConvertFromUtf32(this.UTF32Code);
    }

    public int UTF32Code {  get { return char.ConvertToUtf32(s, index); } }
    public double NumericValue { get { return char.GetNumericValue(s, index); } }
    public UnicodeCategory UnicodeCategory { get { return char.GetUnicodeCategory(s, index); } } 
    public bool IsControl { get { return char.IsControl(s, index); } }
    public bool IsDigit { get { return char.IsDigit(s, index); } }
    public bool IsLetter { get { return char.IsLetter(s, index); } }
    public bool IsLetterOrDigit { get { return char.IsLetterOrDigit(s, index); } }
    public bool IsLower { get { return char.IsLower(s, index); } }
    public bool IsNumber { get { return char.IsNumber(s, index); } }
    public bool IsPunctuation { get { return char.IsPunctuation(s, index); } }
    public bool IsSeparator { get { return char.IsSeparator(s, index); } }
    public bool IsSurrogatePair { get { return char.IsSurrogatePair(s, index); } }
    public bool IsSymbol { get { return char.IsSymbol(s, index); } }
    public bool IsUpper { get { return char.IsUpper(s, index); } }
    public bool IsWhiteSpace { get { return char.IsWhiteSpace(s, index); } }
}

class Program
{
    static void Main(string[] args)
    {
        var wf = new WordFilter();

        var rawSentence = "My PHB is such a poophead. It's gotten worse since his promotion.  In fact you might call him a supercraphead!";
        var niceSentence = wf.Filter(rawSentence);
        Console.WriteLine(rawSentence);
        Console.WriteLine(niceSentence);
    }
}
 
Share this answer
 
Comments
PIEBALDconsult 28-Nov-16 19:14pm    
How does that handle p00p and cr@p?
Not sure if my Excel VBA solution posted, so I'm trying again
VB.NET
Public Function noBad(ByVal badstr As String) As String
    Dim badWords As Variant, goodWords As Variant
    Dim i As Integer, bw As String
    badWords = Array("poop*", "PHB", "gotten", "POOP*")
    goodWords = Array("p**p", "boss", "become", "P**P")
    For i = 0 To UBound(badWords)
        bw = badWords(i)
        With CreateObject("VBScript.RegExp")
            .Pattern = "\b" & bw & IIf(InStr(bw, "*"), "", "\b")
            .Global = True
            badstr = .Replace(badstr, goodWords(i))
        End With
    Next i
    noBad = badstr
End Function


Worth a go, I thought. Okay so the case thing isn't so good, but I only get a short lunch
 
Share this answer
 
Comments
Kornfeld Eliyahu Peter 28-Nov-16 8:46am    
It has to be really boring there to go for VBA :-)
DarrenWheatley 29-Nov-16 3:27am    
Why the downer on VBA? To be fair, it's as much as I expected, but it's not like we're a code shop or anything.
Kornfeld Eliyahu Peter 29-Nov-16 3:31am    
:-) <-- did you saw it?
I was joking... I believe in the "right tool for..."
DarrenWheatley 29-Nov-16 3:36am    
Well, I guess it was a quiet lunch...
COBOL has awesome string handling:

C#
identification division.
program-id. FixBadWords.

data division.
  
working-storage section.
   
	01 wsData.
	    03 wsComment occurs 5 times pic a(255).
	01 i         pic 9(1).

procedure division. 
	
	move "My PHBis such a poophead. It's gotten worse since his promotion" 
            to wscomment(1)
	move "My PHB is such a POOPHEAD!" to wscomment(2)

	perform varying i from 1 by 1 until i > 5
	    
		inspect wsComment(i) 
			replacing all 'poophead' by 'p**phead'
				      'POOPHEAD' BY 'P**PHEAD'
				      'PHB'      BY 'boss'
				      'gotten'   BY 'become'
		
		display wsComment(i)

	end-perform

stop run
.
 
Share this answer
 
v2
Comments
Chris Maunder 1-Dec-16 16:04pm    
So easy it's almost cheating!
Robert g Blair 1-Dec-16 19:33pm    
Well, you could do a lot to make it more gooderer:

- The string literals can be variables, which you could get from an input source, eg, a database, an input form, xml, whatever.

- You can also use TALLYING and BEFORE INITIAL ...

But case - case in COBOL is not so easy.
Sorry, but for COBOL solutions you have to choose a case. Upper or lower.

Case is merely a social construct y'know :)
Robert g Blair 1-Dec-16 23:02pm    
You can test this code here: https://www.tutorialspoint.com/compile_cobol_online.php
EDIT:
* Figured out way to make 'poopoopoop' become 'p**p**p**p' (without recursion) and keep other design constraints.

* Figured out an 'Oh Duh - that's so simple!' way to keep 'nincompoop' as 'nincompoop' - just replace it with itself!

* Made class static, which resulted in about 8% inprovement. But the additional 'p**p**p**p' routine (replaceRepeatingStrings) dropped speed from 0.24ms to 0.37ms. Still 7 times faster than Solution 10 on my machine.
-END EDIT

Quote:
The point is to make the question a little loose to allow ... pointless religious wars
Good, I'll use some 'goto's :)

I'm not very familiar with regex, but believe it doesn't allow you to change the case of a word based on the cases of the surrounding words. (If I'm wrong, please enlighten me.) If that is the case, 'PHB' will always become 'boss' in the regex solutions, regardless of whether the input 'shouted' the phrase.

The following solution is C++ all the way, and is about 7 times faster than Solution 10 on my machine using Grant's newest timing routine. It does take the capitalization of the surrounding words into effect. It will also change 'pooppoop' to 'p**pp**p', which doesn't seem to have been specified by the specifications, although logically should occur. Additionally, it changes 'poopoopoop' into 'p**p**p**p' without recursion.

Quote:
A weekly simple programming problem that should take no more than half an hour or so
Yeah, right Chris... :rolleyes: This logic was a friggin' pain!

C++
#include <map>
#include <string>
#include <algorithm>
#include <vector>
#include <tuple>
#include <ctime>
#include <iostream>
#include "windows.h" //For QueryPerformanceCounter
using namespace std;

//I am going to use a tuple of: the input word, the small case output word, and the
//capped case output word, so they don't have to be calculated each time.
//So some defines are just for additional clarity:
#define GetTupleWord      get<0>
#define GetTupleSmallCase get<1>
#define GetTupleCapCase   get<2>


//Utility class for the timing...
class TimerLowRes {
   private:
      std::clock_t begC;
      std::clock_t avgTotC;
      std::clock_t diffC;
      int numTimesC;
   public:
      TimerLowRes() : avgTotC(0), numTimesC(0) { }

      void start() { begC = std::clock(); }

      std::clock_t stop() {
         diffC = std::clock() - begC;
         avgTotC = avgTotC + diffC;
         numTimesC++;
         return diffC;
         }

      std::clock_t getAvg() {
         if (numTimesC == 0) return 0;
         return avgTotC / numTimesC;
         }

      void reset() {
         numTimesC = 0;
         avgTotC = 0;
         }

      std::clock_t getLapTime() { return std::clock() - begC; }
   };




//High precision timer utility class for the timing.
//Derived from https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx
class TimerHighRes {
   private:
      LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds, Frequency;
      int numTimesC;
   public:
      TimerHighRes() : numTimesC(0) {
         QueryPerformanceFrequency(&Frequency);
         }

      void start() {
         QueryPerformanceCounter(&StartingTime);
         }

      LARGE_INTEGER stop() {
         QueryPerformanceCounter(&EndingTime);
         ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
         ElapsedMicroseconds.QuadPart *= 1000000;
         ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
         numTimesC++;
         return ElapsedMicroseconds;
         }
   };


class Reformatter {
   private:
      //Good chance that the implied design requirements involve UNICODE text, but for quick
      //prototype of similar design use wstring:

      typedef tuple<wstring, wstring, wstring> WordTuples;
      vector<WordTuples> repeatingStringsC;     //This will address items like 'poopoopoop'
      vector<WordTuples> caseSensitiveSubsC;    //This means that the inputted word must be
                                                //matched on exact case.
      vector<WordTuples> nonCaseSensitiveSubsC; //By this, it is meant that the inputted word
                                                //can be either caps or small case in the text.
      vector<WordTuples> asteriskSubsC;         //The CapCase will be ignored - only the
                                                //asterisks in the small case will be modified.

      wstring resultC;
      size_t  strLenC;
      wstring::const_iterator beginLocC;
      wstring::const_iterator curLocC;
      wstring::const_iterator endLocC;

      bool replaceAsterisks();
      bool replaceNonCaseSensitiveBegin();
      bool replaceCaseSensitive();
      bool replaceRepeatingStrings();
      bool previousWordIsCapitalized(wstring::const_iterator & thisWord, wstring::const_iterator & prevWord);
      bool nextWordIsCapitalized(wstring::const_iterator & thisWord, wstring::const_iterator & nextWord);
      void findBeginningOfPrevious(wstring::const_iterator & thisWord, wstring::const_iterator & prevWord);
      void findBeginningOfNext(wstring::const_iterator & thisWord, wstring::const_iterator & nextWord);

      bool isWhiteSpaceOrPunctuation(const wstring::const_iterator & temp);

   public:
      Reformatter() {
         repeatingStringsC.push_back(make_tuple(L"poo", L"p**", L"p"));

         caseSensitiveSubsC.push_back(make_tuple(L"PHB", L"boss", L"BOSS"));

         //The following 'gotten to be' must be defined before 'gotten':
         nonCaseSensitiveSubsC.push_back(make_tuple(L"gotten to be", L"become", L"BECOME"));
         nonCaseSensitiveSubsC.push_back(make_tuple(L"gotten", L"become", L"BECOME"));

         asteriskSubsC.push_back(make_tuple(L"nincompoop", L"nincompoop", L"nincomp##p"));
         asteriskSubsC.push_back(make_tuple(L"poop", L"p**p", L"P**P"));
         asteriskSubsC.push_back(make_tuple(L"p##p", L"p**p", L"P**P"));
         asteriskSubsC.push_back(make_tuple(L"ass", L"a**", L"A**"));
         }

      void reformat(const wstring & str);
      void outputResult();
   };


void Reformatter::outputResult() {
   wcout << L"OUTPUT: " << resultC << endl << endl;
   }


bool Reformatter::isWhiteSpaceOrPunctuation(const wstring::const_iterator & it) {
   if (*it == L' ' || *it == L'\t' || *it == L'.' || *it == L'!' || *it == L'_' ||
               *it == L'\r' || *it == L'\n') return true;
   return false;
   }


void Reformatter::findBeginningOfNext(wstring::const_iterator & thisWord, wstring::const_iterator & nextWord) {
   //There is the possibility that there is no next word, but there is whitespace.
   //If that is the case, return prevWord = thisWord = curLocC.

   //Go to the whitespace at the end of the current word:
   while (thisWord != endLocC && !isWhiteSpaceOrPunctuation(thisWord)) ++thisWord;
   //Move 'thisWord' back to the beginning of the whitespace:
   nextWord = thisWord;
   //Now skip any additional whitespace:
   while (nextWord != endLocC && isWhiteSpaceOrPunctuation(nextWord)) ++nextWord;
   if (nextWord == endLocC) {
      nextWord = endLocC;
      thisWord = endLocC;
      return;
      }
   }


void Reformatter::findBeginningOfPrevious(wstring::const_iterator & thisWord, wstring::const_iterator & prevWord) {
   //There is the possibility that there is no previous word, but there is whitespace.
   //If that is the case, return prevWord = thisWord = (beginning of word).

   //Go to the whitespace before the current word:
   while (thisWord != beginLocC && !isWhiteSpaceOrPunctuation(thisWord)) --thisWord;
   //Move 'thisWord' back to the beginning (one space forward), and set 'prevWord' current pos:
   prevWord = thisWord;
   ++thisWord;
   //Now skip any additional whitespace:
   while (prevWord != beginLocC && isWhiteSpaceOrPunctuation(prevWord)) --prevWord;
   if (prevWord == beginLocC) {
      if (isWhiteSpaceOrPunctuation(prevWord)) prevWord = thisWord;
      return;
      }
   //We are now on the last character of the previous word.  Iterate to the beginning of it:
   while (prevWord != beginLocC && !isWhiteSpaceOrPunctuation(prevWord)) --prevWord;
   //Check for the case where the user starts the input with a space character:
   if (isWhiteSpaceOrPunctuation(prevWord)) ++prevWord;
   }


bool Reformatter::previousWordIsCapitalized(wstring::const_iterator & thisWord,
               wstring::const_iterator & prevWord) {

   //We are working from the 'curLocC' position in the string.
   //Create a temporary iterator and find the beginning of the previous word.
   //If it reaches the beginning of the string, return 'true' so the 'shouting'
   //routines only rely on the following word.
   findBeginningOfPrevious(thisWord, prevWord);
   if (thisWord == prevWord) return true; //We will default to 'true' for the previous word
               //and 'false' for the next word, so next word processing can occur.
   //Now find the case of each letter until the next whitespace:
   while (!isWhiteSpaceOrPunctuation(prevWord)) {
      wchar_t temp = *prevWord;
      if (iswlower(*prevWord)) return false;
      ++prevWord;
      }
   return true;
   }


bool Reformatter::nextWordIsCapitalized(wstring::const_iterator & thisWord, wstring::const_iterator & nextWord) {
   //We are working from the 'curLocC' position in the string.
   //Create a temporary iterator and find the beginning of the previous word.
   //If it reaches the beginning of the string, return 'true' so the 'shouting'
   //routines only rely on the following word.
   findBeginningOfNext(thisWord, nextWord);
   if (thisWord == nextWord) return false;   //We are defaulting to 'true' for previous word
               //processing, and 'false' for this, so if there isn't any text before or
               //after the word, both processings will occur.
   //Now find the case of each letter until the next whitespace:
   while (nextWord != endLocC && !isWhiteSpaceOrPunctuation(nextWord)) {
      wchar_t temp = *nextWord;
      if (iswlower(*nextWord)) return false;
      ++nextWord;
      }
   return true;
   }


bool Reformatter::replaceCaseSensitive() {
   //This returns true if a replacement has been made.
   bool found = false;
   wstring::const_iterator inputIterator;
   for (auto it = caseSensitiveSubsC.begin(); it != caseSensitiveSubsC.end(); ++it) {
      const wstring & str = GetTupleWord(*it);
      inputIterator = curLocC;
      for (int pos=0, len=str.length(); pos<len; ++pos) {
         found = true;
         if (inputIterator == endLocC || *inputIterator != str[pos]) {
            //The string doesn't match the criteria
            found = false;
            break;
            }
         ++inputIterator;
         }
      if (found) {
         //There is a non-specified scenario to take care of.  If the user inputs something
         //like 'PHBblog.com', in reality we may not want this to substitute 'boss'.  Here
         //is logic to take care of that scenario:
         if (inputIterator == endLocC || !isWhiteSpaceOrPunctuation(inputIterator)) {
            return false;
            }
         //We now know that we need to replace the string, but we don't know if it needs to
         //be 'shouted'.
         //curLocC is still at the beginning of the text being checked against.  This can
         //only be at the beginning of a word.
         wstring::const_iterator thisWord = curLocC;
         wstring::const_iterator prevWord = curLocC;
         wstring::const_iterator nextWord = curLocC;
         //If someone has different length words for cap and small, this will need to be changed in the cases:
         if (previousWordIsCapitalized(thisWord, prevWord) && nextWordIsCapitalized(thisWord, nextWord)) {
            resultC.append(GetTupleCapCase(*it));
            }
         else {
            resultC.append(GetTupleSmallCase(*it));
            }
         //Now append any whitespace after the string:
         curLocC += str.length();
         while (curLocC != endLocC && isWhiteSpaceOrPunctuation(curLocC)) {
            resultC.append(1, *curLocC);
            ++curLocC;
            }
         return found;
         }
      }
   return found;
   }


bool Reformatter::replaceNonCaseSensitiveBegin() {
   //By 'non-case-sensitive', it is meant that the routine will check for either capped or
   //non-capped matches.
   bool found;
   vector<tuple<wstring, wstring, wstring>>::iterator it;
   wstring::const_iterator inputIterator;
   const wstring * str;

   //If there isn't a character before 'curLocC' we don't need to look any further.
   if (curLocC == beginLocC) return false;
   if (!(curLocC-1==beginLocC || isWhiteSpaceOrPunctuation(curLocC-1))) return false;
   if (!isWhiteSpaceOrPunctuation(curLocC-1)) return false;

   for (it = nonCaseSensitiveSubsC.begin(); it != nonCaseSensitiveSubsC.end(); ++it) {
      str = &GetTupleWord(*it);
      inputIterator = curLocC;
      for (int pos=0, len=str->length(); pos<len; ++pos) {
         found = true;
         if (inputIterator == endLocC || towlower(*inputIterator) != (*str)[pos]) {
            found = false;
            break;
            }
         ++inputIterator;
         }
      if (found) break;
      }
   if (found) {
      //We need to see if it is capped, and if so, cap the output:
      inputIterator = curLocC;
      bool isCapped = true;
      for (int pos=0, len=(*str).length(); pos<len; ++pos) {
         if (iswlower(*inputIterator)) {
            isCapped = false;
            break;
            }
         ++inputIterator;
         }
      if (isCapped) {
         resultC.append(GetTupleCapCase(*it));
         }
      else {
         resultC.append(GetTupleSmallCase(*it));
         }
      curLocC += GetTupleWord(*it).length();
      return true;
      }
   return false;
   }


bool Reformatter::replaceAsterisks() {
   bool found;
   vector<tuple<wstring, wstring, wstring>>::iterator it;
   wstring::const_iterator inputIterator;
   const wstring * str;

   for (it = asteriskSubsC.begin(); it != asteriskSubsC.end(); ++it) {
      str = &GetTupleWord(*it);
      inputIterator = curLocC;
      for (int pos=0, len=str->length(); pos<len; ++pos) {
         found = true;
         if (inputIterator == endLocC || towlower(*inputIterator) != (*str)[pos]) {
            found = false;
            break;
            }
         ++inputIterator;
         }
      if (found) break;
      }
   if (found) {
      //Go through the small cap tuple and if a character is an asterisk, replace
      //the corresponding character with an asterisk:
      str = &GetTupleSmallCase(*it);
      inputIterator = curLocC;
      for (int pos=0, len=(*str).length(); pos<len; ++pos) {
         if ((*str)[pos] == L'*') {
            resultC.append(L"*");
            }
         else {
            resultC.append(1, *curLocC);
            }
         ++curLocC;
         ++inputIterator;
         }
      return true;
      }
   return false;
   }


bool Reformatter::replaceRepeatingStrings() {
   bool found;
   vector<tuple<wstring, wstring, wstring>>::iterator it;
   wstring::const_iterator inputIterator;
   const wstring * str;

   for (it = repeatingStringsC.begin(); it != repeatingStringsC.end(); ++it) {
      str = &GetTupleWord(*it);
      inputIterator = curLocC;
      for (int pos=0, len=str->length(); pos<len; ++pos) {
         found = true;
         if (inputIterator == endLocC || towlower(*inputIterator) != (*str)[pos]) {
            found = false;
            break;
            }
         ++inputIterator;
         }
      if (found) {
         //The inputIterator is pointing to the letter right after the initial string
         //(ie, the final 'p' in 'poop')
         str = &GetTupleCapCase(*it);
         if (*inputIterator != (*str)[0]) found = false;
         }
      if (found) break;
      }
   if (found) {
      //Go through the small cap tuple and if a character is an asterisk, replace
      //the corresponding character with an asterisk:
      str = &GetTupleSmallCase(*it);
      inputIterator = curLocC;
      for (int pos=0, len=(*str).length(); pos<len; ++pos) {
         if ((*str)[pos] == L'*') {
            resultC.append(L"*");
            }
         else {
            resultC.append(1, *curLocC);
            }
         ++curLocC;
         ++inputIterator;
         }
      return true;
      }
   return false;
   }


void Reformatter::reformat(const wstring & str) {
   resultC.reserve(str.length()+1);
   resultC.clear();
   beginLocC = str.begin();
   curLocC = beginLocC;
   strLenC = str.length();
   endLocC = str.end();
   while (curLocC != endLocC) {
      //Skip processing on whitespace:
      while (curLocC != endLocC && isWhiteSpaceOrPunctuation(curLocC)) {
         resultC.append(1, *curLocC);
         ++curLocC;
         }
      if (curLocC == endLocC) return;
      //Now do the non-whitespace processing:
      if (!(replaceCaseSensitive() || replaceNonCaseSensitiveBegin() ||
                  replaceRepeatingStrings() || replaceAsterisks())) {
         resultC.append(1, *curLocC);
         ++curLocC;
         }
      }
   }


int main() {
   static Reformatter reformatter;

   //First, some tests to show that it is working:
   wstring str = L"I GOTTEN PHB lettuce!";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   str = L"I GOTTEN phb lettuce!";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   str = L"Not Poopoop, and PoopPoop nInComPoop!!";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   str = L"My PHB is such a poophead. It's gotten worse since his promotion";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   //Changed the following to 'phb.com', so that program checks for more general case.
   //phb.com redirects to another site in reality, but there is no reason not to check
   //against being a URL in the implied design requirements.
   cout << "NOTE THAT 'boss' BECOMES CAPPED OR NON-CAPPED DEPENDING ON THE SURROUNDING WORDS:"
               << endl << "ALSO, 'PHB' BECOMES 'boss' DEPENDING UPON WHETHER 'PHB' IS BY ITSELF."
               << endl;
   str = L"MY PHB HAS started his new BLOG PHB_BLOG.com. And PHBBlog.com! "
               "He's GOTTEn TO BE SUCH A MISBEGOTTEN POOPHEAD!";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   //Now do an timing run:
   TimerLowRes timerLowRes;
   TimerHighRes timerHighRes;
   LARGE_INTEGER timeRighRes;
   vector<float> timeList;
   int numIterations = 100;
   int totalTests = 10;
   for (int numTimes=0; numTimes<totalTests; ++numTimes) {
      timerLowRes.start();
      timerHighRes.start();
      for (int x=0; x<numIterations; ++x) {
         //reformatter.reformat(L"MY PHB HAS started his new BLOG PHB_BLOG.com. And PHBBlog.com! He's gotten TO BE "
         //            " SUCH A MISBEGOTTEN POOPHEAD!");
         reformatter.reformat(L"My PHB is such a poophead. It's gotten worse since his promotion");
         //reformatter.reformat(L"My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!");
         //reformatter.reformat(L"I GOTTEN PHB lettuce!");
         //reformatter.reformat(L"Not Poopoop, but PoopPoop!!");  //Note that this will handle subsequent 'poops' correctly if 2 'p's together at end.
         }
      timeRighRes = timerHighRes.stop();
      clock_t timeLowRes(timerLowRes.stop());
      cout << "low res time: " << timeLowRes << " milliseconds" << endl;
      cout << "high res time: " << timeRighRes.QuadPart/(float)1000 << " milliseconds" << endl;
      timeList.push_back(timeRighRes.QuadPart/(float)1000);
      }
   float avg = 0.0;
   for (int i=0; i<totalTests; ++i) avg += timeList[i];
   avg = avg/totalTests;
   cout << endl << "Average = " << avg << " milliseconds";
   cout << endl << "Enter any key, then an 'Enter' to exit: ";
   cin >> timeRighRes.QuadPart; //Junk way to exit console, but good enough!
   return 0;
   }


Didn't use any goto's, for those who are counting, but wouldn't have given a crap if I did.

It might be possible to streamline the logic in the replaceCaseSensitive and other replacement routines. All the 'found' checking and setting seemed excessive, but it worked and I moved on.

David
 
Share this answer
 
v2
Comments
Robert g Blair 1-Dec-16 15:59pm    
David - my COBOL solution took just 15 minutes to write (and I haven't coded COBOL for many years).

And it will run blindingly fast. Everything is handled in a single intrinsic, compiler optimized, command.

I didn't add the "extra" bits you have, because they weren't in the spec, and I didn't want to spend any time on it.

To implement "pooppoop" -> "p**pp**p" is a simple REPLACING operation.

To implement "poopoopoop" -> "p**p**p**p" would involve a REPLACING "poo" BEFORE INITIAL "p**" - and recursing.
David O'Neil 1-Dec-16 20:22pm    
I like the the simplicity of your COBOL solution! I can see a C++ STL equivalent in my head that would be a little longer, but keep your simplicity. Out of curiosity, can you determine how fast your routine compares to Graeme's output for Solution 10 on your machine? I'm just curious how efficient intrinsics are - I've never played with them.

Your "REPLACING "poo" BEFORE INITIAL "p**"" comment triggered my new approach to solving that part of the problem - without recursion! Thanks!
Robert g Blair 1-Dec-16 22:04pm    
David - its very hard to compare.

If you remove the Display statement it will run in less than a millisecond on, say, a Fujitsu mainframe.
It runs in a couple milliseconds on a Ubuntu box with GnuCobol.

But what I coded was very compiler friendly - hard coded literals, single statement etc.

FYI: When I say "intrinsic" I mean in this sense: https://en.wikipedia.org/wiki/Intrinsic_function
Ie, the compiler is optimizing the string handling involved in the INSPECT statement execution. Whether it does it well or no is a function of the compiler's competence.
EDIT: Ah, the wonders sleep can do. Even though a solution was already chosen, I had an idea to improve my solution.

This was fun. I don't get to use regex too much in my day to day sadly.

C#
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

public class BadWordFilter
{
    public static void Replace(ref string input,
        IDictionary<string, string> badWordMap)
    {
        if (string.IsNullOrWhiteSpace(input))
            throw new ArgumentException(nameof(input),
                "String cannot be null, empty, or whitespace.");

        foreach (KeyValuePair<string, string> badWord in badWordMap)
            ReplaceBadWord(ref input, badWord);
    }

    private static void ReplaceBadWord(ref string input,
        KeyValuePair<string, string> badWord)
    {
        if (string.IsNullOrWhiteSpace(badWord.Key))
            throw new ArgumentException(nameof(badWord.Key),
                "Key cannot be null, empty, or whitespace.");

        string pattern = GetReplacementPattern(badWord.Key);
        MatchEvaluator evaluator = match =>
        {
            string replacement = badWord.Value;
            if (match.Value == match.Value.ToUpper())
            {
                if (badWord.Key != badWord.Key.ToUpper())
                    replacement = badWord.Value.ToUpper();
            }
            else if (match.Value[0] == char.ToUpper(match.Value[0]))
                replacement = char.ToUpper(badWord.Value[0]) + badWord.Value.Substring(1);
            return replacement;
        };
        input = Regex.Replace(input, pattern, evaluator, RegexOptions.IgnoreCase);
    }

    private static string GetReplacementPattern(string badWordKey)
    {
        StringBuilder pattern = new StringBuilder(
            $@"(?:\b){badWordKey.Trim('*')}"
        );
        if (badWordKey.StartsWith("*"))
            pattern.Insert(6, @"\w*", 1);
        if (badWordKey.EndsWith("*"))
            pattern.Append(@"\w*");
        return pattern.ToString();
    }
}


You use it like this:
C#
static void Main(string[] args)
{
    Dictionary<string, string> badWords = new Dictionary<string, string>()
    {
        {"poop*", "p**phead"},
        {"*HB", "boss"},
        {"gotten", "become"},
        {"*crap*", "taco supreme"}
    };
    string input = "My PHB is such a poophead. It's gotten worse since his promotion.  In fact you might call him a supercraphead!";
    BadWordFilter.Replace(ref input, badWords);
    Console.WriteLine(input);
    Console.ReadKey();
}
//Output:
//My boss is such a p**phead.  It's become worse since his promotion.  In fact you might call him a taco supreme!


Handles *text, text*, and *text* wildcards. Will not capitalize a replacement for a word that is by default capitalized (ex. PHB -> boss). Maintains first-letter capitalization when appropriate. Also support any regex code you'd like to put in the key since I directly inject it. For example {"p[oO0]+p*", "p**phead"} will catch poop, poOp, p0OPhead, or even PoO0o0Ophead.
 
Share this answer
 
v7
After a little more refactoring and such:

C#
PIEBALD.Type.ReplaceOmatic replacer = 
  new PIEBALD.Type.ReplaceOmatic
  (
    new System.Tuple<string,string> ( "PHB!"        , "boss!"  ) 
  ,
    new System.Tuple<string,string> ( "gotten"      , "become" ) 
  ,
    new System.Tuple<string,string> ( "*p[o0]{2}p*" , "p**p"   ) 
  ) ;

string intext = "My PHB is such a poophead. It's gotten worse since his promotion. My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!" ;

string outtext = intext ;

int total = replacer.Replace ( ref outtext ) ;

System.Console.WriteLine ( "{0}\n{1}\n{2}" , intext , total , outtext ) ;


Produces:

My PHB is such a poophead. It's gotten worse since his promotion. My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!
5
My boss is such a p**phead. It's become worse since his promotion. My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!



This just in (2016-11-27): ReplaceOmatic[^]


Late breaking news (2016-11-28):
I now have it tracking how many characters were replaced in each section, so it will stop applying replacements once the entire section has been replaced.
This is a reason to sort the more-frequently-hit replacements to the front of the List.

And I've tightened up the code for copying characters to a StringBuilder.

Part of the change:
C#
int r = 0 ;

/* Apply ReplacementSpecs to the section until we run out of them or we've replaced the entire section. */
for ( int i = 0 ; ( i < this.rep.Count ) && ( r < Candidate.Length ) ; i++ )
{
  /* Track how many characters have been replaced so far. */
  r += this.rep [ i ].Replace ( ref temp ) ;
}



News flash! (2016-11-30):
Added Sorting and updated the Article and uploaded the latest code (I hope).
 
Share this answer
 
v9
I saw that no one addressed JavaScript jet...
I did...
JavaScript
function Bad2Good(text)
{
	var T = {
		'poophead': 'p**phead',
		'PHB!': 'boss',
		'gotten': 'become',
		'poop*': 'p**p',
		'*poop': 'p**p'
	};

	for (var prop in T)
	{
		var flags = 'ig';

		// starts with
		if (prop[prop.length-1] == '*')
		{
			prop = /(\s|[!?.,])poop(\w*)/;
		}

		// ends with
		if (prop[0] == '*')
		{
			prop = /(\w*)poop(\s|[!?.,])/;
		}

		// case sensitive
		if (prop[prop.length - 1] == '!')
		{
			flags = 'g';
			prop = prop.substr(0, prop.length - 1);
		}

		text = text.replace(new RegExp(prop, flags), function (match, p1, p2, p3, p4)
		{
			if (Number.isInteger(p1))
			{
				if (match.toUpperCase() == match)
				{
					// case sensitive
					if (T[match.toLowerCase()] == undefined)
					{
						return (T[match + '!']);
					}
					// shouting
					else
					{
						return (T[match.toLowerCase()].toUpperCase());
					}
				}
				// normal
				else
				{
					return (T[match]);
				}
			}
			else
			{
				// starts with
				if(/(\s|[!?.,])/.test(p1))
				{
					match = match.substr(p1.length, match.length - p1.length - p2.length);

					return (p1 + ((match.toUpperCase() == match) ? T[match.toLowerCase() + '*'].toUpperCase() : T[match.toLowerCase() + '*']) + p2);
				}
				// ends with
				else if (/(\s|[!?.,])/.test(p2))
				{
					match = match.substr(p1.length, match.length - p1.length - p2.length);

					return (p1 + ((match.toUpperCase() == match) ? T['*' + match.toLowerCase()].toUpperCase() : T['*' + match.toLowerCase()]) + p2);
				}
				else
				{
					return (match);
				}
			}
		});
	}

	return (text);
}


Got home. Got time. Saw some of my bugs and smashed them...
JavaScript
function Bad2Good(text)
{
	var T = {
		'PHB!': 'boss',
		'gotten': 'become',
		'poop*': 'p**p',
		'*poop': 'p**p'
	};

	for (var prop in T)
	{
		var flags = 'ig';

        // case sensitive
		if (prop[prop.length - 1] == '!')
		{
			flags = 'g';
			prop = prop.substr(0, prop.length - 1);
		}

		// starts with
		if (prop[prop.length - 1] == '*')
		{
			prop = '(\\b)' + prop.substr(0, prop.length - 1) + '(\\w*)(\\b)';
		} 
        
        // ends with
		if (prop[0] == '*')
		{
			prop = '(\\b)(\\w*)' + prop.substr(1, prop.length) + '(\\b)';
		}

		text = text.replace(new RegExp(prop, flags), function (match, len, p1)
		{
            var lookup;
            var replace;
            var good;

            if(Number.isInteger(len))
            {
                replace = match;
                lookup = match;
            }
            else 
            {
                if(match.startsWith(p1)) 
                {
                    replace = match.replace(p1, '');
                    lookup = '*' + replace;
                }
                else
                {
                    replace = match.replace(p1, '');
                    lookup = replace + '*';
                }
            }

            good = T[lookup.toLowerCase()] || T[lookup + '!'];

            if(T[lookup + '!'] == undefined)
            {
                if((replace.toUpperCase() == replace))
                {
                    good = good.toUpperCase();
                }
                else if((replace.toLowerCase() == replace))
                {
                    good = good.toLowerCase();
                }
            }

            return(match.replace(replace, good));
		});
	}

	return (text);
}
 
Share this answer
 
v3
Comments
PIEBALDconsult 27-Nov-16 17:26pm    
Are your replacements hard-coded?
Kornfeld Eliyahu Peter 27-Nov-16 17:40pm    
As a bug... I just updated with the home-made version :-)
Quick and dirty Javascript

JavaScript
var bad = ["poop*", "PHB!", "gotten"];
var replacement = ["p**p", "boss", "become"];
function escapeRegExp(str) {
  return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
function replaceBad(sentence){
	return sentence.split(/\b/).map(w => {
		for(var i=0, m=bad.length; i<m; i++){
			var badToken = bad[i];
			if(badToken.slice(-1)==='*') { // ends with *
				return w.replace(new RegExp('^' + escapeRegExp(badToken.slice(0,-1)) + '(.*)$', 'i'), replacement[i]+'$1');
			}
			if(badToken.slice(0)==='*') { // starts with *
				return w.replace(new RegExp('^(.*)'+ escapeRegExp(badToken.slice(1)) + '$', 'i'), '$1' + replacement[i]);
			}
			if(badToken.slice(-1)==='!') { // ens with !
				return w.replace(new RegExp('^'+ escapeRegExp(badToken.slice(0,-1)) + '$', ''), replacement[i]);
			}
			return w.replace(new RegExp('^'+badToken+'$', 'i'), replacement[i]);
		}
	}).join('');
}

var test=['Hello poophead; glad your PHB has "gotten" less rotten',
'Hi poopheadpoop; glad your PhB has "undergotten" less NOPHB'
];

test.forEach(s => {
	console.log(s + ' => ' + replaceBad(s)); 
})
 
Share this answer
 
Only spotted this 20 min ago, so hopefully not too late to the tea party!

C# code allows for individual word case sensitivity + shortcut override (in bonus requirement). All cases use the same bad word replacement extension methods. ReplaceWords string extension for a set of words or ReplaceText string extension for single word granularity.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace BadWordFilter
{
    class Program
    {
        static void Main(string[] args)
        {
            FirstTest();
            BonusTest();

            Console.WriteLine("-- Press any key to exit --");
            Console.ReadKey();
        }

        private static void FirstTest()
        {
            var badWords = new Dictionary<string, tuple<string, bool>>()
            {
                {"poophead", new Tuple<string, bool>("p**phead", false)},
                {"PHB", new Tuple<string, bool>("boss", false)},
                {"gotten",new Tuple<string, bool>("become", false) }
            };

            string badMessage = "My        PHB is such a poophead. It's gotten worse since his promotion";
            string goodMessage = badMessage.ReplaceWords(badWords, filtered: false);

            Console.WriteLine("First Test");
            Console.WriteLine("==========");
            Console.WriteLine($"Before: {badMessage}");
            Console.WriteLine($"After:  {goodMessage}");
            Console.WriteLine();

            badWords = new Dictionary<string, tuple<string, bool>>()
            {
                {"POOPHEAD", new Tuple<string, bool>("P**PHEAD", true)},
                {"poophead", new Tuple<string, bool>("p**phead", false)},
                {"PHB", new Tuple<string, bool>("boss", true)}
            };

            badMessage = "My PHB is such a POOPHEAD!";
            goodMessage = badMessage.ReplaceWords(badWords, filtered: false);

            Console.WriteLine($"Before: {badMessage}");
            Console.WriteLine($"After:  {goodMessage}");
            Console.WriteLine();

        }

        private static void BonusTest()
        {
            var badWords = new Dictionary<string, tuple<string, bool>>()
            {
                {"POOP*", new Tuple<string, bool>("P**P", true) },
                {"poop*", new Tuple<string, bool>("p**p", false) },
                {"PHB!", new Tuple<string, bool>("boss", true) },
                {"gotten",new Tuple<string, bool>("become", true) }
            };

            string badMessage = "My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!";
            string goodMessage = badMessage.ReplaceWords(badWords, filtered: true);

            Console.WriteLine("Bonus Test");
            Console.WriteLine("==========");
            Console.WriteLine($"Before: {badMessage}");
            Console.WriteLine($"After:  {goodMessage}");
            Console.WriteLine();
        }
    }

    public static class StringExtension
    {
        private static readonly string[] spaceSplitChar = { " " };
        private static readonly char[] wordSplitChars = @"?<=[\.!\?])".ToCharArray();

        public static string ReplaceWords(this string input, Dictionary<string, tuple<string, bool>> badWords, bool filtered = false, bool trimWhitespaces = true)
        {
            if (string.IsNullOrEmpty(input))
                return input;

            foreach (var word in badWords)
                input = input.ReplaceText(word.Value.Item1, word.Key, filtered: filtered, isCaseSensitive: word.Value.Item2, trimWhitespaces: trimWhitespaces);

            return input;
        }

        public static string ReplaceText(this string input, string replaceText, string findText = "", bool filtered = false, bool isCaseSensitive = false, bool trimWhitespaces = true)
        {
            if (string.IsNullOrEmpty(input))
                return input;

            var sb = new StringBuilder();
            bool isMatchStart = false, 
                 isMatchEnd = false;
            StringComparison compareMode = isCaseSensitive ? StringComparison.InvariantCulture : StringComparison.InvariantCultureIgnoreCase;

            if (filtered)
            {
                isMatchStart = findText.EndsWith("*");
                isMatchEnd = findText.StartsWith("*");
                compareMode = findText.EndsWith("!") ? StringComparison.InvariantCulture : compareMode;
                findText = findText.Trim(new[] { '*', '!' });
            }

            int lenOldValue = findText.Length,
                curPosition = 0,
                idxNext = input.IndexOf(findText, compareMode);

            while (idxNext >= 0)
            {
                if (isMatchStart || !isMatchEnd)
                {
                    if (input.Substring(idxNext - 1, 1).Equals(" "))
                        sb.Append(input, curPosition, idxNext - curPosition)
                          .Append(replaceText);
                }
                else if (isMatchEnd)
                {
                    sb.Append(input, curPosition, idxNext - curPosition)
                      .Append(curPosition < input.Length && input.Substring(curPosition, curPosition + 1)[0].IsEndChar() ? findText : replaceText);
                }

                curPosition = idxNext + lenOldValue;
                idxNext = input.IndexOf(findText, curPosition, compareMode);
            }
            sb.Append(input, curPosition, input.Length - curPosition);

            input = sb.ToString();
            if (trimWhitespaces)
                input= input.TrimCharacter(' ');

            return input;
        }

        public static bool IsEndChar(this char c)
            => wordSplitChars.Contains(c);

        public static string TrimCharacter(this string input, char c)
            => string.Join(c.ToString(), input.Split(c).Where(str => str != string.Empty).ToArray());

    }
}


Output from execution:
First Test
==========

Before: My        PHB is such a poophead. It's gotten worse since his promotion
After:  My boss is such a p**phead. It's become worse since his promotion

Before: My PHB is such a POOPHEAD!
After:  My boss is such a P**PHEAD!


Bonus Test
==========

Before: My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!
After:  My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!

-- Press any key to exit --


This code is built for efficiency, so no RegEx used! ;)

[edit: hopefully I have fixed all the weird stuff when code pasted in...]
 
Share this answer
 
v12
VISUAL PROLOG 7.5 Solution.

Note: the bad_nice/2 clauses can be moved to an external file for maintenance independent of the rest of the code.

This could be tidied up more, but I considered speed of submission to be the priority here.

Harrison Pratt


class predicates
    badWordFix : ( string TextToFilter ) -> string FixedText.
clauses
    badWordFix( Text ) = FixedText :-
        Splitters = " .,:!",
        FixedText = fixText ( Text, Splitters, "" ).

        class predicates
            fixText : ( string, string Separators, string Temp ) -> string.
        clauses
            fixText( "",_,S ) = S :-!.
            fixText( S, Separators, Temp ) = FixedText :-
                string::splitStringBySeparators(S,Separators,HeadStr,CurrSep,Rest ),
                !,
                FixedText = fixText( Rest, Separators, string::concat( Temp, filteredWord(HeadStr), string::charToString(CurrSep) )).
            fixText( S, Separators, Temp ) = FixedText :-
                FixedText = fixText("",Separators,string::concat(Temp,S )).

    class predicates
        filteredWord : ( string ) -> string .
    clauses
        filteredWord( S ) = NiceWord :-
            CleanS = stripPrefix( stripSuffix(S,"*"), "*" ),
            bad_nice( CleanS, NiceWord ), !.
        filteredWord( S ) = S.

    class predicates
        bad_nice : ( string, string [out] ) determ.
    clauses
        bad_nice( "poophead", "p***head" ).
        bad_nice( "PHB", "boss" ).
        bad_nice( "gotten", "become" ).

    class predicates
        stripPrefix : ( string, string ) -> string.
    clauses
        stripPrefix( S, Pre ) = StripStr :-
            string::hasPrefix(S,Pre,StripStr), !.
        stripPrefix( S, _ ) = S.
    
    class predicates
        stripSuffix : ( string, string ) -> string.
    clauses    
        stripSuffix( S, Suff ) = StripStr :-
            string::hasSuffix(S,Suff,StripStr), !.
        stripSuffix( S, _ ) = S.
 
Share this answer
 
I'm a fan of clean code.
Which for me is at least:
- It should be expressive-> meaningful names
- Smaller is better -> methods should be short
- Can be easily extended by any other developer -> if requiremets changes easy to adapt
- Reading the code should be pleasant -> because you can follow the logic without overstress your brain

Here it is (BadWordsMatch.cs):

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text;
using System.Text.RegularExpressions;

namespace BadWords
{
    public enum eHow { easy, hard };

    public class BadWordsMatchAndReplace
    {
            public enum eMatchMode { startsWith, Endswith, Equals };
            public struct sWildCardAndCase
            {
                public eMatchMode matchMode;
                public bool respectCase;
                public string key;
            }

            public static Dictionary<sWildCardAndCase, string> dictBadWordsEasy = new Dictionary<sWildCardAndCase, string>(){
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = true, key ="poophead"}, "p**phead"},
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = true, key ="PHB"}, "boss"},
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = true, key ="gotten"} , "become"}
            };

            public static Dictionary<sWildCardAndCase, string> dictBadWordsHard = new Dictionary<sWildCardAndCase, string>(){
                {new sWildCardAndCase() { matchMode = eMatchMode.startsWith, respectCase = false, key = "poop" }, "p**p" },
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = true, key = "PHB" }, "boss" },
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = false, key = "gotten" }, "become" }

           };

            // shouting is retrieved by allowing an exclamation mark in words and later on compared to last char therein
            public static Regex r = new Regex(@"([!\w])+", RegexOptions.None);
            public static bool shouting = false;

            /// <summary>
            /// checks an input sentence for bad words and replaces them by safer expressions
            /// </summary>
            /// <param name="input">sentence to check</param>
            /// <param name="how">easy or hard substitution rule</param>
            /// <returns>corrected string</returns>           
            /// 
            public static string MatchAndReplace(string input, eHow how)
            {

                string output = input;
                string s = string.Empty;
                shouting = false;

                Dictionary<sWildCardAndCase, string> baseDict = (how == eHow.easy ? dictBadWordsEasy : dictBadWordsHard);
                Match m = r.Match(input);

                while (m.Success)
                {
                    s = m.Value;
                    bool found = false;

                    foreach (KeyValuePair<sWildCardAndCase, string> kvp in baseDict)
                    {
                        bool respectCase = kvp.Key.respectCase;
                        if ( s.EndsWith("!"))
                        {
                            // case insensitive and make replacement uppercase
                            shouting = true;
                            respectCase = false;
                        }
                        // looks for matches under given constraints and updates output accordingly
                        found = ProcessWord(s, kvp.Key.key, respectCase, kvp.Key.matchMode, kvp.Value, ref output);
                        if (found) break;
                    }
                    m = m.NextMatch();
                }
                return output;
 

            }

            /// <summary>
            /// looks for a match of word under given constraints and updates output accordingly
            /// </summary>
            /// <param name="word">word from input string</param>
            /// <param name="key">bad word</param>
            /// <param name="respectCase">match only case exactly</param>
            /// <param name="matchMode">wildcard specifier</param>
            /// <param name="value">replacement</param>
            /// <param name="output">output -> output with replacement</param>
            /// <returns>no match -> false else true</returns>
            ///             
            private static bool ProcessWord(string word, string key, bool respectCase, eMatchMode matchMode, string value, ref string output)
            {
                if (respectCase && !word.Contains(key)) return false;
                if (!respectCase && !word.ToLower().Contains(key.ToLower())) return false;
                if (!ConstraintsMet(  word,  key,  respectCase,  matchMode)) {
                    return false;
                }
                if (respectCase)
                {
                    output = output.Replace(key, value);
                }
                else
                {
                    if (word.Any(char.IsUpper))
                        output = output.Replace(key.ToUpper(), value.ToUpper());
                    else if (shouting)
                        output = output.Replace(key, value.ToUpper());
                    else
                        output = output.Replace(key, value);
                }
                return true;

            }

            /// <summary>
            /// checks for constraints
            /// </summary>
            /// <param name="word">word from input string</param>
            /// <param name="key">bad word</param>
            /// <param name="respectCase">match only case exactly</param>
            /// <param name="matchMode">wildcard specifier</param>
            /// <returns>constraints met -> true else false</returns>
            ///              
            private static bool ConstraintsMet( string word, string key, bool respectCase, eMatchMode matchMode)
            {
                string wordToCompare = word;
                if (shouting) { wordToCompare = wordToCompare.TrimEnd('!'); };
                switch (matchMode)
                {
                    case eMatchMode.Equals:
                        if (respectCase && !wordToCompare.Equals(key)) { return false; }
                        else if (!respectCase && !wordToCompare.ToLower().Equals(key.ToLower())) { return false; }
                        break;
                    case eMatchMode.startsWith:
                        if (respectCase && !wordToCompare.StartsWith(key)) { return false; }
                        else if (!respectCase && !wordToCompare.ToLower().StartsWith(key.ToLower())) { return false; }
                         break;
                    case eMatchMode.Endswith:
                        if (respectCase && !wordToCompare.EndsWith(key)) { return false; }
                        else if (!respectCase && !wordToCompare.ToLower().EndsWith(key.ToLower())) { return false; }
                        break;
                }
                return true;

            }

    }
}



It should be self explaining.
Most important is: I concentrate the requirements about case sensitivity and wildcard rules in a data structure which relieves me from looking for *s before or
after the search key.
For the 'Easy part' I would not had a need for this but once I made it for the hard
part it's nice to have it for this part too, because the code can be one for both
(
C#
Dictionary<sWildCardAndCase, string> baseDict = (how == eHow.easy ? dictBadWordsEasy : dictBadWordsHard);


For Testing my Code I have provided a little Console Main, which is here (Program.cs):

C#
using System;
using System.Collections.Generic;
using System.Linq;


namespace BadWords
{
    class Program
    {
        static void Main(string[] args)
        {        
            string input = string.Empty;
            string s = string.Empty;
            string output = string.Empty;

            while (!(s.Equals("q")))
            {
                while (!(s = Console.ReadLine()).Equals("."))
                {
                    input = input + s;
                }
                Console.WriteLine("------------- Easy Match ---------------"); 
                Console.WriteLine(input);
                output = BadWordsMatchAndReplace.MatchAndReplace(input, eHow.easy);
                Console.WriteLine(output);
                Console.WriteLine("------------- Bonus Match ---------------");
                Console.WriteLine(input);
                output = BadWordsMatchAndReplace.MatchAndReplace(input, eHow.hard);
                Console.WriteLine(output);
                Console.WriteLine("-------------- Next Input ---------------");
                s = Console.ReadLine();
                input = s;

            }
        }
    }
}


Usage: Paste your input (multiline allowed), then type <return>.<return>
If you want to stop the program then type <return>q<return>
 
Share this answer
 
Comments
PIEBALDconsult 29-Nov-16 18:51pm    
Oh, the humanity! Wasteful wasteful. You throw away perfectly good information that can be used later.
You don't even allow for multiple keys to match a word.
I decided to use F#…

Each filter specification is converted to a closure that takes a string, potentially sanitises it and returns the good string. To process all words in a string, use Regex.Replace with a MatchEvaluator that runs the filter list over the matched word.

F#
open System
open System.Text.RegularExpressions

// Define an active pattern to match a string starting with a prefix and return the remnant of the string
let (|Prefix|_|) (ignoreCase:bool) (prefix:string) (s:string) =
   let maybePrefix = s.Substring(0, (min prefix.Length s.Length))
   match String.Compare(maybePrefix, prefix, ignoreCase) with
   | 0 -> Some(maybePrefix, s.Substring(prefix.Length))
   | _ -> None

// Define an active pattern to match a string ending with a suffix and return the remnant of the string
let (|Suffix|_|) (ignoreCase:bool) (suffix:string) (s:string) =
   let maybeSuffix = s.Substring((max 0 (s.Length - suffix.Length)))
   match String.Compare(maybeSuffix, suffix, ignoreCase) with
   | 0 -> Some(maybeSuffix, s.Substring(0, s.Length - suffix.Length))
   | _ -> None

// Adjust case of a good word to reflect how the bad word's been used
let AdjustCase (suffix:string) (badBit:string) (goodWord:string) (ignoreCase:bool) =
   if not ignoreCase then
      goodWord
   else if badBit = suffix.ToUpper() then
      goodWord.ToUpper()
   else if badBit = suffix.ToLower() then
      goodWord.ToLower()
   else
      goodWord

// Create a filter (a closure of string -> string) from a filter spec (pair of strings, 
// first = bad word, second = replacement). Applying the closure to a string will sanitise it.
let createFilter (spec:string*string) =
   let rec createFilterHelper (badWord:string) (goodWord:string) (ignoreCase:bool) =
      match badWord with
      // Case sensitive?
      | Suffix true "!" (_, prefix) -> createFilterHelper prefix goodWord false
      // badWord is a prefix (<string>*)
      | Suffix true "*" (_, prefix) ->
         fun (word:string) ->
            match word with
            | Prefix ignoreCase prefix (badBit, restOfWord) -> (AdjustCase prefix badBit goodWord ignoreCase) + restOfWord
            | anyOtherWord -> anyOtherWord
      // badWord is a sufffix (*<string>)
      | Prefix true "*" (_, suffix) ->
         fun (word:string) ->
            match word with
            | Suffix ignoreCase suffix (badBit, restOfWord) -> restOfWord + (AdjustCase suffix badBit goodWord ignoreCase)
            | anyOtherWord -> anyOtherWord
      // badWord is fixed
      | anyOtherWord -> 
         fun (word:string) ->
            match String.Compare(word, badWord, ignoreCase) with
            | 0 -> AdjustCase badWord word goodWord ignoreCase
            | _ -> word
   // Invoke createFilterHelper
   createFilterHelper (fst spec) (snd spec) true

// Create a filter list from a spec list
let createFilters specs = specs |> List.map createFilter

// Apply a sequence of filters to a word
let filterAWord filters word = filters |> Seq.fold (fun word filter -> filter word) word

// Apply a sequence of filters to each word in a string
let filterWords filters text = Regex.Replace(text, "\w+", MatchEvaluator(fun regexMatch -> filterAWord filters regexMatch.Value))

// These are my test filter specs
let filterSpecs = [ ("poop*", "p**p"); ("PHB!", "boss") ; ("gotten", "become") ]

// And my test filters
let compiledFilters = createFilters filterSpecs

let tests = 
   printfn "Poophead -> p**phead = %b" ((filterAWord compiledFilters "Poophead") = "p**phead")
   printfn "PHB -> boss = %b" ((filterAWord compiledFilters "PHB") = "boss")
   printfn "Phb -> Phb = %b" ((filterAWord compiledFilters "Phb") = "Phb")
   printfn "gotten -> become = %b" ((filterAWord compiledFilters "gotten") = "become")
   printfn "<long string=""> - %b" ((filterWords compiledFilters "My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!") = "My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!")
</long></string></string>
 
Share this answer
 
Comments
PIEBALDconsult 29-Nov-16 18:52pm    
Does the filter indicate whether or not anything was replaced?
Stuart Dootson 30-Nov-16 2:09am    
No, but they could easily do so (return tuple of bool & string?). To stop filtering once a substitutions made, I presume? If that's the case, the Seq.fold needs to become a 'Seq.map |> Seq.find' - that's lazy evaluated, so not all the filters would be processed.
Stuart Dootson 30-Nov-16 5:25am    
In fact - the easiest way to accomplish this is to:

1. Change the filter to return a string option (Some string indicates that filtering was done)

2. Add a 'null' filter (always returns the input word) at the end of the filter list to ensure there's always a succeeding filter.

3. Change the filterAWord function to be:

let filterAWord (filters:(string -> string option) seq) (word:string) =
      filters
      |> Seq.map (fun filter -> filter word)
      |> Seq.find Option.isSome
      |> Option.get
NOTE: This is a performance summary comparison of the above C# solutions only

** Updated 2/12/16: Stuart Dootson supplied test project, so timings updated.

I was curious to see how the other C# solutions compared performance wise to my own, so I build a test bed that you can download[^] and try yourself. I balanced out all the solutions so they ran equally against the same test strings.

Disclaimer: I did not check each solution to see if they all gave the same output or not. I have now run tests on each of the solutions and 3 of them fail the benchmark set. (I suggest that the authors of solutions 3, 5 & 12 download the link above and check for errors.)

Here are the results (best to worse ranked by average) on my dev machine compiled in Release Mode and run from the command line.

(Note: Results will differ for you based on your system configuration)

After the initial positioning of results and tweaks to all the tests (v2), I felt that I could improve my results a little, so I've made some minor changes (can be found in the download link above), and the revised results are now:

-----------------------------------------------------------------------------------
 
Tests run:        10
Iterations/test:  100
Total executions: 1,000
 
Basic Test
 
   Solution 10   : MIN:  0.72670 ms | MAX:  0.96770 ms | AVG:  0.75483 ms
   Solution 1    : MIN:  1.17980 ms | MAX:  2.08300 ms | AVG:  1.30762 ms
   Solution 3    : MIN:  2.92230 ms | MAX:  3.71300 ms | AVG:  3.03632 ms
   Solution 8    : MIN:  3.97410 ms | MAX:  4.34740 ms | AVG:  4.06542 ms
   Solution 12   : MIN:  4.97950 ms | MAX:  5.31250 ms | AVG:  5.04871 ms
   Solution 13*2 : MIN:  7.2594 ms  | MAX: 10.2251 ms  | AVG:  8.18209 ms
   Solution 2    : MIN:  7.33320 ms | MAX:  8.42080 ms | AVG:  7.63468 ms
   Solution 13*1 : MIN:  8.4053 ms  | MAX: 14.0585 ms  | AVG: 10.4869 ms
   Solution 5    : MIN: 14.79730 ms | MAX: 17.29360 ms | AVG: 15.54762 ms
 
Bonus Test
 
   Solution 10   : MIN:  1.08750 ms | MAX:  1.13540 ms | AVG:  1.10539 ms
   Solution 1    : MIN:  1.09130 ms | MAX:  1.35580 ms | AVG:  1.15636 ms
   Solution 3    : MIN:  2.95310 ms | MAX:  3.38000 ms | AVG:  3.04058 ms
   Solution 8    : MIN:  5.37370 ms | MAX:  5.74020 ms | AVG:  5.46218 ms
   Solution 12   : MIN:  7.44160 ms | MAX:  7.72250 ms | AVG:  7.50184 ms
   Solution 13*2 : MIN:  7.4998 ms  | MAX: 12.9918 ms  | AVG:  8.99166 ms
   Solution 13*1 : MIN:  8.2866 ms  | MAX: 11.5506 ms  | AVG:  9.42119 ms
   Solution 2    : MIN:  8.54510 ms | MAX:  8.86510 ms | AVG:  8.64252 ms
   Solution 5    : MIN: 16.79520 ms | MAX: 17.75920 ms | AVG: 17.19089 ms
 
*1 F#Solution
*2 PreCompiledF#Solution
-----------------------------------------------------------------------------------
 
Tests run:        10
Iterations/test:  1000
Total executions: 10,000
 
Basic Test
 
   Solution 10   : MIN:   7.24660 ms | MAX:   7.52710 ms | AVG:   7.35793 ms
   Solution 1    : MIN:  10.38490 ms | MAX:  10.72050 ms | AVG:  10.53257 ms
   Solution 3    : MIN:  29.32480 ms | MAX:  29.52170 ms | AVG:  29.41673 ms
   Solution 8    : MIN:  39.87840 ms | MAX:  40.39850 ms | AVG:  40.05387 ms
   Solution 12   : MIN:  49.83420 ms | MAX:  50.64540 ms | AVG:  50.08076 ms
   Solution 2    : MIN:  73.15800 ms | MAX:  74.28660 ms | AVG:  73.62445 ms
   Solution 13*2 : MIN:  78.6023 ms  | MAX:  93.5715 ms  | AVG:  83.0625 ms
   Solution 13*1 : MIN:  88.4249 ms  | MAX:  93.8609 ms  | AVG:  90.9543 ms
   Solution 5    : MIN: 140.52420 ms | MAX: 148.83820 ms | AVG: 143.44404 ms
 

Bonus Test
 
   Solution 1    : MIN:   9.54210 ms | MAX:  10.12520 ms | AVG:   9.72796 ms
   Solution 10   : MIN:  10.84060 ms | MAX:  11.22680 ms | AVG:  10.94738 ms
   Solution 3    : MIN:  29.66310 ms | MAX:  30.13180 ms | AVG:  29.79657 ms
   Solution 8    : MIN:  54.05920 ms | MAX:  59.03390 ms | AVG:  56.43869 ms
   Solution 12   : MIN:  74.48620 ms | MAX:  75.41370 ms | AVG:  74.66223 ms
   Solution 13*2 : MIN:  78.3943 ms  | MAX:  98.3158 ms  | AVG:  83.7617 ms
   Solution 2    : MIN:  87.62770 ms | MAX:  92.20400 ms | AVG:  91.17817 ms
   Solution 13*1 : MIN:  87.9642 ms  | MAX:  99.3946 ms  | AVG:  91.8193 ms
   Solution 5    : MIN: 168.75850 ms | MAX: 201.33640 ms | AVG: 175.94069 ms

*1 F#Solution
*2 PreCompiledF#Solution
-----------------------------------------------------------------------------------


NOTE: with a small adjustment to the test to better reflect real-world examples, many of the solutions received a performance gain.

Lastly, if I have not correctly used a solution, please let me know so that I can update results. If any authors make changes, also please let me know and I will make sure that the tests are re-run and results published.

* old test: download[^]
* old v2 test: download[^]
 
Share this answer
 
v6
Comments
Stuart Dootson 30-Nov-16 11:21am    
Out of inerest, I converted your benchmarking code to F# to measure my F# solution & got the following timings:

Statistics
----------
Tests run:       10
Iterations/test: 100
-----------------------------------------------------------------------------
F# Solution - Basic: MIN = 2.6989, MAX = 3.2336, AVG = 2.78732
F# Solution - Bonus: MIN = 2.8122, MAX = 3.1169, AVG = 2.8664

Statistics
----------
Tests run:       10
Iterations/test: 1000
-----------------------------------------------------------------------------
F# Solution - Basic: MIN = 25.4413, MAX = 27.2665, AVG = 25.8091
F# Solution - Bonus: MIN = 28.5614, MAX = 30.4572, AVG = 29.0724
Graeme_Grant 30-Nov-16 11:38am    
Send me a link to the F# code and I'll run it here so I can update the results. :)
Stuart Dootson 30-Nov-16 19:23pm    
https://gist.github.com/studoot/f883c041cf6aaeaa1a752114963c16b7 You (of course) meant the timing code - silly me posted the link to my solution code... I'll get the timing harness code posted up when I can get to it...
Stuart Dootson 1-Dec-16 6:01am    
Here we go - it's a Github gist

Easiest way for you to build it is probably create a new F# Console Application in VS2015, paste the code in that gist into the source file that's created by the template and then build/run the Release variant.
Graeme_Grant 1-Dec-16 6:37am    
Cool... Compiled and ran first time! Never done F# before.

I set my 3+yr old MBP (Macbook Pro running Win10 in bootcamp) the same as the other speed tests and the results were as follows:

-----------------------------------------------------------------------------

Tests run:       10
Iterations/test: 100

Basic Test

  FsSolution   - Basic : MIN: 7.8449 ms | MAX 8.8849 ms | AVG 8.52395 ms
  *PreCompiled - Basic : MIN: 7.2831 ms | MAX 8.2619 ms | AVG 7.61122 ms

Bonus Test

  FsSolution   - Bonus : MIN: 7.7423 ms | MAX 8.4193 ms | AVG 7.92343 ms
  *PreCompiled - Bonus : MIN: 7.2196 ms | MAX 10.7562 ms | AVG 7.76979 ms

-----------------------------------------------------------------------------

Tests run:       10
Iterations/test: 1000

Basic Test

  FsSolution   - Basic : MIN: 77.7693 ms | MAX 80.7415 ms | AVG 78.8205 ms
  *PreCompiled - Basic : MIN: 72.5419 ms | MAX 77.3086 ms | AVG 73.9815 ms

Bonus Test

  FsSolution   - Bonus : MIN: 78.1498 ms | MAX 84.8175 ms | AVG 80.1699 ms
  *PreCompiled - Bonus : MIN: 73.0482 ms | MAX 76.4278 ms | AVG 74.1332 ms

-----------------------------------------------------------------------------


Once PIEBALDconsult finishes his bit I'll update the details with this table and add a link to the F# solution.
A Haskell solution… Essentially a rough translation of my F# one!

{-# LANGUAGE OverloadedStrings, TemplateHaskell, QuasiQuotes #-}

module Main where

import Criterion.Main
import Data.CaseInsensitive (mk, CI)
import Data.Char as C
import Data.List (find)
import Data.Maybe (fromJust, isJust)
import Data.Stringable
import Data.Text (Text)
import qualified Data.Text as T
import Text.Regex.PCRE.Heavy

data FilterSpec = FilterSpec { badPattern::Text, goodWord::Text }
type FilterSpecs = [FilterSpec]
type Filter = Text -> Maybe Text
type Filters = [Filter]

double :: Text -> Text
double match = (match `T.append` match)

wordRegex = [re|\w+|]

stringsAreEqual ignoreCase left right
  | ignoreCase && (mk left) == (mk right) = True
  | (not ignoreCase) && left == right = True
  | otherwise = False

adjustCase badBit actualBit goodWord ignoreCase
  | not ignoreCase = goodWord
  | actualBit == (T.toUpper badBit) = T.toUpper goodWord
  | actualBit == (T.toLower badBit) = T.toLower goodWord
  | (T.head actualBit) == (C.toUpper $ T.head badBit) = makeFirstCharUpper
  | otherwise = goodWord
  where
    upperFirstChar = C.toUpper $ T.head goodWord
    makeFirstCharUpper = T.cons upperFirstChar (T.tail goodWord)

prefixFilter :: Text -> Text -> Bool -> Text -> Maybe Text
prefixFilter badWord goodWord ignoreCase word
  | stringsAreEqual ignoreCase prefix badWord = Just $ T.append caseAdjusted rest
  | otherwise = Nothing
  where
    (prefix, rest) = T.splitAt (T.length badWord) word
    caseAdjusted = adjustCase badWord prefix goodWord ignoreCase

suffixFilter :: Text -> Text -> Bool -> Text -> Maybe Text
suffixFilter badWord goodWord ignoreCase word
  | stringsAreEqual ignoreCase suffix badWord = Just $ T.append rest caseAdjusted
  | otherwise = Nothing
  where
    (rest, suffix) = T.splitAt (T.length word - T.length badWord) word
    caseAdjusted = adjustCase badWord suffix goodWord ignoreCase
    
wordFilter :: Text -> Text -> Bool -> Text -> Maybe Text
wordFilter badWord goodWord ignoreCase word
  | stringsAreEqual ignoreCase word badWord = Just goodWord
  | otherwise = Nothing
  where
    caseAdjusted = adjustCase badWord word goodWord ignoreCase

nullFilter :: Text -> Maybe Text
nullFilter = Just

createFilter :: FilterSpec -> Filter
createFilter spec = createFilterHelper (badPattern spec) (goodWord spec) True
  where
    createFilterHelper spec goodWord ignoreCase
      | (T.last spec) == '!' = createFilterHelper (T.init spec) goodWord False
      | (T.last spec) == '*' = prefixFilter (T.init spec) goodWord ignoreCase
      | (T.head spec) == '*' = suffixFilter (T.tail spec) goodWord ignoreCase
      | otherwise = wordFilter spec goodWord ignoreCase

createFilters specs = (map createFilter specs) ++ [nullFilter]

applyFilters :: Filters -> Text -> Text
applyFilters filters word = fromJust activeFilter
  where
    filteredWords = map ($ word) filters
    activeFilter = fromJust $ find isJust filteredWords 

filterWords :: Filters -> Text -> Text
filterWords filters text = gsub wordRegex sanitiseWord text
  where
    sanitiseWord word = applyFilters filters word
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900