Click here to Skip to main content
15,885,278 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
Hi,

I've spent quite some time writing a syntax highlighter using Regex.

These are the permitted matches:

1) String|int|boolean and then a space (for variable declaration eg. String myString)
2) FilterableList<String|int|boolean>
3) open bracket, either nothing or (something then a space) "a quote" (either nothing or a space then something) close bracket eg. (myString + "hello")
4) A space, then +|-|*|/|==|&& or || and then a space eg. 1 + 2
5) ( or )
6) Any number 0-9

This almost works perfectly but there are a few issues. See Regex Syntax Highlighter - Album on Imgur[^]

1) The entire quote, including '"' should be yellow.
2) FilterableList<> should be blue, what's inside should be green if int|String|boolean
3) One of the quotes has its colour completely offset
4) The ) isn't red.

It seems to be the case that certain matches interfere with others, any ideas how to overcome this?
VB
Private Sub updateCodeSyntaxHighlighting()
        allowCodeInput = False

        Dim prefix As String = "(?<return>"
        Dim suffix As String = ")( .*)?(?!.)"

        Dim listMatches As New Regex(prefix & "FilterableList<.*>" & suffix)
        Dim variableMatches As New Regex(prefix & "(int|String|boolean)|<(?<return>int|String|boolean)>" & suffix)
        Dim quoteMatches As New Regex("\((.* )?" & prefix & """.*"")( .*)?\)")
        Dim symbolMatches As New Regex(prefix & "( ([+\-/*]|==|&&|\|\|) )|\(|\))")
        Dim numberMatches As New Regex(prefix & "[0-9])")

        Dim selPos As Integer = codeEditorBox.SelectionStart
        Dim selPos2 As Integer = codeEditorBox.GetFirstCharIndexOfCurrentLine

        If Not codeEditorBox.Text = "" Then
            codeEditorBox.Select(0, codeEditorBox.Lines(codeEditorBox.GetLineFromCharIndex(selPos)).Length)
            codeEditorBox.SelectionStart = selPos2
            codeEditorBox.SelectionColor = codeEditorBox.ForeColor
            codeEditorBox.SelectionStart = selPos
        End If

        For Each match As Match In listMatches.Matches(codeEditorBox.Text)
            highlight(match, selPos, Color.FromArgb(64, 255, 255))
        Next

        For Each match As Match In variableMatches.Matches(codeEditorBox.Text)
            highlight(match, selPos, Color.FromArgb(64, 255, 64))
        Next

        For Each match As Match In quoteMatches.Matches(codeEditorBox.Text)
            highlight(match, selPos, Color.FromArgb(255, 255, 64))
        Next

        For Each match As Match In symbolMatches.Matches(codeEditorBox.Text)
            highlight(match, selPos, Color.FromArgb(255, 64, 64))
        Next

        For Each match As Match In numberMatches.Matches(codeEditorBox.Text)
            highlight(match, selPos, Color.FromArgb(50, 155, 155))
        Next

        codeEditorBox.SelectionLength = 0

        allowCodeInput = True
    End Sub

    Private Sub highlight(match As Match, selPos As Integer, c As Color)
        Dim index As Integer = match.Index
        Dim length As Integer = match.Result("${return}").Length

        codeEditorBox.Select(index, length)
        codeEditorBox.SelectionColor = c
        codeEditorBox.SelectionStart = index + length
        codeEditorBox.SelectionColor = codeEditorBox.ForeColor
        codeEditorBox.SelectionStart = selPos
    End Sub


What I have tried:

Multiple regex generators
-------------------------
Posted
Updated 13-Jun-17 19:47pm
v6

I once did something like this in C# for SQL code, maybe this will be of use to you.
The full source code can be found on CodeProject: SMO Tutorial 3 of n - Scripting[^]
Note that newlines are replaced by spaces in the temporary string for searching, as newlines are problematic for regex searches.
C#
/// <summary>
/// Syntax coloring for rich text box.
/// </summary>
/// <param name="colorComments">Also color comments green.</param>
private void SyntaxColoring(bool colorComments)
{
    this.richTextScript.Focus();
    this.richTextScript.SelectionStart = 0;
    this.richTextScript.SelectionLength = 0;
    string richText = this.richTextScript.Text.Replace("\n", " ");

    // Check for keywords and apply blue color.
    foreach (var keyword in this.Keywords)
    {
        var reg = Regex.Matches(richText, @"([ ,()-])" + keyword + @"([ ,()-])");

        foreach (Match match in reg)
        {
            this.richTextScript.SelectionStart = match.Index;
            this.richTextScript.SelectionLength = match.Length;
            this.richTextScript.SelectionColor = Color.Blue;
            this.richTextScript.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
        }
    }

    // Find strings and apply red color.
    foreach (Match match in Regex.Matches(richText, @"([""'])(?:(?=(\\?))\2.)*?\1"))
    {
        this.richTextScript.SelectionStart = match.Index;
        this.richTextScript.SelectionLength = match.Length;
        this.richTextScript.SelectionColor = Color.IndianRed;
        this.richTextScript.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
    }

    // Check for comments and apply green color.
    if (colorComments)
    {
        foreach (Match match in Regex.Matches(richText, @"/\*.*?\*/"))
        {
            this.richTextScript.SelectionStart = match.Index;
            this.richTextScript.SelectionLength = match.Length;
            this.richTextScript.SelectionColor = Color.Green;
            this.richTextScript.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
        }
    }
}
 
Share this answer
 
v2
Comments
[no name] 13-Jun-17 13:30pm    
I'll definitely have a look into it, thank you!
[no name] 13-Jun-17 14:20pm    
I have an issue as you can see here https://www.debuggex.com/r/c7IRbxMbAlRz0-dN

It's not returning just int|String|boolean like I want it to, it's returning the entire string.
RickZeeland 13-Jun-17 15:05pm    
I find debuggex a bit confusing, but this seems to work for strings:
([""'])(?:(?=(\\?))\2.)*?\1
RickZeeland 13-Jun-17 15:24pm    
And this for your keywords, as long as they are surrounded by space or special chars:
([ ,()-])int|bool|String([ ,()-])
[no name] 13-Jun-17 15:26pm    
I'm really struggling here.

For Each keyword As String In variableTypes
For Each match As Match In New Regex("((.* |(?<!.))" & keyword & "|(?<=.)<" & keyword & ">)( .*|(?!.))").Matches(codeEditorBox.Text)
highlight(match, selPos, Color.FromArgb(64, 255, 64))
Next
Next

Private Sub highlight(match As Match, selPos As Integer, c As Color)
Dim index As Integer = match.Index
Dim length As Integer = match.Length

codeEditorBox.Select(index, length)
codeEditorBox.SelectionColor = c
codeEditorBox.SelectionStart = index + length
codeEditorBox.SelectionColor = codeEditorBox.ForeColor
codeEditorBox.SelectionStart = selPos
End Sub

If I run this on "String goaway", it returns "String goaway" as opposed to just "String", which I want.
Solved! I ended up using a combination of Regex and good old programming logic.

The issue was the order in which I was colouring the different strings. If you colour different parts of a richtextbox in such a way that SelectionStart is not greater than the last, some parts lose their colour. That's what confused me - as there was no colour on certain parts I assumed the Regex hadn't captured things correctly.

To fix the ordering issue I stored the index and a SyntaxHighlightOperation holding the other necessary values for my highlight function in a SortedDictionary. That effectively sorted the highlighting operations by index, completely resolving the colouring issue.

Here's the result: [^]

And here's the final code:
VB
Dim operations As New SortedDictionary(Of Integer, SyntaxHighlightOperation)

        Dim code As String = codeEditorBox.Text

        Dim anything As String = "[\(\);]"
        Dim codeElements As String = "Server|Files"
        Dim variableTypes As String = "string|int|bool|dec"
        Dim enumerations As String = "SyntaxRating|Mood"

        For Each match As Match In New Regex(anything).Matches(code)
            operations.Add(match.Index, New SyntaxHighlightOperation(match.Index, 1, selPos, SyntaxColours.Symbol))
        Next

        For Each match As Match In New Regex(variableTypes).Matches(code)
            If code.hasAnyAround(match.Value, match.Index, New String()() {
                                 New String() {"""", """"},
                                 New String() {"(", ")"}
                                 }) Then
                operations.Add(match.Index - 1, New SyntaxHighlightOperation(match.Index - 1, match.Value.Length + 2, selPos, SyntaxColours.Variable))
            ElseIf code.hasAround(match.Value, match.Index, New String() {"List<", ">"}) AndAlso code.hasAround(match.Value, match.Index, New String() {"<", ">"}) Then
                operations.Add(match.Index - 5, New SyntaxHighlightOperation(match.Index - 5, match.Value.Length + 2, selPos, SyntaxColours.List))
                operations.Add(match.Index, New SyntaxHighlightOperation(match.Index, match.Value.Length, selPos, SyntaxColours.Variable))
                operations.Add(match.Index + match.Value.Length, New SyntaxHighlightOperation(match.Index + match.Value.Length, 1, selPos, SyntaxColours.List))
            End If
        Next

        For Each match As Match In New Regex(codeElements).Matches(code)
            operations.Add(match.Index, New SyntaxHighlightOperation(match.Index, match.Value.Length, selPos, SyntaxColours.CodeElement))
        Next

        For Each match As Match In New Regex(enumerations).Matches(code)
            If code.hasAfter(match.Value, match.Index, ".") Then
                operations.Add(match.Index, New SyntaxHighlightOperation(match.Index, match.Value.Length, selPos, SyntaxColours.Enumeration))
            End If
        Next

        ''''''''''''''''''''''''''''''''''

        For Each operation As SyntaxHighlightOperation In operations.Values
            highlight(operation.getIndex(), operation.getLength(), operation.getSelPos(), operation.getColor())
        Next


I also made an extensions class to help with the logic for requiring certain strings before, after or before and after the match:

VB
Imports System.Runtime.CompilerServices

Module ExtensionMethods

    <Extension()>
    Public Function hasAround(code As String, text As String, index As Integer, around() As String) As Boolean
        Dim before As String = around(0)
        Dim after As String = around(1)

        If index - before.Length < 0 OrElse index + text.Length = code.Length Then Return False

        Return code.Substring(index - before.Length, before.Length) = before AndAlso code.Substring(index + text.Length, after.Length) = after
    End Function

    <Extension()>
    Public Function hasAnyAround(code As String, text As String, index As Integer, around()() As String) As Boolean
        For Each a As String() In around
            Dim before As String = a(0)
            Dim after As String = a(1)

            If index - before.Length < 0 OrElse index + text.Length >= code.Length Then Return False
            If code.Substring(index - before.Length, before.Length) = before AndAlso code.Substring(index + text.Length, after.Length) = after Then Return True
        Next

        Return False
    End Function

    <Extension()>
    Public Function hasBefore(code As String, text As String, index As Integer, before As String) As Boolean
        If index - before.Length < 0 Then Return False
        Return code.Substring(index - before.Length, before.Length) = before
    End Function

    <Extension()>
    Public Function hasAfter(code As String, text As String, index As Integer, after As String) As Boolean
        If index + text.Length >= code.Length Then Return False
        Return code.Substring(index + text.Length, after.Length) = after
    End Function

End Module
 
Share this answer
 
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900