OriginalGriff's solution is the perfect solution to this question. I'm just posting another "answer" here as it allows me to format the code I'm about to post.
My original intent was to DEAL with such strings, basically finding a way to remove such "garbage" from the string without affecting the string too much.
This was causing an issue writing to XML. The following would fail:
Dim sFoo As String = "a" + ChrW(&HD800) + "b"
Dim oXML As System.Xml.XmlDocument
Dim oRoot As System.Xml.XmlElement
oXML = New System.Xml.XmlDocument
oRoot = oXML.CreateElement("foo")
oXML.AppendChild(oRoot)
oRoot.InnerText = sFoo
oXML.Save(System.IO.Path.Combine(System.IO.Path.GetTempPath, "foo.xml"))
To fix it - or at least prevent the error -
Dim sFoo As String = "a" + ChrW(&HD800) + "b"
Try
If Not sFoo.IsNormalized Then
sFoo = sFoo.Normalize
End If
Catch ex As Exception
Dim bytes As Byte()
bytes = System.Text.Encoding.Unicode.GetBytes(sFoo)
sFoo = System.Text.Encoding.Unicode.GetString(bytes)
End Try
Dim oXML As System.Xml.XmlDocument
Dim oRoot As System.Xml.XmlElement
oXML = New System.Xml.XmlDocument
oRoot = oXML.CreateElement("foo")
oXML.AppendChild(oRoot)
oRoot.InnerText = sFoo
oXML.Save(System.IO.Path.Combine(System.IO.Path.GetTempPath, "foo.xml"))
To the reader: Now this replaces the "garbage" with other characters that a human reader might perceive as "random" or nonsensical. For
my intents and purposes that is perfectly fine, but if you need to do better than that you'll have to address that problem yourself :-)