Click here to Skip to main content
15,867,568 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am parsing some text and need to look within an individual sentence (already parsed).

I am looking to pull a numeric range. The problem is, the range is identified with a "dash", a "dash with spaces", or simply the word "to" or "through" and sometimes includes a degree symbol on one or both numbers. It also may have an additional character after the degree symbol (which is not always a standard degree symbol. The actual numbers can be anything....

Examples:
HTML
30-70°
30 - 70°
30°-70°
30°C-70°C
30° to 70° C
30°C-70°C
30 to 70°C

etc, any combination of the examples.

Also, sometimes the "degree" symbol is not compliant.

The sentence may also include two ranges, such as:
HTML
30°C-70°C (68°F-85°F)

I am stumped, but the users don't seem to understand why this is so hard.
Again, it could be 3° to 8° just as well, it is just a number range.

Any hard-core experts know where to start? A START would be great even if the entire scenario is unclear or not doable. If I can cull the low-hanging fruit I can convince the managers that we simply cannot automate the entire process.....

Let me add: it get's even more complicated: it may say Temperature above/below 70° or not to exceed... but I really want the range if it is available. Meaning, the sentence may have only ONE number and no range, and that number may be expressed in both F and C. Help me almighty coding people.....

Buddy
Posted
Updated 22-May-13 11:20am
v4
Comments
_Damian S_ 15-May-13 1:51am    
Where is the data coming from? This is less a regex issue and more like a data issue. Parsing really bad data will yield reasonable results if it follows one of several formats, but completely random text is almost impossible to deal with.

Are you stuck with the data coming through however it comes, or can you enforce better data entry at another point in the system to make your life easier?
Buddy Farrell 15-May-13 2:21am    
Thanks Damian,
I did not notice the reply option so there are some comments and an excellent suggestion by another member!
Buddy Farrell 15-May-13 1:56am    
Damian,
The data is coming from XML documents out of my control. This is not an issue of poor internal data management. I need to parse these documents and try to make sense of them. The source is the US government.
Buddy Farrell 15-May-13 2:02am    
I see it as completely a regEx issue, if it can be tackled at all.
Prasad Khandekar 15-May-13 2:06am    
Hello,

Damian is right. However if incoming data format is not in your control then for whatever sample data you have provided you can use following regular expression.

(\d{2})(?:°?\s?[C|F]?)(?:\s?(-|to)\s?)(\d{2})(?:°\s?[C|F]?)

You may want to use an excellent regular expression testing tool called Expresso available at http://www.ultrapico.com.
Regards,

1 solution

Test it:
VB
'need reference to MS VBScript Regular Expressions 5.5
Sub CheckDegrees()
Dim sTmp As String, sPattern As String
Dim iCounter As Integer
Dim oRegex As VBScript_RegExp_55.RegExp
Dim oMatch As VBScript_RegExp_55.Match
Dim oMatchColl As VBScript_RegExp_55.MatchCollection

On Error GoTo Err_CheckDegrees

sTmp = "30-70°" & vbCr
sTmp = sTmp & "30 - 70°" & vbCrLf
sTmp = sTmp & "30°-70°" & vbCrLf
sTmp = sTmp & "30°C-70°C" & vbCrLf
sTmp = sTmp & "30° to 70° C" & vbCrLf
sTmp = sTmp & "30°C-70°C" & vbCrLf
sTmp = sTmp & "30 to 70°C"

sPattern = "(\d{1,})(°?\s?[C|F]?)(\s?)(-|to)(\s?)(\d{1,2})(°\s?[C|F]?)"

Set oRegex = New VBScript_RegExp_55.RegExp
With oRegex
    .Pattern = sPattern
    .MultiLine = True
    .Global = True
    .IgnoreCase = False
    Set oMatchColl = .Execute(sTmp)
    
    For Each oMatch In oMatchColl
        iCounter = iCounter + 1
        MsgBox oMatch.Value & vbCr, vbInformation, iCounter
    Next
End With

Exit_CheckDegrees:
    On Error Resume Next
    Set oMatchColl = Nothing
    Set oMatch = Nothing
    Set oRegex = Nothing
    Exit Sub
    
Err_CheckDegrees:
    MsgBox Err.Description, vbExclamation, Err.Number
    Resume Exit_CheckDegrees

End Sub
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900