|
BillWoodruff wrote: (code on request)
Codes Plz so I can do my homework assignment.
|
|
|
|
|
JimmyRopes wrote: Codes Plz To hear is to obey, Master: [^] *
* plain-vanilla text file
“But I don't want to go among mad people,” Alice remarked.
“Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.”
“How do you know I'm mad?” said Alice.
“You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
|
|
|
|
|
Ahhhhhhhhh. I knew I used RegEx for a reason.
I have been using RegEx since my first stint at Bell Laboratories in the late 1970's. If you couldn't "grep" you had no street cred.
Since then I have used RegEx on many platforms and in many languages. I don't even think about searching for patterns any other way.
I know people who despise RegEx but it is to their detrement. They will have to code and debug a lot of code to do any kind of complex pattern recognition when they could use a RegEx and be done with it.
As for the efficiency that will depend on whether the RegEx engine is context free NFA (Nondeterministic Finite Automaton) or context sensitive DFA (Deterministic Finite Automaton).
In general context free is slower at complex pattern recognition but much easier to implement. Unless your application is time critical you will not notice any serious delay in processing as a result of using RegEx and they are easier to implement and debug. That is why I use them wherever I need to parse data.
Just my opinion.
modified 6-Feb-14 22:10pm.
|
|
|
|
|
BillWoodruff wrote: I believe learning, and mastering, something new is one of the very best things in life !
Learning is often strapped to pain, and effort.
Mastering the learnt stuff is joy!
BillWoodruff wrote: Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is:
1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset.
2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset.
It's all specified like that. [-->]
BillWoodruff wrote: If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple.
I don't have any control over the formatting, I only know the constraints of the allowed values.
By the way, I wrote some unit tests to compare the performance between split and RegEx: A split does not providea measeruable performance improvement, compared to the Regular Expression. What I really like about the RegEx solutions that the parsing method becomes a lot shorter, and is therefore more readable (see the method public bool FromString(string dateTime) in the code sample below:
I wrote the following code so far:
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace Springlog.Com.Messaging
{
public class SyslogTimestamp
{
#region Properties
static short GetDayMaxDaysOfMonth(int month, int year)
{
if (month != 2)
{
if (month == 1 || month == 3 || month == 5 || month == 7 || month == 8 || month == 10 || month == 12)
{
return 31;
}
else
{
return 30;
}
}
else
{
if (IsLeapYear(year))
{
return 29;
}
return 28;
}
}
private static bool IsLeapYear(int year)
{
if (year % 4 == 0)
{
if (year % 100 == 0)
{
if (year % 400 == 0)
{
return true;
}
return false;
}
return true;
}
return false;
}
int year;
public int Year
{
get { return year; }
}
int month;
public int Month
{
get { return month; }
}
int dayOfMonth;
public int DayOfMonth
{
get { return dayOfMonth; }
}
int hours;
public int Hours
{
get { return hours; }
}
int minutes;
public int Minutes
{
get { return minutes; }
}
int seconds;
public int Seconds
{
get { return seconds; }
}
int miliseconds;
public int Miliseconds
{
get { return miliseconds; }
}
double utcOffset;
public double UtcOffset
{
get { return utcOffset; }
set { utcOffset = value; }
}
#endregion
public SyslogTimestamp()
{
Reset();
}
public SyslogTimestamp(string timestamp)
{
Reset();
FromString(timestamp);
}
public SyslogTimestamp(DateTime timestamp)
{
Reset();
FromDateTime(timestamp);
}
private void Reset()
{
miliseconds = 0;
seconds = 0;
minutes = 0;
hours = 0;
dayOfMonth = 0;
month = 0;
year = 0;
utcOffset = 0;
}
public bool FromString(string dateTime)
{
Reset();
Regex splitRegex = new Regex(@"([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)", RegexOptions.IgnoreCase);
Match timestamp = splitRegex.Match(dateTime);
if (timestamp.Groups.Count > 1)
{
AddYears(Int32.Parse(timestamp.Groups[1].Value));
AddMonths(Int32.Parse(timestamp.Groups[2].Value));
AddDays(Int32.Parse(timestamp.Groups[3].Value));
AddHours(Int32.Parse(timestamp.Groups[4].Value));
AddMinutes(Int32.Parse(timestamp.Groups[5].Value));
AddSeconds(Int32.Parse(timestamp.Groups[6].Value));
AddMilliseconds(Int32.Parse(timestamp.Groups[7].Value));
utcOffset = Double.Parse(timestamp.Groups[8].Value, CultureInfo.InvariantCulture);
return true;
}
else
{
return false;
}
}
public string ToFormattedString()
{
string timezonePreSign = "+";
if (utcOffset < 0)
{
timezonePreSign = "";
}
return string.Format("{0}-{1}-{2}T{3}:{4}:{5}.{6}Z{7}{8}", year, month, dayOfMonth, hours, minutes, seconds, miliseconds, timezonePreSign, utcOffset.ToString().Replace(',','.'));
}
public DateTime ToDateTime()
{
DateTimeKind dateTimeKind;
if(utcOffset == 0)
{
dateTimeKind = DateTimeKind.Utc;
}
else
{
dateTimeKind = DateTimeKind.Local;
}
return new DateTime(year, month, dayOfMonth, hours, minutes, seconds, miliseconds, dateTimeKind);
}
public override string ToString()
{
return this.ToFormattedString();
}
public void FromDateTime(DateTime timestamp)
{
TimeSpan utcOffset = TimeZone.CurrentTimeZone.GetUtcOffset(timestamp);
this.utcOffset = utcOffset.Hours + (utcOffset.Minutes / 100);
AddMilliseconds(timestamp.Millisecond);
AddSeconds(timestamp.Second);
AddMinutes(timestamp.Minute);
AddHours(timestamp.Hour);
AddDays(timestamp.Day);
AddMonths(timestamp.Month);
AddYears(timestamp.Year);
}
public void Add(TimeSpan timeSpan)
{
AddMilliseconds(timeSpan.Milliseconds);
AddSeconds(timeSpan.Seconds);
AddMinutes(timeSpan.Minutes);
AddHours(timeSpan.Hours);
AddDays(timeSpan.Days);
}
public void AddMilliseconds(int val)
{
if (val + miliseconds >= 1000)
{
AddSeconds( val / 1000);
miliseconds = (val % 1000);
}
else
{
miliseconds += val;
}
}
private void AddSeconds(int val)
{
if (seconds + val >= 60)
{
AddMinutes(val / 60);
seconds = (val % 60);
}
else
{
seconds += val;
}
}
private void AddMinutes(int val)
{
if (minutes + val >= 60)
{
AddMinutes(val / 60);
minutes = (val % 60);
}
else
{
minutes += val;
}
}
private void AddHours(int val)
{
if (hours + val >= 24)
{
AddDays (val / 24);
hours = (val % 24);
}
else
{
hours += val;
}
}
private void AddDays(int val)
{
short dayCount = GetDayMaxDaysOfMonth(month, year);
if (dayOfMonth + val > dayCount)
{
AddMonths(val / dayCount);
dayOfMonth = (val % dayCount);
}
else
{
dayOfMonth += val;
}
}
private void AddMonths(int val)
{
if (month + val > 12)
{
AddYears(val / 12);
month = (val % 12);
}
else
{
month += val;
}
}
private void AddYears(int val)
{
year += val;
}
public override bool Equals(object o)
{
if (o is SyslogTimestamp)
{
return DoMatch((SyslogTimestamp)o, this);
}
else
{
return false;
}
}
public static bool DoMatch(SyslogTimestamp a, SyslogTimestamp b)
{
bool doMatch = (a.year == b.Year)
&& (a.month == b.Month)
&& (a.dayOfMonth == b.DayOfMonth)
&& (a.hours == b.Hours)
&& (a.minutes == b.Minutes)
&& (a.seconds == b.Seconds)
&& (a.miliseconds == b.Miliseconds)
&& (a.utcOffset == b.UtcOffset);
return doMatch;
}
public static bool operator ==(SyslogTimestamp a, SyslogTimestamp b)
{
if (System.Object.ReferenceEquals(a, b))
{
return true;
}
if (((object)a == null) || ((object)b == null))
{
return false;
}
return DoMatch(a, b);
}
public static bool operator !=(SyslogTimestamp a, SyslogTimestamp b)
{
return !(a == b);
}
public static bool operator <(SyslogTimestamp a, SyslogTimestamp b)
{
if (a.Year < b.Year) { return true; }
if (a.Year > b.Year) { return false; }
if (a.Month < b.Month) { return true; }
if (a.Month > b.Month) { return false; }
if (a.DayOfMonth < b.DayOfMonth) { return true; }
if (a.DayOfMonth > b.DayOfMonth) { return false; }
if ((a.Hours + a.UtcOffset) < (b.Hours + b.UtcOffset)) { return true; }
if ((a.Hours + a.UtcOffset) > (b.Hours + b.UtcOffset)) { return false; }
if (a.Minutes < b.Minutes) { return true; }
if (a.Minutes > b.Minutes) { return false; }
if (a.Seconds < b.Seconds) { return true; }
if (a.Seconds > b.Seconds) { return false; }
if (a.Miliseconds < b.Miliseconds) { return true; }
if (a.Miliseconds > b.Miliseconds) { return false; }
return false;
}
public static bool operator >(SyslogTimestamp a, SyslogTimestamp b)
{
if (a.Year > b.Year) { return true; }
if (a.Year < b.Year) { return false; }
if (a.Month > b.Month) { return true; }
if (a.Month < b.Month) { return false; }
if (a.DayOfMonth > b.DayOfMonth) { return true; }
if (a.DayOfMonth < b.DayOfMonth) { return false; }
if ((a.Hours + a.UtcOffset) > (b.Hours + b.UtcOffset)) { return true; }
if ((a.Hours + a.UtcOffset) < (b.Hours + b.UtcOffset)) { return false; }
if (a.Minutes > b.Minutes) { return true; }
if (a.Minutes < b.Minutes) { return false; }
if (a.Seconds > b.Seconds) { return true; }
if (a.Seconds < b.Seconds) { return false; }
if (a.Miliseconds > b.Miliseconds) { return true; }
if (a.Miliseconds < b.Miliseconds) { return false; }
return false;
}
public static bool operator >=(SyslogTimestamp a, SyslogTimestamp b)
{
return (a > b) || DoMatch(a, b);
}
public static bool operator <=(SyslogTimestamp a, SyslogTimestamp b)
{
return (a < b) || DoMatch(a, b);
}
public override int GetHashCode()
{
return base.GetHashCode();
}
}
}
Which is checked by the following Unit Tests:
using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Springlog.Com.Messaging;
namespace Springlog.Com.Messaging.UnitTests
{
[TestClass]
public class SyslogTimestampUnitTest
{
[TestMethod]
public void SyslogTimestampComparisonIntegrity()
{
SyslogTimestamp timestamp1 = new SyslogTimestamp();
SyslogTimestamp timestamp2 = new SyslogTimestamp();
SyslogTimestamp timestamp3 = new SyslogTimestamp();
SyslogTimestamp timestamp4 = new SyslogTimestamp();
timestamp1.FromDateTime(DateTime.Now);
timestamp2.FromDateTime(DateTime.Now.AddDays(-1));
timestamp3 = timestamp1;
timestamp4.FromDateTime(DateTime.Now.AddMinutes(2));
Assert.AreEqual(true, (timestamp1 > timestamp2));
Assert.AreEqual(true, (timestamp4 > timestamp1));
Assert.AreEqual(false, (timestamp1 < timestamp2));
Assert.AreEqual(true, (timestamp1 >= timestamp2));
Assert.AreEqual(false, (timestamp1 <= timestamp2));
Assert.AreEqual(true, (timestamp1 == timestamp3));
Assert.AreEqual(false, (timestamp1 != timestamp3));
Assert.AreEqual(false, (timestamp1 == timestamp4));
Assert.AreEqual(false, (timestamp1 == timestamp2));
}
[TestMethod]
public void SyslogTimestampStringConversionIntegrity()
{
DateTime now = DateTime.Now;
SyslogTimestamp timestamp = new SyslogTimestamp();
SyslogTimestamp timestamp2 = new SyslogTimestamp();
timestamp.FromDateTime(now);
timestamp.UtcOffset = 1;
Assert.AreEqual(true, timestamp2.FromString(timestamp.ToFormattedString()));
Assert.AreEqual(true, (timestamp2 == timestamp));
timestamp.UtcOffset = 1.5;
Assert.AreEqual(true, timestamp2.FromString(timestamp.ToFormattedString()));
Assert.AreEqual(true, (timestamp2 == timestamp));
}
[TestMethod]
public void SyslogTimestampDateTimeConversionIntegrity()
{
DateTime now = DateTime.Now;
SyslogTimestamp timestamp = new SyslogTimestamp();
timestamp.FromDateTime(now);
DateTime timestampDateTime = timestamp.ToDateTime();
Assert.AreEqual(now.Year, timestampDateTime.Year);
Assert.AreEqual(now.Month, timestampDateTime.Month);
Assert.AreEqual(now.Day, timestampDateTime.Day);
Assert.AreEqual(now.Hour, timestampDateTime.Hour);
Assert.AreEqual(now.Minute, timestampDateTime.Minute);
Assert.AreEqual(now.Second, timestampDateTime.Second);
Assert.AreEqual(now.Millisecond, timestampDateTime.Millisecond);
}
}
}
Clean-up crew needed, grammar spill... - Nagy Vilmos
|
|
|
|
|
Hi Marco,
I'm enjoying reading your novel-in-code , and I looked at the RFC5424 spec which I consider brain-damaged.
I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered.
What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ?
You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere.
cheers, Bill
“But I don't want to go among mad people,” Alice remarked.
“Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.”
“How do you know I'm mad?” said Alice.
“You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
|
|
|
|
|
BillWoodruff wrote: and I looked at the RFC5424 spec which I consider brain-damaged.
Not at all - The timestamp may be completly brain-damaged, but at least it is clearly specified.
BillWoodruff wrote: I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered.
That's in fact brain-damaged.
BillWoodruff wrote: What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ?
As far as I can recall, the spec was written by Rainer Gerhards as the only author.
Which explains the complexity, that guy makes a fortune doing consulting for it (at least I suspect it).
BillWoodruff wrote: You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere.
Not if the author can make consultant money from it.
Clean-up crew needed, grammar spill... - Nagy Vilmos
|
|
|
|
|
|
Marco, OriginalGriff and I recommend Expresso. You might want to add RegexBulder[^] as well to your regex toolbox. RegexBuilder lets you try your expression on multiple input strings which makes it very useful for testing your expression.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
Marco Bertschi wrote: (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings) be the coolest boy in the sandbox.
~RaGE();
I think words like 'destiny' are a way of trying to find order where none exists. - Christian Graus
Do not feed the troll ! - Common proverb
|
|
|
|
|
Marco Bertschi wrote: And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
You didn't need to tell us what it will parse, it's immediatelly obvious what the regex is for. Anyone can see that.
To alcohol! The cause of, and solution to, all of life's problems - Homer Simpson
----
Our heads are round so our thoughts can change direction - Francis Picabia
|
|
|
|
|
Nothing beautiful, looks like a dump. [0-9] is just a \d.
|
|
|
|
|
You could simplify/shorten it by replacing each instance of [0-9] with \d and getting rid of [] around single characters, though I suppose that could make it less readable.
|
|
|
|
|
|
I have managed to avoid this convoluted mess for over 30 years. And with a little luck, when i die i will STILL not speak regex.
.net string methods work just fine for me. About as fast for 90% of your needs and more readable for 100% of your needs.
The 2 or 3 times i have actually needed the power of regex in 30 years, i just sub'd out that line of code. And rather than learning/debugging/banging head .... I went to the beach swilling cheap whiskey and chasing cheaper women.
My advice to those who dont yet know regex....impress your peers with cheap whiskey and women and forget about the regex. Both make your head hurt....but one is less fun.....
|
|
|
|
|
I didn't found RegEx to be too hard, after finding out the basic principles at least. It's the same as with every new thing: Learning by doing!
Clean-up crew needed, grammar spill... - Nagy Vilmos
|
|
|
|
|
I think this site is the best tutorial/guide to regular expressions:
http://www.regular-expressions.info/tutorial.html
Clear descriptions of everything from simple to complex.
---
I like this site for testing expressions:
http://www.regexr.com/
It's an online testing tool (plus a downloadable version if you like). Paste in your expression, paste in your text, see where it matches, with analysis, hover over matches to see groups.
|
|
|
|
|
You can probably compress it a little. If you are matching a single character, then you don't need []:
[0] => 0
[:] => :
[0][1][0-2][\.]? => 01[0-2]\.?
.? I think needs to be quoted where you mean ".": . => \.?
Also, you can often use meta characters to save some finger work:
[0-9] => \d
[0-9]{4} => \d{4}
Also, some of the expressions don't need to be bracketed:
([0-9]{4}) => \d{4}
You only need the brackets if you want the parser to be able to treat it as a unit in larger ops.
Just some ideas. Nice work though!
|
|
|
|
|
One approach I have used with varying levels of utility is to treat the regex pattern like an algebraic expression. Factor out common terms and assign them to variables. Use replication counts, etc. This can simplify the final result and make it more maintainable.
Your pattern can be simplified a bit and made more readable somewhat by this. I found other strings like valid IP addresses can be greatly simplified this way.
Maybe this is useful to you!?
"Courtesy is the product of a mature, disciplined mind ... ridicule is lack of the same - DPM"
|
|
|
|
|
Has any looked at the MSDN pricing lately?
I do some MS Office (excel) integration for my customers so it requires that I get their version of MS Office (all the versions; 97 to 2013) to program against. To have access to Office I need to upgrade from the MSDN Pro to MSDN Premium... which costs $5000 more!
Can anyone explain to me how they can justify charging developers $5000 to integrate with their products? I'm not writing the president's speeches using these installs. I'm writing software so my customers will continue to use MS products and purchase licenses.
I don't understand. What am I missing... besides the money to do it?
Joel Palmer
Data Integration Engineer
|
|
|
|
|
Nope, you didn't miss anything, it's the money.
It was broke, so I fixed it.
|
|
|
|
|
Joel Palmer wrote: I don't understand. What am I missing...
It's MS missing common sense.
Clean-up crew needed, grammar spill... - Nagy Vilmos
|
|
|
|
|
Joel Palmer wrote: how they can justify charging developers $5000 It's called business.
Veni, vidi, abiit domum
|
|
|
|
|
You've missed the point. Developers are MS life blood. If I'm buying a MSDN subscription then I'm a pretty serious developer. As a developer I keep my customers coming back for more when I meet their requirements and if I do that using MS technology then the customers buy more MS licenses.
Good business would be to let developers increase the MS bottom line by having them using more MS products in their solutions. Pricing their products out of a developer's solutions does not make any business sense.
Joel Palmer
Data Integration Engineer
|
|
|
|
|
You are making a couple of faulty assumptions: that MS care about developers, and the MS care about end users. They don't: they care about money. And the large companies they expect to sell the top-end product to will just pay up and move on, because it isn't the people signing the purchase order's money. While there are people who will pay it, they will continue to gouge charge.
Have to considered buying second hand copies of office on FleaBay to complete your collection that way, probably cost less than $500 for the lot...
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
OriginalGriff wrote: You are making a couple of faulty assumptions: that MS care about developers, and the MS care about end users. Er, no, I didn't make those assumptions anywhere.
Veni, vidi, abiit domum
|
|
|
|
|