Click here to Skip to main content
15,891,567 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
i am having trouble splitting a string in c# have a string

start and dffdfdddddddfd<m>one</m><m>two</m><m>three</m><m>four</m>dbfjnbjvbnvbnjvbnv and end


and I want to extract the text between <m> and </m> and i need 3 output :

output 1 : one two three four

output 2 : four

output 3 : one



what do i do ?
how would I do this?
please give me a sample code
please help.
thanks and regards.
Posted
Updated 8-Oct-11 20:33pm
v2

Use a Regex:
(?<=\<m\>)[^\<]+(?=\)

C#
//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Sun, Oct 9, 2011, 07:37:07 AM
///  Using Expresso Version: 3.0.3634, http://www.ultrapico.com
///
///  A description of the regular expression:
///
///  Match a prefix but exclude it from the capture. [\<m\>]
///      \<m\>
///          Literal <
///          m
///          Literal >
///  Any character that is NOT in this class: [\<], one or more repetitions
///  Match a suffix but exclude it from the capture. [\</m\>]
///      \</m\>
///          Literal <
///          /m
///          Literal >
///
///
/// </summary>
public static Regex regex = new Regex("(?<=\\<m\\>)[^\\<]+(?=\\</m\\>)",
    RegexOptions.CultureInvariant | RegexOptions.Compiled);

// Capture all Matches in the InputText
MatchCollection ms = regex.Matches(InputText);



Get a copy of Expresso [^] - it's free, and it examines and generates Regular expressions. I really wish I'd written it!
 
Share this answer
 
Comments
Mehdi Gholam 9-Oct-11 2:42am    
You beat me to it, 5!

mine was : (\<m\>)(?<m>\w*)(\<\/m\>)
OriginalGriff 9-Oct-11 4:23am    
:laugh:
I used "[^\<]+" instead of "\w*" so that punctuation would be included: probably I should have used ".*?" instead, but that is harder for a beginner to read and understand!
Have a look at the "(?<=)" and "(?=)" groups - they are non-capturing prefix and suffix - handy sometimes as they reduce the match collection to just the relevant data. :)
Mehdi Gholam 9-Oct-11 4:28am    
nice to know, thanks
OriginalGriff 9-Oct-11 4:31am    
You're welcome - there is so much stuff in Regexes it's way too easy to miss bits! (I do it all the time) :laugh:
BillWoodruff 9-Oct-11 9:14am    
+5 Tasty !
I am voting +5 for OriginalGriff's elegant solution above ... and you should too :) ... but ... I was curious to know the pain of implementing this using 'Split' so here goes:
C#
// this code uses Linq: be sure and reference the Linq library 
// in your Form's 'Using-declarations"' using System.Linq;
//
// assume a winform with:
// textBox1, textBox2, button1
// textBox1 (MulitLine = false) holds the string to be split
// textBox2 (MultiLine = true) will hold the result of splitting
// button1 triggers the parsing

// variables to hold the results of indexing the split string
private List<string> method1;
private string method2;
private string method3;

// string to be turned into char[] to use in splitting
private string stringForSplit = "<m>";

private void button1_Click(object sender, EventArgs e)
{
    var result = textBox1.Text
      .Split(stringForSplit.ToCharArray())
        .Where(s => (!String.IsNullOrWhiteSpace(s)))
          .ToList();

    // ignore the first and last entries in the result
    result = result.GetRange(1, result.Count - 2);

    method1 = result;
    method2 = result.Last();
    method3 = result.First();

    // examine the result ...
    textBox2.Lines = result.ToArray();
}
Discussion:

1. would be interesting to compare performance of Split versus RegEx for this scenario.

2. to really compare the 'generalized usefulness' of this technique compared to RegEx would require skills beyond my knowledge of RegEx.
 
Share this answer
 
v14

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900