Click here to Skip to main content
15,887,596 members
Please Sign up or sign in to vote.
3.50/5 (2 votes)
See more:
Hi.
I have a little problem with pulling information out of html code I get as a string.
I make a post to a website and get the website in html code back as a string. Looks like this:
XML
<h3 class="category">Asia</h3>
                                <div class="server-list">
    <div class="server">
        <div class="status-icon up" data-tooltip="Available">
        </div>
        <div class="server-name">


I get around 20 of these in the same string. Each with different name and only two possible data-tooltip, either "Available" or "Unavailable".
What I am having trouble with is searching through the string and checking each item if it is available or not.
Does someone know a good method for doing this?
Posted

1 solution

Supposing that all your entries look like this and you want "Asia"-"Available" and so on, you can use the following regular expression (singleline):
<h3.*?>(.*?)</h3>.*?data-tooltip="(.*?)"

Parse all the results, and you will have it.

Update: the second text sample given needs a different regular expression, but either way see this complete code:

C#
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace t1
{
    class Program
    {
        public struct ServerStatus
        {
            public string ServerName { get; set; }
            public string Status { get; set; }
        }

        public static IList<ServerStatus> GetStatusFromHtml(string HTMLString)
        {
            List<ServerStatus> result = new List<ServerStatus>();
            Regex r = new Regex(@"class=""server.*?data-tooltip=""(.*?)"".*?class=""server-name"">\s*(.*?)\s*</div>", RegexOptions.Singleline);

            Match m = r.Match(HTMLString);
            while (m.Success)
            {
                result.Add(new ServerStatus() { ServerName = m.Groups[2].Value, Status = m.Groups[1].Value });
                m = m.NextMatch();
            }

            return result;
        }
        
        static void Main(string[] args)
        {
            string x = @"<div class=""server alt"">
		<div class=""status-icon up"" data-tooltip=""Available"">
		</div>
		<div class=""server-name"">
				Hardcore
		</div>
	<span class=""clear""><!-- --></span>
	</div>
	<div class=""server"">
		<div class=""status-icon down"" data-tooltip=""Maintenance"">
		</div>
		<div class=""server-name"">
				USD
		</div>
	<span class=""clear""><!-- --></span>
	</div>";

            foreach (ServerStatus s in GetStatusFromHtml(x))
            {
                Console.WriteLine("{0}:{1}", s.ServerName, s.Status);
            }
        }
    }
}


Update 2: using the real source.

C#
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;

namespace t1
{
    class Program
    {
        public struct ServerStatus
        {
            public string ServerName { get; set; }
            public string Status { get; set; }
        }

        public static IList<ServerStatus> GetStatusFromHtml(string HTMLString)
        {
            List<ServerStatus> result = new List<ServerStatus>();
            Regex r = new Regex(@"class=""server.*?data-tooltip=""(.*?)"".*?class=""server-name"">\s*(.*?)\s*</div>", RegexOptions.Singleline);

            Match m = r.Match(HTMLString);
            while (m.Success)
            {
                string ServerName = m.Groups[2].Value;
                string Status = m.Groups[1].Value;
                if(ServerName != string.Empty && Status != string.Empty)
                {
                    result.Add(new ServerStatus() { ServerName = ServerName, Status = Status});
                }
                m = m.NextMatch();
            }

            return result;
        }
        
        static void Main(string[] args)
        {
            WebRequest request = WebRequest.Create(@"http://us.battle.net/d3/en/status");
            using (WebResponse response = request.GetResponse())
            {
                using (StreamReader reader = new StreamReader
                   (response.GetResponseStream(), Encoding.UTF8))
                {
                    string content = reader.ReadToEnd();
                    
                    foreach (ServerStatus s in GetStatusFromHtml(content))
                    {
                        Console.WriteLine("{0}:{1}", s.ServerName, s.Status);
                    }
                }
            }            
        }
    }
}
 
Share this answer
 
v4
Comments
sigsand 24-May-12 12:50pm    
In c# like this:
string expression = @"<h3.*?>(.*?).*?data-tooltip=(.*?)";

?
Notice that I removed one " from your expression.
Zoltán Zörgő 24-May-12 13:22pm    
No, like this:
string expression = @"<h3.*?>(.*?).*?data-tooltip=""(.*?)""";
The qoutes are part of the expression, you will find nothing if you remove one. Actually the greediness of * will not be stopped by anything in your case, since ) is a control character.
Zoltán Zörgő 24-May-12 13:36pm    
Oh! The page stripped part of my expression! This is the correct one:
@"<h3.*?>(.*?)</h3>.*?data-tooltip=""(.*?)"""
Zoltán Zörgő 27-May-12 13:31pm    
Any progress?
sigsand 30-May-12 11:54am    
Sorry fo not replying earlier.
I am not that good with regular expressions and I still have problems with doing what I want.
I'll tell you the whole idea, in case you are still following this thread.
I want to make a program that retreives information from a website and displays some information pulled from that website on its GUI.
The GUI consists of a lot of labels that simply show a boolean value, on or off.
Every second (or whatever, user specific) the program should make a post to the website and store the website as a string (already achieved).
The string when retreived looks like this (with a lot of garbage both in the beginning and the end):
<div class="server alt">
<div class="status-icon up" data-tooltip="Available">
</div>
<div class="server-name">
Hardcore
</div>
<span class="clear"><!-- --></span>
</div>
<div class="server">
<div class="status-icon down" data-tooltip="Maintenance">
</div>
<div class="server-name">
USD
</div>
<span class="clear"><!-- --></span>
</div>

And the information I need to get is if the "data-tooltip=" shows Available or Maintenance. Then change the status of the labels in the GUI to show either on or off.
I have not figured out how to do this yet.
I think that the webpage always has the same content so I was thinking to sput some specific line numbers as constants and simply check each specific line for the word Maintenance or Available.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900