Click here to Skip to main content
15,867,939 members
Articles / Programming Languages / C#
Article

APOD Website Scraper, a HOPE demonstration

Rate me:
Please Sign up or sign in to vote.
5.00/5 (10 votes)
2 Jun 2014CPOL21 min read 23.3K   186   13   2
Using the Higher Order Programming Environment, scrape the APOD website's 20 years of photos and explore APOD's.

Image 1

The source code for this release can be found here: https://github.com/cliftonm/HOPE/tree/release-6-1-2014

Watch the video!

 

Introduction

In this article, I'm going to demonstrate how to write a simple application to work with the APOD website.  The goals are:

  1. Every day at a specific time, scrape the current APOD and display a thumbnail
  2. The thumbnail should include metadata to:
    1. Allow the user to click on the thumbnail and open a browser window to the APOD webpage
    2. Display metadata which can be read or opened in a separate text window.
    3. Launch the associated webpage in the default browser
  3. The user can also create a local database of all historical APOD webpages
  4. The user can search for keywords which will display thumbnails of images that match those keywords
    1. The thumbnails will have the same behavior as described above.

This article expands on my previous article introducing the Higher Order Programming Environment.  Here we develop a much more complex application.

User Warning

I cannot be held responsible for people spending hours and hours searching the database and exploring the great images!

Image 2

Running the Application

The executable filename is TypeSystemExplorer.exe.  If you download just the binaries (.NET 4.5 is required), run this program.  A bunch of things will start happening, including creating the database, creating an Images folder, and scraping today's APOD.

If you want to scrape all the APOD's ever, first, disable the Thumbnail Converter receptor by double-clicking on it.  We don't need it for the moment.  open the folder with the binaries and drag and drop the assembly APODEventGeneratorReceptor.dll onto the surface.  This will create some 6000+ carriers, each with a date between 6/16/1995 (the first APOD) to today.  These carriers will then be processed by the system.  On my computer, it takes about an hour or two to scrape 20 years of APODs.  The logger will occasionally emit a message indicating some problem or other.  When you're all done, don't forget to re-enable the Thumbnail Converter.  For those that want to know what happens when you don't disable the Thumbnail Converter, well, you'll get the first 100 images in the viewer, and then the viewer disables itself.

To search, drag and drop the SearchForReceptor.dll assembly onto the surface.  A window will appear where you can enter search strings.

You can also drag and drop additional thumbnail viewers, text display receptors, etc., enable and disable things, and so forth, like I do in the video.

If you want to just pick some random images, drag and drop the JPG's onto the surface, they'll be displayed and any metadata for those images will be automatically acquired from the database.

Most of all, have fun!

Receptors

Image 3

This would be a fairly simple program to write in a purely monolithic manner.  Using the Higher Order Programming Environment (HOPE) we're going to break the task down into APOD-specific processes and general purpose processes.

There are six receptors that we'll create for this activity:

  1. Key-Value pair persistence
  2. Daily Timer
  3. Web Page Reader
  4. APOD Page Scraper
  5. APOD URL Generator
  6. Record Persistence
  7. Record Searcher

And we'll utilize the Text-to-Speech and Thumbnail Viewer receptors that we already have, with some additional enhancement to the visualizer component (which is not a receptor.)

Please be familiar with the terminology in the article describing HOPE.

Persistence Engine

We need a simple way to persist application data.  Rather than have a bunch of separate databases (or pseudo-databases's) lying around, having a receptor as the interface to a database implementiation will be useful. 

With the HOPE architecture, you can easily replace a receptor.  Here, we can implement a completely different storage mechanism, transparent to the rest of the application.

Let's define the protocol used by the receptors first.

RequireTable Protocol

We'll use this protocol so that receptors can let the persistence engine know that they require certain tables to exist.

<SemanticTypeStruct DeclType="RequireTable">
  <Attributes>
    <NativeType Name="TableName" ImplementingType="string"/>
    <NativeType Name="SemanticTypeName" ImplementingType="string"/>
  </Attributes>
</SemanticTypeStruct>

We'll use semantic types to define the table schema.  For example, let's create the semantic type protocol for saving when the last date/time of a timer event occurs:

<SemanticTypeStruct DeclType="LastEventDateTime">
  <Attributes>
    <NativeType Name="ID" ImplementingType="long?"/>
    <NativeType Name="EventName" ImplementingType="string"/>
    <NativeType Name="EventDateTime" ImplementingType="DateTime?"/>
  </Attributes>
</SemanticTypeStruct>

Note the nullable types.  These help determine which fields to populate when inserting records.  Also note, for SQLite, the primary key autoincrementing field must be a long.  There are undoubtedly other type conversion issues that I haven't addressed yet.

DatabaseRecord Protocol

<SemanticTypeStruct DeclType="DatabaseRecord">
  <Attributes>
    <NativeType Name="ResponseProtocol" ImplementingType="string"/>
    <NativeType Name="TableName" ImplementingType="string"/>
    <NativeType Name="Action" ImplementingType="string"/> <!-- insert, update, delete, select -->
    <NativeType Name="Row" ImplementingType="ICarrier"/> <!-- the carrier with the actual row data -->
    <NativeType Name="Where" ImplementingType="string"/>
    <NativeType Name="OrderBy" ImplementingType="string"/>
    <NativeType Name="GroupBy" ImplementingType="string"/>
  </Attributes>
</SemanticTypeStruct>

Here the protocol tells the receptor the table name on which to operate, the action to take, and in the case of an insert or update statement, the carrier containing the data to record to create or update.  We can also specify where, order by, and group by clauses.  This is by no means an entity framework!  Notice that for queries, a response protocol is provided.  This means that one receptor could issue the query but direct it to a different receptor rather than return the data to itself.

Receptor Implementation

We'll use SQLite as the database for getting a no-fuss implementation up and running (no fuss at least on the part of the user who would otherwise need to specify credentials, make sure a database server was installed and running, etc.)  Let's first look at how I created the receptor template:

public class ReceptorDefinition : IReceptorInstance
{
public string Name { get { return "Persistor"; } }
public bool IsEdgeReceptor { get { return true; } }
public bool IsHidden { get { return false; } }

protected IReceptorSystem rsys;
protected SQLiteConnection conn;
protected Dictionary<string, Action<dynamic>> protocolActionMap;
const string DatabaseFileName = "hope.db";

public ReceptorDefinition(IReceptorSystem rsys)
{
  this.rsys = rsys;
  protocolActionMap = new Dictionary<string, Action<dynamic>>();
  protocolActionMap["RequireTable"] = new Action<dynamic>((s) => RequireTable(s));
  protocolActionMap["DatabaseRecord"] = new Action<dynamic>((s) => DatabaseRecord(s));
  CreateDB();
  OpenDB();
}

public string[] GetReceiveProtocols()
{
  return protocolActionMap.Keys.ToArray();
}

public void Terminate()
{
  conn.Close();
  conn.Dispose();

  // As per this post:
  // http://stackoverflow.com/questions/12532729/sqlite-keeps-the-database-locked-even-after-the-connection-is-closed
  // GC.Collect() is required to ensure that the file handle is released NOW (not when the GC gets a round tuit. ;)
  GC.Collect();
}

public void ProcessCarrier(ICarrier carrier)
{
  protocolActionMap[carrier.Protocol.DeclTypeName](carrier.Signal);
}

Notice how I'm creating a dictionary to map protocols to methods so that we have a one-liner for vectoring the protocol to the desired method.

Implementing a couple methods to create the database if it doesn't exist:

/// <summary>
/// Create the database if it doesn't exist.
/// </summary>
protected void CreateDBIfMissing()
{
  string subPath = Path.GetDirectoryName(DatabaseFileName);

  if (!File.Exists(DatabaseFileName))
  {
    SQLiteConnection.CreateFile(DatabaseFileName);
  }
}

protected void OpenDB()
{
  conn = new SQLiteConnection("Data Source = " + DatabaseFileName); 
  conn.Open();
}

...we now have enough code at this point to test whether the database is created.  Simply dropping the receptor onto the surface should create the database, and sure enough, the database is created:

Image 4

Another interesting quality of the HOPE architecture is that it is very easy to create behaviors without having to run the entire application or write unit tests.  Drop a carrier onto the surface to test specific behavior.  Remove receptors that would process the carrier to inspect the carrier.

Now let's implement the essentials for creating tables (if they are missing):

protected void RequireTable(dynamic signal)
{
  if (!TableExists(signal.TableName))
  {
    StringBuilder sb = new StringBuilder("create table " + signal.TableName + "(");

    // Always create a primary key as the field ID. 
    // There is no need to put this into the semantic type definition unless it's required for queries.
    sb.Append("ID INTEGER PRIMARY KEY AUTOINCREMENT");
    List<INativeType> types = rsys.SemanticTypeSystem.GetSemanticTypeStruct(signal.SemanticTypeName).NativeTypes;

    // Ignore ID field in the schema, as we specifically create it above.
    types.Where(t=>t.Name.ToLower() != "id").ForEach(t =>
    {
      sb.Append(", ");
      sb.Append(t.Name);
      // we ignore types, as per the SQLite 3 documentation:
      // "Any column in an SQLite version 3 database, except an INTEGER PRIMARY KEY column, 
      // may be used to store a value of any storage class."
      // <a href="http://www.sqlite.org/datatype3.html">http://www.sqlite.org/datatype3.html</a>
    });

    sb.Append(");");

    Execute(sb.ToString());
  }
}

protected bool TableExists(string tableName)
{
  string sql = "SELECT name FROM sqlite_master WHERE type='table' AND name=" + tableName.SingleQuote() + ";";
  string name = QueryScalar<string>(sql);

  return tableName == name;
}

protected T QueryScalar<T>(string query)
{
  SQLiteCommand cmd = conn.CreateCommand();
  cmd.CommandText = query;
  T result = (T)cmd.ExecuteScalar();

  return result;
}

We'll now test this by creating a carrier XML file:

<Carriers>
  <Carrier Protocol="RequireTable" TableName="LastEventDateTime" SemanticTypeName="LastEventDateTime"/>
</Carriers>

Drop the Persistor receptor onto the surface, drop the carrier onto the surface, and voila, the table is created (using the ubiquitous SQLite Database Browser):

Image 5

Now we can declare our schema as semantic types and create the table if it doesn't exist, and we've successfully tested this process without writing any overhead code.

The last thing to do is implement the CRUD operations, and we'll create test carriers to verify the behavior.  We need this:

protected Dictionary<string, object> GetColumnValueMap(ICarrier carrier)
{
  List<INativeType> types = rsys.SemanticTypeSystem.GetSemanticTypeStruct(carrier.Protocol.DeclTypeName).NativeTypes;
  Dictionary<string, object> cvMap = new Dictionary<string, object>();
  types.ForEach(t => cvMap[t.Name] = t.GetValue(carrier.Signal));

  return cvMap;
}

...to get the key-value pairs for the insert and update operations.

Insert Record

protected void Insert(dynamic signal)
{
  Dictionary<string, object> cvMap = GetColumnValueMap(signal.Row);
  StringBuilder sb = new StringBuilder("insert into " + signal.TableName + "(");
  sb.Append(String.Join(", ", (from c in cvMap where c.Value != null select c.Key).ToArray()));
  sb.Append(") values (");
  sb.Append(String.Join(",", (from c in cvMap where c.Value != null select "@" + c.Key).ToArray()));
  sb.Append(");");

  SQLiteCommand cmd = conn.CreateCommand();
  (from c in cvMap where c.Value != null select c).ForEach(kvp => 
    cmd.Parameters.Add(new SQLiteParameter("@" + kvp.Key, kvp.Value)));
  cmd.CommandText = sb.ToString();
  cmd.ExecuteNonQuery();
  cmd.Dispose();
}

Test Carrier

<Carriers>
  <Carrier Protocol="LastEventDateTime" EventName="Test" EventDateTime="8/19/1962 12:15 PM"/>
  <Carrier Protocol="DatabaseRecord" TableName="LastEventDateTime" Action="Insert" Row="{LastEventDateTime}"/>
</Carriers>

As usual, we drop the receptor onto the surface, then the test carrier above, and then inspect the database:

Image 6

Update Record

protected void Update(dynamic signal)
{
  Dictionary<string, object> cvMap = GetColumnValueMap(signal.Row);
  StringBuilder sb = new StringBuilder("update " + signal.TableName + " set ");
  sb.Append(String.Join(",", (from c in cvMap where c.Value != null select c.Key + "= @" + c.Key).ToArray()));
  sb.Append(" where " + signal.Where);

  SQLiteCommand cmd = conn.CreateCommand();
  (from c in cvMap where c.Value != null select c).ForEach(kvp => 
    cmd.Parameters.Add(new SQLiteParameter("@" + kvp.Key, kvp.Value)));
  cmd.CommandText = sb.ToString();
  cmd.ExecuteNonQuery();
  cmd.Dispose();
}

Test Carrier

<Carriers>
  <Carrier Protocol="LastEventDateTime" EventName="I've been updated!"/>
  <Carrier Protocol="DatabaseRecord" TableName="LastEventDateTime" Action="Update" Row="{LastEventDateTime}" Where="ID=1"/>
</Carriers>

Image 7

Delete Record

protected void Delete(dynamic signal)
{
  string sql = "delete from " + signal.TableName + " where " + signal.Where;
  SQLiteCommand cmd = conn.CreateCommand();
  cmd.CommandText = sql;
  cmd.ExecuteNonQuery();
  cmd.Dispose();
}

Test Carrier

<Carriers>
  <Carrier Protocol="DatabaseRecord" TableName="LastEventDateTime" Action="Delete" Where="ID=1"/>
</Carriers>

Image 8

All gone!

Select Record(s)

Each selected record will be emitted as its own carrier.  This is rather inefficient for large record sets, but for our purposes, will be quite sufficient.  Other implementations could, for example, return a collection of records within a semantic type.  We could also easily implement paging with this mechanism.  We'll leave that for later, when we actually need to deal with this problem.  And of course, table joins is completely ignored, etc.  We just need the basics here!

protected void Select(dynamic signal)
{
  StringBuilder sb = new StringBuilder("select ");
  List<INativeType> types = rsys.SemanticTypeSystem.GetSemanticTypeStruct(signal.ResponseProtocol).NativeTypes;
  sb.Append(String.Join(",", (from c in types select c.Name).ToArray()));
  sb.Append(" from " + signal.TableName);
  if (signal.Where != null) sb.Append(" where " + signal.Where);
  // support for group by is sort of pointless since we're not supporting any mechanism for aggregate functions.
  if (signal.GroupBy != null) sb.Append(" group by " + signal.GroupBy);
  if (signal.OrderBy != null) sb.Append(" order by " + signal.OrderBy);

  SQLiteCommand cmd = conn.CreateCommand();
  cmd.CommandText = sb.ToString();
  SQLiteDataReader reader = cmd.ExecuteReader();

  while (reader.Read())
  {
    ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct(signal.ResponseProtocol);
    dynamic outSignal = rsys.SemanticTypeSystem.Create(signal.ResponseProtocol);
    Type type = outSignal.GetType();

    // Populate the output signal with the fields retrieved from the query, as specified by the requested response protocol
    types.ForEach(t =>
    {
      object val = reader[t.Name];

      PropertyInfo pi = type.GetProperty(t.Name);
      val = Converter.Convert(val, pi.PropertyType);
      pi.SetValue(outSignal, val);
    });

    rsys.CreateCarrier(this, protocol, outSignal);
  }

  cmd.Dispose();
}

Test Carrier

<Carriers>
  <Carrier Protocol="DatabaseRecord" TableName="LastEventDateTime" ResponseProtocol="LastEventDateTime" Action="select"/>
</Carriers>

Mousing over the emitted carrier, we can inspect the contents in the property grid:

Image 9

And we see the data returned.

There are of course a few (maybe numerous!) loose ends which we'll deal with at some point.

At this point though, we have a workable persistor receptor.  Here's one with three records waiting to be processed:

Image 10

Timer

On to something simpler!

We need a timer receptor to generate carriers at specific intervals.  These carriers need only specify a protocol with a message specific for the kind of timer, for example, "Daily", "Every Quarter Hour", etc.  As a side note, we could use "Every Quarter Hour" to create a grandfather clock chime, but that's not the purpose here.

Because I turn off my computer at night (and reboot possibly once or more during the day) and I suspect others do as well, the timer needs to remember the last carrier it issued and determine if the time period as expired since the last event.  If so, a new carrier will be emitted.  This will keep the system from re-firing carriers for a particular time interval that have already been processed.

One observation about the HOPE platform is that it really wants real-time feeds.  It's well suited for handling streams of data, events, and situations where data is continuously changing.

Let's start with the protocols:

IntervalTimerConfiguration Protocol

<SemanticTypeStruct DeclType="IntervalTimerConfiguration">
  <Attributes>
    <NativeType Name="StartDateTime" ImplementingType="DateTime"/>
    <NativeType Name="Interval" ImplementingType="int"/>
    <NativeType Name="EventName" ImplementingType="string"/>
    <NativeType Name="IgnoreMissedIntervals" ImplementingType="bool"/>
  </Attributes>
</SemanticTypeStruct>

We'll need this protocol for declaring different starting times, their intervals, and the event names.  The interval will be in seconds, so, for example, one day is 24 hours * 60 minutes / hr * 60 seconds / minute, or 86,400.  The event name describes the name to stuff into the TimerEvent signal.

StartDateTime acts as a seed for when to the next event occurs.  If the date is missing, the DateTime class uses the current date.

TimerEvent Protocol

<SemanticTypeStruct DeclType="TimerEvent">
  <Attributes>
    <NativeType Name="EventName" ImplementingType="string"/>
    <NativeType Name="EventDateTime" ImplementingType="DateTime"/>
  </Attributes>
</SemanticTypeStruct>

A very simple protocol describing the event name. 

Note how easy it would be to create carriers of different events to simulate the timer, which makes it a lot easier to test the receptors!

Configuration Carrier

Now let's create the configuration carrier (carrier-config_timer.xml).  It's rather dull, since we only want daily events:

<Carriers>
  <Carrier 
      Protocol="IntervalTimerConfiguration" 
      StartDateTime="8:00 AM"
      Interval="86400" 
      EventName="Daily Event"
      IgnoreMissedIntervals="false"/>
</Carriers>

For something a little more exciting, create a 5 second event as well:

<Carrier Protocol="IntervalTimerConfiguration" 
         StartDateTime="1/1/2014 12:01 AM" 
         Interval="5" 
         EventName="5sec" 
         IgnoreMissedIntervals="true"/>

So, at 8 AM every morning (or whenever you turn your computer on) the event will fire.  What happens if you don't turn your computer on for a few days?  Well, then we need to trigger events for missed intervals.  We'll implement this so only one event is triggered for every missed interval--for the APOD scraper, we want to retrieve any missed days.  We see now why we need the persistence receptor--we need the ability to store when the last event was fired.

We'll use the carrier we defined earlier to return last timer event records:

<SemanticTypeStruct DeclType="LastEventDateTime">
  <Attributes>
    <NativeType Name="ID" ImplementingType="long?"/>
    <NativeType Name="EventName" ImplementingType="string"/>o9
    <NativeType Name="EventDateTime" ImplementingType="DateTime?"/>
  </Attributes>
</SemanticTypeStruct>

Now we get to our first "real world problem" with the HOPE architecture -- completion events.  The issue is this: when I initialize the event receptor, how do I know whether I've received the last event time from the database at that point?  If this is a new event, there is no record in the database and the persistor will not send anything back.  This requires a completion notification -- but do we implement this as a carrier or as something the receptor system manages?  I'll elect to use a completion notification carrier, for the simple reason that, in a distributed computing environment, the receptor system will hand off the carrier to another system and at that point it has no idea when the remote process completes.  The downside to this is that we have to guarantee the order of carriers.  At the moment, since carriers are created only the main application thread, this is not an issue.  Even with the visualizer hooked in, the carriers are processed sequentially -- see the CurveIndex property of each carrier animation, which counts from 0 to 50.  Furthermore, whether a carrier can be processed asynchronously should be determined, not by the receptor system, but by the receptor.  Furthermore, this is not an issue except when synchronization is required, such as the multiple carrier return by the persistor.  Indeed, this whole problem could be avoided if we actually returned the row collection rather than one row per carrier!  This seems like the best approach, so I've made 5 lines of modification to the persistor for this new behavior.

The actual implementation is a bit arduous because we have to sync up with the response from the persistor.  Also, the calculation for the next event time is complex because we're taking into consideration that the system is restarting after being down for a while, and we have the option to create events for all missed time points.  Ignoring all the hoops one has to go through to set everything up, here's what happens when an event fires (the code comments should be sufficiently explanatory):

/// <summary>
/// Fire an event NOW.
/// </summary>
protected void FireEvent(DateTime now)
{
  LastEventTime = now;
  UpdateRecord();
  CreateEventCarrier();
}

/// <summary>
/// Updates an existing record with the new "LastEventTime"
/// or inserts a new record if it didn't already exist.
/// </summary>
protected void UpdateRecord()
{
  ICarrier rowCarrier = CreateRow();

  // If it already exists in the DB, update it
  if (PreExisting)
  {
    UpdateRecord(rowCarrier);
  }
  else
  {
    // Otherwise, create it
    PreExisting = true;
    InsertRecord(rowCarrier);
  }
}

/// <summary>
/// Creates the carrier for the timer event.
/// </summary>
protected void CreateEventCarrier()
{
  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("TimerEvent");
  dynamic signal = rsys.SemanticTypeSystem.Create("TimerEvent");
  signal.EventName = EventName;
  signal.EventDateTime = (DateTime)LastEventTime;
  rsys.CreateCarrier(receptor, protocol, signal);
}

/// <summary>
/// Creates the carrier (as an internal carrier, not exposed to the system) for containing
/// our record information. Assuming only an update, this sets only the EventDateTime field.
/// </summary>
protected ICarrier CreateRow()
{
  // Create the type for the updated data.
  ISemanticTypeStruct rowProtocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("LastEventDateTime");
  dynamic rowSignal = rsys.SemanticTypeSystem.Create("LastEventDateTime");
  rowSignal.EventDateTime = LastEventTime;
  ICarrier rowCarrier = rsys.CreateInternalCarrier(rowProtocol, rowSignal);

  return rowCarrier;
}

/// <summary>
/// Creates a carrier instructing the persistor to update the LastEventDateTime field for our event name.
/// </summary>
protected void UpdateRecord(ICarrier rowCarrier)
{
  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("DatabaseRecord");
  dynamic signal = rsys.SemanticTypeSystem.Create("DatabaseRecord");
  signal.TableName = "LastEventDateTime";
  signal.Row = rowCarrier;
  signal.Action = "update";
  signal.Where = "EventName = " + EventName.SingleQuote();
  rsys.CreateCarrier(receptor, protocol, signal);
}

/// <summary>
/// Creates a carrier instructing the persistor to create a new entry. Note that we also
/// add the EventName to the field set.
/// </summary>
protected void InsertRecord(ICarrier rowCarrier)
{
  rowCarrier.Signal.EventName = EventName;
  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("DatabaseRecord");
  dynamic signal = rsys.SemanticTypeSystem.Create("DatabaseRecord");
  signal.TableName = "LastEventDateTime";
  signal.Row = rowCarrier;
  signal.Action = "insert";
  rsys.CreateCarrier(receptor, protocol, signal);
}

The Webpage Scraper Receptor

Generically, we need the ability to return the HTML of a given web page.  We'll use two protocols.  This one defines the request, in which we provide the URL of the page to scrape and the response protocol to put the HTML into.

<SemanticTypeStruct DeclType="ScrapeWebpage">
  <Attributes>
    <NativeType Name="URL" ImplementingType="string"/>
    <NativeType Name="ResponseProtocol" ImplementingType="string"/>
  </Attributes>
</SemanticTypeStruct>

For our purposes, we want the response handled by a protocol that the APOD webpage scraper receptor (see below) will use the parse the HTML:

<SemanticTypeStruct DeclType="APODWebpage">
  <Attributes>
    <NativeType Name="URL" ImplementingType="string"/>
    <NativeType Name="HTML" ImplementingType="string"/>
    <NativeType Name="Errors" ImplementingType="string"/>
  </Attributes>
</SemanticTypeStruct>

We also return the requested URL any exceptions that might occur in reading the page.  The receptor implementation is straight-forward.  Note that the processing is performed asynchronously:

public async void ProcessCarrier(ICarrier carrier)
{
  try
  {
    string html = await Task.Run(() =>
    {
      // http://stackoverflow.com/questions/599275/how-can-i-download-html-source-in-c-sharp
      using (WebClient client = new WebClient())
      {
        // For future reference, if there are parameters, like:
        // <a href="http://www.somesite.it/?p=1500">www.somesite.it/?p=1500</a>
        // use:
        // client.QueryString.Add("p", "1500"); //add parameters
        return client.DownloadString(carrier.Signal.URL);
      }
    });

    Emit(html, carrier.Signal.ResponseProtocol);
  }
  catch (Exception ex)
  {
    EmitError(ex.Message, carrier.Signal.ResponseProtocol);
  }
}

protected void Emit(string html, string protocolName)
{
  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct(protocolName);
  dynamic signal = rsys.SemanticTypeSystem.Create(protocolName);
  signal.HTML = html;
  rsys.CreateCarrier(this, protocol, signal);
}

protected void EmitError(string error, string protocolName)
{
  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct(protocolName);
  dynamic signal = rsys.SemanticTypeSystem.Create(protocolName);
  signal.Errors = error;
  rsys.CreateCarrier(this, protocol, signal);
}

We now have a simple re-usable receptor for getting a web page's HTML.

The APOD Webpage Scraper Receptor

While this receptor receives the carrier-protocol to scrape the page, it also receives the HTML from the web scraper:

protected Dictionary<string, Action<dynamic>> protocolActionMap;

public ReceptorDefinition(IReceptorSystem rsys)
{
  this.rsys = rsys;

  protocolActionMap = new Dictionary<string, Action<dynamic>>();
  protocolActionMap["TimerEvent"] = new Action<dynamic>((s) => TimerEvent(s));
  protocolActionMap["APODWebpage"] = new Action<dynamic>((s) => ProcessPage(s));
}

public string[] GetReceiveProtocols()
{
  return protocolActionMap.Keys.ToArray();
}

When this receptor gets the daily event to process today's APOD, it formats the URL and sends off the request to scrape the webpage:

protected void TimerEvent(dynamic signal)
{
  if (signal.EventName == "ScrapeAPOD")
  {
    DateTime eventDate = signal.EventDateTime;
    // Create a URL in this format:
    // http://apod.nasa.gov/apod/ap140528.html
    string url = "http://apod.nasa.gov/apod/ap" + eventDate.ToString("yyMMdd") + ".html";
    EmitUrl(url);
  }
}

protected void EmitUrl(string url)
{
  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("ScrapeWebpage");
  dynamic signal = rsys.SemanticTypeSystem.Create("ScrapeWebpage");
  signal.URL = url;
  signal.ResponseProtocol = "APODWebpage";
  rsys.CreateCarrier(this, protocol, signal);
}

Now, scraping a web page for its content tends to be more art than science, especially since the layout of the content can change, people who design these pages don't think in terms of how their page might be used by automation, so there's no divs and certainly no useful id attributes, and so forth.  I won't bore you with the details of the implementation, which I've tried to make as robust as possible to handle 19 years of daily APOD's (that's almost 7000 pages.)

However, for a successful page scrape, we package up the image filename for other receptors, and we also log some information to the database.  Emitting the image file uses the ImageFilename protocol I created in the previous article, and can trigger the processes implemented in the thumbnail creator, viewer, and writer receptors.

protected void EmitImageFile(string fn)
{
  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("ImageFilename");
  dynamic fsignal = rsys.SemanticTypeSystem.Create("ImageFilename");
  fsignal.Filename = fn;
  // TODO: The null here is really the "System" receptor.
  rsys.CreateCarrier(this, protocol, fsignal);
}

To log the entry in the database, we need a semantic type defining the schema:

<SemanticTypeStruct DeclType="APOD">
  <Attributes>
    <NativeType Name="ID" ImplementingType="long?"/>
    <NativeType Name="URL" ImplementingType="string"/>
    <NativeType Name="ImageFile" ImplementingType="string"/>
    <NativeType Name="Title" ImplementingType="string"/>
    <NativeType Name="Keywords" ImplementingType="string"/>
    <NativeType Name="Explanation" ImplementingType="string"/>
    <NativeType Name="Error" ImplementingType="string"/>
  </Attributes>
</SemanticTypeStruct>

Note that keywords are missing for older APOD pages.  Also, we're logging any scraper errors so we can visit the page and see what's up with the HTML and correct the algorithm.

We need to make sure this table exists, so the receptor issues a RequireTable carrier-protocol:

public void Initialize()
{
  RequireAPODTable();
}

protected void RequireAPODTable()
{
  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("RequireTable");
  dynamic signal = rsys.SemanticTypeSystem.Create("RequireTable");
  signal.TableName = "APOD";
  signal.Schema = "APOD";
  rsys.CreateCarrier(this, protocol, signal);
}

Now, logging the data is straight forward:

protected void LogImage(string url, string fn, string keywords, string title, string explanation, List<string> errors)
{
  dynamic record = rsys.SemanticTypeSystem.Create("APOD");
  record.URL = url;
  record.ImageFile = fn;
  record.Keywords = keywords;
  record.Explanation = explanation;
  record.Title = title;
  record.Errors = String.Join(", ", errors.ToArray());

  ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("DatabaseRecord");
  dynamic signal = rsys.SemanticTypeSystem.Create("DatabaseRecord");
  signal.TableName = "APOD";
  signal.Action = "insert";
  signal.Row = record;
  rsys.CreateCarrier(this, protocol, signal);
}  

And here's an example of 8 entries in the database:

Image 11

Of course it's nearly as fun to watch HOPE in action processing these pages as it is to look at the beautiful APOD photos!

Handling Errors

We'll also send errors to the Logger receptor, which creates a flyout for any "debug message":

// Use the debug message receptor to display error counts.
if (errors.Count > 0)
{
  ++totalErrors;
  ISemanticTypeStruct dbgMsgProtocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("DebugMessage");
  dynamic dbgMsgSignal = rsys.SemanticTypeSystem.Create("DebugMessage");
  dbgMsgSignal.Message = totalErrors.ToString() + ": " + record.Errors;
  rsys.CreateCarrier(this, dbgMsgProtocol, dbgMsgSignal);
}  

This way, we can visually determine if there are a slew of errors suddenly occurring in the page scraping process.

Scraping all the Images

To actually scrape all the pages, we're going to create a receptor that does nothing more than simulate timer events with the dates, starting from the first APOD on June 16, 1995, to today.  The carrier looks like this:

<Carrier Protocol="TimerEvent" EventName="ScrapeAPOD" EventDateTime="5/19/2014"/>

This is a "drop and go" receptor -- as soon as it's dropped onto the surface, it starts emitting carriers.  Crazy.

public void Initialize()
{
  DateTime start = DateTime.Parse("6/16/1995");
  DateTime stop = DateTime.Parse("7/16/1995"); // DateTime.Now;

  for (DateTime date = start; date <= stop; date = date.AddDays(1))
  {
    ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("TimerEvent");
    dynamic signal = rsys.SemanticTypeSystem.Create("TimerEvent");
    signal.EventName = "ScrapeAPOD";
    signal.EventDateTime = date;
    rsys.CreateCarrier(this, protocol, signal);
  }

  rsys.Remove(this);
}  

Note also that when this receptor is done generating the carriers, it removes itself.

One month of dates being sent to the APOD receptor:

Image 12

One month of images displayed in a thumbnail viewer receptor (the mouse wheel rotates the images around like a carousel):

Image 13

Bells and Whistles

Viewing more than 300 or so images in a thumbnail viewer starts dragging down the system.  Also, once all these images are in the system, it would be nice to be able to filter on them with simple text searching.  For the main image in the viewer carousel, it would also be nice to be able to perform the following actions:

  • View image full size
  • Read (or speak) the "Explanation"
  • Go to the actual web page

Image Metadata

Let's deal with image metadata first.  The key to this process is to make it as general purpose as possible.  One approach would be to associate metadata with the image itself.  This has significant drawbacks. We can't guarantee that the metadata is always present.  What happens if the user drags and drops a thumbnail into the viewer?  The only metadata that we can possibly associate with the image that helps us is the filename -- this we can put into the Tag property of the image when it's loaded.  And we can certainly query the database for any metadata information (the URL, title, full image filename, and explanation) given an image.  The thumbnail viewer can request the metadata package from some receptor that can provide it:

<SemanticTypeStruct DeclType="GetImageMetadata">
  <Attributes>
    <NativeType Name="ImageFile" ImplementingType="string"/>
    <NativeType Name="ResponseProtocol" ImplementingType="string"/>
  </Attributes>
</SemanticTypeStruct>

and for the response:

<SemanticTypeStruct DeclType="HaveImageMetadata">
  <Attributes>
    <NativeType Name="ImageFile" ImplementingType="string"/>
    <NativeType Name="Metadata" ImplementingType="dynamic"/>
  </Attributes>
</SemanticTypeStruct>

The visualizer can generate this request for an image that is "focused" in the carousel:

protected void GetImageMetadata(IReceptor r)
{
  CarouselState cstate = carousels[r];
  int idx = cstate.Offset % cstate.Images.Count;

  if (idx < 0)
  {
    idx += cstate.Images.Count;
  }

  Image img = cstate.Images[idx];
  ISemanticTypeStruct protocol = Program.Receptors.SemanticTypeSystem.GetSemanticTypeStruct("GetImageMetadata");
  dynamic signal = Program.Receptors.SemanticTypeSystem.Create("GetImageMetadata");
  signal.ImageFile = img.Tag.ToString();
  signal.ResponseProtocol = "HaveImageMetadata";
  Program.Receptors.CreateCarrier(null, protocol, signal);
}

The APOD receptor is interested in this protocol and will send a request off to the Persistor receptor, since it after all created the images there in the database.  We can also re-use the protocol that defines what we want back, namely the APOD schema-protocol.

In the APOD receptor:

protected void GetImageMetadata(dynamic signal)
{
  string imageFile = signal.ImageFile;

  // Sort of kludgy, we're stripping off the "-thumbnail" portion of the filename if the user
  // happens to have dropped a thumbnail file. Rather dependent upon the fact that the thumbnail
  // writer writes image files with string added to the filename!
  imageFile = imageFile.Surrounding("-thumbnail");

  ISemanticTypeStruct dbprotocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("DatabaseRecord");
  dynamic dbsignal = rsys.SemanticTypeSystem.Create("DatabaseRecord");
  dbsignal.TableName = "APOD";
  dbsignal.Action = "select";
  dbsignal.ResponseProtocol = "APOD";
  // Wildcard prefix to ignore path information.
  dbsignal.Where = "ImageFile LIKE '%" + imageFile + "'";
  rsys.CreateCarrier(this, dbprotocol, dbsignal);
}

Here, debug view, is an example response:

Image 14

or we can inspect the carrier log (created by the carrier export receptor) when I drop an image onto the surface:

<Carriers>
  <Carrier Protocol="ImageFilename" Filename="E:\HOPE\TypeSystemExplorer\bin\Debug\Images\02mantar_feresten.jpg" />
  <Carrier Protocol="ThumbnailImage" Filename="E:\HOPE\TypeSystemExplorer\bin\Debug\Images\02mantar_feresten-thumbnail.jpg" Image="System.Drawing.Bitmap" />
  <Carrier Protocol="GetImageMetadata" ImageFile="E:\HOPE\TypeSystemExplorer\bin\Debug\Images\02mantar_feresten.jpg" ResponseProtocol="HaveImageMetadata" />
  <Carrier Protocol="DatabaseRecord" ResponseProtocol="APOD" TableName="APOD" Action="select" Where="ImageFile LIKE '%02mantar_feresten.jpg'" Tag="HaveImageMetadata" />
  <Carrier Protocol="APODRecordset">
    <Recordset>
      <APOD ID="3003" URL="http://apod.nasa.gov/apod/ap031206.html" ImageFile="Images\02mantar_feresten.jpg" Keywords="sundial, observatory, india" Title="Jaipur Observatory Sundial" Explanation="..."
    </Recordset>
  </Carrier>
</Carriers>

Now we "simply" send this response back to the requestor with the field values we want to include (a bit klunky but good enough for now):

protected void ProcessAPODRecordset(dynamic signal)
{
  // Allows for custom protocols.
  ISemanticTypeStruct respProtocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("HaveImageMetadata");
  dynamic respSignal = rsys.SemanticTypeSystem.Create("HaveImageMetadata");
  List<dynamic> records = signal.Recordset;

  // TODO: What if more than one image filename matches?
  if (records.Count > 0)
  {
    dynamic firstMatch = records[0];
    respSignal.ImageFile = firstMatch.ImageFile;
    ICarrier responseCarrier = CreateAPODRecordCarrier();
    responseCarrier.Signal.URL = firstMatch.URL;
    responseCarrier.Signal.Keywords = firstMatch.Keywords;
    responseCarrier.Signal.Title = firstMatch.Title;
    responseCarrier.Signal.Explanation = firstMatch.Explanation;
    respSignal.Metadata = responseCarrier;

    // Off it goes!
    rsys.CreateCarrier(this, respProtocol, respSignal);
  }
  // else, APOD knows nothing about this image file, so there's no response.
}

One Step Backwards for Two Steps Forward

However, there's an important piece missing, and that's describing to the front end what exactly can be done with this information.  How does the visualizer indicate that, for example, a web page can be opened, or that the title or explanation can be read or spoken?  The answer to that is to properly utilize the semantic type system.  We'll rewrite the APOD semantic type like this (introducing the "SemanticElement" element):

<SemanticTypeStruct DeclType="APOD">
  <Attributes>
    <SementicElement Name="PrimaryKey"/>
    <SemanticElement Name="URL"/>
    <SemanticElement Name="ImageFilename"/>
    <SemanticElement Name="Keywords"/>
    <SemanticElement Name="Title"/>
    <SemanticElement Name="Explanation"/>
    <NativeType Name="Errors" ImplementingType="string"/>
  </Attributes>
</SemanticTypeStruct>

If we now look at the carriers, we can see how the semantic types are elements instead of values.  Here's the request and the response:

<Carrier Protocol="DatabaseRecord" ResponseProtocol="APOD" TableName="APOD" Action="select" Where="ImageFilename LIKE '%02mantar_feresten.jpg'" />
<Carrier Protocol="APODRecordset">
  <Recordset>
    <APOD Errors="" />
    <URL Value="http://apod.nasa.gov/apod/ap031206.html" />
    <ImageFilename Filename="Images\02mantar_feresten.jpg" />
    <Keywords>
      <Text Value="sundial, observatory, india" />
    </Keywords>
    <Title>
      <Text Value="Jaipur Observatory Sundial" />
    </Title>
    <Explanation>
      <Text Value="&lt;a href=&quot;http://www.bomhard.de/_englisch/index.htm&quot;&gt;Walk through&lt;/a&gt; these doors and up the stairs to begin your journey along a line from Jaipur, India toward the &lt;a href=&quot;ap980912.html&quot;&gt;North Celestial Pole&lt;/a&gt;. Such cosmic alignments abound in &lt;a href=&quot;http://www.ferestenphoto.com/mantar_subdirect.html&quot;&gt;marvelous Indian observatories&lt;/a&gt; where the architecture itself allows astronomical measurements. The structures were &lt;a href=&quot;http://www.atco-fr.com/cadrans/jaipur/jaip_uk.php3&quot;&gt;built in Jaipur&lt;/a&gt; and other cities in the eighteenth century by the Maharaja &lt;a href=&quot;http://www.atributetohinduism.com/ articles_hinduism/46.htm&quot;&gt;Jai Singh II (1686-1743)&lt;/a&gt;. Rising about 90 feet high, this stairway actually forms a shadow caster or &lt;a href=&quot;http://www.cosmicgnomon.com/sdindex.htm&quot;&gt;gnomon&lt;/a&gt;, part of what is still perhaps the largest &lt;a href=&quot;http://www.sundials.org/&quot;&gt;sundial on planet Earth&lt;/a&gt;. Testaments to Jai Singh II's passion for astronomy, the design and large scale of his observatories' structures still provide impressively accurate measurements of &lt;a href=&quot;http://www.nsta.org/awsday&quot;&gt;shadows and sightings&lt;/a&gt; of celestial angles." />
    </Explanation>
  </Recordset>
</Carrier>

How does this solve the problem of what do to with the metadata information?  Well, now, we know exactly what protocol and signal to package up into a carrier if the user clicks on, say the URL -- the protocol is the semantic type "URL."  If we have a receptor interested in carriers with this protocol, then it will respond when the visualizer packages up that metadata item, as we'll see next.

Moving Forward: Processing Metadata

We basically want to show the user what metadata is available for the associated image, so we'll first collect the metadata and determine whether the metadata is actionable (it will be a semantic element rather than a native type):

public void ProcessImageMetadata(dynamic signal)
{
  ICarrier metadata = signal.Metadata;
  string protocol = metadata.Protocol.DeclTypeName;
  string path = signal.ImageFilename.Filename;
  string fn = Path.GetFileName(path);
  var carousel = carousels.FirstOrDefault(kvp => kvp.Value.ActiveImageFilename == fn);

  // The user could have removed the viewer by the time we get a response.
  if (carousel.Value != null)
  {
    InitializeMetadata(protocol, carousel.Value.MetadataPackets, metadata.Signal);
  }
}

// This is complex piece of code.
/// <summary>
/// Gets the metadata tags reflectively, so that we have a general purpose function for display image metadata.
/// </summary>
protected void InitializeMetadata(string protocol, List<MetadataPacket> packets, dynamic signal)
{
  packets.Clear();
  // Get all the native and semantic types so we can get the values of these types from the signal.
  List<IGetSetSemanticType> types = Program.SemanticTypeSystem.GetSemanticTypeStruct(protocol).AllTypes;
  // Get the type of the signal for reflection.
  Type t = signal.GetType();
  // For each property in the signal where the value of the property isn't null (this check may not be necessary)...
  t.GetProperties().Where(p => p.GetValue(signal) != null).ForEach(p =>
  {
    // We get the value, which is either a NativeType or SemanticElement
    object obj = p.GetValue(signal);
    string itemProtocol = null;

    // If it's a SemanticElement, then we have a protocol that we can use for actionable metadata.
    // We would package up this protocol into a carrier with the metadata signal in order to let
    // other receptors process the protocol.
    if (obj is IRuntimeSemanticType)
    {
      itemProtocol = p.Name;
    }

    // Here we the IGetSetSemanticType instance (giving us access to Name, GetValue and SetValue operations) for the type. 
    IGetSetSemanticType protocolType = types.Single(ptype => ptype.Name == itemProtocol);
    // Create a metadata packet.
    MetadataPacket metadataPacket = new MetadataPacket() { ProtocolName = itemProtocol, Name = p.Name };
    // Get the object value. This does some fancy some in the semantic type system,
    // depending on whether we're dealing with a native type (simple) or a semantic element (complicated).
    object val = protocolType.GetValue(Program.SemanticTypeSystem, signal);
    // If the type value isn't null, then we have some metadata we can display for the image.
    val.IfNotNull(v =>
    {
      metadataPacket.Value = v.ToString();
      packets.Add(metadataPacket);
    });
  });
}

Now we can come up with some sort of scheme to display the metadata, say, under the image.  We will limit the content width though:

kvp.Value.MetadataPackets.ForEach(meta =>
{
  Rectangle region = new Rectangle(location.X, y, location.Width, 15);
  string data = meta.Name + ": " + meta.Value;
  e.Graphics.DrawString(data, font, whiteBrush, region);
  y += 15;
});

where kvp.Value is the carousel for the "focused" image.  The result now is the metadata appearing below the focused image in the viewer carousel:

Image 15

Now, the fun part is, when we click on one of these metadata fields, we can create carriers for the semantic elements that do something, based of course on what the receptor that "listens" to that protocol wants to do.  This is handled in the visualizer and is a bit complex due to the interaction with the semantic type system:

protected bool TestImageMetadataDoubleClick(Point p)
{
  ISemanticTypeSystem sts = Program.SemanticTypeSystem;

  foreach(var kvp in carousels)
  {
    Rectangle imgArea = kvp.Value.ActiveImageLocation;
    int imgidx = kvp.Value.ActiveImageIndex;
    int idx = -1;

    foreach(var meta in kvp.Value.Images[imgidx].MetadataPackets)
    {
      ++idx;
      Rectangle metaRect = new Rectangle(imgArea.Left, imgArea.Bottom + 10 + (MetadataHeight * idx), imgArea.Width, MetadataHeight);

      if (metaRect.Contains(p))
      {
        // This is the metadata the user clicked on.
        // Now check if it's semantic data. In all cases, this should be true, right?
        if (!String.IsNullOrEmpty(meta.ProtocolName))
        {
          // The implementing type is a semantic type requiring a drill into?
          if (sts.GetSemanticTypeStruct(meta.Name).SemanticElements.Exists(st => st.Name == meta.PropertyName))
          {
            // Yes it is. Emit a carrier with with protocol and signal.
            string implementingPropertyName = sts.GetSemanticTypeStruct(meta.ProtocolName).SemanticElements.Single(e => e.Name == meta.PropertyName).GetImplementingName(sts);
            ISemanticTypeStruct protocol = Program.SemanticTypeSystem.GetSemanticTypeStruct(meta.PropertyName);
            dynamic signal = Program.SemanticTypeSystem.Create(meta.PropertyName);
            protocol.AllTypes.Single(e => e.Name == implementingPropertyName).SetValue(Program.SemanticTypeSystem, signal, meta.Value);
            Program.Receptors.CreateCarrier(null, protocol, signal);

            // Ugh, I hate doing this, but it's a lot easier to just exit all these nests.
            return true;
          }
          else if (sts.GetSemanticTypeStruct(meta.Name).NativeTypes.Exists(st => st.Name == meta.PropertyName))
          {
            // No, it's just a native type.
            ISemanticTypeStruct protocol = Program.SemanticTypeSystem.GetSemanticTypeStruct(meta.ProtocolName);
            dynamic signal = Program.SemanticTypeSystem.Create(meta.ProtocolName);
            sts.GetSemanticTypeStruct(meta.ProtocolName).NativeTypes.Single(st => st.Name == meta.PropertyName).SetValue(Program.SemanticTypeSystem, signal, meta.Value);
            Program.Receptors.CreateCarrier(null, protocol, signal);

            // Ugh, I hate doing this, but it's a lot easier to just exit all these nests.
            return true;
          }
        // else: we don't have anythin we can do with this.
        }
      }
    }
  }

  return false;
}

Url Metadata Opens Page in Web Browser

For example, we can open up a web browser when we click on the URL.  Why does this work?  Because there is a semantic type "URL":

<SemanticTypeStruct DeclType="URL">
  <Attributes>
    <NativeType Name="Value" ImplementingType="string"/>
  </Attributes>
</SemanticTypeStruct>

This defines the protocol, which is something the URL receptor pays attention to:

public string[] GetReceiveProtocols()
{
  return new string[] { "URL" };
}

public void ProcessCarrier(ICarrier carrier)
{
  string url = carrier.Signal.Value;
  try
  {
    Process.Start(url);
  }
  catch
  {
    // Eat exceptions.
  }
}

Similarly, if the title, explanation, and keywords are semantic types implementing the "Text" semantic type, which we've already established a protocol for and can be received by the appropriate receptor:

Image 16

 

If you've slogged through the article to this point, it is perhaps becoming blindingly obvious how flexible this system is!  Lightweight receptors implementing responses to protocols that can be added and removed as desired to configure the system for one's specific purposes.

Searching the Database

This is an important piece, implemented with a receptor that creates a small window for entering search strings and emitting a carrier with the "SearchFor" protocol, which the APOD receptor understands.  If other receptors associated with data stores receive this, they'll perform searches as well.

Imagine entering a single search string and being able to query a variety of data sources (especially ones not available for Google to index) rather than having to launch each application that interfaces to a database and using it to query the results.

A simple SearchFor receptor provides the ability to enter a search string, and when user presses the Enter key, a carrier is created with the SearchFor protocol and a signal with the TextBox text:

protected void CreateForm()
{
  Form form = new Form();
  form.Text = "Search For:";
  form.Location = new Point(100, 100);
  form.Size = new Size(500, 60);
  form.TopMost = true;
  tb = new TextBox();
  tb.KeyPress += OnKeyPress;
  form.Controls.Add(tb);
  tb.Dock = DockStyle.Fill;
  form.Show();
  form.FormClosed += WhenFormClosed;
}

protected void OnKeyPress(object sender, KeyPressEventArgs e)
{
  if (e.KeyChar == '\r')
  {
    ISemanticTypeStruct protocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("SearchFor");
    dynamic signal = rsys.SemanticTypeSystem.Create("SearchFor");
    signal.SearchString = tb.Text;
    rsys.CreateCarrier(this, protocol, signal);
  }
}

The APOD receptor is interested in this protocol and when a carrier is received, it queries the database.  Any return records are converted into "ImageFilename" carrier-protocol-signals:

/// <summary>
/// Search the APOD database for matches.
/// </summary>
protected void SearchFor(dynamic signal)
{
  string searchFor = signal.SearchString;

  ISemanticTypeStruct dbprotocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("DatabaseRecord");
  dynamic dbsignal = rsys.SemanticTypeSystem.Create("DatabaseRecord");
  dbsignal.TableName = "APOD";
  dbsignal.Action = "select";
  dbsignal.ResponseProtocol = "APODSearchResults"; // will respond actuall with "APODRecordset"
  // TODO: Use parameters
  dbsignal.Where = "Keywords LIKE '%" + searchFor + "' or Title LIKE '%" + searchFor + "' or Explanation LIKE '%" + searchFor + "'";
  rsys.CreateCarrier(this, dbprotocol, dbsignal);
}

/// <summary>
/// Create carriers for the images that meet the returned search criteria.
/// </summary>
protected void ProcessSearchResults(dynamic signal)
{
  List<dynamic> records = signal.Recordset;

  foreach (dynamic d in records)
  {
    ISemanticTypeStruct outprotocol = rsys.SemanticTypeSystem.GetSemanticTypeStruct("ImageFilename");
    dynamic outsignal = rsys.SemanticTypeSystem.Create("ImageFilename");
    outsignal.Filename = d.ImageFilename.Filename;
    rsys.CreateCarrier(this, outprotocol, outsignal);
  }
}

Nothing else needs to be done, as we can take advantage of the thumbnail converter and thumbnail viewer receptors to display the matching images, get the metadata, etc.  Incidentally, we could also simply drop a whole bunch of images onto the surface and the system will try to find the metadata for them as well.  You could obviously have multiple sources for the metadata.

Now, if we do some fancy stuff, we can, for example, compare two galaxies, M31 and M33.  This is accomplished by using two different thumbnail viewers and disabling the first one (double-click on it to toggle enable/disable) to vector the second search to the second thumbnail.  Here's the result of searching for M31 (notice the disabled viewer on the left):

Image 17

Now, we disable the thumbnail viewer on the right, enable the one on the left, and search for M33:

Image 18

Conclusion

The receptors that I've created so far are:

APODEventGeneratorReceptor This will generate 20 years worth of "scrape this page" events.  Drop this receptor onto the surface to start the scraping process and build the database and image library.
APODScraperReceptor Responsible for actually scraping the HTML from the APOD website
CarrierExporterReceptor Creates a log file of all carriers received by receptors (only those received, if there's no receiver, you won't see a log entry.)
HelloWorldReceptor My first test receptor.
ImageViewerReceptor Displays an image as in separate window, proportionally scaled to the window dimensions.
ImageWriterReceptor Writes thumbnail images, appending "-thumbnail" to the filename
LoggingReceptor The second test receptor I wrote, generates flyouts for debug message signals.
PersistenceReceptor Implements SQLite database persistence
SearchForReceptor Displays a small window to enter search text and issues a carrier with that text when the user presses the Enter key.
TextDisplayReceptor Displays text for text protocols.
TextToSpeechReceptor Speaks both text-to-speech protocols and Text protocols.
ThumbnailCreatorReceptor Converts an image to a thumbnail and emits the image on a carrier.
ThumbnailViewerReceptor Responds to thumbnail image protocols and displays the images in a carousel.
TimerReceptor Fires events (declared in XML) when the designated time interval occurs.
UrlReceptor Launches the default browser with the specified URL.
WeatherInfoReceptor Collates weather and zipcode information and issues text-to-speech carriers.
WeatherServiceReceptor Receives a zipcode protocol and queries NOAA for today's weather at that location.
ZipCodeReceptor Receives a zipcode protocol and queries a web service for the town/city and state at that location.

We've implemented two receptors that are specific to generating the APOD scrape events and the actual parsing of the APOD web page.  We've leveraged existing receptors for creating and displaying thumbnails.  We've added several general purpose receptors.  In the process, we've also added a lot of underlying behaviors, especially in the visualizer, such as the carousel (which could use some polishing!)

Here's a receptor that I didn't implement -- translate text to a different language.  Here's a Code Project article by Ravi Bhavnani -- put this into a receptor that receives Text protocols and you have instant translation of any Text signal.

Overall, we've also demonstrated that HOPE can be used as an environment to do something actually useful.  The webpage scraper demonstrates the robustness of the system, processing thousands of pages, leveraging the asynchronous Task library in .NET.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect Interacx
United States United States
Blog: https://marcclifton.wordpress.com/
Home Page: http://www.marcclifton.com
Research: http://www.higherorderprogramming.com/
GitHub: https://github.com/cliftonm

All my life I have been passionate about architecture / software design, as this is the cornerstone to a maintainable and extensible application. As such, I have enjoyed exploring some crazy ideas and discovering that they are not so crazy after all. I also love writing about my ideas and seeing the community response. As a consultant, I've enjoyed working in a wide range of industries such as aerospace, boatyard management, remote sensing, emergency services / data management, and casino operations. I've done a variety of pro-bono work non-profit organizations related to nature conservancy, drug recovery and women's health.

Comments and Discussions

 
QuestionWell done Pin
Mike Hankey2-Jun-14 4:40
mveMike Hankey2-Jun-14 4:40 
AnswerRe: Well done Pin
Marc Clifton3-Jun-14 4:16
mvaMarc Clifton3-Jun-14 4:16 
Mike Hankey wrote:
Interesting subject and looks great.


Thanks Mike!

Marc

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.