Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / Javascript
Technical Blog

Querying Wikipedia in ASP.NET using LINQ-to-Wiki

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
23 Apr 2013CPOL5 min read 19.5K   6   1
Querying Wikipedia in ASP.NET using LINQ-to-Wiki.

Have you ever visited Wikipedia and simply just gotten lost in the sheer vastness of knowledge that is available there? If only something existed to allow you to easily create complex queries that would provide you with exactly what you needed using syntax that were familiar with (such as LINQ)? Well then this may be just the post for you!

Introducing LINQ-to-Wiki

LINQ-to-Wiki is a library designed by Petr Onderka to query any sites running MediaWiki (which includes Wikipedia) through any available .NET language. It provides extensive functionality to allow complex queries to be performed and is not limited to just reading wiki pages, but it can also perform edits, content additions and more. You can request a variety of different items that would otherwise normally require a significant amount of scrolling, clicking and result in the eventual “how did I get here” several hours later. All of this after losing focus on your original goal because of sheer magnitude and borderline addiction to knowledge the site can evoke.

A few of the many things related to Wikipedia content that can be accessed through queries in LINQ-to-Wiki are  :

  • Listing all of the articles within a category
  • Listing all of the links contained within a page
  • Grabbing images and related articles
  • Full query and search support

LINQ-to-Wiki uses traditional LINQ queries that any .NET developer would be accustomed to and then the library translates these into API Requests through MediaWiki for whatever big plans that you are trying to conquer the world with.

Getting Started

LINQ-to-Wiki can be accessed in the following two methods :

Once you have added the appropriate references to the LINQ-to-Wiki files to your project, then you are ready to get started!

Your First Query

Querying is really where LINQ-to-Wiki shines (as you could imagine with the cosmos of data within Wikipedia)! The actual querying process is very straight-forward and really doesn’t differ much from using a traditional DataContext that you would be accustomed to working with in any other flavor of LINQ-to-X (SQL, Entities etc.).

You’ll first need to initialize a Wiki class that will act as your DataContext and the source of all of your queries. You can initialize it using actual Login information (if you plan on editing and performing more advanced actions) but in this demonstration we will just be focusing on querying, so feel free to make up your own credentials :

var wikipedia = new Wiki("Example");

Once you have created your necessary Wiki object, then you will basically be ready to start querying. However, Wikipedia is a huge, complex data-filled cosmos and before we start adventuring around in our LINQ-powered spaceship, let’s take a look at a map to see where we can go.

Exploring the Cosmos of Wikipedia

Before we delve to deep into some serious querying, let’s review over some of the properties and collections that we can use from our Wiki object. Since this post is primarily concerned with querying, we will be looking at the Query property of our Wiki object.

var query = wikipedia.Query.AdventurePlaceholder;

Some of the major properties that we will be concerned with regarding querying of our Query object are :

  • allcategoriesThis is an enumeration of all of the available Categories
  • allimagesThis is an enumeration of all of the available Images
  • alllinksThis is an enumeration of all of the available Links
  • categorymembersThis lists all of the pages in a given category
  • backlinksThis finds all pages that link back to a specific page.
  • searchThis allows a full-text search to be performed

From each of these we can use the LINQ methods that we all know and love such as .Where() and .Select() and then we wrap everything up to execute our query using the .AsEnumerable() method. Each of these items will also have specific properties that can be accessed within your inner clauses to further narrow your search, so don’t neglect how wonderful Intellisense can be.

Blasting off into the Cosmos (Finally!)

So let’s start out with a simple query to get ourselves off the launch pad. We will query Wikipedia for all of the images that start with “Microsoft” and return the title of each :

//This will retrieve all of the images that begin with "Microsoft" (using the built-in prefix property) and select the title of each.
var query = wikipedia.Query.allimages().Where(i => i.prefix == "Microsoft").Select(s => s.title).ToEnumerable();

That’s it! Using a simple Controller Action within MVC (for this example) we can output each of our results to a basic list within our View :

public ActionResult QueryWiki()
{
     var wikipedia = new Wiki("Example")
     var query = wikipedia.Query.allimages().Where(i => i.prefix == "Microsoft").Select(s => s.title).ToEnumerable();
     return View(query);
}

along with this simple View :

<ul>
     @foreach (var image in Model){
         <li>@image</li> 
     }
</ul>

will result in a huge (and very ugly) list of all of the images within Wikipedia that begin with “Microsoft”.

"Microsoft" Wikipedia Image Results

Query results containing all Wikipedia Images that begin with “Microsoft”

Let’s spice it up a bit (because just text is boring)

Let’s make things a little more appealing to the eyes by pulling some additional properties besides the title of the images. We can use the url, height and width properties available from our images to create a similar list that will feature images of each of these items instead of just a plain-jane unordered list.

First, we will create a very simple class that will store the properties that we are concerned about that we can pass across to the View for display :

public class WikiImage
{
     public string Url { get; set; }
     public int Height { get; set; }
     public int Width { get; set; }
     //Simple Constructor
     public WikiImage(string url, int height, int width)
     {
          Url = url;
          Height = height;
          Width = width;
     }
}

Using our new and improved query (which will select the url, height and width properties from our image)

var query = wikipedia.Query.allimages()
            .Where(i => i.prefix == "Microsoft")
            .Select(s => new WikiImage(s.url,s.height,s.width)).ToList();

along with a few minor adjustments to the View (the controller action remains basically the same),

@foreach (var image in Model){
     <img src='@image.Url' height='@image.Height' width='@image.Width' /><br />
}

gives us our result…

(err the result is too big to easily display full-size. I’ll adjust the height and width in the view to provide a better example)

*ahem* And gives us our result!

A Ton of Microsoft Square Images

Results from our new query to grab all of the images that start with “Microsoft” on Wikipedia

Additional Complexity Coming Soon!

This post is a just a simple example of some of the things that you can do using LINQ-to-Wiki. Next time, we will be covering using some of the more advanced features such as using PageResults to create even more complex queries and pulling some additional data and who knows what else!

For More Information (if you just can’t wait to dig in)

If you are interested in learning a bit more about LINQ-to-Wiki, visit the github page where you can find a plethora of documentation detailing each of the individual methods and properties that you can query against. I would also highly recommend downloading the LINQ-to-Wiki Samples project, which contains all kinds of samples to get you started.

You can also download this example from github from the link below :

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United States United States
An experienced Software Developer and Graphic Designer with an extensive knowledge of object-oriented programming, software architecture, design methodologies and database design principles. Specializing in Microsoft Technologies and focused on leveraging a strong technical background and a creative skill-set to create meaningful and successful applications.

Well versed in all aspects of the software development life-cycle and passionate about embracing emerging development technologies and standards, building intuitive interfaces and providing clean, maintainable solutions for even the most complex of problems.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Jim Meadors23-Apr-13 19:23
Jim Meadors23-Apr-13 19:23 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.