Parsing HTML in Windows Store Apps – Boredom Challenge Day 1

Standard

By reading and parsing the HTML code of a given website, we can effectively display any information on that website in our application in any way we desire. This is usually needed when our source does not provide a web service or API that we can use, but show the information we need on its own webpage. For example, if we want to make an app that shows information about the latest movies, we can read and parse the HTML of a movie site and then present it in our application.

Part of the HTML code of my blog's main page.

Part of the HTML code of my blog’s main page.

For this article, we’ll make a simple Windows Store app which will get the link of a Windows Phone app on the Windows Phone Marketplace and then parse that page’s HTML to show its information, by using HTML Agility Pack.

We’ll start by creating a blank Windows 8 app.

1

First, we’ll arrange our interface in MainPage.xaml:

    <Grid Background="{StaticResource ApplicationPageBackgroundThemeBrush}">
        <TextBlock HorizontalAlignment="Left" Margin="50,100,0,0" TextWrapping="Wrap" Text="Windows Phone Marketplace Link:" VerticalAlignment="Top" FontSize="16"/>
        <TextBox x:Name="TextBoxLink" HorizontalAlignment="Left" Margin="50,124,0,0" TextWrapping="Wrap" VerticalAlignment="Top" Width="500" Height="32"/>
        <StackPanel Margin="596,124,50,100">
            <Image x:Name="ImageAppLogo" Height="150" HorizontalAlignment="Left" Width="150"/>
            <TextBlock x:Name="TextBlockName" TextWrapping="Wrap" FontSize="16" Margin="0,20,0,0"/>
            <TextBlock x:Name="TextBlockPublisher" TextWrapping="Wrap" FontSize="16" Margin="0,5,0,0"/>
            <TextBlock x:Name="TextBlockCategory" TextWrapping="Wrap" FontSize="16" Margin="0,5,0,0"/>
            <TextBlock x:Name="TextBlockPrice" TextWrapping="Wrap" FontSize="16" Margin="0,5,0,0"/>
            <TextBlock x:Name="TextBlockDescription" TextWrapping="Wrap" FontSize="16" Margin="0,5,0,0"/>
        </StackPanel>
        <Button x:Name="ButtonParse" Content="Parse HTML" HorizontalAlignment="Left" Margin="50,161,0,0" VerticalAlignment="Top"/>
    </Grid>

As you can see, we’ve added a textbox to get the link, a button to start the action and several textblocks and an image to show the information and the logo of the app. Next, we’ll add HTML Agility Pack to our project. To do this, right click “References” in the Solution Explorer, select Manage NuGet References, search for “html agility pack”, and click install when you find it.

2

Now we will fill the Click event our button:

using System.Net;
using HtmlAgilityPack;
using Windows.UI.Xaml.Media.Imaging;
using Windows.UI.Popups;
async private void ButtonParse_Click(object sender, RoutedEventArgs e)
        {
            if (TextBoxLink.Text.StartsWith("http://www.windowsphone.com/"))
            {
                ButtonParse.IsEnabled = false;
                string html;
                HttpWebRequest req = (HttpWebRequest)WebRequest.Create(TextBoxLink.Text);
                try
                {
                    WebResponse x = await req.GetResponseAsync();
                    HttpWebResponse res = (HttpWebResponse)x;
                    if (res != null)
                    {
                        if (res.StatusCode == HttpStatusCode.OK)
                        {
                            Stream stream = res.GetResponseStream();
                            using (StreamReader reader = new StreamReader(stream))
                            {
                                html = reader.ReadToEnd();
                            }
                            HtmlDocument htmlDocument = new HtmlDocument();
                            htmlDocument.LoadHtml(html);

                            string appName = htmlDocument.GetElementbyId("application").ChildNodes[3].InnerText;
                            TextBlockName.Text = "Name: " + WebUtility.HtmlDecode(appName);

                            string appPublisher = htmlDocument.GetElementbyId("publisher").ChildNodes[3].InnerText;
                            TextBlockPublisher.Text = "Publisher: " + WebUtility.HtmlDecode(appPublisher);

                            string appImage = htmlDocument.GetElementbyId("appSummary").ChildNodes[1].ChildNodes[1].GetAttributeValue("src", "");
                            ImageAppLogo.Source = new BitmapImage(new Uri(appImage, UriKind.Absolute));

                            string appDescription = htmlDocument.GetElementbyId("appDescription").ChildNodes[1].ChildNodes[1].InnerText;
                            TextBlockDescription.Text = "Description:\n" + WebUtility.HtmlDecode(appDescription);

                            string appPrice = htmlDocument.GetElementbyId("offer").ChildNodes[1].InnerText;
                            TextBlockPrice.Text = "Price: " + WebUtility.HtmlDecode(appPrice);

                            string appCategory = htmlDocument.GetElementbyId("crumb").ChildNodes[1].ChildNodes[1].InnerText;
                            //appCategory = char.ToUpper(appCategory[0]) + appCategory.Substring(1);
                            TextBlockCategory.Text = "Category: " + WebUtility.HtmlDecode(appCategory);
                        }
                        res.Dispose();
                        ButtonParse.IsEnabled = true;
                    }
                }
                catch
                {
                    ButtonParse.IsEnabled = true;
                    MessageDialog messageDialog =
                        new MessageDialog("A tear occured in the space-time continuum. Please try again when all planets in the solar system are aligned (excluding Pluto, which isn't a planet anymore).");
                    //Meaningful error messages are important.
                    messageDialog.ShowAsync();
                }
            }
            else
            {
                ButtonParse.IsEnabled = true;
                MessageDialog messageDialog = new MessageDialog("Please enter a valid Windows Phone Marketplace URL.");
                await messageDialog.ShowAsync();
            }
        }

With this addition, our application is complete. However, I’d like to shed some light on the code above.

As you can see, I’ve reached the relevant information by selecting an HTML node by its id, and then selecting a specific child node of it. To do this, you have to take the whole HTML as a string (as we do at first) and inspect it manually. Setting breakpoints and looking at the contents of nodes in debug mode is quite useful as it allows you to quickly see where what you need is.

Also, we need to use WebUtiliy.HTMLDecode in order to prevent the special characters (‘, &, ” etc.) from showing up with their HTML encoded forms.

Alright then, let’s build and run our application, paste the link of an app, and click Parse. 🙂

3

In the screenshot above I’ve used the link of my own app (from Turkey store); therefore the price, category etc. are in Turkish. They will be in the same language as the link you’ve provided, since we are not checking and directly showing it.

Before finishing, I’d like to give one final warning. This method solely depends on the fact that the HTML code of the page won’t change (at least, the ids and the order of the nodes in the ids we’ve selected). If you use it in your application, you need to either
a) Be extremely vigilant and if the HTML structure of the page changes, fix it very quickly,
b) Have some kind of backup plan in the app, such as other sources or some other functionality,
c) Be prepared to get loads of “a tear in space-time continuum occured” errors, along with a deep dive in your ratings.

Thank you very much for reading. You can get the full source code of our example here.

See you in tomorrow’s article. 🙂

Advertisements

One thought on “Parsing HTML in Windows Store Apps – Boredom Challenge Day 1

  1. KoopaKing

    I seem to have a problem when trying to run this application.

    I get this error:

    Unable to activate Windows Store app ‘9eb1e833-c542-461b-d701-da43679084a3_0s13d9ped16de!App’. The activation request failed with error ‘This app does not support the contract specified or is not installed’

    Any idea what this might be?

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s