Quick Start Guide - MapReduce - Running WordCount Sample

This guide will take you through running the WordCount Twister4Azure sample (latest version of source code) using the Azure local development fabric. In the next step, we’ll explore how to run them in Azure Cloud.

Pre-requisites

  1. Visual Studio 2010 ( or 2012). 
  2. Azure SDK. 0.9 release of Twister4Azure source code supports version 1.7 of the Azure SDK. The development code supports SDK version 1.8.
  3. Download the latest version of Twister4Azure source code.

Running WordCount Sample Locally

  • Start Visual Studio 2010 as administrator and open the “Twister4Azure” solution. Run it (F5).
  • Create a containers named “wcinput” in the blob store of your local development storage. Upload a set of text files to the “wcinput” container. You can use a freely available third party Azure storage client (eg: CloudBerry explorer for Azure Blob Storage) to perform these operations.
  • Open “SampleClients” solution in a different Visual studio instance. Make sure “WordCountMRClient” project is selected as the startup project. Go to the project properties of “WordCountMRClient” project and open the Debug tab. Specify the following in the “Command line arguments” text area. The arguments for WordCount client are,
    <Job ID(needs to be unique across different runs)> <Input Data Container> <Output Container> <Number Of Reduce Tasks> 
    Eg: test1 wcinput wcoutput 2
  • Run the client (F5). You’ll be able to monitor the computation using the Twister4Azure monitoring console, which will open in a new browser window.

Running the WordCount Sample in Azure Cloud

  • Open "Twister4Azure" solution.
  • Double click on the “Twister4AzureWorker” Role under “Roles” in the "Twister4AzureCloud" project. Go to “Settings” tab. Click on “DataConnectionString” setting and click on “...” in the value and select "Enter storage account credentials". Enter your azure storage account credentials
  • Do the same for "DiagnosticsConnectionString" and "Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString". Make sure you perform this configuration for both the Worker Role (Twister4AzureWorker) and the Web Role(Twister4AzureUI).
  • Follow these tutorials from MSDN to deploy Azure Applications directly from Visual Studio.
  • Upload the sample data to your Azure storage account similarly to the way we upload them to local storage..
  • Open “SampleClients” solution in a different Visual studio instance. Make sure “WordCountMRClient” project is selected as the startup project. Go to the project properties of “WordCountMRClient” project and open the Debug tab. Specify the following in the “Command line arguments” text area. The arguments for WordCount Client are,
    <Job ID(needs to be unique across different runs)> <Input Data Container> <Output Container> <Number Of Reduce Tasks> 
    Eg: test1 wcinput wcoutput 2
    
  • Open the "ClientCredentials.cs" in the "Credentials" project of the “SampleClients” solution. Comment the return statement( return Cloud....Develo..;). Uncomment block comment above the return statement and specify the details of your Azure Storage Account in the GetCloudStorageAccount() method.
    public static CloudStorageAccount GetClientStorageAccount()
    {
        /* Uncomment to use real Azure Storage accounts */
        return GetCloudStorageAccount();
        
        /* comment when running using Real Azure storage account */
        //return CloudStorageAccount.DevelopmentStorageAccount;
    }
  • Run the client (F5).

Last edited Apr 12, 2013 at 8:20 PM by thilina, version 14

Comments

No comments yet.