Quick Start Guide - Iterative MapReduce - KMeans Clustering

This guide will take you through running couple of Twister4Azure samples using the Azure local development fabric. In the next step, we’ll explore how to run them in Azure Cloud.

Pre-requisites

  1. Visual Studio 2010
  2. Azure SDK latest version of Twister4Azure source code supports version 1.7 of the Azure SDK.
  3. Download the latest release of Twister4Azure and unzip it.

Running KMeans Clustering Sample Locally

  • Start Visual Studio 2010 as administrator and open the “Twister4Azure” solution. Run it (F5).
  • Create a blob containers named “kminput”.  Download and unzip kminput.zip. Upload the contents of the unzipped “kminput” folder to the “kminput” container. You can use a freely available third party Azure storage client (eg: CloudBerry explorer for Azure Blob Storage) to perform the above.
  • Create a blob containers named “kmcenters”.  Download and unzip kmcenters.zip. Upload the contents of the unzipped “kmcenters” folder to the “kmcenters” container.
  • Open “SampleClients” solution in a different Visual studio instance. Make sure “KMeansMRClient” project is selected as the startup project. Go to the project properties of “KMeansMRClient” project and open the Debug tab. Specify the following in the “Command line arguments” text area. The arguments for KMeans Client are,
    <jobID(needs to be unique across different runs)> <inputContainerName>
           <outContainerName> <NumberOfReduceTasks> <cluster centers> 
           <vector length>
    Eg: test1  kminput kmoutput 1 kmcenters/centers_400_20 20
  • Run the client (F5). You’ll be able to monitor the computation using the Twister4Azure monitoring console, which will open in a new browser window.

Running KMeans Clustering Sample in Azure Cloud

  • Open "TwisterAzure" solution.
  • Double click on the “Twister4AzureWorker” Role under “Roles” in the "Twister4AzureCloud" project. Go to “Settings” tab. Click on “DataConnectionString” setting and click on “...” in the value and select "Enter storage account credentials". Enter your azure storage account credentials
  • Do the same for "DiagnosticsConnectionString" and "Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString". Make sure you perform this configuration for both the Worker Role (Twister4AzureWorker) and the Web Role(Twister4AzureUI).
  • Follow these tutorials from MSDN to deploy Azure Applications directly from Visual Studio.
  • Upload the sample data to your Azure storage account similarly to the way we upload them to local storage.
  • Open “SampleClients” solution in a different Visual studio instance. Make sure “KMeansMRClient” project is selected as the startup project. Go to the project properties of “KMeansMRClient” project and open the Debug tab. Specify the following in the “Command line arguments” text area. The arguments for KMeans Client are,
    <jobID(needs to be unique across different runs)> <inputContainerName>
           <outContainerName> <NumberOfReduceTasks> <cluster centers> 
           <vector length>
    Eg: test1  kminput kmoutput 1 kmcenters/centers_400_20 20
  • Open the "ClientCredentials.cs" in the "Credentials" project. Comment the return statement( return Cloud....Develo..;). Uncomment block comment above the return statement and specify the details of your Azure Storage Account.
  • Run the client (F5).

Last edited Sep 1, 2012 at 9:52 PM by thilina, version 7

Comments

No comments yet.