This project is read-only.

Twister4Azure Application Development Guide

  1. Add new MapReduce application project
  2. Implement the Mapper
  3. Implementing the Reducer
  4. Implement the Driver for your MapReduce Program
  5. Add the new MapReduce application to Twister4Azure
  6. Run/Debug/Deploy
  7. Client API

Check out the quick start guides on traditional MapReduce and Iterative MapReduce to learn how to run Twister4Azure applications.

1. Add new MapReduce application project

  • Start Visual Studio 2010 or 2012 pre-installed with Azure-SDK. (If you want to use the local fabric, run Visual Studio as the administrator). Currently Twister4Azure supports version 1.7 (June/August 2012) of the Azure SDK. We try our best to keep Twister4Azure updated with the latest version of the Azure SDK.
  • Open  the Twister4Azure.sln solution in the Visual Studio. 
  • Add a new project (eg: HelloT4ASample) to the Twister4Azure solution. File-> New –>New Project -> Visual C# –> Class Library and provide a name for your project.
  • Add AzureMRCore.dll (or the AzureMRCore project) as a reference to the newly created HelloT4ASample project. (Right click on the project –> Add Reference)

2. Implement the Mapper

  • Add a new c# class to the (HelloT4ASample) project. Extend the new class from Mapper<INKEY, INVALUE, OUTKEY, OUTVALUE, BCASTINKEY, BCASTINVALUE> class and use the appropriate types for the generic type parameters.
  • Implement the Map method. protected override int Map(INKEY key, INVALUE value, List<KeyValuePair<BCASTINKEY, BCASTINVALUE>> dynamicData, IOutputCollector<OUTKEY, OUTVALUE> outputCollector, string programArgs){....}

Eg: WordCount Mapper

using System.Collections.Generic;
using AzureMRCore;
using AzureMRCore.DataTypes;
using AzureMRCore.MapRed;

namespace HelloT4ASample
{
    internal class WordCountMapper : Mapper<IntKey, StringValue, StringKey, IntValue, NullKey, NullValue>
    {
        protected override int Map(IntKey key, StringValue value, List<KeyValuePair<NullKey, NullValue>> dynamicData, IOutputCollector<StringKey, IntValue> outputCollector, string programArgs)
        {
            string line = value.GetTextValue();
            string[] words = line.Split(' ');
            foreach (string word in words)
            {
                outputCollector.Collect(StringKey.GetInstance(word), IntValue.GetInstance(1));
            }
            return 0;
        }
    }
}

 

3. Implement the Reducer

  • Add a new c# class to the (HelloT4ASample) project.  Extend the new class from Reducer<INKEY, INVALUE, OUTKEY, OUTVALUE> and use the appropriate types for the generic parameters.
  • Make sure the INKEY and INVALUE types of the Reducer implementation are same as the OUTKEY and OUTVALUE type of the Mapper implementation.
  • Implement the Reduce method. public override int Reduce(INKEY key, List<INVALUE> values, IOutputCollector<OUTKEY, OUTVALUE> outputCollector, string programArgs){....}

Eg: WordCount Reducer

using System.Collections.Generic;
using System.Linq;
using AzureMRCore;
using AzureMRCore.DataTypes;
using AzureMRCore.MapRed;
namespace HelloT4ASample
{
    internal class WordCountReducer : Reducer<StringKey, IntValue, StringKey, IntValue>
    {
        public override int Reduce(StringKey key, List<IntValue> values, IOutputCollector<StringKey, IntValue> outputCollector, string programArgs)
        {
            int count = values.Sum(value => value.Value);
            var outValue = new IntValue {Value = count};
            outputCollector.Collect(key, outValue);
            return 0;
        }
    }
}

 

4. Implement the Driver for your MapReduce Program

The driver program configures your MapReduce application.

  • Add a new c# class to the (HelloT4ASample) project. Extend the new class from MapReduceDriver<TMapInKey, TMapInValue, TMapOutKey, TMapOutValue, TReduceOutKey,  TReduceOutValue> and use the appropriate types for the generic parameters.  Note: Iterative MapReduce application drivers should extend the IterativeMRDriver<TMapInKey, TMapInValue, TMapOutKey, TMapOutValue, TReduceOutKey, TReduceOutValue, TMergeOutKey, TMergeOutValue, TBcastInKey, TBcastInValue> and the pleasingly parallel application drivers should extend the  MapOnlyDriver<TMapInKey, TMapInValue, TMapOutKey, TMapOutValue> abstract classes, instead of the MapReduceDriver class.
  • Override the Name, MapperType, ReducerType and InputFormatType getter methods to provide a name , Mapper implementation, Reducer implementation and the InputFormat for your MapReduce application. You use the Driver to optionally specify PartitionerType, OutputFormatType and CombinerType for your MapReduce computations.

Eg: WordCount Driver

using AzureMRCore.DataTypes;
using AzureMRCore.Drivers;
using AzureMRCore.InputFormat;
using AzureMRCore.MapRed;
using AzureMRCore.OutputFormat;
using AzureMRCore.Partitioners;

namespace HelloT4ASample
{
    public class WordCountMR : MapReduceDriver<IntKey, StringValue, StringKey, IntValue, StringKey, IntValue>
    {
        public override string Name
        {
            get { return "WordCountMR"; }
        }

        public override Mapper<IntKey, StringValue, StringKey, IntValue, NullKey, NullValue> MapperType
        {
            get { return new WordCountMapper(); }
        }

        public override IInputFormat<IntKey, StringValue> InputFormatType
        {
            get { return new CachedLineInputFormat(); }
        }

        public override Reducer<StringKey, IntValue, StringKey, IntValue> ReducerType
        {
            get { return new WordCountReducer(); }
        }

        public override Reducer<StringKey, IntValue, StringKey, IntValue> CombinerType
        {
            get { return new WordCountReducer(); }
        }

        public override IPartitioner PartitionerType
        {
            get { return new HashPartitioner(); }
        }

        public override IOutputFormat<StringKey, IntValue> MapOutputFormatType
        {
            get { return new SequenceOutputFormat<StringKey, IntValue>(); }
        }
    }
}

 

5. Add the new MapReduce application to Twister4Azure

  • Add the new MapReduce application project as a reference to the Twister4AzureWorker project. Right click the Twister4AzureWorker project –> Add Reference –> Projects –> Select “HelloT4ASample” (or the name you give for the MapReduce application project.
  • Open the Twister4AzureCloud project. Double click the Projects –> Twister4AzureWoker to open the configuration view. Select the Settings tab.

image

  • Click the value of the TwisterMRDrivers setting. Add the fully qualified name of the MapReduceDriver class of your application in to this field. Fully qualified name is the full class name with the namespace followed by a comma and your MapRedue application project assembly name. Project assembly name by default is the project name, unless you specifically changed it)The entries for different MapReduce applications are separated by a semi colon (;). eg: WordCountSample.WordCountMR,WordCountSample; HelloT4ASample.WordCountMR,HelloT4ASample

image

 

6. Running/Debugging

  • Local : Assuming you ran Visual Studio 2010 as administrator (Required for deploying in the development fabric), simply Run/Debug the sample solution in Visual Studio. The web based monitoring console will open in a browser window.
  • Deploying in Azure Cloud
    Follow these tutorials from MSDN to deploy Azure Applications from Visual Studio.

 

7. Client API

NOTE: You need to make sure to provide the same queue names in your service deployment as well as in the client program.

  • Open the SampleClients.sln solution in Visual Studio.
  • Configure your storage account in the ClientCredentials utility class of the Credentials project.
  • Create a new project and add AzureMRCore, Microsoft.WindowsAzure.StorageClient libraries as references to the project.
  • Creating tasks using files in a BLOB container
AzureMRCore.Client.ClientUtils.ProcessMapRed(string mrAppName, string inputBlobContainerURI, int iteration, string programParams, int numReduceTasks, string outputContainerName, string bcastURI = null, Boolean doMerge = false)

Program params can be used to pass an optional program parameter to all the Map and Reduce tasks."inputBlobContainerURI" should contain the files that needs to be processed.The output will be stored in the "outputContainerName".

  • Waiting for completion

After submitting the job using the above API, you can optionally wait for the completion of the job using the following method.

AzureMRCore.Client.ClientUtils.waitForCompletion(string jobid, CloudStorageAccount storageAccount, int sleepTime)

"sleepTime" is the polling interval that will be used to poll for the status of the job.

Eg: WordCount Client

CloudStorageAccount storageAccount = ClientCredentials.GetClientStorageAccount();
TwisterAzureClient twisterAzureClient = new TwisterAzureClient(storageAccount, jobID, "mapschedq", "reduceschedq");
twisterAzureClient.ProcessMapRed("WordCountMR", inputContainer, 0, "", numReduceTasks, outputContainer);
twisterAzureClient.WaitForCompletion(500);

Last edited Sep 13, 2012 at 5:21 AM by thilina, version 11

Comments

No comments yet.