Skip to content

Latest commit

 

History

History
142 lines (123 loc) · 6.3 KB

data-lake-analytics-u-sql-develop-user-defined-operators.md

File metadata and controls

142 lines (123 loc) · 6.3 KB
title description services documentationcenter author manager editor ms.assetid ms.service ms.devlang ms.topic ms.tgt_pltfrm ms.workload ms.date ms.author
Develop U-SQL user-defined operators for Azure Data Lake Analytics jobs | Microsoft Docs
Learn how to develop user-defined operators to be used and reused in Data Lake Analytics jobs.
data-lake-analytics
edmacauley
jhubbard
cgronlun
e5189e4e-9438-46d1-8686-ed4836bf3356
data-lake-analytics
na
article
na
big-data
12/05/2016
edmaca

Develop U-SQL user-defined operators for Azure Data Lake Analytics jobs

Learn how to develop user-defined operators to be used and reused in Data Lake Analytics jobs. You will develop a custom operator to convert country names.

For the instructions of developing general-purpose assemblies for U-SQL, see Develop U-SQL assemblies for Azure Data Lake Analytics jobs

Prerequisites

Define and use user-defined operator in U-SQL

To create and submit a U-SQL job

  1. From the File menu, click New, and then click Project.

  2. Select the U-SQL Project type.

    new U-SQL Visual Studio project

  3. Click OK. Visual studio creates a solution with a Script.usql file.

  4. From Solution Explorer, expand Script.usql, and then double-click Script.usql.cs.

  5. Paste the following code into the file:

     using Microsoft.Analytics.Interfaces;
     using System.Collections.Generic;
    
     namespace USQL_UDO
     {
         public class CountryName : IProcessor
         {
             private static IDictionary<string, string> CountryTranslation = new Dictionary<string, string>
             {
                 {
                     "Deutschland", "Germany"
                 },
                 {
                     "Schwiiz", "Switzerland"
                 },
                 {
                     "UK", "United Kingdom"
                 },
                 {
                     "USA", "United States of America"
                 },
                 {
                     "中国", "PR China"
                 }
             };
    
             public override IRow Process(IRow input, IUpdatableRow output)
             {
    
                 string UserID = input.Get<string>("UserID");
                 string Name = input.Get<string>("Name");
                 string Address = input.Get<string>("Address");
                 string City = input.Get<string>("City");
                 string State = input.Get<string>("State");
                 string PostalCode = input.Get<string>("PostalCode");
                 string Country = input.Get<string>("Country");
                 string Phone = input.Get<string>("Phone");
    
                 if (CountryTranslation.Keys.Contains(Country))
                 {
                     Country = CountryTranslation[Country];
                 }
                 output.Set<string>(0, UserID);
                 output.Set<string>(1, Name);
                 output.Set<string>(2, Address);
                 output.Set<string>(3, City);
                 output.Set<string>(4, State);
                 output.Set<string>(5, PostalCode);
                 output.Set<string>(6, Country);
                 output.Set<string>(7, Phone);
    
                 return output.AsReadOnly();
             }
         }
     }
    
  6. Open Script.usql, and paste the following U-SQL script:

     @drivers =
         EXTRACT UserID      string,
                 Name        string,
                 Address     string,
                 City        string,
                 State       string,
                 PostalCode  string,
                 Country     string,
                 Phone       string
         FROM "/Samples/Data/AmbulanceData/Drivers.txt"
         USING Extractors.Tsv(Encoding.Unicode);
    
     @drivers_CountryName =
         PROCESS @drivers
         PRODUCE UserID string,
                 Name string,
                 Address string,
                 City string,
                 State string,
                 PostalCode string,
                 Country string,
                 Phone string
         USING new USQL_UDO.CountryName();    
    
     OUTPUT @drivers_CountryName
         TO "/Samples/Outputs/Drivers.csv"
         USING Outputters.Csv(Encoding.Unicode);
    
  7. From Solution Explorer, right-click Script.usql, and then click Build Script.

  8. From Solution Explorer, right-click Script.usql, and then click Submit Script.

  9. If you haven't connect to your Azure subscription, you will be prompt to enter your Azure account credentials.

  10. Click Submit. Submission results and job link are available in the Results window when the submission is completed.

  11. You must click the Refresh button to see the latest job status and refresh the screen.

To see the job output

  1. From Server Explorer, expand Azure, expand Data Lake Analytics, expand your Data Lake Analytics account, expand Storage Accounts, right-click the Default Storage, and then click Explorer.
  2. Expand Samples, expand Outputs, and then double-click Drivers.csv.

See also