title | description | services | documentationcenter | author | manager | editor | ms.assetid | ms.service | ms.devlang | ms.topic | ms.tgt_pltfrm | ms.workload | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Manage Azure Data Lake Analytics using Azure .NET SDK | Microsoft Docs |
Learn how to manage Data Lake Analytics jobs, data sources, users. |
data-lake-analytics |
mumian |
jhubbard |
cgronlun |
811d172d-9873-4ce9-a6d5-c1a26b374c79 |
data-lake-analytics |
na |
article |
na |
big-data |
03/3/2017 |
jgao |
[!INCLUDE manage-selector]
Learn how to manage Azure Data Lake Analytics accounts, data sources, users, and jobs using the Azure .NET SDK. To see management topics using other tools, click the tab select above.
- Visual Studio 2015, Visual Studio 2013 update 4, or Visual Studio 2012 with Visual C++ Installed.
- Microsoft Azure SDK for .NET version 2.5 or above. Install it using the Web platform installer.
- Required Nuget Packages
-
In Visual Studio, right-click the project name in the Solution Explorer and click Manage NuGet Packages.
-
In the Nuget Package Manager tab, make sure that Package source is set to nuget.org and that Include prerelease check box is selected.
-
Search for and install the following NuGet packages:
- Microsoft.Rest.ClientRuntime.Azure.Authentication - This tutorial uses V2.2.12
- Microsoft.Azure.Management.DataLake.Analytics - This tutorial uses V2.1.0 preview
- Microsoft.Azure.Management.DataLake.Store - This tutorial uses V2.1.0 preview
- Close the Nuget Package Manager.
The Azure Data Lake Analytics and Azure Data Lake Store APIs include sets of client management objects from which you do most of your programming. These objects are in these two namespaces:
- Microsoft.Azure.Management.DataLake.Analytics
- Microsoft.Azure.Management.DataLake.Store
The following table shows the client management objects, with variables that used for their code examples throughout this article.
Client Management Object | Code Variable |
---|---|
DataLakeStoreAccountManagementClient | adlsClient |
DataLakeAnalyticsAccountManagementClient | adlaClient |
DataLakeStoreFileSystemManagementClient | adlsFileSystemClient |
DataLakeAnalyticsCatalogManagementClient | adlaCatalogClient |
DataLakeAnalyticsJobManagementClient | adlaJobClient |
- DataLakeStoreAccountManagementClient - Use to create and manage Data Lake Store accounts.
- DataLakeFileSystemAccountManagementClient - Use for file system tasks such as to create folders and files, upload files, list files, access ACL's and credentials, and add links to Azure Storage blobs.
Although you can create links to Azure Storage from Data Lake, you cannot access its content. To do so, you must use the Azure Storage SDK APIs. You can, however, run U-SQL scripts on Azure Storage blobs.
- DataLakeAnalyticsAccountManagementClient - Use to create and manage Data Lake Analytics accounts.
- DataLakeAnalyticsCatalogManagementClient - Use to explore the catalog items in Data Lake Analytics.
- DataLakeAnalyticsJobManagementClient - Submit and manage jobs in Data Lake Analytics.
Initialize client management objects using your credentials (creds) as obtained by your preferred authentication method, described next in this article.
// Only the Data Lake Analytics and Data Lake Store
// objects need a subscription ID.
adlsClient = new DataLakeStoreAccountManagementClient(creds);
adlsClient.SubscriptionId = <Subscription-ID>;
adlaClient = new DataLakeAnalyticsAccountManagementClient(creds);
adlaClient.SubscriptionId = <Subscription-ID>;
adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(creds);
adlaCatalogClient = new DataLakeAnalyticsCatalogManagementClient(creds);
adlaJobClient = new DataLakeAnalyticsJobManagementClient(creds);
// Methods to create and manage Data Lake Analytics
. . .
You have multiple options for logging on to Azure Data Lake Analytics.
The following snippet shows the easiest authentication by the user providing credentials, such as a username and password or a pin number.
// User login via interactive popup
// Use the client ID of an existing AAD "native nlient" application.
SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
var tenantId = "<Tenant ID>"; // Replace this string with the user's Azure Active Directory tenant ID.
var clientId = "1950a258-227b-4e31-a9cf-717495945fc2"; // Sample client ID
var activeDirectoryClientSettings = ActiveDirectoryClientSettings.UsePromptOnly(nativeClientApp_clientId, new Uri("urn:ietf:wg:oauth:2.0:oob"));
var creds = UserTokenProvider.LoginWithPromptAsync(_tenantId, activeDirectoryClientSettings).Result;
We recommend creating your own application and service principal within your Azure Active Directory tenant, then using the client ID for that application, rather than the sample ID used here.
You can use the following snippet to authenticate your application non-interactively, using the client secret / key for an application / service principal. Use this authentication option with an existing Azure AD "Web App" Application.
// Service principal / application authentication with client secret / key
// Use the client ID and certificate of an existing AAD "Web App" application.
SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
var tenantId = "<Azure tenant ID>";
var webApp_clientId = "<AAD-application-clientid>";
var clientSecret = "<AAD-application-client-secret>";
var clientCredential = new ClientCredential(webApp_clientId, clientSecret);
var creds = ApplicationTokenProvider.LoginSilentAsync(tenantId, clientCredential).Result;
As a third option, the following snippet can be used to authenticate your application non-interactively, using the certificate for an application / service principal. Use this authentication option with an existing Azure AD "Web App" Application.
// Service principal / application authentication with certificate
// Use the client ID and certificate of an existing AAD "Web App" application.
SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
var tenantId = "<Azure tenant ID>";
var webApp_clientId = "<AAD-application-clientid>";
System.Security.Cryptography.X509Certificates.X509Certificate2 clientCert = <AAD-application-client-certificate>
var clientAssertionCertificate = new ClientAssertionCertificate(webApp_clientId, clientCert);
var creds = ApplicationTokenProvider.LoginSilentWithCertificateAsync(tenantId, clientAssertionCertificate).Result;
Before running any Data Lake Analytics jobs, you must have a Data Lake Analytics account. Also, a Data Lake Analytics account requires at least one Data Lake Store account. For more information, see Azure Data Lake Analytics Overview
If you haven't already created one, you must have an Azure Resource Group to create your Data Lake Analytics components. You will need your authentication credentials, subscription ID, and a location. The following code shows how to create a resource group:
string rgName == "<value>"; // specify name for the new resrouce group
var resourceManagementClient = new ResourceManagementClient(credential) { SubscriptionId = subscriptionId };
var resourceGroup = new ResourceGroup { Location = location };
resourceManagementClient.ResourceGroups.CreateOrUpdate(groupName, rgName);
For more information, see Azure Resource Groups and Data Lake Analytics.
The following code shows how to create a Data Lake Store account. Before you use the Create method, you must define its parameters by specifying a location.
var adlsParameters = new DataLakeStoreAccount(location: _location);
adlsClient.Account.Create(_resourceGroupName, _adlsAccountName, adlsParameters);
The following code shows how to create a Data Lake Analytics account, using the asynchronous method. The CreateAsync method takes a collection of Data Lake Store accounts as one of its parameters. This collection must be populated with instances of DataLakeStoreAccountInfo objects. In this example, these DataLakeStoreAccountInfo objects are obtained from a helper method (AdlaFromAdlsStoreAccounts).
For any Data Lake Analytics account, you only need to include the Data Lake Store accounts that you need to perform the needed analytics. One of these Data Lake Store accounts, must be the default Data Lake Store account.
try
{
var adlaAccount = new DataLakeAnalyticsAccount()
{
DefaultDataLakeStoreAccount = “Accounting”,
Location = _location,
DataLakeStoreAccounts = new DataLakeStoreAccountInfo[]{
new DataLakeStoreAccountInfo(“Expenditures”),
new DataLakeStoreAccountInfo(“Accounting”)
}
};
adlaClient.Account.Create(_resourceGroupName, newAccountName, adlaAccount);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
The following code lists the Data Lake Store accounts in a subscription. List operations do not always provide all the properties of an object and that in some cases you need to do a Get operation on the object.
var adlsAccounts = adlsClient.Account.List().ToList();
foreach (var adls in adlsAccounts)
{
Console.WriteLine($"\t{adls.Name});
}
var adlaAccounts = adlaClient.Account.List().ToList();
for (var adla in AdlaAccounts)
{
Console.WriteLine($"\t{adla.Name}");
}
The following code uses a DataLakeAnalyticsAccountManagementClient to get a Data Lake Analytics account. First a check is made to see if the account exists.
DataLakeAnalyticsAccount adlaGet;
if (adlaClient.Account.Exists(_resourceGroupName, accountName))
{
adlaGet = adlaClient.Account.Get(_resourceGroupName, accountName);
Console.WriteLine($"{adlaGet.Name}\tCreated: {adlaGet.CreationTime}");
}
Similarly, you can use DataLakeStoreAccountManagementClient (adlsClient) in the same way to get a Data Lake Store account.
The following code deletes a Data Lake Analytics account if it exists.
if (adlaClient.Account.Exists(_resourceGroupName, accountName))
{
adlaClient.Account.Delete(_resourceGroupName, accountName);
Console.WriteLine($"{accountName} Deleted");
}
else
{
Console.WriteLine($"{accountName} does not exist.");
}
You can also delete a Data Lake Store account in the same way with a DataLakeStoreAccountManagementClient.
Every Data Lake Analytics account requires a default Data Lake Store account. Use this code to determine the default Store account for an Analytics account.
if (adlaClient.Account.Exists(_resourceGroupName, accountName))
{
DataLakeAnalyticsAccount adlaGet = adlaClient.Account.Get(_resourceGroupName, accountName);
Console.WriteLine($"{adlaGet.Name} default DL store account: {adlaGet.DefaultDataLakeStoreAccount}");
}
Data Lake Analytics currently supports the following data sources:
When you create an Analytics account, you must designate an Azure Data Lake Store account to be the default Data Lake Store account. The default Data Lake Store account is used to store job metadata and job audit logs. After you have created an Analytics account, you can add additional Data Lake Store and links to Azure Storage (blobs) accounts.
You can create links to Azure Storage accounts.
AddStorageAccountParameters addParams = new AddStorageAccountParameters(<storage key value>);
adlaClient.StorageAccounts.Add(_resourceGroupName, _adlaAccountName, "<Azure Storage Account Name>", addParams);
The following code lists the Data Lake Store accounts and the Azure Storage accounts used for a specified Data Lake Analytics account.
var sAccnts = adlaClient.StorageAccounts.ListByAccount(_resourceGroupName, acctName);
if (sAccnts != null)
{
Console.WriteLine("Azure Storage accounts:");
foreach (var a in sAccnts)
{
Console.WriteLine($"\t{a.Name}");
}
}
var stores = adlsClient.Account.List();
if (stores != null)
{
Console.WriteLine("\nData stores:");
foreach (var s in stores)
{
Console.WriteLine($"\t{s.Name}");
}
}
You can use the Data Lake Store file system client management object to upload and download individual files or folders from Azure to your local computer, using the following methods:
- UploadFolder
- UploadFile
- DownloadFolder
- DownloadFile
The first parameter for these methods is the name of the Data Lake Store Account, followed by parameters for the source path and the destination path.
The following example shows how to download a folder in the Data Lake Store.
try
{
if (adlsFileSystemClient.FileSystem.PathExists(account, sourcePath))
{
adlsFileSystemClient.FileSystem.DownloadFolder(account, sourcePath, destinationPath);
}
else
{
Console.WriteLine("Path does not exist");
}
}
catch (IOException ioex)
{
Console.WriteLine(ioex.Message);
}
You can use .NET Framework IO operations to create content for a file in a Data Lake Store. The following code writes the first four values of 100 random byte arrays to .csv file.
MemoryStream azMem = new MemoryStream();
StreamWriter sw = new StreamWriter(azMem, UTF8Encoding.UTF8);
for (int i = 0; i < 100; i++)
{
byte[] gA = Guid.NewGuid().ToByteArray();
string dataLine = string.Format($"{gA[0].ToString()},{gA[1].ToString()},{gA[2].ToString()},{gA[3].ToString()},{gA[4].ToString()}");
sw.WriteLine(dataLine);
}
sw.Flush();
azMem.Position = 0;
adlsFileSystemClient.FileSystem.Create(adlsAccoutName, "/Samples/Output/randombytes.csv", azMem);
sw.Dispose();
azMem.Dispose();
The following code lists the containers for a specified Azure Storage account.
string ADLAName = "<specify Data Lake Analytics account name>";
string azStorageName = "<specify Azure Storage account name>";
var containers = adlaClient.StorageAccounts.ListStorageContainers(_resourceGroupName, ADLAName, azStorageName);
foreach (var c in containers)
{
Console.WriteLine(c.Name);
}
The following code checks if an Azure Storage account (storageAccntName) exists in a Data Lake Analytics account (analyticsAccountName), and if a container (containerName) exists in the Azure Storage account.
bool accountExists = adlaClient.Account.StorageAccountExists(_resourceGroupName, analyticsAccountName, storageAccntName));
bool containerExists = adlaClient.Account.StorageContainerExists(_resourceGroupName, analyticsAccountName, storageAccntName, containerName));
The DataLakeAnalyticsCatalogManagementClient object provides methods for managing the SQL database provided for each Azure Data Lake Store. The DataLakeAnalyticsJobManagementClient provides methods to submit and manage jobs run on the database with U-SQL scripts.
Among the several things, you can list, the most common are databases and their schema. The following code obtains a collection of databases, and then enumerates the schema for each database.
var databases = adlaCatalogClient.Catalog.ListDatabases(adlaAccountName);
foreach (var db in databases)
{
Console.WriteLine($"Database: {db.Name}");
Console.WriteLine(" - Schemas:");
var schemas = adlaCatalogClient.Catalog.ListSchemas(dlaAccountName, db.Name);
foreach (var schm in schemas)
{
Console.WriteLine($"\t{schm.Name}");
}
}
Run on the default master database, the output of this example is as follows:
Database: master
- Schemas:
dbo
INFORMATION_SCHEMA
sys
usql
The following code shows how to access the database with a Data Lake Analytics Catalog management client to list the columns in a specified table.
var tbl = adlaCatalogClient.Catalog.GetTable(_adlaAnalyticsAccountTest, "master", "dbo", "MyTableName");
IEnumerable<USqlTableColumn> columns = tbl.ColumnList;
foreach (USqlTableColumn utc in columns)
{
string scriptPath = "/Samples/Scripts/SearchResults_Wikipedia_Script.txt";
Stream scriptStrm = adlsFileSystemClient.FileSystem.Open(_adlsAccountName, scriptPath);
string scriptTxt = string.Empty;
using (StreamReader sr = new StreamReader(scriptStrm))
{
scriptTxt = sr.ReadToEnd();
}
var jobName = "SR_Wikipedia";
var jobId = Guid.NewGuid();
var properties = new USqlJobProperties(scriptTxt);
var parameters = new JobInformation(jobName, JobType.USql, properties, priority: 1, degreeOfParallelism: 1, jobId: jobId);
var jobInfo = adlaJobClient.Job.Create(_adlaAnalyticsAccountTest, jobId, parameters);
Console.WriteLine($"Job {jobName} submitted.");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
The following code lists information about jobs that failed.
var jobs = adlaJobClient.Job.List(adlaClient,
new ODataQuery<JobInformation> { Filter = "result eq 'Failed'" });
foreach (var j in jobs)
{
Console.WriteLine($"{j.Name}\t{j.JobId}\t{j.Type}\t{j.StartTime}\t{j.EndTime}");
}