Use Java to manage directories and files in Azure Data Lake Storage
This article shows you how to use Java to create and manage directories and files in storage accounts that have a hierarchical namespace.
To learn about how to get, set, and update the access control lists (ACL) of directories and files, see Use .Java to manage ACLs in Azure Data Lake Storage.
Package (Maven) | Samples | API reference | Give Feedback
Prerequisites
An Azure subscription. For more information, see Get Azure trial.
A storage account that has hierarchical namespace enabled. Follow these instructions to create one.
Set up your project
To get started, open this page and find the latest version of the Java library. Then, open the pom.xml file in your text editor. Add a dependency element that references that version.
If you plan to authenticate your client application by using Microsoft Entra ID, then add a dependency to the Azure Identity library. For more information, see Azure Identity client library for Java.
Next, add these imports statements to your code file.
import com.azure.identity.*;
import com.azure.storage.common.StorageSharedKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.BinaryData;
import com.azure.storage.file.datalake.*;
import com.azure.storage.file.datalake.models.*;
import com.azure.storage.file.datalake.options.*;
Note
Multi-protocol access on Data Lake Storage enables applications to use both Blob APIs and Data Lake Storage Gen2 APIs to work with data in storage accounts with hierarchical namespace (HNS) enabled. When working with capabilities unique to Data Lake Storage Gen2, such as directory operations and ACLs, use the Data Lake Storage Gen2 APIs, as shown in this article.
When choosing which APIs to use in a given scenario, consider the workload and the needs of your application, along with the known issues and impact of HNS on workloads and applications.
Authorize access and connect to data resources
To work with the code examples in this article, you need to create an authorized DataLakeServiceClient instance that represents the storage account. You can authorize a DataLakeServiceClient
object using Microsoft Entra ID, an account access key, or a shared access signature (SAS).
You can use the Azure identity client library for Java to authenticate your application with Microsoft Entra ID.
Create a DataLakeServiceClient instance and pass in a new instance of the DefaultAzureCredential class.
static public DataLakeServiceClient GetDataLakeServiceClient(String accountName){
DefaultAzureCredential defaultCredential = new DefaultAzureCredentialBuilder().build();
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.endpoint("https://" + accountName + ".dfs.core.chinacloudapi.cn")
.credential(defaultCredential)
.buildClient();
return dataLakeServiceClient;
}
To learn more about using DefaultAzureCredential
to authorize access to data, see Azure Identity client library for Java.
Create a container
A container acts as a file system for your files. You can create a container by using the following method:
The following code example creates a container and returns a DataLakeFileSystemClient object for later use:
public DataLakeFileSystemClient CreateFileSystem(
DataLakeServiceClient serviceClient,
String fileSystemName) {
DataLakeFileSystemClient fileSystemClient = serviceClient.createFileSystem(fileSystemName);
return fileSystemClient;
}
Create a directory
You can create a directory reference in the container by using the following method:
The following code example adds a directory to a container, then adds a subdirectory and returns a DataLakeDirectoryClient object for later use:
public DataLakeDirectoryClient CreateDirectory(
DataLakeFileSystemClient fileSystemClient,
String directoryName,
String subDirectoryName) {
DataLakeDirectoryClient directoryClient = fileSystemClient.createDirectory(directoryName);
return directoryClient.createSubdirectory(subDirectoryName);
}
Rename or move a directory
You can rename or move a directory by using the following method:
Pass the path of the desired directory as a parameter. The following code example shows how to rename a subdirectory:
public DataLakeDirectoryClient RenameDirectory(
DataLakeFileSystemClient fileSystemClient,
String directoryPath,
String subdirectoryName,
String subdirectoryNameNew) {
DataLakeDirectoryClient directoryClient = fileSystemClient
.getDirectoryClient(String.join("/", directoryPath, subdirectoryName));
return directoryClient.rename(
fileSystemClient.getFileSystemName(),
String.join("/", directoryPath, subdirectoryNameNew));
}
The following code example shows how to move a subdirectory from one directory to a different directory:
public DataLakeDirectoryClient MoveDirectory(
DataLakeFileSystemClient fileSystemClient,
String directoryPathFrom,
String directoryPathTo,
String subdirectoryName) {
DataLakeDirectoryClient directoryClient = fileSystemClient
.getDirectoryClient(String.join("/", directoryPathFrom, subdirectoryName));
return directoryClient.rename(
fileSystemClient.getFileSystemName(),
String.join("/", directoryPathTo, subdirectoryName));
}
Upload a file to a directory
You can upload content to a new or existing file by using the following method:
The following code example shows how to upload a local file to a directory using the uploadFromFile
method:
public void UploadFile(
DataLakeDirectoryClient directoryClient,
String fileName) {
DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
fileClient.uploadFromFile("filePath/sample-file.txt");
}
You can use this method to create and upload content to a new file, or you can set the overwrite
parameter to true
to overwrite an existing file.
Append data to a file
You can upload data to be appended to a file by using the following method:
The following code example shows how to append data to the end of a file using these steps:
- Create a
DataLakeFileClient
object to represent the file resource you're working with. - Upload data to the file using the
DataLakeFileClient.append
method. - Complete the upload by calling the
DataLakeFileClient.flush
method to write the previously uploaded data to the file.
public void AppendDataToFile(
DataLakeDirectoryClient directoryClient) {
DataLakeFileClient fileClient = directoryClient.getFileClient("sample-file.txt");
long fileSize = fileClient.getProperties().getFileSize();
String sampleData = "Data to append to end of file";
fileClient.append(BinaryData.fromString(sampleData), fileSize);
fileClient.flush(fileSize + sampleData.length(), true);
}
Download from a directory
The following code example shows how to download a file from a directory to a local file using these steps:
- Create a
DataLakeFileClient
object to represent the file that you want to download. - Use the
DataLakeFileClient.readToFile
method to read the file. This example sets theoverwrite
parameter totrue
, which overwrites an existing file.
public void DownloadFile(
DataLakeDirectoryClient directoryClient,
String fileName) {
DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
fileClient.readToFile("filePath/sample-file.txt", true);
}
List directory contents
You can list directory contents by using the following method and enumerating the result:
Enumerating the paths in the result may make multiple requests to the service while fetching the values.
The following code example prints the names of each file that is located in a directory:
public void ListFilesInDirectory(
DataLakeFileSystemClient fileSystemClient,
String directoryName) {
ListPathsOptions options = new ListPathsOptions();
options.setPath(directoryName);
PagedIterable<PathItem> pagedIterable = fileSystemClient.listPaths(options, null);
java.util.Iterator<PathItem> iterator = pagedIterable.iterator();
PathItem item = iterator.next();
while (item != null) {
System.out.println(item.getName());
if (!iterator.hasNext()) {
break;
}
item = iterator.next();
}
}
Delete a directory
You can delete a directory by using one of the following methods:
- DataLakeDirectoryClient.delete
- DataLakeDirectoryClient.deleteIfExists
- DataLakeDirectoryClient.deleteWithResponse
The following code example uses deleteWithResponse
to delete a nonempty directory and all paths beneath the directory:
public void DeleteDirectory(
DataLakeFileSystemClient fileSystemClient,
String directoryName) {
DataLakeDirectoryClient directoryClient = fileSystemClient.getDirectoryClient(directoryName);
// Set to true to delete all paths beneath the directory
boolean recursive = true;
directoryClient.deleteWithResponse(recursive, null, null, null);
}