Issue
guys. I'm using a personal compute cluster on Azure Databricks and proceeding with a mount point creation on as follows:
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "xxxxxx",
"fs.azure.account.oauth2.client.secret": "xxxxx",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/xxxxxx/oauth2/token"}
dbutils.fs.mount(
source = "abfss://[email protected]/xxx/yyy",
mount_point = "/mnt/MyMount/",
extra_configs = configs)
When I try to access the mount using dbutils.fs.ls("/mnt/MyMount/")
it's accessible. But when I start to read with code as follows:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("MyApp").getOrCreate()
views = ['view_1', 'view_2', 'view_3', 'view_4', 'view_5']
logs = ['Log_views']
for view in views:
print(f'Start save view {view}.')
spark.read.format("delta").load(f"/mnt/source/{view}").toPandas().to_csv(f"/mnt/MyMount/{view}.csv", index=False)
print(f'View {view} save sucessfuly.')
for log in logs:
spark.read.format("parquet").load(f"/mnt/source/{log}").toPandas().to_csv(f"/mnt/MyMount/{log}.csv", index=False)
print(f'Logs {log} save successfully.')
I got an error:
OSError: Cannot save file into a non-existent directory: '/mnt/MyMount'
I tried to remove and recreate the mount point and confirmed all access is working fine.
Solution
You need to include /dbfs/
in path.
Because the path outside spark context need to be given from root filesystem and your mount will be under dbfs
folder.
Below is the data:
Code
spark.read.format("parquet").load(f"/mnt/source/{log}").toPandas().to_csv(f"/dbfs/mnt/MyMount/{log}.csv", index=False)
And output:
For more information on working with files in databricks refer this
Answered By - JayashankarGS Answer Checked By - Marilyn (WPSolving Volunteer)