I'm working on the Deployment of the Purview ADB Lineage Solution Accelerator. In step 3 of Install OpenLineage on Your Databricks Cluster section, the author is asking to run the following in thepowershell to Upload the init script and jar to dbfs using the Databricks CLI.
dbfs mkdirs dbfs:/databricks/openlineage
dbfs cp --overwrite ./openlineage-spark-*.jar dbfs:/databricks/openlineage/
dbfs cp --overwrite ./open-lineage-init-script.sh dbfs:/databricks/openlineage/open-lineage-init-script.sh
Question: Do I correctly understand the above code as follows? If that is not the case, before running the code, I would like to know what exactly the code is doing.
- The first line creates a folder
openlineagein the root directory ofdbfs - It's assumed that you are running the
powershellcommand from the location where.jarandopen-lineage-init-script.share located - The second and third lines of the code are copying the
jarand.shfiles from your local directory to thedbfs:/databricks/openlineage/indbfsof Databricks
CodePudding user response:
dbfs mkdirsis an equivalent of UNIXmkdir -p, ie. under DBFS root it will create a folder nameddatabricks, and inside it another folder namedopenlineage- and will not complain if these directories already exist.and 3. Yes. Files/directories not prefixed with
dbfs:/mean your local filesystem. Note that you can copy from DBFS to local or vice versa, or between two DBFS locations. Just not between local filesystem only.
