I have a script at GCS bucket. I want to run the script in my airflow dag using BashOperator. The airflow is present in a VM.
My constraints are that I cannot copy that script in VM and run because it has some jobs and connections running inside it. If I copy the script I will have to copy the dependant jars and files as well.
I tried using gsutil cat <script path in bucket> | sh but it is not working.
I also came across post for accessing the bucket-file using below code but I do not know how to use it in BashOperator or to run it.
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('test_bucket')
blob = bucket.get_blob('temp_files_folder/test.txt')
Any suggestions!
CodePudding user response:
To use Cloud Storage files as local, you can use GCSFuse that mount a bucket in a folder and then you can use the file directly from your VM, but the file never left Cloud Storage location.
CodePudding user response:
If you dont want to use Airflow VM (current), you have to use another environment (VM) to run. I think the simplest way is to wrap your Java program into a Docker container, add needed packages and run it, for example in Google Cloud Run/Cloud Function. Instead of storing Java file (-s) in GCP, store Docker image in DockerHub, Google Cloud Registry, or another service.
You can pass connection settings as env variables or command arguments while running the container. Obviously, you can use the current Airflow VM (if Docker is installed) to run the Docker container. Some helpful links:
- Airflow DockerOperator https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/_api/airflow/providers/docker/operators/docker/index.html
- Create workflow for Cloud Run/Function https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/workflows.html
