The GlobalSolutions

Apache Airflow Powered by GlobalSolutions

Apache Airflow is an open-source tool to programmatically author, schedule, and monitor workflows. It is one of the most robust platforms used by Data Engineers for orchestrating workflows or pipelines. With Airflow, users can author workflows as Directed Acyclic Graphs (DAGs) of tasks. Airflow's rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

Airflow connects with multiple data sources and can send alerts via email or Slack when a task completes or fails. It is distributed, scalable, and flexible, making it well-suited to handle the orchestration of complex business logic.

Airflow is a distributed system consisting of the following components:

Why Subscribe to Our Offering in AWS Marketplace

Accessing Your AMI from AWS Marketplace

To get started with your Apache Airflow stack:

  1. Subscribe: Purchase the Apache Airflow AMI from the AWS Marketplace.
  2. Connect via SSH:
    • In the AWS Console, select your launched instance and click Connect.
    • Choose SSH Client and follow the connection instructions shown.
    • From your local terminal, connect using your .pem key file:
    ssh -i yourpemfile.pem ubuntu@<public-ip-of-your-server>
    • Once logged in you will land in the home directory.
For more information, refer to the AWS Instance Connection Guide.

Installation — Container Stack

Apache Airflow has been installed as a set of containers. The following containers make up the stack:

Container Description
airflow-apiserver Serves the Airflow REST API.
airflow-dag-processor Parses and processes DAG files.
airflow-scheduler Schedules and triggers task execution.
airflow-triggerer Handles deferrable operator triggers.
airflow-worker Executes the tasks assigned by the scheduler.
postgres Metadata database storing workflow and task state.
redis Message broker used for task queuing between scheduler and workers.

Configuring the Password

Once logged into the machine, follow the steps below to set the Airflow password to your EC2 Instance ID:

  1. Navigate to the Airflow directory from home:
    cd airflow
  2. You should see instanceid.py in the directory.
  3. Run it with the following command:
    python3 instanceid.py
  4. This will set the Airflow login password to your EC2 Instance ID.
Finding Your Instance ID: Log into the AWS Console and select your instance, or run the following command from within the instance:

curl -s http://169.254.169.254/latest/meta-data/instance-id

Connecting to the Application

To access the Apache Airflow web interface from your local machine, open a browser and navigate to:

http://<your-server-ip>:8080

Log in with the following credentials:

Username Password
airflow Your EC2 Instance ID (e.g. i-0abc123def456789)
Important: Make sure you have run python3 instanceid.py as described above before attempting to log in.

Getting Started with DAGs

Directed Acyclic Graphs (DAGs) define your workflows in Airflow. Follow these steps to get started:

  1. Access the Airflow web interface using the URL and credentials above.
  2. Create DAGs to define your workflows.
  3. Monitor task execution and view task logs from the UI.
  4. Trigger and pause workflows as needed.

Sample DAG

A sample DAG is pre-installed at /home/ubuntu/airflow/dag/my_first_dag.py. To view it running, go to the Airflow console and search for sample_dag — you should see it in a running state.

Note: If you need help creating new DAGs, the GlobalSolutions team can assist — however this will be a separate engagement.

AWS Cost Optimizer — CloudInsider

Our other popular offering is the AWS Cost Optimizer aka CloudInsider, available in AWS Marketplace. This service has helped our customers save significantly on AWS and other cloud spending. It is easy to subscribe and you can see the savings in minutes.

▶ Watch Demo Video Subscribe on AWS Marketplace

Support

For any questions or assistance with our AWS Marketplace offering, reach out to us at support@theglobalsolutions.net.