Apache Airflow Powered by GlobalSolutions
Apache Airflow is an open-source tool to programmatically author, schedule, and monitor workflows. It is one of the most robust platforms used by Data Engineers for orchestrating workflows or pipelines. With Airflow, users can author workflows as Directed Acyclic Graphs (DAGs) of tasks. Airflow's rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
Airflow connects with multiple data sources and can send alerts via email or Slack when a task completes or fails. It is distributed, scalable, and flexible, making it well-suited to handle the orchestration of complex business logic.
Airflow is a distributed system consisting of the following components:
- Webserver — Provides a user interface for managing Airflow workflows.
- Scheduler — Responsible for scheduling and running Airflow tasks.
- Worker — Worker nodes that execute Airflow tasks.
- Executor — Responsible for running tasks on worker nodes.
- Metadata database — Stores information about Airflow workflows, tasks, and other entities.
Why Subscribe to Our Offering in AWS Marketplace
- We update the software constantly to the latest version to address security issues.
- Customers can kick-start their core work right away with our pre-packaged AMIs.
- Production-ready application stacks.
Accessing Your AMI from AWS Marketplace
To get started with your Apache Airflow stack:
- Subscribe: Purchase the Apache Airflow AMI from the AWS Marketplace.
- Connect via SSH:
- In the AWS Console, select your launched instance and click Connect.
- Choose SSH Client and follow the connection instructions shown.
- From your local terminal, connect using your
.pemkey file:
ssh -i yourpemfile.pem ubuntu@<public-ip-of-your-server>
- Once logged in you will land in the home directory.
Installation — Container Stack
Apache Airflow has been installed as a set of containers. The following containers make up the stack:
| Container | Description |
|---|---|
airflow-apiserver |
Serves the Airflow REST API. |
airflow-dag-processor |
Parses and processes DAG files. |
airflow-scheduler |
Schedules and triggers task execution. |
airflow-triggerer |
Handles deferrable operator triggers. |
airflow-worker |
Executes the tasks assigned by the scheduler. |
postgres |
Metadata database storing workflow and task state. |
redis |
Message broker used for task queuing between scheduler and workers. |
Configuring the Password
Once logged into the machine, follow the steps below to set the Airflow password to your EC2 Instance ID:
- Navigate to the Airflow directory from home:
cd airflow
- You should see
instanceid.pyin the directory. - Run it with the following command:
python3 instanceid.py
- This will set the Airflow login password to your EC2 Instance ID.
curl -s http://169.254.169.254/latest/meta-data/instance-id
Connecting to the Application
To access the Apache Airflow web interface from your local machine, open a browser and navigate to:
http://<your-server-ip>:8080
Log in with the following credentials:
| Username | Password |
|---|---|
| airflow | Your EC2 Instance ID (e.g. i-0abc123def456789) |
python3 instanceid.py as described above before attempting to log in.
Getting Started with DAGs
Directed Acyclic Graphs (DAGs) define your workflows in Airflow. Follow these steps to get started:
- Access the Airflow web interface using the URL and credentials above.
- Create DAGs to define your workflows.
- Monitor task execution and view task logs from the UI.
- Trigger and pause workflows as needed.
Sample DAG
A sample DAG is pre-installed at /home/ubuntu/airflow/dag/my_first_dag.py. To view it running, go to the Airflow console and search for sample_dag — you should see it in a running state.
AWS Cost Optimizer — CloudInsider
Our other popular offering is the AWS Cost Optimizer aka CloudInsider, available in AWS Marketplace. This service has helped our customers save significantly on AWS and other cloud spending. It is easy to subscribe and you can see the savings in minutes.
▶ Watch Demo Video Subscribe on AWS Marketplace
Support
For any questions or assistance with our AWS Marketplace offering, reach out to us at support@theglobalsolutions.net.