Apache Airflow Powered by GlobalSolutions
Apache Airflow
Powered by GS
Apache Airflow is an open-source tool to programmatically author, schedule, and monitor workflows. It is one of the most robust platforms used by Data Engineers for orchestrating workflows or pipelines. You can easily visualize your data pipelines dependencies, progress, logs, code, trigger tasks, and success status.With Airflow, users can author workflows as Directed Acyclic Graphs (DAGs) of tasks.Airflows rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed. It connects with multiple data sources and can send an alert via email or Slack when a task completes or fails. Airflow is distributed, scalable, and flexible, making it well-suited to handle the orchestration of complex business logic. It is highly scalable and extensible, making it a good choice for automating complex data pipelines.
Airflow is a distributed system that consists of the following components:
- Webserver: The webserver provides a user interface for managing Airflow workflows.
- Scheduler: The scheduler is responsible for scheduling and running Airflow tasks.
- Worker: The worker nodes execute Airflow tasks.
- Executor: The executor is responsible for running tasks on worker nodes.
- Metadata database: The metadata database stores information about Airflow workflows, tasks, and other entities.
How to Connect to the Application (from Local instance):
To access the Apache Airflow web interface from your local machine:
- URL: http://:8080
- Username: admin
- Password: global
Steps to create a DAG and sample DAG for refence:
- Access the Airflow web interface using the above mentioned URL.
- Log in with the following information, username - admin and password - global.
- Create DAGs to define workflows.
- Monitor task execution and view task logs.
- Trigger and pause workflows as needed.
The sample dag that we created is in the
/home/ec2-user/airflow/dag . You can see a python file with the name my_first_dag.py where you can see the first dag. Go on to the airflow console and search for sample_dag and you can see it running state. If you need help on creating new Dags we can help but that will be a separate engagement from us.
Support
Please contact us at support@theglobalsolutions.net for any questions on this offering in AWS Marketplace.