Simple Airports Analysis - Malaysia

Made with Python Python version GitHub code size in bytes GitHub repo size Website GitHub

Contents

Introduction

In this project, we want to run simple analysis on airport and arrival data.

We want to know:

  1. How many airports are there in Malaysia?
  2. What is the distance between the airports in Malaysia?
  3. How many flights are going to land in Malaysian airports the next day from the point of query?
  4. Which airport is most congested base on the information gathered in question 3 above?

To answer these questions, we built an end-to-end framework with ETL pipelines, a database, and an analytics platform as seen in the Project Framework diagram below. (Admittedly a little over-engineered but, hey! Why not…? :trollface:)

The relationship between the tables used to answer the questions above can be visualised using the Table Lineage.

We also hosted a version of the dbt documentation on GitHub which would contain the table descriptions, data dictionaries, and the SQL queries—both raw and compiled—used to generate the tables.

Finally, to visualise the data and help us answer the questions, we used Metabase—an open source business intelligence tool—to create an Analytics Dashboard.


Project Framework

Covered in Step 1, 2, and 3 airports-analysis-framework

Table Lineage (via dbt)

Covered in Step 4 (Optional) - view the dbt documentation table-lineage-dbt

Table Documentation (via dbt)

Covered in Step 4 (Optional) - view the dbt documentation table-lineage-dbt

Metabase Dashboard

Covered in Step 3 (Optional) airport-analysis-metabase


Technology Used

| Database | (ETL/ELT) Pipeline | Workflow Orchestration | Analytics Platform | |—————————|—————————–|—————————-|—————————| | Postgres (using Docker) | Python 3.6 | Luigi (using Python) | Metabase (using Docker) | | | dbt (Data Transformation) | | |

Developer notes

This project was created on macOS. It has not yet been tested on Windows.

Requirements

Data Source

We primarily use Airport and Arrival data from these two sources:

Using this project

Step 1. Initial setup

1) Install Python 3.6 using these instructions.

(Optional) Use a Python virtual environment

2) IMPORTANT - Install required libraries, at the command line: bash $ pip install -r requirements.txt This will install the Python libraries required for this project:

3) Install Docker using these instructions.

4) Install PostgresSQL via Docker using the instructions below (source), at the command line: ```bash $ docker pull postgres

$ mkdir -p $HOME/docker/volumes/postgres

$ docker run --rm --name pg-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432 -v $HOME/docker/volumes/postgres:/var/lib/postgresql/data postgres
```

5) Install a suitable SQL Client such as DBeaver.

Use the default connection configuration to verify the database connection and view the data that will be added in the later steps:
```yaml
Host:     localhost 
Database: postgres
Port:     5432
User:     postgres
Password: docker
```

![database-sql-client-dbeaver](/simple-airports-analysis/docs/img/database-sql-client-dbeaver.png)

Step 2. Running the workflow

This step utilises the Luigi workflow orchestration that automates the entire ELT pipeline.

To run each tasks in the workflow manually, see this guide. 1) Run the workflow.py file which will initiate the luigi workflow orchestration, at the command line: bash $ python workflow.py --local-scheduler DbtRunAnalysis Luigi will work through all the steps as defined in workflow.py.

> _Notes: This step may take some time since it is running the entire pipeline._
> 
> _**(Optional)** Luigi has a web interface which can be used to monitor workflow and view dependency graphs as shown below. To replicate, follow [this guide](/simple-airports-analysis/docs/visualising-luigi.html)._ 

![workflow-visualisation-luigi](/simple-airports-analysis/docs/img/workflow-visualisation-using-luigi.png)

2) The following will be returned when the previous step completes successfully: ```bash INFO: ===== Luigi Execution Summary =====

Scheduled 7 tasks of which:
* 7 ran successfully:
    - 1 DbtDeps()
    - 1 DbtRunAirports()
    - 1 DbtRunAnalysis()
    - 1 DbtSeedAirports()
    - 1 DbtSeedArrivals()
    ...

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====
```
> _Notes: You can view the 'logs' for each step in the files with `.output` extension that is generated in the root of the project directory._

3) The data should ready in the database as tables and can be viewed using the SQL Client that was installed in Step 1.5 above.

Step 3. Analysis on Metabase

Optional Step! Alternatively, the data can be viewed directly using the SQL client as mentioned in Step 1.5.

1) Install Metabase via Docker using the instructions below (source), at the command line: bash $ docker run -d -p 3000:3000 --name metabase metabase/metabase

Wait a little while as it takes time to initialise. You can use docker logs -f metabase to follow the rest of the initialization progress.

2) Once initialised, visit the local Metabase portal at http://localhost:3000/.

3) You will be directed to http://localhost:3000/setup, click on Let's get started. setup-metabase-0

4) Fill in the details for your admin account. setup-metabase-1

5) Important step - Use the following database connection configuration to connect Metabase to our Postgres instance in Docker: yaml Name: Local (Or any other name) Host: host.docker.internal Port: 5432 Database name: postgres Database username: postgres Database password: docker > Please note that this has been tested to work on macOS. For Windows, if Metabase fails to connect to the above Host, please try referring to this official documentation on Docker Networking.

![setup-metabase-2](/simple-airports-analysis/docs/img/setup-metabase-2.png)

6) Choose your usage data preference. setup-metabase-3

7) Done! You should see the main Metabase page below which lists the available tables in the Postgres Database that was created in Step 2. setup-metabase-4

8) Unfortunately, there is no export feature to share the dashboard and charts seen in the Introduction. However, it is just a few simple SELECT * FROM table queries to the main fct_ and some of the stg_ tables available in the Postgres database. :smile:

Step 4. Viewing dbt documentation

Optional Step!

1) In the ./dbt directory, run the following commands to generate the documentation: bash $ cd ./dbt $ dbt docs generate --profiles-dir ./

2) Once the documentation has been generated, run the following commands to host the documentation website locally: bash $ dbt docs serve

3) The documentation website should automatically be launched in your browser.

If it does not, just navigate to [http://localhost:8080/](http://localhost:8080/).

4) Alternatively, you can view the version we hosted on GitHub - dbt documentation.

Step 5. Done

:shipit:

Future Features and Improvements

As this project was only done in more or less 3 days, there were some features or improvements that were not implemented.

Here are a few suggestions:

Author