Price difference project (Airflow data pipeline)
I prepared this project to show a data pipeline built with Airflow. It is a simple example of fetching some Uniswap swap data (WETH/USDC) of a single user from Dune using their API and comparing the prices with Crypto Compare’s API.
Project Components
- Github repo: See the full repo here: Price difference project
- Airflow: Orchestrates the pipeline’s tasks on a adaily basis.
- AWS PostgreSQL Database: Stores all data.
- Dune Analytics: Data on swaps are fetched from a Dune query, allowing parameterized fetching to retrieve only the most recent data. View the query here.
- Crypto Compare API: Provides near-real-time market prices to compare against trade execution prices.
Repository Structure
In this repository please explore the following key directories and files:
- dags/: Contains Airflow DAGs defining the workflow and Python scripts for fetching data from Dune and Crypto Compare APIs and calculating price difference.
- db_management/: Includes scripts for database operations like creating views.
- Dockerfile, docker-compose.yml, .env and requirements.txt: These files are used to build the docker container.
- notebooks: If you want to try to interact with the code with a jupyter notebook. If so you need to active the .conda virtual environment or build your own using the requirements_local_conda.txt file.
Running the Project
- Environment Setup: Ensure Docker and VS Code are installed.
- Start the Application:
- Open the project folder in VS Code.
- Run below docker command and wait for the services to initialize:
docker compose up
- Access Airflow UI:
- Navigate to http://localhost:8080/home in your web browser.
- Log in with username airflow and password airflow.
- Execute and Monitor Pipeline:
- Locate the price_difference_dag and manually trigger the DAG by clicking the “play button”.
- Monitor task progress and also check out the dependencies in the DAG’s Graph View by clicking the name of the DAG and then “Graph”.
- View Results:
- Access results from the DAG’s execution logs under logs/dag_id=price_difference_dag/run_id=manual[timestamp]/task_id=calculate_price_difference__. Look for the attemp=1.log file in the folder with the latest timestamp. Inside the file please search for 3 hashtags “###” and finde a line like this: [2024-07-29T13:04:29.342+0000] {calculate_price_difference.py:24} INFO - ### The average price difference for all the swaps was -0.175 percent ###
Stopping the Project and clean up
To halt the project and clean up Docker images, use:
docker compose down --rmi all
Considerations and improvements for production readiness
Unfortunately it Crypto Compare only has the last 7 days of prices on the minute granularity. So I opted for Cryptoo Compare daily closing prices:
These are obvioulsly not very acurate, but the idea of the project was to show a working pipeline not so much interesting analytical findings.
In transitioning to a production environment, I would consider the following:
- Security: Implement managed secret handling services like AWS Secrets Manager for sensitive data.
- Scalability: Utilize Celery with Airflow for distributed task execution.
- Reliability: Enhance error handling with alerts. Add comprehensive logging and monitoring.
- Data Integrity: Implement data quality checks and validations throughout the data pipeline.
- Compliance and Documentation: Maintain thorough documentation and ensure compliance with data protection regulations.