Data Visualization Case Study

Here’s what the data say about air pollution in Delhi!

I downloaded Open Air Quality Data from 2016 to 2024 using Python and made visualization in Tableau to understand and predict how pollution has evolved in Delhi from 2016 to 2024. The situation isn’t good.

Aayush Malik
4 min readNov 18, 2024
Photo by Nik Shuliahin 💛💙 on Unsplash

The workflow included getting the data, and letting the data answer some of the questions about air pollution in Delhi. The article is structured in the same way. The first part is suited for audience interested in knowing the code behind the data. The second part is for those who are interested in understanding the insights behind the data.

Data Extraction and Data Transformation

The first step involved downloading data from openaq.org. According to their website, “OpenAQ is a nonprofit organization providing universal access to air quality data to empower a global community of changemakers to solve air inequality — the unequal access to clean air.” The steps I took to download the data included:

  1. Sign up for an account on openaq.org
  2. Get the API Key from Account Settings.
  3. Use this Python script to download, and process the data in a form that was useful for visualization in Tableau.
# get your API Key from OpenAQ
API_KEY = ""

# importing packages
from pprint import pprint
import requests
import time
import pandas as pd

# the url to get daily measurements for sensor ID 23534
# one can get the sensor ID for a specified location
url = "https://api.openaq.org//v3/sensors/23534/measurements/daily"

headers = {
"X-API-KEY": API_KEY,
}

# empty list to create all the records
all_records = []

with requests.Session() as s:
page = 1 # because at most 1000 records can be retrieved
while page:
params = {
"limit": 1000,
"page": page
}
print(f"Making request for page number {page}")
r = s.get(url, headers=headers, params=params)
data = r.json()

# Extract and store records
records = data.get("results", [])
all_records.extend(records)

len_results = len(r.json()['results'])
print(len_results)
time.sleep(0.5)
page = r.json()['meta']['page'] + 1
if len_results == 0:
break

# Convert all records to a DataFrame
df = pd.DataFrame(all_records)

# to get the start_date and end_date from date column
def get_start_date(value):
return value['datetimeFrom']['local'][:10]
def get_end_date(value):
return value['datetimeTo']['local'][:10]

# applying map to the specified columns
df['start_date'] = df['period'].map(get_start_date)
df['end_date'] = df['period'].map(get_end_date)

# removing all the negative values as those are meaningless
df = df[df['value']>0]

# final CSV for Tableau
df.to_csv("output.csv", index=False)

Data Visualization in Tableau

After downloading and processing the data, I imported it in Tableau as a text file and asked the following questions to visualize the trends over the years.

How is the concentration of PM 2.5 evolving over the years?

According to Indian NAAQS, the maximum limit of PM2.5 should be 40 μg/m3.

The chart shows that the pollution because of PM 2.5 is increasing every year and because we do not have the full data for November and December 2024, we can only estimate that 2024 can be one of the worst offenders of PM 2.5 by the end of the year.

Is there a seasonal trend in PM 2.5 values in Delhi over the years?

Over the years the monsoon months in Delhi — July, August, and September — are the “cleanest”.

If we look at all the values of PM 2.5 across months, we see that the cleanest months are July, August, and September. The monsoon winds from East and Southeast can be attributed to that. Summers especially April, May, and June tend to be polluted too because of the warm winds from the West that carry dust from Indian desert. The same can be seen in the chart below too. Over the years, it’s only July, August, and September that meet the regulations set by the Indian authorities for PM 2.5 concentration.

Only July, August, and September have PM 2.5 concentrations less than prescribed norms.

Is pollution getting reduced in the worst months — November and January?

PM 2.5 November Concentration
January PM 2.5 Concentration

Can we estimate PM 2.5 values for 2025?

Pollution Estimates for 2025 as per PM 2.5 Concentration

Using Tableau’s Time Series Analysis tools, one can see an initial estimated trend of pollution levels in 2025. This can help to take an preemptive action against the rising levels which can be seen from October onwards. Timely GRAP (Graded Response Action Plan) measures can mitigate the effects of pollution before it gets worse.

For more insights into similar phenomenon using open data, follow me on Medium and LinkedIn.

--

--

Aayush Malik
Aayush Malik

Written by Aayush Malik

Open Data | Causal Inference | Machine Learning | Data Visualization and Communication | https://www.linkedin.com/in/aayushmalik/

No responses yet