
Urban PM2.5 Methodology
A reproducible and extensible analytical pipeline for applied data science
π Direct access
π§ Overview
This repository hosts the open, reproducible materials developed as a methodological extension of a doctoral thesis in Applied Data Science, focused on the analysis of urban PM2.5 concentrations.
Rather than proposing a single optimal predictive model, the project introduces a coherent analytical pipeline designed for real-world environmental data, where temporal structure, interpretability, and reproducibility are prioritised over algorithmic benchmarking.
The project is presented as an example of research continuity, illustrating how a doctoral research line can evolve beyond the thesis itself.
π― Research motivation
Urban air quality analysis is often approached through fragmented workflows, ad hoc modelling decisions, or performance-driven comparisons that overlook methodological coherence.
This project responds to several recurrent limitations observed in applied data science practice:
- Insufficient attention to the temporal structure of environmental data.
- Overemphasis on predictive performance at the expense of interpretability.
- Superficial adoption of emerging technologies without methodological grounding.
- Limited reproducibility in applied environmental analytics.
The proposed pipeline addresses these issues by prioritising analytical clarity, transparency, and methodological robustness.
π§© Reproducible methodological workflow
The analytical workflow follows a structured and sequential pipeline:
Data preparation and harmonisation
Validation of raw observations, temporal parsing, and explicit handling of missing values.Exploratory and structural analysis
Examination of temporal dynamics, seasonal behaviour, and data gaps to understand the data-generating process.Classical modelling and evaluation
Simple baseline and linear models with time-based validation to establish methodological reference points.Quantum methodological extension
Integration of a quantum kernel-based model under NISQ constraints, without claims of performance superiority.
This workflow reflects a methodological stance, not a technology-driven comparison.
π§ Analytical philosophy
The pipeline is guided by the following principles:
- Sequential structure: each analytical step builds explicitly on the previous one.
- Time-aware validation: no random splits are applied to temporal data.
- Model parsimony: simple and interpretable models are favoured where appropriate.
- Reproducibility by design: all results are generated programmatically from documented scripts.
- Extensibility: emerging analytical paradigms can be integrated without structural changes.
This philosophy underpins both the doctoral thesis and its extension presented here.
ποΈ Repository structure
urban-pm25-methodology/
βββ docs/ # Rendered Quarto website (GitHub Pages)
β βββ index.html
β βββ scripts.html
β βββ notebook.html
βββ data/ # Clean analysis-ready dataset
βββ figures_tiff/ # High-resolution figures for publication
βββ scripts/ # R and Python scripts implementing the pipeline
βββ index.qmd
βββ scripts.qmd
βββ notebook.qmd
βββ _quarto.yml
βββ LICENSE
βββ README.md
βοΈ Technologies Used
| Category | Tools / Packages |
|---|---|
| Programming | R 4.4 + Β· Quarto |
| Data Handling | tidyverse Β· arrow Β· lubridate . readr |
| Visualisation | ggplot2 Β· cowplot |
| Modelling | Linear models Β· Persistence baselines |
| Quantum Computing | Qiskit Β· Qiskit Aer Β· Qiskit Machine Learning |
| Reproducibility | Git Β· GitHub Pages Β· Conda |
π Reproducibility and openness
All analyses are fully reproducible and rely exclusively on open-source tools.
- Scripts are executed sequentially and independently.
- No manual data manipulation is performed.
- Computational environments are explicitly documented.
- Results are generated programmatically from source code.
This approach supports transparent, verifiable, and reusable research practices.
Fig. 1. Methodological pipeline for urban PM2.5 analysis, illustrating the sequential integration of data preparation, exploratory analysis, classical modelling, and a demonstrative quantum extension under NISQ constraints.
π Relation to the doctoral thesis
This project constitutes a direct methodological extension of the doctoral thesis and is presented as evidence that the research line remains active, adaptable, and open to emerging analytical methodologies.
It is not intended as a standalone technological breakthrough, but as a coherent continuation of a broader research programme in applied data science and urban environmental analytics.
π Bibliographic Resources
Bibliographic resources associated with the doctoral thesis and related publications will be made available in this repository.
π Citation
If you reuse or adapt this resource, please cite as:
CΓ‘ceres-Tello, J., & GalΓ‘n-HernΓ‘ndez, J. J. (2025).
Urban PM2.5 Methodology: A reproducible analytical pipeline for applied data science. Available at: https://jcaceres-academic.github.io/urban-pm25-methodology/
βοΈ License
Code and notebooks: Creative Commons Attribution 4.0 (CC BY 4.0) Data (if reused): CC0 1.0 Public Domain Dedication
π¬ Contact
JesΓΊs CΓ‘ceres Tello Department of Computer Systems and Computing Universidad Complutense de Madrid
π§ jcaceres.academic@gmail.com
π§ jescacer@ucm.es
β¬ οΈ Back to my main page
This repository supports open, transparent, and reproducible research in environmental data science and STEM education.