Scripts and Evaluation Workflow

Reproducible Python pipeline for Urban PM2.5 imbalance-aware classification

Purpose of this section

This page documents the Python scripts that implement the imbalance-aware evaluation framework developed in this project.

The objective is not algorithmic benchmarking, but to provide a transparent and modular analytical architecture for evaluating classification models under:

Temporal constraints
Structural class imbalance
Operational computational cost

All scripts are openly available in the repository and can be executed independently.

Methodological position

The workflow is structured around the following principles:

Chronological integrity (no random splits)
Explicit imbalance analysis
Separation between preprocessing, training, and evaluation
Cost-aware performance assessment
Full reproducibility using open Python tools

The scripts are modular, traceable, and designed for deployment-oriented evaluation rather than experimental optimisation.

Script overview

01 - Data acquisition (EEA / OpenAQ)

eea_read.py

This script handle the acquisition of raw PM2.5 data from official sources.

Main objectives - Load hourly PM2.5 measurements. - Standardise column formats. - Validate temporal consistency. - Store structured raw datasets.

They establish the empirical foundation of the pipeline.

02 - Preprocessing and harmonisation

preprocess_pm25_lisbon.py

Transforms raw hourly observations into analysis-ready daily data.

Main objectives - Daily aggregation. - WHO-based class definition. - Missing value inspection. - Label encoding (Low / Moderate / High). - Export of consolidated parquet files.

This step defines the structural imbalance observed in the High category.

03 — Classical modelling and evaluation

model_ready.py

Prepares the dataset for classification modelling.

Main objectives - Feature construction. - Temporal ordering enforcement. - Train/Test split: - Train: 2021–2022 - Test: 2023 - Final matrix preparation for modelling.

Chronological integrity is strictly preserved.

04 - Model training

entrenamiento.py

Implements supervised classification models: - Logistic Regression - Random Forest - XGBoost - Multi-Layer Perceptron (MLP)

Main objetives - Fit models on training data. - Compute predictions on 2023 test data. - Measure training and inference time. - Export raw prediction outputs.

The script prioritises structured evaluation rather than hyperparameter search.

05 - Metrics extraction and export

extraer_metricas.py exportar_metricas_balanced.py

These scripts compute imbalance-aware metrics: - Accuracy - Macro-F1 - Balanced Accuracy - Class-specific Precision / Recall / F1 (High class) - Computational cost (training + inference time)

Outputs are formatted for: - Publication tables - Web visualisation - Trade-off analysis

06 - Visual analytics

plot_radar_balanced.py plot_ranking_scenarios.py tradeoff_map.py generate_tradeoff_map_pm25_lisbon_2021_2023.py

These scripts produce the graphical synthesis of the evaluation: - Radar comparison under balanced scenario - Multi-criteria ranking across decision scenarios - Performance–Cost trade-off map - Efficient frontier visualisation

Figures are exported in:

figures_publication/ (300 dpi TIFF)
images/ (PNG for web rendering)

Execution logic

Scripts are intended to be executed sequentially:

Data acquisition
Preprocessing
Model-ready dataset construction
Model training
Metric extraction
Visual analytics

This structure ensures full traceability and reproducibility.

Reproducibility and transparency

All scripts: - Are fully documented. - Rely exclusively on open-source tools. - Produce results programmatically. - Can be executed independently. - Require no manual data manipulation

The computational environment is managed using Conda.

Final remark

This script-based architecture reflects a deployment-oriented evaluation philosophy, where: - Temporal integrity is preserved - Rare-event detection is prioritised - Computational efficiency is quantified - Model trade-offs are explicitly visualised

The objective is methodological clarity and operational realism, not technological novelty.

The structure is intended for both academic research and advanced teaching in applied data science.