Scripts and Evaluation Workflow
Reproducible Python pipeline for Urban PM2.5 imbalance-aware classification
Purpose of this section
This page documents the Python scripts that implement the imbalance-aware evaluation framework developed in this project.
The objective is not algorithmic benchmarking, but to provide a transparent and modular analytical architecture for evaluating classification models under:
- Temporal constraints
- Structural class imbalance
- Operational computational cost
All scripts are openly available in the repository and can be executed independently.
Methodological position
The workflow is structured around the following principles:
- Chronological integrity (no random splits)
- Explicit imbalance analysis
- Separation between preprocessing, training, and evaluation
- Cost-aware performance assessment
- Full reproducibility using open Python tools
The scripts are modular, traceable, and designed for deployment-oriented evaluation rather than experimental optimisation.
Script overview
01 - Data acquisition (EEA / OpenAQ)
eea_read.py
This script handle the acquisition of raw PM2.5 data from official sources.
Main objectives - Load hourly PM2.5 measurements. - Standardise column formats. - Validate temporal consistency. - Store structured raw datasets.
They establish the empirical foundation of the pipeline.
02 - Preprocessing and harmonisation
preprocess_pm25_lisbon.py
Transforms raw hourly observations into analysis-ready daily data.
Main objectives - Daily aggregation. - WHO-based class definition. - Missing value inspection. - Label encoding (Low / Moderate / High). - Export of consolidated parquet files.
This step defines the structural imbalance observed in the High category.
03 — Classical modelling and evaluation
model_ready.py
Prepares the dataset for classification modelling.
Main objectives - Feature construction. - Temporal ordering enforcement. - Train/Test split: - Train: 2021–2022 - Test: 2023 - Final matrix preparation for modelling.
Chronological integrity is strictly preserved.
04 - Model training
entrenamiento.py
Implements supervised classification models: - Logistic Regression - Random Forest - XGBoost - Multi-Layer Perceptron (MLP)
Main objetives - Fit models on training data. - Compute predictions on 2023 test data. - Measure training and inference time. - Export raw prediction outputs.
The script prioritises structured evaluation rather than hyperparameter search.
05 - Metrics extraction and export
extraer_metricas.py exportar_metricas_balanced.py
These scripts compute imbalance-aware metrics: - Accuracy - Macro-F1 - Balanced Accuracy - Class-specific Precision / Recall / F1 (High class) - Computational cost (training + inference time)
Outputs are formatted for: - Publication tables - Web visualisation - Trade-off analysis
06 - Visual analytics
plot_radar_balanced.py plot_ranking_scenarios.py tradeoff_map.py generate_tradeoff_map_pm25_lisbon_2021_2023.py
These scripts produce the graphical synthesis of the evaluation: - Radar comparison under balanced scenario - Multi-criteria ranking across decision scenarios - Performance–Cost trade-off map - Efficient frontier visualisation
Figures are exported in:
- figures_publication/ (300 dpi TIFF)
- images/ (PNG for web rendering)
Execution logic
Scripts are intended to be executed sequentially:
- Data acquisition
- Preprocessing
- Model-ready dataset construction
- Model training
- Metric extraction
- Visual analytics
This structure ensures full traceability and reproducibility.
Reproducibility and transparency
All scripts: - Are fully documented. - Rely exclusively on open-source tools. - Produce results programmatically. - Can be executed independently. - Require no manual data manipulation
The computational environment is managed using Conda.
Final remark
This script-based architecture reflects a deployment-oriented evaluation philosophy, where: - Temporal integrity is preserved - Rare-event detection is prioritised - Computational efficiency is quantified - Model trade-offs are explicitly visualised
The objective is methodological clarity and operational realism, not technological novelty.
The structure is intended for both academic research and advanced teaching in applied data science.