Abstract
Based on air quality index data for the period 2018–2022, Hungary ranks as the 80th most polluted country in the world. Given the air pollution data measured in Hungary and the health impact of air pollution, it is of utmost importance to measure air quality in Hungary focusing on PM10 and PM2.5 pollutants. One possible solution for high-density measurement is to utilize low-cost sensors at the population level. The calibration procedure has to be carried out in a way that does not incur extra costs and maintenance at the physical level. A potential solution is the development of an algorithm to perform the calibration with remote access. This publication presents a fragment of this development, where we attempted to implement the procedure using a neural network and performed a comparative analysis with official data.
1 Introduction
On 12 May 2021, the European Commission adopted the EU Action Plan on “Zero Pollutants in Air, Water and Soil” [1], a key commitment of the European Green Deal. The Zero Pollutant Goal 17 is an overarching objective that contributes to the UN 2030 Agenda for Sustainable Development and complements the EU's climate neutrality target for 2050, in synergy with the goals for a clean and circular economy and the restoration of biodiversity. The main objective of the Action Plan is to provide guidance for integrating pollution prevention into all relevant EU policies, maximizing synergies in an effective and proportionate way, enhancing implementation and identifying possible gaps or trade-offs. In order to steer the European Union (EU) towards the vision of a Healthy Planet for All by 2050, the Action Plan sets out key targets for 2030 to accelerate the reduction of pollution:
Key target 1: Reduce the health impacts of air pollution (premature deaths) by more than 55% – based on Directive 2016/2284/EC of the European Parliament and of the Council on national emission reduction commitments.
Key target 2: The EU should reduce the proportion of EU ecosystems where air pollution threatens biodiversity by 25% by 2030 – based on Directive (EU) 2016/2284 of the European Parliament and of the Council on national emission reduction commitments.
In June 2022, the Environment Council adopted a general approach on the review of the EU ETS. In December 2022, the Council reached a provisional agreement with the European Parliament. Under this agreement, the overall emissions reduction target for sectors covered by the EU ETS for 2030 was increased from 61% to 62%, as proposed by the Commission.
The project GINOP-2.3.4 (Establishment of Advanced Materials and Smart Technologies at the University of Miskolc – GINOP-2.3.4 – 15-2016-00004), was to establish the foundation for intelligent building management and urban management systems and services. These systems and services can be integrated into sustainable building management systems, they contribute to more efficient and sustainable operation, and prepare decisions and interventions by analyzing measured environmental and comfort perception and structural parameters. A sensor-based urban monitoring and operation support system is composed of two main parts. Firstly, low-cost sensor units for air quality measurement have been developed within the GINOP-2.3.4 project and, secondly, a methodology to describe the system and its applicability.
The low-cost sensor units used for air quality measurement also include sensors for measuring particulate matter concentrations. Several of these sensor units have been installed in the Miskolc-Martinkertváros area, and their data is stored and summarized in a central database.
The measured data are recorded every 10 min along with time stamps and coordinates. In addition to the station deployed by the National Air Pollution Monitoring Network (NAPMN) in Miskolc-Martinkertváros, a proprietary sensor unit was also installed. The data logger placed next to the NAPMN station provides data every 6 s when the particulate matter sensor is active.
The sensors used in NAPMN stations are subjected to a rigorous calibration process every six months, which involves considerable costs. A prerequisite for using the low-cost sensor unit developed within the GINOP-2.3.4 project for residential and municipal surveillance purposes is that the sensors are installed in a calibrated condition and are automatically recalibrated at regular intervals without the need for dismantling or replacement. This requires the development of a calibration method. The method involves processing of measured data using neural networks and linear regression. The test results will be evaluated based on validated data. One year's worth of measured data from the two data sets was obtained.
The long-term reliability (in terms of accuracy) of the outdoor sensor-based air quality measurement system can be maintained in the long-term without the need for periodic on-site servicing of the physical device, and the calibration can be performed by the use of the appropriate software. This methodological element means significant cost and resource savings in operating the system, and the accurately measured data ensures more efficient and sustainable operation of municipalities and public buildings.
2 Data matching and presenting data
The analysis was preceded by matching the existing data. The fundamental challenge was to create a dataset from all the available different sources that could be analyzed in conjunction with the NAPMN and the sensor deployed by the University of Miskolc. The available data consists of air parameters measured independently by the University of Miskolc and the data collected by the NAPMN. The data are available for the period between February 2019 and February 2020.
The authors present the data collected by the University of Miskolc. This data has been provided in text format. Each file contains data for a maximum of one day, although there may be cases where data from a single day needs to be extracted from 5 to 8 separate data files. The measurement frequency is set at 6 s, assuming all sensors are read successfully, resulting in a potential maximum of 10 records per minute. However, this frequency can decrease to as few as 1–2 records per minute. Table 1 illustrates the data and the suitable units of measurement corresponding to three different kind of sensors Particle Measure Systems 7003 (PMS7003), Budapest University of Technology and Economics (BUTE) sensor and Digital Humidity and Temperature (DHT) sensors. The BUTE and DHT sensors also measure humidity and temperature, in the tests, were used the measured values of the sensor that were closer to the reference values.
Sensor data from University of Miskolc
Name | Unit of Measurement | Example |
Timestamp | Date | 2019.2.8, 0:0:5 |
BUTE Pressure | Pa | 1009.02 |
BUTE Temperature | Celsius | −1.65 |
BUTE Humidity | % | 80.43 |
DHT Temperature | Celsius | 20.58 |
DHT Humidity | % | 77.56 |
PMS7003 PM10 | μg m−3 | 34 |
PMS7003 PM2.5 | μg m−3 | 25 |
PMS7003 PM1 | μg m−3 | 13 |
PMS7003 0.3 μm number of particles | pcs dm−3 | 2,345 |
PMS7003 0.5 μm number of particles | pcs dm−3 | 254 |
PMS7003 1 μm number of particles | pcs dm−3 | 23 |
PMS7003 2.5 μm number of particles | pcs/dm3 | 10 |
PMS7003 5 μm number of particles | pcs m−3 | 3 |
PMS7003 10 μm number of particles | pcs m−3 | 1 |
Status | OK/Error | – |
The data mentioned above are included in the files provided by the University of Miskolc. Data are separated by commas in the data files, where one row represents one measurement. Within these data files, entries are separated by commas, with each row representing a single measurement. For the purposes of analysis and validation, a total of 852 files were acquired, necessitating approximately 547 MB of storage space.
The NAPMN data were provided by the University of Miskolc for strictly research purposes. These datasets are stored in a standard file format, accessible through applications like Excel. Importing the data is faster than using Excel. The NAPMN data file contains the following information as it is shown in Table 2.
NAPMN data
Name | Unit of measurement | Example |
Time of measurement | Date Time | 28/02/2019 23:03 |
PM10 | μg m−3 | 24.6 |
Wind direction | degree | 300.0 |
Wind speed | km h−1 | 3.0 |
Temperature | Celsius | 34.6 |
Humidity | % | 56.0 |
PM2.5 | μg m−3 | 13.4 |
Within the NAPMN data, minute averages have been incorporated; however, occasional gaps of up to one day exist. Given the non-standard format and text file nature of this data, a tailored software solution has been developed to facilitate its processing. This comprehensive software entails a customized parser and a dedicated data model. Based on past experiences, a C# console application has been selected, as it is considered suitable for the array of software tasks anticipated.
3 Theoretical background of the examined procedures
A requirement for using the developed low-cost sensor unit for residential and municipal surveillance purposes is that the sensors are calibrated and automatically recalibrated at regular intervals without the need for dismantling or replacement. This requires the development of a calibration method. Through training and testing conducted by the authors, the data from the sensors developed within the NAPMN project are ensured to align as closely as possible with the temperature, humidity, PM10, and PM2.5 parameters measured by the NAPMN station. To assess the accuracy of the neural network-based classifier, a classification based on the Air Quality Index (AQI) of particulate matter concentration was employed. Regardless of their chemical composition, particulate matter is categorized based on the size of airborne particles, and its concentration is classified using a scale. From a health perspective, particulate matter smaller than 10–16 μm is of great importance. Measuring instruments typically gauge sizes of 10, 4, 2.5, and 1 μm separately. Based on the obtained data series, tests were performed for both PM10 and PM2.5 sizes [2].
The AQI serves as an indicator of how polluted the air is and how that pollution might affect health. Concentrations of PM2.5, measured in μg m−3 are categorized into six classifications based on their potential health impact. An AQI value ranging from 0 to 50, corresponding to PM2.5 concentrations between 0 and 12 μg m−3, indicates ‘Good’ air quality. As the AQI value increases, indicating higher PM2.5 concentrations, the air quality deteriorates. For instance, AQI values between 51 and 100, which correspond to PM2.5 levels of 12–35 μg m−3, are classified as ‘Moderately good.’ When AQI surpasses 300, reflecting PM2.5 concentrations greater than 250 μg m−3, the air quality is deemed ‘Dangerous.’ This classification system aids in understanding the potential health risks associated with varying levels of air pollution [2].
PM10 are airborne particles of less than 10 μm. They can be of any composition: organic, inorganic, solid, even droplets of dew in humid weather. When inhaled, they pass through the pharynx, usually irritating the upper respiratory tract. They can increase the number and severity of asthma attacks, trigger or aggravate bronchitis and other lung diseases, and reduce the body's defences against infections. PM10 includes fine particles defined as PM2.5, which are 2.5 μm or less in diameter.
PM2.5 refers to particles that are two and a half microns in size. Their dimensions are approximately thirty times smaller than the diameter of a human hair. These minuscule particles have the ability to penetrate deep into the respiratory tract, reaching as far as the lungs. They can cause irritation in the eyes, nose, and throat, leading to symptoms as coughing, sneezing, and a runny nose. Prolonged exposure to PM2.5 can result in shortness of breath, reduced lung function, and adverse effects on individuals with asthma and heart conditions.
These particles primarily originate from outdoor combustion sources as fuels, power plants, heating systems, and natural fires. However, they can also originate from gases and tiny vapor droplets in the atmosphere, traveling considerable distances from their point of emission. Forest fires or a volcanic eruption can significantly increase airborne concentrations up to hundreds of kilometers away [2].
3.1 Neural networks
Deep neural networks are currently among the most extensively researched topics in the field of applied artificial intelligence. Their applications span a wide range and are continuously growing. Given that neural networks have been successfully utilized in other studies for calibration and parameterization of gases [3, 4], the authors were motivated to explore this direction. Neural networks allow us to solve problems that would previously have been considered impractical due to limited computing resources. Through the addition of multiple hidden layers, deep neural networks can effectively extract complex data abstractions. These abstractions play a crucial role in building more accurate models compared to conventional machine learning methods.
The smallest component of a neural network is the perceptron (Fig. 1), i.e., a processing element. The classical artificial neuron is a multi-input, single-output unit that implements some kind of nonlinear mapping between inputs and outputs. One of its important properties is that it invokes a nonlinear activation function on the weighted sum of the inputs of each neuron, which provides the output of the neuron. During network training, these weights must be adjusted to produce the desired output [5].
Neural networks can be divided into two main groups according to how neurons are connected to each other.
The first type is the feed-forward neural network [5, 7]. In a feed-forward network, the only variable that requires adjustment is the weight vector of synaptic connections between neurons. Similar networks give a consistent output for a given input regardless of the number of iterations.
The second type is recurrent networks [5, 8–10]. In recurrent networks, the synaptic connections depend not only on the inputs of other neurons, but also on the neuron's own previous inputs. These networks can form cyclic structures, introducing feedback loops of different lengths. A feedback network is a graph, which can contain circles with arbitrary length.
Since in these networks the output of any neuron can be connected to the input of another neuron, the current output of the network no longer depends only on the vector generated by the weights of the synaptic connections in the network, but also on the previous state of the network [5]. This creates a kind of memory in recurrent networks.
Network training is the process of optimizing computational parameters, where the goal is to obtain approximately correct output values for the input data. This process is based on a combination of the data used for training and also on the algorithms. The training algorithm selects different combinations of computational parameters and evaluates these combinations by applying them to each training case. In doing so, it determines how good the network's responses are. Each parameter combination is actually a test run. An important property of neural networks is their learning ability. During the learning process, neural networks can improve and adapt the weights of the network with the sample points used during training [5]. The goal of parameter modification during learning is to produce the desired responses for the given inputs, so that later on it will produce the correct output only for the input data not used during training (also for regression and classification problems). The (back-propagation) algorithm tries to reduce the error of the model after each learning iteration (learning epoch) by modifying the weight functions of the network [6].
Using the validated data for testing the calibration procedure, the feed-forward backprop type neural network was chosen among the available neural network types, and the reasons and advantages of this network are described below.
The essential characteristic of a neural network is learning from a set of samples, so the effectiveness of the network depends to a large extent on the number of samples trained. If the network is built on poor, sparse, or excessively noisy sample sets, the system will compute output values based on the faulty connections. The success of training is an essential component, as the network tends to over-learn the input data. In this case, the network does not learn the relationships between the set of samples, but the individual samples, leading to obvious bias [6].
The correction of the data for possible deviations and the compilation of statistics for the full one-year measurement period (availability time, calculation of mean and standard deviation values by day) were carried out using MATLAB software. MATLAB software and its add-ons (Neural Network Toolbox) were also used to build, train and test the neural networks.
4 Neural network learning and training results
In the study of neural networks, three types of training were conducted. In each case, the predefined feed-forward backprop neural network type was selected within the Neural Network Toolbox add-on of MATLAB software. For two of the performed tests, the full annual dataset from February 2019 to January 2020 was used both as training and testing dataset. In these two cases, calibration of particulate matter concentrations of different sizes (PM10 and PM2.5) was attempted. In the third case, the calibration of PM10 particulate matter concentrations was tested on a monthly basis. For each month, training and testing were performed using the respective monthly data. The following sections summarize the defined exercise and the results of the tests.
The effectiveness of the neural network is measured by the percentage of correspondence between the AQI class classification of the PM10 and PM2.5 data measured by the NAPMN and the AQI class classification of the data calibrated by the neural network.
4.1 Calibration of PM2.5 particle concentration data series with a one-year training set
Task: to study a feed-forward backprop neural network trained on a validated and migrated annual dataset (525 538 samples). The training is performed in 4 different ways according to the type and number of training parameters:
using PM10 data;
using PM10 and NAPMN temperature data;
using PM10 and NAPMN humidity data;
using PM10, NAPMN humidity and NAPMN temperature data.
The differences between the efficiency results presented in Table 3 are minimal, within one tenth of a percentage point. In this context, the training was most successful when the humidity measured by the NAPMN station was used as a correction factor in addition to the PM10 concentration. This may be because the air humidity affects the concentration of particulate matter, the higher the humidity the lower the concentration. The use of NAPMN temperature data as a correction factor can also be justified on these grounds. Using both parameters together results in a 0.09% underperformance, but improves the overall efficiency.
Training efficiency with varying parameterization
Teaching dataset(s) | Number of teaching dataset(s) | Teaching efficiency | ||
PM10 | 1 | 91.5648% | ||
PM10 | NAPMN temperature | 2 | 91.7505% | |
PM10 | NAPMN humidity | 2 | 91.7875% | |
PM10 | NAPMN temperature | NAPMN humidity | 3 | 91.6908% |
Figures 2–4 illustrate the different neural network architectures. When using the 3 parameters, the number of hidden layers should be increased. Figures 2–4 contain the author's editing about the structure of a neural network from MATLAB.
Neural network structure in the case of one teaching parameter
Citation: Pollack Periodica 20, 1; 10.1556/606.2024.00951
Neural network structure in the case of two teaching parameters
Citation: Pollack Periodica 20, 1; 10.1556/606.2024.00951
Neural network structure with three teaching parameters
Citation: Pollack Periodica 20, 1; 10.1556/606.2024.00951
4.2 Calibration of the PM2.5 particulate matter concentration data series with a one-year training set
The task: to study a feed-forward backprop neural network trained on a validated and migrated annual dataset (525 538 samples). The training is performed in four different ways according to the type and number of training parameters:
using PM2.5 data;
using PM2.5 and NAPMN temperature data;
using PM2.5 and NAPMN humidity data;
using PM2.5, NAPMN humidity and NAPMN temperature data.
The differences between the efficiency results in Table 4 are also minimal in this case with variations within one-tenth of a percentage point. In this context, the training proved to be most successful when the humidity and temperature measured by the NAPMN station were used as correction factors in addition to the PM2.5 concentration value. The reason for this can be explained by the interaction of particulate matter concentration, humidity, and temperature as explained in the previous chapter. Overall, the results from the PM2.5 particulate matter concentration calibration tests showed ∼5% better efficiency compared to the PM10 particulate matter concentration calibration test results. Since the numerical values within the term represent the particle size, the difference in results may be due to the fact that the PM10 values include the PM2.5 values, i.e. the PM2.5 value fluctuates over a smaller range.
Efficiency of training with different parameterization
Teaching dataset(s) | Number of teaching dataset(s) | Teaching efficiency | ||
PM10 | 1 | 96.0606% | ||
PM10 | NAPMN temperature | 2 | 96.1147% | |
PM10 | NAPMN humidity | 2 | 96.0785% | |
PM10 | NAPMN temperature | NAPMN humidity | 3 | 96.1456% |
Figures 5–7 illustrate the different neural network architectures. When using three parameters, the number of hidden layers should be increased.
Neural network structure in the case of one teaching parameter
Citation: Pollack Periodica 20, 1; 10.1556/606.2024.00951
Neural network structure in the case of two teaching parameters
Citation: Pollack Periodica 20, 1; 10.1556/606.2024.00951
Neural network structure in the case of three teaching parameters
Citation: Pollack Periodica 20, 1; 10.1556/606.2024.00951
4.3 Calibration of PM10 particulate matter data series with monthly training set
Task: Analyzing the feed-forward backprop neural network trained with a validated and migrated annual dataset on a monthly basis. We conduct separate training runs for each month and assess its efficiency.
Figure 8 illustrates the neural network architecture used in the training, consisting of 1 hidden layer and 1 neuron per layer. Previous test results indicate that the efficiency values show minimal differences during calibration, while the complexity of the neural network increases more steeply. The aim of this study is to find out whether there is an efficiency improvement when the sensor data is calibrated for a given month (covering a smaller time interval) or not. This study was carried out using PM10 particulate matter concentration data.
The architecture of the trained neural network
Citation: Pollack Periodica 20, 1; 10.1556/606.2024.00951
Table 5 and Fig. 9 also present the efficiency of the monthly training sessions visually. A significant variation is observed in the percentage results. The highest efficiency value was 99.4578% in August, while the lowest was 82.8172% in January, with a difference of almost 17%. This variation can be attributed to temperature and humidity values. During the heating season and winter's low temperatures, particulate matter concentrations tend to be higher, and grain sizes can become more variable compared to the summer months. The average of the monthly efficiency values is 92.8763%, which shows approximately ∼1% better results than learning from annual PM10 data along with compensation values.
Training effectiveness and averages by month
Month | Teaching efficiency | Season | Average efficiency |
March 2019 | 91.7065% | Spring | 93.8101% |
April 2019 | 90.2884% | ||
May 2019 | 99.4354% | ||
June 2019 | 99.0833% | Summer | 99.2292% |
July 2019 | 99.1464% | ||
August 2019 | 99.4578% | ||
September 2019 | 98.6781% | Fall | 90.1891% |
October 2019 | 83.4698% | ||
November 2019 | 88.4194% | ||
December 2019 | 89.1371% | Winter | 84.9638% |
January 2020 | 82.8172% | ||
February 2020 | 82.9371% | ||
Annual average | 92.8763% |
PM10 particulate matter concentration calibration efficiency by month
Citation: Pollack Periodica 20, 1; 10.1556/606.2024.00951
This suggests the value of conducting further studies along this line. Furthermore, a significant discrepancy exists between the seasonal and average efficiencies, further supporting the case for additional testing.
5 Summary
The study presented in this paper helps to improve the accuracy of the instrument developed by the University of Miskolc and provides a transparent and clear data set from which further statistics and calibration procedures can be developed.
The neural networks were investigated by performing three types of training. In each case, the predefined feed-forward backprop type neural network was chosen within the Neural Network Toolbox add-on of MATLAB software was chosen. In two of the tests performed, the full annual dataset from February 2019 to January 2020 was used as both training and testing dataset. In two cases, calibration of particulate matter concentrations of different sizes (PM10 and PM2.5) was attempted. In the third case, the calibration of PM10 particulate matter concentrations was tested on a monthly basis. For each month, training and testing with the respective monthly data series was performed.
In summary, the results of the neural network efficiency show that the neural network method is suitable for calibrating particle concentration data, with initial tests providing an average result of over 90%. The calibration of PM10 particulate matter concentration data is very challenging. The findings also indicate the benefits of incorporating temperature and humidity compensation values during the calibration process. A comparison between the outcomes of learning on a yearly and monthly basis points towards the idea of investigating the use of distinct calibration neural networks throughout a year, considering either seasons or individual months. Moreover, it is valuable to explore determining the optimal points for dividing individual training sessions and the related data. Overall, over the one-year measurement period, there were a total of 5,523,772 min of data collected. This indicates that 1,828 min of data were not available for the full year. This means that the sensors were operating with 99.66% availability.
Acknowledgements
The research work was carried out with the support of the European Union and the Hungarian state, co-funded by the European Regional Development Fund, in the framework of the project GINOP-2.3.4-15-2016-00004, with the aim of promoting cooperation between higher education and industry.
References
- [1]↑
European Commission, EU Action Plan: “Road to pollution-free air, water and soil”, 2021. [Online] Available: https://eur-lex.europa.eu/legal-content/HU/ALL/?uri=CELEX:52021DC0400. Accessed: Apr. 16, 2024.
- [2]↑
Air pollution (in Hungarian). [Online]. Available: http://levego.enum.hu/wplevego/index.php/szallo-por/. Accessed: Feb. 21, 2023.
- [3]↑
I. Bolkeny and V. Fuvesi, “AI based predictive detection systems,” Pollack Period., vol. 13, no. 2, pp. 137–146, 2018.
- [4]↑
I. Bolkeny and L. Czap, “AI based detection of gas hydrate formation in the field,” Pollack Period., vol. 15, no. 3, pp. 72–78, 2020.
- [6]↑
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, 1989.
- [7]↑
Sensor data. [Online]. Available: https://tdk.bme.hu/VIK/Jelfeld//Tevekenyseg-modellezes-szenzoradatok-alapjan1. Accessed: Apr. 15, 2024.
- [8]↑
K. E. Parsopoulos, Particle Swarm Optimization, and Intelligence: Advances and Applications (Premier Reference Source). IGI Global Publisher, 2010.
- [9]
J. A. Anderson, Introduction to Neural Networks, Bradford Books, 1995.
- [10]
B. Luitel and G. K. Venayagamoorthy, “A PSO with quantum infusion algorithm for training simultaneous recurrent neural networks,”, in 2009 International Joint Conference on Neural Networks. Atlanta, GA, USA, June 14–19, 2009, pp. 1923–1933.