Abstract
Due to the relative independence from the operational parameters, the linear retention indices (LRIs) are useful tool in gas chromatography-mass spectrometry (GC-MS) qualitative analysis. The aim of the current study was to develop a multiple linear regression (MLR) model for the prediction of LRIs as a function of selected molecular descriptors. Liquid injection GC-MS was used for the analysis of Essential oils (Rose, Lavender and Peppermint) separating the ingredients by a semi-standard non-polar stationary phase. As a result, a sum of 103 compounds were identified and their experimental LRIs were derived relying on reference measurements of a standard mixture of n-alkanes (from C8 to C20). As a next step, a set of molecular descriptors was generated for the distinguished chemical structures. Further, a stepwise MLR was applied for the selection of the significant descriptors (variables) which can be used to predict the LRIs. From an exploit set of over 2000 molecular descriptors, it was found that only 16 can be regarded as significant and independent variables. At this point split validation was applied: the identified compounds were randomly divided into training (85%) and validation (15%) sets. The training set (87 compounds) was used to derive two MLR models by applying i) the ‘enter’ algorithm (R2 = 0.9960, RMSЕ = 17) and ii) the ‘stepwise’ one (R2 = 0.9958, RMSЕ = 17). The predictive power was assessed by the validation set (16 compounds) as follows i) q2F1 = 0.9896, RMSE = 25 and ii) q2F1 = 0.9886, RMSE = 26, respectively. The adequateness of both regression approaches was further evaluated. Newly developed headspace-solid phase microextraction (HS-SPME) procedures in combination with GC-MS were used for an alternative analysis of the studied Essential oils. Twelve additional compounds, not previously detected by the liquid sample introduction mode of analysis, were identified for which the values of the significant descriptors were within the working range of the developed MLRs. For the last compounds, the LRIs were calculated and the experimental data was used as an external set for assessment of the regression models. The predictive power for both regression approaches was assessed as follows: Enter RMSE = 41, q2F2 = 0.9503 and Stepwise RMSE = 40, q2F2 = 0.9521.
1 Introduction
Usually, Essential oils are complex systems of multiple chemical components obtained from plant material through steam distillation. They are mainly composed of lipophilic and highly volatile metabolites such as mono and sesquiterpenes. Other classes of compounds discovered are oxides, alcohols, aldehydes, ketones, esters, and heterocycles [1]. However, their composition varies significantly depending on the place of growth, seasonal fluctuations and weather conditions [2, 3]. Due to the complexity of the composition of the oils and the danger of adulteration with other oils or synthetic materials, a need for selective technique with the capability to characterize the variety of compounds in the composition of the oils is required. Due to the volatile and relatively volatile nature of essential oil components, they can be analyzed using gas chromatography in combination with mass spectrometry [4, 5]. The gas chromatographic method is used for the qualitative analysis of volatile compounds, providing information on retention times as the criterion for peak identification. The mass spectrometer, used as a detector, offers additional information for the identification of individual compounds by comparing the spectra of each component with those in the spectral library. Since essential oils are mixtures of terpene or phenylpropane derivatives, their spectra are similar and the correct identification of the peaks is highly sophisticated, sometimes impossible [1]. To increase the reliability of analytical results for evaluating the composition of essential oil components, it is necessary to use a combination of criteria and different identification approaches, including methods of separation and concentration, comparison with spectra from databases and calculation of retention indices (RI) [1, 6, 7]. Retention indices RI denote the retention behavior of the compounds of interest according to a uniform scale determined by a series of closely related standard substances (n-alkane scale) for isothermal conditions. The linear retention indices (LRIs), used for programmed gas chromatographic runs, can be calculated by applying the equations proposed by van Den Dool and Kratz [1]. Many efforts can be focused on developing predictive models for LRIs in the analysis of essential oils by GC-MS to facilitate and reduce the identification process [8, 10, 11].
The development of the prediction models includes a selection of descriptors (physical-chemistry properties of the compounds), validating the model using an internal set of compounds and testing using an external set of compounds for the developed model. The prediction coefficient q2 is the main criterion used to assess the actual predictive power of the models [9, 12–14] as well as calculating the root mean square error (RMSE), etc.
In the present study linear regression models as a simple approach for predicting the LRIs in gas chromatographic analysis of volatile components of essential oils for the semi-standard non-polar stationary phase were developed. Two different techniques for analysis i) direct liquid injection with appropriate dilution and ii) alternatively HS-SPME [15] were used for the analysis of different essential oils bought from the local market. Newly developed HS-SPME were proposed. The data collected from the real samples were used to calculate the LRIs. Regression models were developed by the selection of significant descriptors (independent variables, x), correlated with LRI (dependent variable, y), evaluated by applying split validation and tested by an external set of compounds.
2 Experimental
2.1 Chemicals and reagents
Dichloromethane (GC grade) was purchased from Honeywell (Riedel-de Haën GmbH, Seelze, Germany). For the solid-phase microextraction procedure, a conventional Carboxene (CAR)/Polydimethylsiloxane (PDMS) SPME fiber with a coating thickness of 85 μm and an SPME holder from Supelco (Bellefonte, PA, USA) were used. Chromatography vials with a volume of 10 mL were purchased from Supelco (Bellefonte, PA, USA) and used for HS-SPME. An alkane mixture C8–C20 in hexane at a concentration of 40 mg L−1 was purchased from Supelco (Bellefonte, PA, USA).
2.2 Instrumentation
The study was carried out using a GC-MS/MS chromatographic system TSQ 9000 (Thermo Scientific, USA) with electron impact ionization (EI) at 70 eV and a PTV injector. A volume of 1 µL of each liquid essential oil sample, after appropriate dilution was injected using an autosampler AI 1300 equipped with a 10 µL glass syringe. HS-SPME injections were performed manually. The PTV injector was operated in Split mode (split ratio 10:1), at a constant inlet temperature of 280 °C. Metal liner (PTV Siltek Metal Liner, 2 mm ID, 2.75 mm OD, 120 mm Length, Thermo Fisher Scientific, USA) was used as suitable for both analyzes. GC column TG SQC MS (15 m × 0.25 mm × 0.25 µm film thickness, Thermo Fisher Scientific, USA) was used for the chromatographic separation. The temperature program of the column oven was optimized to obtain adequate chromatographic separation as follows: initial temperature of 40°C min −1 held for 2 min; at a rate of 1 °C min−1 to 200 °C; at a rate of 30 °C min−1 to a final temperature of 280 °C, which was held for 3 min. The solvent delay time and total analysis time were 2 and 168 min, respectively. Helium (purity 99.9999%) at a flow rate of 1.0 mL min−1 was used as a carrier gas. The transfer line and Ion source temperatures were set at 250 °C and 230 °C, respectively. Full scan mode in the range 35–500 amu was used, with a dwell time of 0.2 s. The data analysis was performed using Xcalibur 4.1 software, and initial identification of volatile organic compounds was performed by comparing their spectra with the spectra from the NIST Mass Spectral Library (NIST MS Search 2.3). Regarding the use of RI and the similarity of the mass spectra, the following acceptance criteria were applied respectively – a matching interval of ±20 units of the retention indices and a similarity value of at least 80%.
2.3 Sample type
Essential oils from Rose oil (Rosa damascena), Peppermint oil (Mentha piperita L.) and Lavender oil (Lavandula angustifolia) were purchased from the local market. All samples were stored in the dark at 4–8 °C temperature before analysis.
2.4 Liquide sample analysis
The essential oils were appropriately diluted in dichloromethane (Lavender oil DF = 100, Rose oil DF = 200, and Peppermint oil DF = 400). The diluted samples were prepared in triplicate and measured according to the instrumental conditions listed in section 2.2. by liquid injection GC-MS/MS.
2.5 HS-SPME of VOCs
The SPME fiber was preconditioned once in the injection port of the gas chromatograph for 30 min at 280 °C, before HS-SPME, according to the manufacturer's recommendations. A blank fiber injection was measured daily. On the bottom of the 10 mL glass (headspace) vial it was placed a piece of paper (dimensions: 0.5 × 0.5 cm). 5 µl Peppermint oil, 10 µl Lavender oil or Rose oil were transferred onto the paper surface and the vials were closed with polytetrafluoroethylene (PTFE) coated silicone rubber septum and aluminium caps. The vials were placed for specified conditioning time (Rose oil, 45 min; Lavender oil, 45 min; Peppermint oil, 30 min) at room temperature 24 °C ± 2 °C and no stirring. After the prescribed time, the SPME fiber was introduced into the headspace for experimentally determined sorption time (Rose oil, 10 min, Lavender oil, 10 min, Peppermint oil, 5 min) and injected into the GC injection port for desorption of 3 min. A schematic illustration of the optimized HS-SPME procedure is presented in Fig. 1.
2.6 Prediction of LRIs by multiple linear regression (MLR)
TSS – total sum of squares
RSS – residual sum of squares
R2 – coefficient of determination;
n – number of observations
m – number of independent variables in the data set
Ri2 – coefficient of determination in linear modelling of the i-th studied molecular descriptor from all other dependent variables j (i ≠ j).
Values of VIF less than 15 were accepted to indicate no variable collinearity [16].
xi: the value of the i-th molecular descriptor;
βi: the i-th regression coefficient.
yi – observed values of linear retention indices
ŷi – predicted values of linear retention indices
ȳtr – mean of the observed linear retention indices of the training set
ȳextt – mean value of the observed linear retention indices of the external set
3 Results and discussion
3.1 Analysis of essential oils by liquid injection GC-MS
The diluted Essential oils were analyzed in triplicate and according to the prescribed acceptance criteria (section 2.2.) were identified 49 components in the Lavender oil, 51 compounds in the analysis of two commercial brands of Rose oil and 32 compounds were identified in the Peppermint oil. A total of 103 unique compounds were selected and SMILES notations for each compound were derived from the PubChem database.
3.2 Multiple linear regression for predicting linear retention indices
3.2.1 Selection of significant independent variables (descriptors)
The LRI of each compound from the set of 103 compounds identified by liquid injection GC-MS were calculated according to the equation of H. van Den Dool and D. J. Kratz (Eq. (1)) using the average retention time determined from three replicates and the average tR of the n-alkanes from the standard solution (n = 3). The obtained LRI values and the SMILES notations for the compounds are presented in Table 1.
Calculated LRIs and SMILES notations for the identified by liquid injection GC-MS compounds in Rose, Lavender and Peppermint oils
Compound | LRI | SMILES notation |
Ether, hexyl methyl | 822 | CCCCCCOC |
3-Methylcyclopentanone | 840 | CC1CCC(=O)C1 |
3-hexenol | 855 | CCC=CCCO |
Hexyl alcohol | 871 | CCCCCCO |
Heptanal | 903 | CCCCCCC=O |
α-Thujene | 920 | CC1=CCC2(C1C2)C(C)C |
α-Pinene | 924 | CC1=CCC2CC1C2(C)C |
Camphene | 936 | CC1(C2CCC(C2)C1=C)C |
3-Methylcyclohexanone | 941 | CC1CCCC(=O)C1 |
Sabinene | 964 | CC(C)C12CCC(=C)C1C2 |
3-p-Menthene | 975 | CC1CCC(=CC1)C(C)C |
n-Octanone-3 | 987 | CCCCCC(=O)CC |
β-Pinene | 990 | CC1(C2CCC(=C)C1C2)C |
β-Myrcene | 990 | CC(=CCCC(=C)C=C)C |
n-Octan-3-ol | 1002 | CCCCCC(CC)O |
δ-3-Carene | 1004 | CC1=CCC2C(C1)C2(C)C |
α-Terpinene | 1010 | CC1=CC=C(CC1)C(C)C |
n-Hexyl acetate | 1016 | CCCCCCOC(=O)C |
β-Cymene | 1018 | CC1=CC(=CC=C1)C(C)C |
Limonene | 1020 | CC1=CCC(CC1)C(=C)C |
Eucalyptol | 1022 | CC1(C2CCC(O1)(CC2)C)C |
trans-β-Ocimene | 1040 | CC(=CCC=C(C)C=C)C |
cis-β-Ocimene | 1044 | CC(=CCC=C(C)C=C)C |
γ-Terpinene | 1050 | CC1=CCC(=CC1)C(C)C |
4-Thujanol | 1060 | CC(C)C12CCC(C1C2)(C)O |
Linalool oxide | 1066 | CC(=CCCC(C)(C1CO1)O)C |
Linalool oxide furanoide | 1073 | CC1(CCC(O1)C(C)(C)O)C=C |
α-Terpinolene | 1080 | CC1=CCC(=C(C)C)CC1 |
Linalool | 1104 | CC(=CCCC(C)(C=C)O)C |
cis-Rose oxide | 1108 | CC1CCOC(C1)C=C(C)C |
Phenylethyl Alcohol | 1115 | C1=CC=C(C=C1)CCO |
1-Octen-3-yl-acetate | 1119 | CCCCCC(C=C)OC(=O)C |
trans-Rose oxide | 1122 | CC1CCOC(C1)C=C(C)C |
2,4,6-Octatriene, 3,4-dimethyl- | 1127 | CC=CC=C(C)C(=CC)C |
3-Octanol, acetate | 1129 | CCCCCC(CC)OC(=O)C |
Camphor | 1132 | CC1(C2CCC1(C(=O)C2)C)C |
Isopulegol | 1137 | CC1CCC(C(C1)O)C(=C)C |
Menthone | 1146 | CC1CCC(C(=O)C1)C(C)C |
Nerol oxide | 1151 | CC1=CCOC(C1)C=C(C)C |
n-Hexyl isobutyrate | 1151 | CCCCCCOC(=O)C(C)C |
Isomenthone | 1154 | CC1CCC(C(=O)C1)C(C)C |
Borneol | 1159 | CC1(C2CCC1(C(C2)O)C)C |
Neo menthol | 1160 | CC1CCC(C(C1)O)C(C)C |
Terpinen-4-ol | 1168 | CC1=CCC(CC1)(C(C)C)O |
Menthol | 1175 | CC1CCC(C(C1)O)C(C)C |
Cryptone | 1176 | CC(C)C1CCC(=O)C=C1 |
α-Terpineol | 1185 | CC1=CCC(CC1)C(C)(C)O |
Butyric acid, hexyl ester | 1197 | CCCCCCOC(=O)CCC |
γ-Terpineol | 1205 | CC(=C1CCC(CC1)(C)O)C |
Borneol formate | 1216 | CC1(C2CCC1(C(C2)OC=O)C)C |
Pulegone | 1229 | CC1CCC(=C(C)C)C(=O)C1 |
Nerol | 1232 | CC(=CCCC(=CCO)C)C |
Carvone | 1236 | CC1=CCC(CC1=O)C(=C)C |
Citronellol | 1241 | CC(CCC=C(C)C)CCO |
Piperitone | 1244 | CC1=CC(=O)C(CC1)C(C)C |
Isogeraniol | 1249 | CC(=CCC=C(C)CCO)C |
β-Phenylethyl acetate | 1256 | CC(=O)OCCC1=CC=CC=C1 |
Linalyl acetate | 1262 | CC(=CCCC(C)(C=C)OC(=O)C)C |
Geraniol | 1265 | CC(=CCCC(=CCO)C)C |
Citral | 1271 | CC(=CCCC(=CC=O)C)C |
Citronellyl formate | 1278 | CC(CCC=C(C)C)CCOC=O |
Bornyl acetate | 1278 | CC(=O)OC1CC2CCC1(C2(C)C)C |
Menthyl acetate | 1291 | CC1CCC(C(C1)OC(=O)C)C(C)C |
Geranic acid methyl ester | 1325 | CC(=CCCC(=CC(=O)OC)C)C |
Hexyl tiglate | 1333 | CCCCCCOC(=O)C(=CC)C |
Eugenol | 1357 | COC1=C(C=CC(=C1)CC=C)O |
Copaene | 1358 | CC1=CCC2C3C1C2(CCC3C(C)C)C |
β-Bourbonene | 1365 | CC(C)C1CCC2(C1C3C2CCC3=C)C |
Nerol acetate | 1368 | CC(=CCCC(=CCOC(=O)C)C)C |
β-Elemene | 1378 | CC(=C)C1CCC(C(C1)C(=C)C)(C)C=C |
Geranyl acetate | 1388 | CC(=CCCC(=CCOC(=O)C)C)C |
Hexyl hexanoate | 1390 | CCCCCCOC(=O)CCCCC |
β-Caryophyllene | 1397 | CC1=CCCC(=C)C2CC(C2CC1)(C)C |
β-Cubebene | 1399 | CC1CCC(C2C13C2C(=C)CC3)C(C)C |
α-Santalene | 1406 | CC(=CCCC1(C2CC3C1(C3C2)C)C)C |
Methyleugenol | 1410 | COC1=C(C=C(C=C1)CC=C)OC |
α-Guaiene | 1423 | CC1CCC(CC2=C1CCC2C)C(=C)C |
trans-α-Bergamotene | 1424 | CC1=CCC2CC1C2(C)CCC=C(C)C |
α-Caryophyllene | 1430 | CC1=CCC(C=CCC(=CCC1)C)(C)C |
Germacrene | 1459 | CC1=CCCC(=C)C=CC(CC1)C(C)C |
δ-Guaiene | 1488 | CC1CCC2=C(CCC(CC12)C(=C)C)C |
γ-Cadinene | 1495 | CC1=CC2C(CC1)C(=C)CCC2C(C)C |
Pentadecane | 1500 | CCCCCCCCCCCCCCC |
β-Cadinene | 1509 | CC1=CCC2C(C1)C(CC=C2C)C(C)C |
Elemol | 1538 | CC(=C)C1CC(CCC1(C)C=C)C(C)(C)O |
β-Caryophyllene oxide | 1556 | CC1(CC2C1CCC3(C(O3)CCC2=C)C)C |
Nerolidol | 1562 | CC(=CCCC(=CCCC(C)(C=C)O)C)C |
Globulol | 1568 | CC1CCC2C1C3C(C3(C)C)CCC2(C)O |
2-Phenylethyl tiglate | 1580 | CC=C(C)C(=O)OCCC1=CC=CC=C1 |
γ-Eudesmol | 1613 | CC1=C2CC(CCC2(CCC1)C)C(C)(C)O |
α-epi-Cadinol | 1624 | CC1=CC2C(CCC(C2CC1)(C)O)C(C)C |
β-Eudesmol | 1628 | CC12CCCC(=C)C1CC(CC2)C(C)(C)O |
α-Eudesmol | 1632 | CC1=CCCC2(C1CC(CC2)C(C)(C)O)C |
3-Heptadecene | 1674 | CCCCCCCCCCCCCC=CCC |
Heptadecane | 1701 | CCCCCCCCCCCCCCCCC |
Farnesol | 1723 | CC(=CCCC(=CCCC(=CCO)C)C)C |
Farnesal | 1737 | CC(=CCCC(=CCCC(=CC=O)C)C)C |
Benzyl Benzoate | 1748 | C1=CC=C(C=C1)COC(=O)C2=CC=CC=C2 |
n-Octadecane | 1800 | CCCCCCCCCCCCCCCCCC |
β-Phenylethyl benzoate | 1834 | C1=CC=C(C=C1)CCOC(=O)C2=CC=CC=C2 |
1-Nonadecene | 1869 | CCCCCCCCCCCCCCCCCC=C |
Nonadecane | 1901 | CCCCCCCCCCCCCCCCCCC |
Eicosane | 1999 | CCCCCCCCCCCCCCCCCCCC |
PaDEL Descriptor software 2.21 was applied to calculate a set of 1 D, 2 D descriptors and PubChem Fingerprint (total number of 2,325 molecular descriptors) using the SMILES notations. A stepwise linear regression algorithm was used to distinguish the significant and independent variables (xi) correlating with the calculated LRI as the dependent variable (y) (Eq. (5)). Only 16 significant descriptors (from a pool of 2,325) remain in the regression equation (Table 2).
A list of selected significant descriptors and their meaning with details about descriptors, taken from the PaDEL descriptor list
Descriptor | Class | Description |
MLFER_L | Molecular linear free energy relation. | Solute gas-hexadecane partition coefficient |
MLFER_S | Molecular linear free energy relation. | Combined dipolarity/polarizability |
ATSC5c | Autocorrelation descriptor. 2D | Centered Broto-Moreau autocorrelation – lag 5/weighted by charges |
ATS5i | Autocorrelation descriptor. 2D | Broto-Moreau autocorrelation - lag 5/weighted by first ionization potential |
n3HeteroRing | Ring Count Descriptor. 2D | Number of 3-membered rings containing heteroatoms (N, O, P, S, or halogens) |
n4Ring | Ring Coun tDescriptor. 2D | Number of 4-membered rings |
n11Ring | RingCountDescriptor. 2D | Number of 11-membered rings |
GATS3c | Autocorrelation descriptor. 2D | Geary autocorrelation – lag 3/weighted by charges |
TIC3 | Information content descriptor. 2D | Total information content index (neighborhood symmetry of 3 rd order) |
PubchemFP143 | Fingerprint | ≥ 1 any ring size 5 |
PubchemFP147 | Fingerprint | ≥ 1 unsaturated non-aromatic carbon-only ring size 5 |
PubchemFP553 | Fingerprint | O=CC=C |
PubchemFP582 | Fingerprint | CCCCC |
PubchemFP639 | Fingerprint | OCCCO |
PubchemFP672 | Fingerprint | O=CC=C-[#1] |
PubchemFP688 | Fingerprint | CCCCCCC |
From Table 3 it can be seen that the following descriptors showing a linear relationship when determining the linear retention indices of compounds with 95% statistical confidence (R2 = 0.9960, Adj. R2 = 0.9951): MLFER_L, MLFER_S, ATSC5c, ATS5i, n3HeteroRing, n4Ring, n11Ring, GATS3c, PubChem Fingerprint descriptors: PubchemFP143, 147, 553, 582, 639, 672, 688 and TIC3. Additional information about the listed descriptors can be found in [24, 25]. From the VIF values presented also in Table 3, it can be concluded that there is no significant collinearity among the listed variables.
Regression model coefficients and their statistical assessment obtained by stepwise multiple linear regression for selected significant descriptors (experimental set of 103 compounds)
Model Descriptor | Coefficient | Standard deviation | P-value | VIF |
Intercept | 241 | 18 | <0.001 | |
MLFER_L | 177 | 3 | <0.001 | 3.84 |
MLFER_S | 140 | 12 | <0.001 | 2.61 |
PubchemFP639 | −279 | 27 | <0.001 | 2.21 |
ATSC5c | −121 | 55 | 0.030 | 1.64 |
PubchemFP672 | 26 | 13 | 0.048 | 3.91 |
PubchemFP582 | −47 | 10 | <0.001 | 1.18 |
PubchemFP147 | −53 | 14 | <0.001 | 1.16 |
n3HeteroRing | 97 | 20 | <0.001 | 2.39 |
n4Ring | −55 | 8 | <0.001 | 1.32 |
GATS3c | −44 | 8 | <0.001 | 1.35 |
ATS5i | 0.006 | 0.001 | <0.001 | 11.37 |
n11Ring | −74 | 19 | <0.001 | 1.08 |
PubchemFP143 | −35 | 7 | <0.001 | 1.77 |
PubchemFP688 | −26 | 5 | <0.001 | 1.80 |
PubchemFP553 | 26 | 11 | 0.02 | 3.75 |
TIC3 | −0.27 | 0.1 | 0.04 | 5.79 |
It should be emphasized that the number of significant and independent variables is relatively small – only 16 molecular descriptors. The last fact is encouraging in view of developing a simple regression model with a limited number of included variables.
3.2.2 Development and validation of mathematical models
The set of 103 compounds was randomly divided into a training set and a validation set. 85% of compounds (n = 87) were included in a training set and the rest 15% were included in the validation set (n = 16).
Two multiple linear regression models were developed by Enter or Stepwise algorithm, using the calculated LRI of the 87 compounds in the training set as dependent variable (y), and the 16 significant and independent molecular descriptors as variables (xi).
The coefficients of the two developed regression models and their statistical assessment are presented in Tables 4 and 5, respectively.
Regression model coefficients and their statistical assessment obtained by Enter algorithm (training set, n = 87)
Descriptor | Coefficient | Standard deviation | P-value | VIF |
Intercept | 233 | 19 | <0.001 | |
MLFER_L | 179 | 3 | <0.001 | 3.74 |
MLFER_S | 139 | 14 | <0.001 | 2.98 |
PubchemFP639 | −276 | 28 | <0.001 | 2.21 |
ATSC5c | −178 | 68 | 0.011 | 1.71 |
PubchemFP672 | 42 | 16 | 0.010 | 5.20 |
PubchemFP582 | −50 | 12 | <0.001 | 1.12 |
PubchemFP147 | −55 | 20 | 0.007 | 1.11 |
n3HeteroRing | 99 | 21 | <0.001 | 2.41 |
n4Ring | −54 | 9 | <0.001 | 1.38 |
GATS3c | −38 | 8 | <0.001 | 1.29 |
ATS5i | 0.006 | 0.001 | <0.001 | 11.80 |
n11Ring | −71 | 20 | 0.001 | 1.09 |
PubchemFP143 | −30 | 7 | <0.001 | 1.54 |
PubchemFP688 | −25 | 6 | <0.001 | 1.98 |
PubchemFP553 | 10 | 14 | 0.47 | 4.99 |
TIC3 | −0.29 | 0.15 | 0.05 | 5.88 |
Regression model coefficients and their statistical assessment obtained by Stepwise algorithm (training set, n = 87)
Descriptor | Coefficient | Standard deviation | P-value | VIF |
Intercept | 226 | 19 | <0.001 | |
MLFER_L | 179 | 3 | <0.001 | 3.73 |
MLFER_S | 126 | 12 | <0.001 | 2.27 |
PubchemFP639 | −276 | 29 | <0.001 | 2.21 |
ATSC5c | −205 | 67 | 0.003 | 1.60 |
PubchemFP672 | 52 | 9 | <0.001 | 1.61 |
PubchemFP582 | −53 | 12 | <0.001 | 1.10 |
n4Ring | −58 | 9 | <0.001 | 1.29 |
n3HeteroRing | 101 | 21 | <0.001 | 2.41 |
GATS3c | −40 | 8 | <0.001 | 1.28 |
PubchemFP147 | −58 | 20 | 0.006 | 1.10 |
ATS5i | 0.004 | 0.001 | <0.001 | 5.06 |
PubchemFP143 | −29 | 7 | <0.001 | 1.53 |
PubchemFP688 | −19 | 5 | 0.01 | 1.45 |
n11Ring | −70 | 20 | 0.001 | 1.09 |
The range of the predicted LRIs using both algorithms was from 833 to 1993.
The adequateness of the regression model using the Enter algorithm was assessed as follows: R2 = 0.9960, Adj. R2 = 0.9951, and RMSE = 17.
It can be seen that when using a stepwise algorithm, the PubChemFP 553 and TIC3 descriptors were excluded from the model equation as non-significant. The adequateness of the regression model using the Stepwise algorithm was assessed as follows: R2 = 0.9958, Adj. R2 = 0.9949, and RMSE = 17.
The obtained values of R2 and Adj. R2 close to one as well as the obtained low values of RMSE for both algorithms (Enter and Stepwise) show that the developed regression models can be regarded as adequate. This fact is further confirmed by analysis of the experimentally determined (observed) vs predicted plots (Figs 2 and 3). When using the training set, for both algorithms the established slope was statistically identical to one and the calculated intercept practically equals to zero.
The mathematical models were further validated by comparing the observed and calculated LRIs of the compounds from the validation set (16 compounds). The obtained results (Figs 2 and 3) are equivalent to the ones found for the training set i.e. for both algorithms the slope can be regarded as equal to one and meanwhile the intercept is statistically identical to zero.
In addition, for the validation set it were calculated the corresponding RMSE and the prediction coefficient q2F1. For the Enter algorithm, it was found that RMSE = 25 and q2F1 = 0.9896. For the Stepwise algorithm, it was determined that RMSE = 26 and q2F1 = 0.9886. The acceptably low RMSE values in combination with q2F1 values close to one show that the developed MLR models are still adequate even for the prediction of LRIs of compounds not used to build the regression function.
3.2.3 Development and optimization of HS-SPME
The validated MLR models were tested by an external set of compounds, experimentally identified by HS-SPME analysis of the Essential oils. To perform the HS-SPME a semi-polar CAR/PDMS coated fiber was selected, based on the chemical properties of the target compounds, their volatility, and as a good option for the extraction of a wide range of volatile and semi-volatile compounds reported in previously proposed methods [26, 27]. A better signal-to-noise ratio and elimination of matrix components can be achieved by SPME due to the affinity of the volatile compounds to the fiber coating and the realization of preconcentration.
The effects of the conditioning time, extraction time and sample volume on the HS-SPME extraction efficiency were studied using a one-variable-at-a-time approach. For the optimization study nine analytes from the analysis of Rose oil (α-Pinene, β-Pinene, Linalyl alcohol, cis-Rose oxide, Phenethyl alcohol, Nerol, Citronellol, Geraniol, Methyleugenol), seven analytes from Lavender oil analysis (α-Thujene, α-Pinene, Camphene, Sabinene, β-Cymene, Eucalyptol and γ-Terpinene) and nine from Peppermint oil (α-Pinene, Sabinene, Eucalyptol, Menthone, Isomentone, Menthol, Pulegone, Menthyl acetate and β-Caryophyllene) were monitored. The compounds were selected based on belonging to different chemical classes with different properties.
The effect of sample volume was investigated using 5 and 10 µL volumes of Rose, Lavender and Peppermint oils using a conditioning time of 30 min, 10 min sorption time, room temperature and no stirring. It was found that in the analysis of Lavender and Rose oil, 10 µL of the sample is sufficient to obtain high peak intensities under the prescribed chromatographic conditions. A 5 µL volume was found to be sufficient to obtain high intensities in the analysis of the Peppermint oil sample.
The conditioning time was studied at 15, 30, 45, and 60 min using 10 min sorption time, room temperature and no stirring. The obtained results from the optimization of the conditioning time showed that maximum peak areas for the selected volatile compounds in Rose oil and Lavender oil can be achieved after 45 min of conditioning, except α-Pinene, β-Pinene and Camphene. The signals of α-Pinene and β-Pinene decrease with increasing time when Rose oil is analyzed, and β-Pinene and Camphene have maximum achieved peak area at 60 min conditioning time if the analysis is conducted for Lavender oil. Despite the fact the signals achieved at 45 min were high enough for successful identification by GC-MS. Peppermint oil compounds show that the highest peak areas are achieved at 30 min of conditioning time for all selected analytes. Therefore, the conditioning time for the analysis of Rose and Lavender oil was 45 min, while 30 min was used for the analysis of Peppermint oil.
The optimization of the sorption time was performed at 2.5, 5, 7.5, and 10 min using 45 min conditioning time for the analysis of Rose and Lavender oil, and 30 min for the analysis of Peppermint oil, room temperature and no stirring. Due to the higher signals in the analysis of Peppermint oil, the effect of sorption time was carried out only at 2.5 and 5 min. From the achieved results it can be concluded that the sorption efficiency for some of the volatile compounds such as α-Pinene is different in the analysis of the three different matrices. In the analysis of Rose oil and Peppermint oil, the sorption time needs to be shorter to avoid the loss of α-Pinene. For all other volatile compounds, except α-Pinene, a longer sorption time is preferred. For the analysis of Rose and Peppermint oil, 5 min were chosen for HS-SPME, while for Lavender oil the time was set to 10 min.
3.2.4 Testing of the LRI prediction models
According to the prescribed criteria in section 2.2. a sum of 19 additional compounds were identified using the HS-SPME which were not registered (were below the methodological limit of detection) when liquid injection GC-MS analysis was accomplished. However, from the newly detected 19 compounds, only a reduced number of 12 have been selected as a test set since the values of their molecular descriptors were within the corresponding ranges established by the training set (Table 6).
Range of the values of the molecular descriptors in the training and test sets
Descriptor | Training set (n = 87) | Test set (n = 12) | ||
Minimum | Maximum | Minimum | Maximum | |
MLFER_L | 3.52 | 9.75 | 4.42 | 7.10 |
MLFER_S | 0.13 | 1.41 | 0.16 | 0.73 |
ATSC5c | −0.12 | 0.09 | −0.04 | 0.06 |
ATS5i | 1665 | 27156 | 7458 | 22638 |
n3HeteroRing | 0 | 1 | 0 | 1 |
n4Ring | 0 | 1 | 0 | 1 |
n11Ring | 0 | 1 | 0 | 0 |
GATS3c | 0.63 | 2.46 | 0.69 | 1.90 |
TIC3 | 59 | 186 | 77 | 161 |
PubchemFP143 | 0 | 1 | 0 | 1 |
PubchemFP147 | 0 | 1 | 0 | 1 |
PubchemFP553 | 0 | 1 | 0 | 1 |
PubchemFP582 | 0 | 1 | 0 | 1 |
PubchemFP639 | 0 | 1 | 0 | 0 |
PubchemFP672 | 0 | 1 | 0 | 1 |
PubchemFP688 | 0 | 1 | 0 | 1 |
The experimentally determined LRIs for the test set of compounds as well as their SMILES notations are presented in Table 7.
Experimentally determined LRIs and SMILES for 12 additionally identified compounds using HS-SPME
Set | Designation | LRI | SMILES |
Test set | Dehydrosabinene | 942 | CC(C)C12CC1C(=C)C=C2 |
1,3,5-Cycloheptatriene, 3,7,7-trimethyl | 959 | CC1=CC=CC(C=C1)(C)C | |
p-Menthane | 979 | CC(C1)CCC(C1)C(C)C | |
δ-2-Carene | 1009 | CC1=CC2C(C2(C)C)CC1 | |
Rosefuran | 1098 | CC1=C(OC=C1)CC=C(C)C | |
2,4,6-Octatriene, 2,6-dimethyl | 1128 | CC=C(C)C=CC=C(C)C | |
β-Ocimene epoxide | 1143 | CC(=CCC1C(O1)(C)C)C=C | |
Butyric acid, 2-methyl-, hexyl ester | 1239 | CCCCCCOC(=O)C(C)CC | |
Neryl formate | 1304 | CC(=CCCC(=CCOC=O)C)C | |
1,7-epi-Sesquithujene | 1382 | CC1=CCC2(C1C2)C(C)CCC=C(C)C | |
cis-α-Bergamotene | 1403 | CC1=CCC2CC1C2(C)CCC=C(C)C | |
δ-Cadinene | 1509 | CC1=CC2C(CCC(=C2CC1)C)C(C)C |
The LRIs of the compounds from the test set were predicted based on the two MLR models. The root mean square error and the coefficient of prediction were calculated and for both algorithms, were Stepwise RMSE = 40, q2F2 = 0.9521 and Enter Stepwise RMSE = 41, q2F2 = 0.9503. Table 8 presents the observed values and predicted values derived from the model equations using the Enter or the Stepwise algorithm. The obtained results for the test set are worsen compared to the ones for the validation set but still informative having in mind that a simple linear regression with only 14 (or 16) descriptors is used.
The observed and predicted LRIs by the Enter and Stepwise MLR models for the test set
Compound | Observed LRI (y) | Enter | Stepwise | ||
Predicted LRI (ŷ) | y-ŷ | Predicted LRI (ŷ) | y-ŷ | ||
Dehydrosabinene | 942 | 897 | 45 | 898 | 44 |
1,3,5-Cycloheptatriene, 3,7,7-trimethyl | 959 | 1063 | −103 | 1061 | −102 |
p-Menthane | 979 | 981 | −2 | 975 | 4 |
δ-2-Carene | 1009 | 988 | 21 | 992 | 17 |
Rosefuran | 1098 | 1113 | −16 | 1111 | −13 |
2,4,6-Octatriene, 2,6-dimethyl | 1128 | 1054 | 74 | 1057 | 71 |
β-Ocimene epoxide | 1143 | 1118 | 25 | 1117 | 26 |
Butyric acid, 2-methyl-, hexyl ester | 1239 | 1246 | −7 | 1254 | −15 |
Neryl formate | 1304 | 1316 | −12 | 1314 | −10 |
1, 7-epi-Sesquithujene | 1382 | 1381 | 1 | 1387 | −5 |
cis-α-Bergamotene | 1403 | 1420 | −17 | 1420 | −17 |
δ-Cadinene | 1509 | 1520 | −10 | 1521 | −12 |
In recent studies using MLR for the development of a method for the prediction of Kovats retention indices of Essential Oils in Gas Chromatography analysis Teuku et al. used 30 molecular descriptors [10]. For the training set the authors obtained R2 0.970, RMSE 57 and for the testing set R2 0.970, RMSE 57 [10]. Yan et al. developed a regression model incorporating 10 descriptors for a diverse set of flavour compounds (n = 107) achieving R2 = 0.9741, RMSEtraining = 54, RMSEcv = 66 and Q2 = 0.9619 [28]. In another study focused on the prediction of the Kovats retention indices of a large set of terpenes (n = 523), Hemmateenejad et al. proposed two MLR equations each including 8 descriptors (selected from a pool of 55 descriptors) with the following characteristics: i) R2 = 0.915, Q2 = 0.910, RMSCV (root mean square error of cross-validation) = 60; ii) R2 = 0.925, Q2 = 0.921, RMSCV = 56 [29]. Yan et al. built three MRL models for the prediction of the retention indices of compounds separated from plant Essential oils by different types of chromatographic columns i.e. column-NP (nonpolar), column-SP (slightly polar) and column-P (polar) [30]. A pool of 600 molecular descriptors was used to derive 8, 7, and 12 significant and independent descriptors which were included in the mathematical equation, respectively. The authors have declared the following results: i) column-NP model (n = 468) - R2 = 0.9522, RMSEtraining = 67, RMSEvalidation = 70, RMSETest = 69, Q2 = 0.9491; ii) column-SP model (n = 469) - R2 = 0.9528, RMSEtraining = 67, RMSEvalidation = 68, RMSETest = 68, Q2 = 0.9508; iii) column-P model (n = 457) - R2 = 0.9415, RMSEtraining = 94, RMSEvalidation = 102, RMSETest = 105, Q2 = 0.9307 [30].
The discussed results from other researchers show that the developed MLR models in the current work lead to comparable or better results in view of the obtained R2, RSME and q2 values.
4 Conclusions
It can be summarized that two approaches (direct liquid injection and HS-SPME) have been successfully developed for qualitative analysis of volatile compounds in rose, lavender and peppermint oil by gas chromatography with mass spectrometry. The proposed multiple linear regression models for prediction of the LRIs values for semi-standard non-polar stationary phase were evolved using experimentally obtained data from analysis of real samples of different Essential oils. The developed two regression models are relatively simple (include only 16 or 14 independent variables, respectively), and both lead to comparable and adequate results for predicting linear retention indices of volatile components in Essential oils. An important requirement for the adequacy of the mathematical models is that the values of the molecular descriptors for a given compound entered in the regression equations should be within the ranges we have studied and specified in the current work.
Author contributions
“Conceptualization, Asya Hristozova, methodology, Asya Hristozova and Slava Tsoneva; software, Asya Hristozova; validation, Kiril Simitchiev, Asya Hristozova and Margarita Batmazyan; formal analysis, Asya Hristozova and Margarita Batmazyan; investigation, Asya Hristozova and Margarita Batmazyan; resources, Asya Hristozova; data curation, Asya Hristozova, Kiril Simitchiev; writing—original draft preparation, Asya Hristozova and Margarita Batmazyan; writing—review and editing, Kiril Simitchiev; visualization, Asya Hristozova and Margarita Batmazyan; supervision, Erwin Rosenberg and Kiril Simitchiev; project administration, Veselin Kmetov; funding acquisition, Veselin Kmetov. All authors have read and agreed to the published version of the manuscript.”
Acknowledgments
Bulgarian National Science Fund, Project KP-06-Austria/3 (2021). Challenges and development of GC-MS/MS analytical methods – academic partnership.
Assoc. Prof. Nikolay Kochev and Assist. Prof. Veselina Paskaleva (University of Plovdiv "Paisii Hilendarski", Department of Analytical Chemistry and Computer Chemistry) are gratefully acknowledged for their support in this work.
References
- 1.↑
Trovato, E.; Micalizzi, G.; Dugo, P.; Utczás, M.; Mondello, L. In Handbook of Essential Oils: Science, Technology, and Applications; Baser, K. H. C., Buchbauer, G., Eds. CRC Press, 3rd ed., 2021; pp 229–251.
- 3.↑
Koksall, N.; Aslancan, H.; Sadighazadi, S.; Kafkas, E. Acta Sci. Pol. Hortorum Cultus 2015, 14, 105–114.
- 4.↑
Sparkman, O. D.; Penton, Z. E.; Kitson, F. G. Gas Chromatography and Mass Spectrometry: a Practical Guide; Elsevier, 2011; pp 15.
- 7.↑
Rigano, F.; Arigò, A.; Oteri, M.; La Tella, R.; Dugo, P.; Mondello, L. J. Chromatogr. A. 2021, 1640, 461963.
- 8.↑
Liapikos, T.; Zisi, C.; Kodra, D.; Kademoglou, K.; Diamantidou, D.; Begou, O.; Pappa-Louisi, A.; Theodoridis, G. J. Chromatogr. B 2022, 1191, 123132.
- 10.↑
Noviandy, T. R.; Maulana, A.; Sasmita, N. R.; Suhendra, R.; Irvanizam, I.; Muslem, M.; Idroes, G. M.; Yusuf, M.; Sofyan, H.; Abidin, T. F.; Idroes, R. J. Eng. Sci. Technol. 2022, 17, 0306–0326.
- 14.
Chirico, N.; Gramatica, P. J. Chem. Inf. Model. 2012, 52, 2044–2058.
- 16.↑
Yan, J.; Cao, D.-S.; Guo, F.-Q.; Zhang, L.-X.; He, M.; Huang, J.-H.; Xu, Q.-S.; Liang, Y.-Z. J. Chromatogr. A 2012, 1223, 118–125.
- 17.
Seber, G. A. F.; Lee, A. L. Linear Regression Analysis; John Wiley & Sons: New Jersey, 2003; pp 35.
- 18.
Garkani-Nejad, Z.; Karlovits, M.; Demuth, W.; Stimpfl, T.; Vycudilik, W.; Jalali-Heravi, M.; Varmuza, K. J. Chromatogr. A. 2004, 1028, 287–295.
- 19.
Idroes, R.; Noviandy, T. R.; Maulana, A.; Suhendra, R.; Sasmita, N. R.; Muslem, M.; Idroes, G. M.; Irvanizam, I. Int. Rev. Model. Simul 2019, 12, 373.
- 20.↑
National Library of Medicine. PubChem on-line available on https://pubchem.ncbi.nlm.nih.gov (accessed Jan 24, 2024).
- 21.↑
PaDEL-Descriptor on-line available on http://www.yapcwsoft.com/dd/padeldescriptor (accessed Jan 24, 2024).
- 24.↑
Platts, J. A.; Butina, D.; Abraham, M. H.; Hersey, A. J. Chem. Inf. Comput. Sci. 1999, 39, 835–845.
- 25.↑
Antanasijević, J.; Antanasijević, D.; Pocajt, V.; Trišović, N.; Fodor-Csorba, K. RSC Adv. 2016, 6, 18452–18464.
- 26.↑
Kalogiouri, N. P.; Manousi, N.; Rosenberg, E.; Zachariadis, G. A.; Paraskevopoulou, A.; Samanidou, V. Food Chem. 2021, 363, 130331.
- 27.↑
Lo, M.-M.; Benfodda, Z.; Bénimélis, D.; Fontaine, J.-X.; Molinié, R.; Meffre, P. ACS Omega 2021, 6, 12691–12698.
- 28.↑
Yan, J.; Cao, D.-S.; Guo, F.-Q.; Zhang, L.-X.; He, M.; Huang, J.-H.; Xu, Q.-S.; Liang, Y.-Z. J. Chromatogr. A. 2012, 1223, 118–125.
- 30.↑
Yan, J.; Huang, J.; He, M.; Lu, H.; Yang, R.; Kong, B.; Xu, Q.; Liang, Y. J. Sep. Sci. 2013, 36, 2464–2471.