Titre : | Cross-domain assessment of feature effectiveness in functional materials prediction via multiple machine larning models | Type de document : | document multimédia | Auteurs : | Maroua Rogti, Auteur ; Ali Benghia, Directeur de thèse ; Bouchra Kaima Mechraoui, Directeur de thèse | Editeur : | Laghouat : Université Amar Telidji - Département des sciences de la matière | Année de publication : | 2025 | Importance : | 56 p. | Note générale : | Option : Materials physics | Langues : | Anglais | Mots-clés : | Pymatgen library Functional materials Machine learning Regression models Principal component analysis | Résumé : | This study explores the impact of feature selection on the accuracy of band gap prediction across two distinct material datasets: nonlinear optical (NLO) compounds and perovskite-structured ABX₃ materials. Features were extracted from the Pymatgen library in Python. The NLO dataset includes 224 chemically diverse compounds collected from literature, while the perovskite dataset consists of 643 ABX₃ materials with X being O, S, Se, F, Cl, Br, or I, sourced from the OQMD database. Several machine learning models were employed, with the Random Forest Regressor (RFR) consistently achieving the best performance. For the ABX₃ dataset, RFR yielded a mean absolute error (MAE) of 0.4941 eV using the top 10 features and 0.5044 eV with all features; for the NLO dataset, the model achieved 0.3309 eV and 0.3238 eV, respectively. Dimensionality reduction through Principal Component Analysis (PCA) did not enhance the results compared to manually selected or full feature sets. These findings highlight the significance of proper feature engineering, especially when relying on descriptors generated by automated tools like Pymatgen, to improve the efficiency and reliability of machine learning applications in materials discovery. | note de thèses : | Mémoire de master en physique |
Cross-domain assessment of feature effectiveness in functional materials prediction via multiple machine larning models [document multimédia] / Maroua Rogti, Auteur ; Ali Benghia, Directeur de thèse ; Bouchra Kaima Mechraoui, Directeur de thèse . - Laghouat : Université Amar Telidji - Département des sciences de la matière, 2025 . - 56 p. Option : Materials physics Langues : Anglais Mots-clés : | Pymatgen library Functional materials Machine learning Regression models Principal component analysis | Résumé : | This study explores the impact of feature selection on the accuracy of band gap prediction across two distinct material datasets: nonlinear optical (NLO) compounds and perovskite-structured ABX₃ materials. Features were extracted from the Pymatgen library in Python. The NLO dataset includes 224 chemically diverse compounds collected from literature, while the perovskite dataset consists of 643 ABX₃ materials with X being O, S, Se, F, Cl, Br, or I, sourced from the OQMD database. Several machine learning models were employed, with the Random Forest Regressor (RFR) consistently achieving the best performance. For the ABX₃ dataset, RFR yielded a mean absolute error (MAE) of 0.4941 eV using the top 10 features and 0.5044 eV with all features; for the NLO dataset, the model achieved 0.3309 eV and 0.3238 eV, respectively. Dimensionality reduction through Principal Component Analysis (PCA) did not enhance the results compared to manually selected or full feature sets. These findings highlight the significance of proper feature engineering, especially when relying on descriptors generated by automated tools like Pymatgen, to improve the efficiency and reliability of machine learning applications in materials discovery. | note de thèses : | Mémoire de master en physique |
|