Feature selection is essential for dimensionality reduction on big data, but it faces considerable challenges when applied to high-dimensional and sparse datasets. To address these challenges, this paper proposes Unconstrained Latent Factorization-based Improved Relief-F (ULF-IR), a novel feature selection method tailored for such complex scenarios. The method integrates two main components: (1) a double factorization (DF)-based unconstrained latent factor model is employed to accurately reconstruct missing data without relying on pre-imputation or strict non-negativity constraints; (2) an improved Relief-F (IRelief-F) algorithm assigns reliable importance weights to features, effectively differentiating among highly similar features even in the presence of noise introduced during imputation. Comprehensive experiments on three real-world datasets show that ULF-IR consistently surpasses state-of-the-art methods in both classification accuracy and robustness, demonstrating its effectiveness as a dependable solution for feature selection on high-dimensional, incomplete data.
Fatemeh M, Ahmed H, Mohamed S, 2025, Two-Stage Hybrid Feature Selection: Integrating ACO Algorithms with a Statistical Ensemble Technique for EV Demand Prediction. IEEE Transactions on Industry Applications, 61(3): 5091–5102.
Petros B, Paschalis S, Areti K, et al., 2024, Deep Learning Missing Value Imputation on Traffic Data using Self-Attention and GAN-based Methods, 2024 Panhellenic Conference on Electronics & Telecommunications (PACET), 1–4.
Song Y, Li M, Luo X, et al., 2020, Improved Symmetric and Nonnegative Matrix Factorization Models for Undirected, Sparse and Large-Scaled Networks: A Triple Factorization-based Approach. IEEE Transactions on Industrial Informatics, 16(5): 3006–3017.
Zhong Y, Jin L, Shang M, et al., 2020, Momentum-Incorporated Symmetric Non-Negative Latent Factor Models. IEEE Transactions on Big Data, 8(4): 1096–1106.
Luo X, Zhong Y, Wang Z, et al., 2021, An Alternating-Direction-Method of Multipliers Incorporated Approach to Symmetric Non-Negative Latent Factor Analysis. IEEE Transactions on Neural Networks and Learning Systems, 34(8): 4826–4840.
Sahoo A, Ghose D, 2022, Imputation of Missing Precipitation Data using KNN, SOM, RF, and FNN. Soft Computing, 26(12): 5919–5936.
Yu X, Dai H, Li L, et al., 2023, Finding the Best Learning to Rank Algorithms for Effort-Aware Defect Prediction. Information and Software Technology, 2023(157): 107165.
Zhao H, Li Z, He W, et al., 2024, Hierarchical Convolutional Neural Network with Knowledge Complementation for Long-Tailed Classification. ACM Transactions on Knowledge Discovery from Data, 18(6): 1–22.
Luo X, Zhang X, 2025, Exploiting Defenses against GAN-based Feature Inference Attacks in Federated Learning. ACM Transactions on Knowledge Discovery from Data, 19(3): 1–20.
Luo X, Zhou Y, Liu Z, et al., 2021, Fast and Accurate Non-Negative Latent Factor Analysis of High-Dimensional and Sparse Matrices in Recommender Systems. IEEE Transactions on Knowledge and Data Engineering, 35(4): 3897–3911.
Xu W, Li Y, 2025, Multi-Label Feature Selection for Imbalanced Data via KNN-based Multi-Label Rough Set Theory. Information Sciences, 2025(715): 122220.
Kira K, Rendell L, 1992, A Practical Approach to Feature Selection, Machine Learning Proceedings, 249–256.
Kononenko I, 2005, Estimating Attributes: Analysis and Extensions of Relief-F, European Conference on Machine Learning, 171–182.
Luo X, Zhong Y, Wang Z, et al., 2021, An Alternating-Direction-Method of Multipliers Incorporated Approach to Symmetric Non-Negative Latent Factor Analysis. IEEE Transactions on Neural Networks and Learning Systems, 34(8): 4826–4840.
Fan H, Xue L, Song Y, et al., 2022, A Repetitive Feature Selection Method based on Improved Relief-F for Missing Data. Applied Intelligence, 52(14): 16265–16280.
Takale D, Pangaonkar S, Jadhav T, et al., 2025, Enhancing Fetal Health Monitoring: Utilizing WWPA based BiGRU with Dropout Layer Regulation to Classify Fetal Health Conditions on Cardiotocography Data. Multiscale and Multidisciplinary Modeling, Experiments and Design, 2025(8): 318.