Machine Learning and Feature Selection Methods for Disease Classification |

Summary

The high false positive rate in lung cancer screening, as seen in studies like the National Lung Screening Trial (NLST), presented a significant challenge, leading to unnecessary follow-up procedures and patient anxiety. Our project addressed this critical problem by developing and applying advanced machine learning and feature selection methods to improve the accuracy of lung cancer nodule classification from CT scan image data.

I contributed to this effort by analyzing 416 quantitative imaging biomarkers from 200 patient CT scans. Through the application of various predictive classifying models (linear, nonlinear, and ensemble) and diverse feature selection methods, we sought to identify the most effective approaches for distinguishing cancerous from benign nodules.

Our research successfully demonstrated that specific machine learning classifiers, such as elastic net and support vector machine, when combined with linear combination or correlation feature selection methods, significantly enhance the accuracy of lung nodule classification (Delzell et al., 2019). Crucially, these methods also substantially reduced the false positive rate compared to previous studies like the NLST. This work contributes directly to more precise and efficient lung cancer screening, ultimately aiming to improve patient outcomes by minimizing unnecessary procedures and providing more reliable diagnoses.