A team of researchers, led by Dr. Honggang Wang, chair of the graduate Department of Computer Science and Engineering at the Katz School, presented two significant advancements in data science, particularly in the application of deep learning and federated learning to medical research, at the 2024 Joint Statistical Meetings conference in Portland, Ore., in August.
“Our presentations underscored the importance of integrating advanced mathematical tools like the Choquet integral into data science, particularly in the healthcare sector,” said Dr. Wang. “Our research not only advances the field of medical data analysis but has the potential to transform how healthcare providers approach patient care, especially in the era of big data and decentralized research.”
The conference, one of the largest gatherings of statisticians and data scientists in the world, focused this year on “Statistics and Data Science: Informing Policy and Countering Misinformation.” During the weeklong event, Dr. Wang, Matthew Fried, a Ph.D. candidate in mathematics at the Katz School, and Semyon Lomasov, a research assistant pursuing graduate studies at Stanford University, contributed to the vibrant discussion on the latest developments in statistical learning and data science.
In one presentation, titled “A New Choquet Activation Function Based Deep Neural Network for Drug Interaction Detection,” Fried and Dr. Wang introduced a novel method for improving how neural networks detect drug interactions, which can be complex, involving multiple variables that standard methods might overlook. To address this, the team developed a new activation function for neural networks using the Choquet integral, a mathematical tool that captures intricate relationships between variables.
Their research showed that this new approach could more accurately model the synergistic and antagonistic effects of different drugs, offering valuable insights into areas like weight loss and drug interactions. The team’s method performed better than conventional techniques, particularly in recognizing complex patterns in medical data, holding promise for enhancing decision-making in healthcare through big data.
In another session, Lomasov and Dr. Wang presented “Federated Choquet Regression with Categorical Variables for Outcome Prediction in Longitudinal Trial Data,” which addressed the growing need for decentralized models in biomedical research due to privacy concerns when aggregating data from multiple clinical sites. The team developed a new federated regression algorithm based on the Choquet integral, which is particularly effective for handling complex, non-additive medical data without making prior statistical assumptions.
Non-additive medical data refers to data in which the combined effect of two or more variables does not simply equal the sum of their individual effects. For example, in medical data, the effect of taking two drugs together might be greater or less than the sum of the effects of each drug taken separately. This is because the drugs might interact in a way that enhances or diminishes their individual effects.
Their algorithm was tested on both synthetic and real medical data, showing strong performance in global data analysis. However, the research also revealed that the algorithm’s effectiveness in decentralized data scenarios depends heavily on the methods used to aggregate the data. This pioneering work marks the first time Choquet-based regression has been applied in a federated learning context, offering a new approach to predicting outcomes in clinical trials while maintaining data privacy.
“As the need for more sophisticated data analysis methods continues to grow, our work sets the stage for future research that could lead to more accurate predictions and better health outcomes,” said Dr. Wang.