演讲人:Guozhu Dong, Wright State University, Ohio, USA
时间:2015 年 7 月 7 号(周二)下午 1:30-3:00
地点:张江校区软件楼 105 IBM 会议室
联系人:王晓阳,xywangcs@fudan.edu.cn
Abstract:
Constructing accurate numerical prediction models is a fundamental task for a wide range of modeling and forecasting applications, including scientific modeling, medical/healthcare modeling, insurance risk modeling, loan default risk modeling, economic forecasting, and severe weather forecasting. As a result, predictive modeling is a key ingredient of data science. In this talk I will introduce (a) a new type of regression models, namely pattern aided regression (PXR) models, and (b) a contrast pattern aided regression (CPXR) method, to build accurate and easy-to-explain PXR models. PXR models were motivated by two observations: (1) Regression modeling applications often involve complex diverse predictor-response relationships, which occur when the optimal regression models (of given popular model types) fitting distinct subgroups of data of an application are highly different. (2) State-of-the-art regression methods are often unable to adequately model such highly diverse predictor-response relationships. To accurately model highly diverse predictor-response relationships, a PXR model uses several pattern and local regression model pairs, which respectively serve as logical and behavioral characterizations of distinct predictor-response relationships, to define a prediction model. In experiments, the PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by wide margins. Using around seven simple patterns on average and using linear local regression models, those PXR models are easy to interpret. CPXR is especially effective for high dimensional data. The CPXR methodology can also be used for analyzing and improving prediction models, and correcting their prediction errors. I will also discuss how to use CPXR for classification, including results on medical risk prediction for traumatic brain injury and heart failure.
This talk is based on the following recent paper: Guozhu Dong and Vahid Taslimitehrani. Pattern-Aided Regression Modeling and Prediction Model Analysis. IEEE Transactions on Knowledge and Data Engineering. In press.
Bio:
Guozhu Dong is a full professor at Wright State University. His main research interests are data science, data mining and machine learning, bioinformatics, and databases. He has published over 150 articles and two books entitled “Sequence Data Mining” and “Contrast Data Mining,” and he holds 4 US patents. He is widely known for his work on contrast/emerging pattern mining and applications, and for his work on first-order maintenance of recursive and transitive closure queries/views.