主讲人:Daisy Zhe Wang Ph.D
时间:2013 年 8 月 5 日(周一)上午 10 点
地点:张江校区软件楼 102 会议室
联系人:肖仰华(shawyh@fudan.edu.cn)
Abstract: Keyword search engines have been the state-of-the-art information retrieval tool over large text corpora for two decades. To date, most search engines have little understanding that keywords and documents refer to entities and relations in real-life. Better search results and experience can be achieved by understanding entities and relations in documents as well as in queries. A knowledge base (KB) containing relevant entities and relations should be the backbone of any application that is fueled by text. Given a large amount of text data, a system is needed that can automatically construct a knowledge base using statistical machine learning (SML) methods, manage the uncertainty inherent in the extracted knowledge, and maintain them over time.
In this talk, I first summarize the major results from BayesStore, a probabilistic database system that natively supports SML models and various inference algorithms to perform query-driven knowledge extraction from text and probabilistic query processing over uncertain extractions. Results show that BayesStore can significantly improve performance and answer quality for queries over unstructured text.
With BayesStore as a foundation, I propose to build a probabilistic knowledge base (ProbKB) system with a deep integration of the SML methods with scalable data processing frameworks. A ProbKB system should be designed to support various aspects in the life of a knowledge base (KB) including KB extraction, expansion, evolution, and integration. I will discuss in detail the challenges and our current progress in the following three research directions: (1) scalable statistical information extraction; (2) probabilistic deductive inference and incremental maintenance over large uncertain KBs; and (3) probabilistic knowledge integration from both SML and crowd-sourcing.
Bio: Daisy Zhe Wang is an Assistant Professor in the CISE department at the University of Florida. She obtained her Ph.D. degree from the EECS Department at the University of California, Berkeley in 2011 and her Bachelor’s degree from the ECE Department at the University of Toronto in 2005. At Berkeley, she was a member of the Database Group and the AMP/RAD Lab. She is particularly interested in bridging scalable data management and processing systems with probabilistic models and statistical methods. She currently pursues research topics such as probabilistic databases, probabilistic knowledge bases, large-scale inference engines, query-driven interactive machine learning, and crowd assisted machine learning. Her research is currently funded by DARPA, Greenplum/EMC, Survey Monkey and Law School at UF.