ACM's KDD 2008 Conference – Day 1 Proceedings#

ACM's KDD 2008 is the annual premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. This year this event was held in Loews Lake Las Vegas resort where Jeff Bergman and I attended it. Details of the program can be found here http://www.kdd2008.com/program.html and the summary is as follows.

9:00 am - 5:00 pm

Full Day Workshop W1 - ADKDD'08
Full Day Workshop W2 - WEBKDD'08
Full Day Workshop W3 - Sensor-KDD
Full Day Workshop W4 - PinKDD'08
Full Day Workshop W5 - SNA-KDD
Full Day Workshop W13 - Multimedia Data Mining

9:00 am - 12:00 pm
Half Day Workshop W6 - KDD CUP and Mining Medical data
Half Day Workshop W7 - Multiple Information Sources
Half Day Workshop W11 - BIOKDD08
Half Day Workshop W12 - Mining for Business Applications

9:00 am - 12:00 pm
Tutorial - Mining Massive RFID, Trajectory, and Traffic Data Sets
Tutorial - Predictive Modeling with Social Networks
Tutorial - Mining Uncertain and Probabilistic Data: Problems, Challenges, Methods, and Applications
Tutorial - Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering

2:00 pm - 5:30 pm Half Day Workshop
W8 - Large Scale Recommender Systems and NetFlix Prize
W10 - Mining using Matrices and Tensors

2:00 pm - 5:00 pm
Tutorial - Blogosphere: Research Issues, Applications, and Tools
Tutorial - Graph Mining and Graph Kernels
Tutorial - Applied Text Mining

6:15 pm - 6:45 pm : Award Presentations

6:45 pm - 7:30 pm : Innovation Award Talk

Day 1 was very informative and provided good learning experience. The program included several full day workshops and tutorials listed below.

·         J. Han, J. Lee, H. Gonzalez, X. Li, "Mining Massive RFID, Trajectory, and Traffic Data Sets"
Jiawei Han, Jae-Gil Lee, Hector Gonzalez, Xiaolei Li
Department of Computer Science, University of Illinois at Urbana-Champaign

·         J. Neville, F. Provost, "Predictive Modeling with Social Networks"
Jennifer Neville, Purdue University
Foster Provost, New York University

·         J. Pei, M. Hua, Y. Tao, X. Lin, "Mining Uncertain and Probabilistic Data: problems, Challenges, Methods, and Applications"
Jian Pei, Simon Fraser University, Canada
Ming Hua, Simon Fraser University, Canada
Yufei Tao, The Chinese University of Hong Kong
Xuemin Lin, The University of New South Wales, Australia

·         H. Kriegel, P. Kroger, A. Zimek, "Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering"
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek
Institute for Informatics, Ludwig-Maximilians-Universitat Munchen, Germany

·         H. Liu and N. Agarwal, "Blogosphere: Research Issues, Applications, and Tools". Huan Liu, Arizona State University, Nitin Agarwal, Arizona State University
R. Feldman, L. Ungar, "Applied Text Mining"

Social Networking being the prominent theme at the conference, I decided to get a head start by attending the half day tutorial on "Predictive Modeling in Social Networks" by Jennifer Neville and Foster Provost.  The abstract from the tutorial is as follows.

Recently there has been a surge of interest in methods for analyzing complex social networks: from communication networks, to friendship networks, to professional and organizational networks. The dependencies among linked entities in the networks present an opportunity to improve inference about properties of individuals, as birds of a feather do indeed flock together. For example, when deciding how to market a product to people in MySpace or Facebook, it may be helpful to consider whether a person's friends are likely to purchase the product.

This tutorial will explore the unique opportunities and challenges for modeling social network data. We will begin with a description of the problem setting, including examples of various applications of social network mining (e.g., marketing, fraud detection). We will then present a number of characteristics of social network data that differentiate it from traditional inference and learning settings, and outline the resulting opportunities for significantly improved inference and learning. We will discuss specific techniques for capitalizing on each of the opportunities in statistical models, and outline both methodological issues and potential modeling pathologies that are unique to network data. We will give links to the recent literature to guide study, and present results demonstrating the effectiveness of the techniques.

Dr. Provost started by establishing the core foundation for social networking and further get in depth with network targeting, disjoint inference, learning & classification, wvRN, ACORA, RBC, RPT, SLR and context of collective inference. Dr. Neville then continued with Gaussian random fields and elaborated with her work on questionable broker detection. Semi-supervised learning, conventional bias vs. variance analysis, homophily, social influence, external factors and open research issues were also part of tutorial. Later in a discussion with Dr. Provost, he mentioned that the collaborative techniques described can also be implemented for outlier analysis which was encouraging.

For the second tutorial, I attended the "Graph Mining and Graph Kernel" tutorial by Karsten M. Borgwardt (http://mlg.eng.cam.ac.uk/~karsten/) and Xifeng Yan (IBM Research Center). This tutorial presented a comprehensive overview of the techniques developed in graph mining and graph kernels and examines the connection between them.  As described by authors, “The goal of this tutorial is i) to introduce newcomers to the field of graph mining, ii) to introduce people with database background to graph mining using kernel machines, iii) to introduce people with machine learning background to database-oriented graph mining, and iv) to present exciting research problems at the interface of both fields.”

Applied Text mining tutorials by Dr. Ronen Feldman & Dr. Lyle Unger was also an excellent talk. Dr. Feldman, author of applied text mining, has a great style of pragmatic discussion and connects with the audience really well. I am looking forward to his future presentation and discuss the idea of natural language corpus extraction implementations in Text mining for my Urdu machine translation work; he must have some great ideas about it.

After the tutorials Bing Liu, the program chair presented conference statistics; apart from all other numbers, salient ones are submission from the US, 323 papers out of which 81 were accepted. In total there were 593 submissions and 118 accepted ones, a less than 20% or less than 1 out of 5 ratio! These guys are picky.

Then came the best research paper award, best application paper award, student travel awards, KDD dissertation award, KDD Cup awards, KDD innovation award and finally concluded on innovation award talk by Raghu Ramakrishnan.  KDD Cup 2008 winning announcements in medical data mining was a highly practical and quite challenging problem. Details of the cup submissions can be seen here. http://www.kdd2008.com/kddcup.html

Dr. Ramakrishnan is the author of “Cow Book” and his final talk for the day covered his past research and a broad spectrum of future directions of information retrieval. With educated “predictions” from  a seasoned data miner, the first day concluded.

I’m very much looking forward to tomorrow’s sessions; till then, happy mining.

I've taken a lot of photos of the presentations Photos of the event are shared on the facebook. Click here to see them.





8/26/2008 1:16:50 PM (Pacific Standard Time, UTC-08:00) #    Comments [0]  |  Trackback

 

All content © 2008, Adnan Masood
About the Author
On this page
Calendar
<November 2008>
SunMonTueWedThuFriSat
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456
Archives
Sitemap
Blogroll OPML
microsoft