ACM's KDD 2008 Conference – Day 1 Proceedings#

ACM's KDD 2008 is the annual premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. This year this event was held in Loews Lake Las Vegas resort where Jeff Bergman and I attended it. Details of the program can be found here http://www.kdd2008.com/program.html and the summary is as follows.

9:00 am - 5:00 pm

Full Day Workshop W1 - ADKDD'08
Full Day Workshop W2 - WEBKDD'08
Full Day Workshop W3 - Sensor-KDD
Full Day Workshop W4 - PinKDD'08
Full Day Workshop W5 - SNA-KDD
Full Day Workshop W13 - Multimedia Data Mining

9:00 am - 12:00 pm
Half Day Workshop W6 - KDD CUP and Mining Medical data
Half Day Workshop W7 - Multiple Information Sources
Half Day Workshop W11 - BIOKDD08
Half Day Workshop W12 - Mining for Business Applications

9:00 am - 12:00 pm
Tutorial - Mining Massive RFID, Trajectory, and Traffic Data Sets
Tutorial - Predictive Modeling with Social Networks
Tutorial - Mining Uncertain and Probabilistic Data: Problems, Challenges, Methods, and Applications
Tutorial - Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering

2:00 pm - 5:30 pm Half Day Workshop
W8 - Large Scale Recommender Systems and NetFlix Prize
W10 - Mining using Matrices and Tensors

2:00 pm - 5:00 pm
Tutorial - Blogosphere: Research Issues, Applications, and Tools
Tutorial - Graph Mining and Graph Kernels
Tutorial - Applied Text Mining

6:15 pm - 6:45 pm : Award Presentations

6:45 pm - 7:30 pm : Innovation Award Talk

Day 1 was very informative and provided good learning experience. The program included several full day workshops and tutorials listed below.

·         J. Han, J. Lee, H. Gonzalez, X. Li, "Mining Massive RFID, Trajectory, and Traffic Data Sets"
Jiawei Han, Jae-Gil Lee, Hector Gonzalez, Xiaolei Li
Department of Computer Science, University of Illinois at Urbana-Champaign

·         J. Neville, F. Provost, "Predictive Modeling with Social Networks"
Jennifer Neville, Purdue University
Foster Provost, New York University

·         J. Pei, M. Hua, Y. Tao, X. Lin, "Mining Uncertain and Probabilistic Data: problems, Challenges, Methods, and Applications"
Jian Pei, Simon Fraser University, Canada
Ming Hua, Simon Fraser University, Canada
Yufei Tao, The Chinese University of Hong Kong
Xuemin Lin, The University of New South Wales, Australia

·         H. Kriegel, P. Kroger, A. Zimek, "Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering"
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek
Institute for Informatics, Ludwig-Maximilians-Universitat Munchen, Germany

·         H. Liu and N. Agarwal, "Blogosphere: Research Issues, Applications, and Tools". Huan Liu, Arizona State University, Nitin Agarwal, Arizona State University
R. Feldman, L. Ungar, "Applied Text Mining"

Social Networking being the prominent theme at the conference, I decided to get a head start by attending the half day tutorial on "Predictive Modeling in Social Networks" by Jennifer Neville and Foster Provost.  The abstract from the tutorial is as follows.

Recently there has been a surge of interest in methods for analyzing complex social networks: from communication networks, to friendship networks, to professional and organizational networks. The dependencies among linked entities in the networks present an opportunity to improve inference about properties of individuals, as birds of a feather do indeed flock together. For example, when deciding how to market a product to people in MySpace or Facebook, it may be helpful to consider whether a person's friends are likely to purchase the product.

This tutorial will explore the unique opportunities and challenges for modeling social network data. We will begin with a description of the problem setting, including examples of various applications of social network mining (e.g., marketing, fraud detection). We will then present a number of characteristics of social network data that differentiate it from traditional inference and learning settings, and outline the resulting opportunities for significantly improved inference and learning. We will discuss specific techniques for capitalizing on each of the opportunities in statistical models, and outline both methodological issues and potential modeling pathologies that are unique to network data. We will give links to the recent literature to guide study, and present results demonstrating the effectiveness of the techniques.

Dr. Provost started by establishing the core foundation for social networking and further get in depth with network targeting, disjoint inference, learning & classification, wvRN, ACORA, RBC, RPT, SLR and context of collective inference. Dr. Neville then continued with Gaussian random fields and elaborated with her work on questionable broker detection. Semi-supervised learning, conventional bias vs. variance analysis, homophily, social influence, external factors and open research issues were also part of tutorial. Later in a discussion with Dr. Provost, he mentioned that the collaborative techniques described can also be implemented for outlier analysis which was encouraging.

For the second tutorial, I attended the "Graph Mining and Graph Kernel" tutorial by Karsten M. Borgwardt (http://mlg.eng.cam.ac.uk/~karsten/) and Xifeng Yan (IBM Research Center). This tutorial presented a comprehensive overview of the techniques developed in graph mining and graph kernels and examines the connection between them.  As described by authors, “The goal of this tutorial is i) to introduce newcomers to the field of graph mining, ii) to introduce people with database background to graph mining using kernel machines, iii) to introduce people with machine learning background to database-oriented graph mining, and iv) to present exciting research problems at the interface of both fields.”

Applied Text mining tutorials by Dr. Ronen Feldman & Dr. Lyle Unger was also an excellent talk. Dr. Feldman, author of applied text mining, has a great style of pragmatic discussion and connects with the audience really well. I am looking forward to his future presentation and discuss the idea of natural language corpus extraction implementations in Text mining for my Urdu machine translation work; he must have some great ideas about it.

After the tutorials Bing Liu, the program chair presented conference statistics; apart from all other numbers, salient ones are submission from the US, 323 papers out of which 81 were accepted. In total there were 593 submissions and 118 accepted ones, a less than 20% or less than 1 out of 5 ratio! These guys are picky.

Then came the best research paper award, best application paper award, student travel awards, KDD dissertation award, KDD Cup awards, KDD innovation award and finally concluded on innovation award talk by Raghu Ramakrishnan.  KDD Cup 2008 winning announcements in medical data mining was a highly practical and quite challenging problem. Details of the cup submissions can be seen here. http://www.kdd2008.com/kddcup.html

Dr. Ramakrishnan is the author of “Cow Book” and his final talk for the day covered his past research and a broad spectrum of future directions of information retrieval. With educated “predictions” from  a seasoned data miner, the first day concluded.

I’m very much looking forward to tomorrow’s sessions; till then, happy mining.

I've taken a lot of photos of the presentations Photos of the event are shared on the facebook. Click here to see them.





8/26/2008 1:16:50 PM (Pacific Standard Time, UTC-08:00) #    Comments [0]  |  Trackback

 

YABE – ASP.NET MVC based Blog Engine – Release 0.8 Published on CodePlex#
YABE (Yet another blog engine) is an effort to make a blog engine based on ASP.NET MVC.  In this release, we have modified the current build to work with MVC Preview Release 3.0 and added new features such as tag cloud, themes etc. Please check it out at codeplex (www.CodePlex.com/YABE).

Last but not least, honorable mention goes to Joel Cochran for a very informative post on Updating from ASP.NET MVC Preview 2 to Preview 3, it was quite helpful Joel.





8/22/2008 11:50:14 PM (Pacific Standard Time, UTC-08:00) #    Comments [0]  |  Trackback

 

Testing Web Services :: Web Service Studio and WCF Test Client#

Testing web services is a pivotal part of contemporary enterprise project life cycle; developers, QA and even system guys do it to validate different aspects of the middleware. This testing comes with its own set of challenges; Aside from being ineffective, testing complex types is not possible via the default browser based test client and WCF services don’t even offer it!. So what’s the remedy?

This webcast demonstrates web service studio and WCF Test client, two tools specifically designed to test web services without the need of writing custom test harnesses. Web Service Studio is a codeplex project is the revival of good old .NET Webservice Studio tool. Web Service Studio is a tool to invoke webmethods interactively. The user can provide a WSDL endpoint. On clicking button Get the tool fetches the WSDL, generates .NET proxy from the WSDL and displays the list of methods available. The user can choose any method and provide the required input parameters. On clicking Invoke the SOAP request is sent to the server and the response is parsed to display the return value. My intent is to further enhance it to add the support for WCF, Nullable Types and REST style API to allow a complete composite type testing from one tool. For details on WCF Test Client, please see my article here.

Webcast on Web Service Studio and WCF Test Client






Screen Shot of the Web Service Studio.



References

http://www.codeplex.com/WebserviceStudio

www.codeproject.com/KB/WCF/WCF35Utils.aspx






8/8/2008 7:03:24 PM (Pacific Standard Time, UTC-08:00) #    Comments [0]  |  Trackback

 

Going Places - PDC, KDD and IASA Connections and Teaching WCF @ UCSD #

August and the next couple of months looks really busy. I’ll be teaching WCF at UCSD and will be attending the following conferences along with doctoral cluster meeting. Therefore I am seriously considering “The Terminal” style living.

KDD 2008, 24 – 27 Aug 2008, Loews Lake Las Vegas Las Vegas, NV
The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-08 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition.

IASA Connections, October 6 - 8, 2008, San Francisco Marriott, San Francisco, CA
I'll be speaking to IASA connections conference in San Francisco on Aspect Oriented Programming in Distributed Systems. More details here.

Microsoft PDC 2008 – 27 – 30 Oct, Los Angeles Convention Center, Los Angeles, CA
Since 1991, the Professional Developers Conference (PDC) has been Microsoft’s premier gathering of leading-edge developers and architects. Attend the PDC to understand the future of the Microsoft platform and to exchange ideas with fellow professionals. You’ll learn about upcoming products, meet Microsoft’s leaders and top engineers, write some code, and be inspired! Unplug for a few days and think about the future.  

Programming Windows Communication Foundation (WCF) (Summer 2008)
Sa, 8:00 a.m. - 5:00 p.m.
8/9/2008 - 8/23/2008
Room 134, UCSD Extension Complex, 9600 N Torrey Pines Rd, La Jolla

Programming Windows Communication Foundation (WCF) (Fall 2008)
Sa, 8:00 a.m. - 5:00 p.m.
10/4/2008 - 10/18/2008
Room 110, UCSD Extension Sorrento Mesa Center, 6925 Lusk Blvd, San Diego





8/6/2008 8:26:51 PM (Pacific Standard Time, UTC-08:00) #    Comments [1]  |  Trackback

 

INETA Community Champion Award#
This morning I was informed by the INETA team that I have won the INETA Community Champion Award.



It's a real honor to even being considered and eventually winning this award. I'd like to thank SGV.NET User Group team, Richard Trinh, Ben Pirih and Vipul Shah for their tireless contribution to keep our user group running.

.NET developers community in Southern California is quite strong and with over over 25 active user groups, probably the most active in the country.

Last but not least, if you need any assistance regarding user group speakers or want me to speak to your UG, please feel free to drop me a line at




8/5/2008 3:02:27 PM (Pacific Standard Time, UTC-08:00) #    Comments [1]  |  Trackback

 

Buidling REST based services using WCF 3.5 - Webcast#

Here is my webcast explaining how to build a simple REST based service using WCF 3.5. Slides and sample code can be downloaded from the links below.


Downloads
REST using WCF 3.5 - Webcast.pptx (137.68 KB)
MyRestService.zip (3.67 KB)





7/21/2008 5:03:48 PM (Pacific Standard Time, UTC-08:00) #    Comments [1]  |  Trackback

 

OWASP Top 10 and Data Mining in Financial Sector#

OWASP’s list have been changed since 2004 in terms of priorities; XSS and inject flaws are on the rise. Details can be found on OWASP’s website.

2007

2004

A1 - Cross Site Scripting (XSS)

A1 - Unvalidated Input

A2 - Injection Flaws

A2 - Broken Access Control

A3 - Malicious File Execution

A3 - Broken Authentication and Session Management

A4 - Insecure Direct Object Reference

A4 - Cross Site Scripting

A5 - Cross Site Request Forgery (CSRF)

A5 - Buffer Overflow

A6 - Information Leakage and Improper Error Handling

A6 - Injection Flaws

A7 - Broken Authentication and Session Management

A7 - Improper Error Handling 

A8 - Insecure Cryptographic Storage

A8 - Insecure Storage

A9 - Insecure Communications

A9 - Application Denial of Service

A10 - Failure to Restrict URL Access

A10 - Insecure Configuration Management

 

OWASP .NET Projects
http://www.owasp.org/index.php/Category:OWASP_.NET_Project

References and Papers on Financial Data Mining

  • Mine Your Way to Combat Money Laundering
  • OFAC SDN List www.ustreas.gov/offices/enforcement/ofac/sdn/
  • FinCen www.fincen.gov/
  • FATF www.fatf-gafi.org/
  • Suspicious Activity Report
  • Keys to a Well Prepared Suspicious Activity Report
  • A framework for data mining-based anti-money laundering research
  • Profiling Behavior: The social construction of categories in the detection of financial crime; dissertation by Ana Canhoto
  • Towards a Proactive Fraud Management Framework for Financial Data Streams
  • T. Senator. "The financial crimes enforcement network AI system (FAIS)." AI Magazine 4, 1995.
  • M. Sparrow. "The State of the Fraud Control Game; and the Impact of Electronic Claims Processing on Fraud and Fraud Control." Proceedings of the International Symposium on Criminal Justice Information Systems and Technology, 1994.
  • U.S. Congress, Office of Technology Assessment (OTA). "Information Technologies for Control of Money Laundering." OTA-ITC-630. Washington, DC: U.S. Government Printing Office, September 1995.
  • Zdanowicz, J.S. (2004), "Detecting money laundering and terrorist financing via data mining", Communications of the ACM, Vol. 47 No.5
  • Watkins, R.C., Reynolds, K.M., Demara, R., Georgiopoulos, M., Gonzalez, A., Eaglin, R. (2003), "Tracking dirty proceeds: exploring data mining technologies as tools to investigate money laundering", Police Practice and Research, Vol. 4 No.2, pp.163-78.
  • Vikram, A., Chennuru, S., Rao, H.R., Upadhyaya, S. (2004), "A solution architecture for financial institutions to handle illegal activities: a neural networks approach", Proceedings of the 37th Hawaii International Conference on System Sciences-2004
  • Zhang, Z., Salerno, J.J., Yu, P.S. (2003), "Applying data mining in investigating money laundering crimes", paper presented at SIGKDD'03, Washington, DC, pp.747-52.
  • Senator, T.E., Goldberg, H.G., Wooton, J. (1995), "The financial crimes enforcement network AI system (FAIS): identifying potential money laundering from reports of large cash transactions", AI Magazine, Vol. 16 No.4, pp.21-39.
  • Tang, J., Yin, J. (2005), "Developing an intelligent data discriminating system of antimony laundering based on SVM", Proceedings of the Fourth International Conference on Machine Learning and Cybernetics. Guangzhou, pp.3453-7.
  • Kingdon, J. (2004), "AI fights money laundering", IEEE Intelligent Systems, Vol. 5/6 pp.87
  • Goldberg, H.G., Wong, R.W.H. (1998), "Restructuring transactional data for link analysis in the FinCEN AI System", Proceedings of 1998 AAAI Fall Symposium on Artificial Intelligence and Link Analysis, AAAI Press, Menlo Park, CA, .
  • Fawcett, T., Provost, F. (1997), "Adaptive fraud detection", Data Mining and Knowledge Discovery, Vol. 1 No.3, pp.291-316.




7/20/2008 9:50:26 PM (Pacific Standard Time, UTC-08:00) #    Comments [0]  |  Trackback

 

REST and WCF 3.5 Talk Slides and Code Samples#
On Thursday July 17th, I presented "RESTFul Web Services – UriTemplates and REST support with WCF 3.5". to SoCal.NET architecture group (http://www.socaldotnetarchitecture.org/). It was well recieved and I got good feedback.

The code samples and slides are as follows.