KDD 2006 panel report did a panel report on grand challenges
in the field of data mining. The report was published in the ACM SIGKDD
Explorations Newsletter and can be accessed from
here. The
report identifies following areas as prominent ones for active research and development. Makes a good
repository for those looking for an idea to expand upon for their dissertation :)
- Will
you cheat for me please, my dear computer: Text-mining and understanding system
that can use the web to pass standard tests, e.g. SAT in World Literature-based
discovery of drug X side effects History.
- Nip
in the bud: Fraud detection based on company financial statements. (Can
we find another Enron before it collapses?)
- Autonomous
Tagging: Automatic tagging and classification of 1 billion digital
photos on the web.
- Social
Networking 2.0: Mining user behaviors in interactions with multimedia
data and use the knowledge extracted in this process to anticipate future behaviors
or to diagnose medical or psychological conditions of the users. This generally
falls under the area of Crossing the semantic gap between multi-media data
and semantics
- Where
do I belong?: Link mining Challenge (extracting graphs describing
entities and relationships from unstructured data)
- Lots
of Traffic!: Estimating large dataset predictive model - from 833
traffic sensors in the Chicago
metropolitan region and the goal is identifying anomalous traffic patterns.
- Gold in the Text: Entity extraction and autonomous text analysis from large
scale unstructured text repository.
- And of
course the genetics side, mining the proteome (Large-scale
databases analysis from sequencing projects, micro array studies, gene-function
studies, protein-protein interactions, comparative genomics, structural
biology, and open source journal articles)
Also, the other areas of research interest mentioned in the data
mining literature are
- Parallelization
of data mining algorithms.
- Designing
and developing scalable algorithms to operate on massive data sets.
- distributed
data mining; multiple topologies (local data, distributed app and so on …)
- Standardizing
the languages, underlying protocols, and application level integration for
data mining and predictive modeling.
- Systems to promote preserving privacy and security in the data mining.
- Visualization of large datasets; mapping their corresponding associations, hierarchies and underlying patterns.
References
What Are The Grand Challenges for Data Mining? KDD-2006 Panel Report