The Instrument will enable
research on Data Mining thanks to its numerous built-in,
pre-defined data mining algorithms for mining from vector and raster geospatial
data ranging from data reduction algorithms
(feature selection, feature extraction, and instance selection (e.g.,
sampling)) to spatial data mining algorithms including spatial co-location
pattern discovery, spatial outlier detection, spatial clustering, and spatial
classification.
In many applications, the
data are naturally multi-modal, in the sense that they are represented by
multiple sets of features. With the availability of multiple information
sources, it is a challenging problem to conduct integrated exploratory analysis
with the aim of extracting more information than what is possible from only a
single source. In literature, many multi-view learning algorithms are designed
to learning from multiple information sources.
From the information fusion
perspective, multi-view learning methods can be categorized into three
different types based on the way that information from different sources is
used:
(1) Feature Integration where the
feature representation is enlarged to incorporate all attributes from different
sources and a unified feature space is generated [WOC99].
(2) Semantic Integration where
computational methods are first applied to each dataset separately and results
on different datasets are then combined [Bis06].
(3) Intermediate Integration (or kernel
integration) where the datasets are kept in their original form and
are integrated at the similarity computation or the Kernel level [BS02,
LCB+06].
From the learning
perspective, so far, most of the multi-view learning algorithms are designed
under the semi-supervised and clustering frameworks, where the semi-supervised
learning is to deal with inadequate labeled examples and the abundant number of
unlabeled examples and clustering aims focuses more on learning the hidden
patterns of the data set in an unsupervised way.
The team of Dr. Tao Li
[http://users.cis.fiu.edu/~taoli/research-project.html] at FIU will utilize the Instrument to establish a comprehensive
framework for large-scale data mining from multiple information sources. The
framework focuses on unsupervised learning and semi-supervised learning and is
able to perform fusion at all different levels including feature integration,
semantic integration, and intermediate integration.
With the explosive growth of
the volume and complexity of document data, it has become a necessity to
semantically understand documents and deliver meaningful information to users.
Areas dealing with these problems are crossing data mining, information
retrieval, and machine learning.
Dr. Tao Li's group has been focusing on developing
advanced data mining and machine learning algorithms to
1) improving
document clustering and summarization performance;
2) integrating
document clustering and summarization to obtain meaningful document clusters
with summarized interpretation;
3) summarizing
the difference and evolution of different document sources; and
4) building
document understanding systems to solve real-world applications.
The Instrument with is vast
repository of geo-located OCR-ed documents will allow
them to scale-up their methods.
References Cited
[WOC99] L.
Wu, S. L. Oviatt, P.R. Cohen. Multimodal
Integration - A Statistical View. IEEE Transactions on Multimedia, 1(4):
334-341, 1999.
[BIS06] C.M. Bishop. Pattern Recognition and Machine
Learning. Springer, 2006.
[BS02] B.
Schlkopf and A. J. Smola.
Learning with Kernels: Support Vector Machines, Regularization, Optimization,
and Beyond. MIT Press, 2002.
[LCB+06] G.R. Lanckriet, N. Cristianini, P.L. Bartlett, L.E. Ghaoui, and M.I. Jordan. Learning the kernel matrix with semi-definite programming. In Proceedings of International Conference on Machine Learning (ICML), pages 323.330, 2006.