With the explosive growth of
the volume and complexity of document data, it has become a necessity to
semantically understand documents and deliver meaningful information to users.
Areas dealing with these problems are crossing data mining, information
retrieval, and machine learning.
Document clustering and
summarization are two fundamental techniques for understanding document
data and have attracted much attention in recent years. The team of Dr. Tao Li [http://users.cis.fiu.edu/~taoli/research-project.html] at FIU has been focusing on developing
advanced data mining and machine learning algorithms to
1) improving
document clustering and summarization performance;
2) integrating
document clustering and summarization to obtain meaningful document clusters
with summarized interpretation;
3) summarizing
the difference and evolution of different document sources; and
4) building
document understanding systems to solve real-world applications.
The Instrument with is vast
repository of geo-located OCR-ed documents will allow
them to scale-up their methods.