With the explosive growth of the volume and complexity of document data, it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning.

 

Document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. The team of Dr. Tao Li [http://users.cis.fiu.edu/~taoli/research-project.html] at FIU has been focusing on developing advanced data mining and machine learning algorithms to

 

1) improving document clustering and summarization performance;

 

2) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation;

 

3) summarizing the difference and evolution of different document sources; and

 

4) building document understanding systems to solve real-world applications.

 

The Instrument with is vast repository of geo-located OCR-ed documents will allow them to scale-up their methods.