With large, diverse geospatial datasets and an integrated, user-centric interface to analyze user-defined algorithms, the Instrument will give researchers the opportunity to experiment with spatially-aware information and knowledge discovery algorithms.

 

This will allow geospatial pattern mining and discovery of interesting geospatial patterns on spatial objects in rare event detection, spatio-temporal change and trend detection, and correlation mining.

 

For example, in rare event detection, unpredictable events are extremely difficult to detect because they don't occur often or they occur at a time/location where they are not expected (e.g., detecting traffic congesting during a disaster situation). The Instrument will provide researchers with historical data that is used to establish a baseline for dynamic event behavior models, and scenarios when deviations from the normal model are identified. Algorithms can then be built to better enable prediction models. These can be input into the Instrument for testing and future rare event detection and prediction.

 

The Instrument will also aid in the investigation of bootstrapping techniques [CBO+02, JDC87] and cost-sensitive learning approaches [Elk01,ZE01] for rare event detection in spatial data with semantic awareness.

 

The general event detection problem has been intensively studied in traditional document collections as Topic Detection and Tracking (TDT) [APL98]. However, different from traditional documents, harvesting Twitter data for event detection is quite challenging since the Twitter messages are very short and noisy, containing nonstandard terms such as abbreviations, acronyms, and emoticons [Eis13, LWJ12]. The Instrument would allow the linking of tweet messages with many static data sources and the identification of the messages referring to the same real-world entity. As a result, the integration can provide an aggregated view of the disaster domain where the attributes and the values of the attributes are assembled and fused from thousands of mainly unstructured messages and improve the accuracy of event detection.

 

The Oak Ridge National Laboratory [http://www.ornl.gov/] (see letter) will utilize the Instrument to tie together sensor data across the US to create a real-time detection and alert system. Their SensorNet Project will utilize the Instrument's analytics and super-resolution data.

 

The team of Dr. S.S. Iyengar [http://users.cis.fiu.edu/~iyengar/] at FIU will utilize the Instrument to perform two case studies on trend analysis and monitoring data behavior. The Instrument's sensor data collected can be used for optimizing strategies for risk assessment as well as detecting critical events based on mining information. The optimization algorithm monitors the dynamic behavior of the data and identifies distinctive features in the data set.

 

 

References Cited

 

[CBO+02] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. "SMOTE: Synthetic Minority Over-sampling TEchnique." Journal of Artificial Intelligence Research, 16:321-357, 2002.

[JDC87] A. K. Jain, R. C. Dubes, and C.-C. Chen. "Bootstrap techniques for error estimation." IEEE Transactions on Pattern Analysis and Machine Intelligence, 9:628-633, 1987.

[Elk01] C. Elkan. "The foundations of cost-sensitive learning." In IJCAI, pages 973-978, 2001.

[ZE01] B. Zadrozny and C. Elkan. "Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 204-213, 2001.

[APL98] J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 37-45. ACM, 1998.

[EIS13] J. Eisenstein. What to do about bad language on the internet. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT), 2013.

[LWJ12] F. Liu, F. Weng, and X. Jiang. A broad-coverage normalization system for social media language. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1035-1044, 2012.