As stated by Mike Norman, SDSC’s director, “gathering data is easy. In fact, it’s so easy it’s exceeding our capacity to validate, analyze, visualize, store and curate. And, many of our critical scientific problems can only be solved by harnessing this data.” Integrating the methodologies of mathematics, statistics, visualization, and computer science, data scientists can address the Big Data challenge. PACE is nurturing a rich collaborative learning environment for this Big Data research community to share knowledge and to learn critical data analysis tools that discover patterns and relationships in data that may contribute to valid predictions. Predicting future trends and behaviors – from the epic to the everyday – allows for proactive, knowledge-driven decisions.
The PACE series of boot camps are designed to provide professionals in business enterprises and scientific communities with the skills critical to design, build, verify, and test predictive data models. Data mining –– the art and science of learning from data –– covers a number of different procedures. This hands-on course emphasizes key learning techniques: decision trees, numeric prediction, clustering, Bayesian learning, artificial neural networks (ANNs), support vector machines (SVMs), etc. Workshop participants will have access to a comprehensive set of data mining tools available on SDSC’s Gordon, one of the world’s most powerful supercomputers with 300 Terabytes of flash memory. Moreover, with access to this computing resource, participants will be able to sharpen their skills, apply data mining algorithms to real data, and interpret the results.
For industry to be competitive in the global marketplace it must have access to the human and technological resources that increase capacity and productivity. PACE aims to offer an exemplary data-mining program renowned for educating and training 21st century data scientists.