Resources

Presentation

PACE / XSEDE Intro to R

R is a statistics package created by statisticians that can be compared to a spreadsheet without visual representation
 
Pros
• Programmable data analysis
• Pre-processing data from different sources
• Data Visualization
• Interactive: interpreted not complied
Cons
• Graphical User Interface
• Low-level programming

Introduction to Data Mining

Data Mining is NOT...
• Data Warehousing
• Deductive) query processing
• SQL/ Reporting
• Software Agents
• Expert Systems
• Online Analytical Processing (OLAP)
• Statistical Analysis Tool
• Data visualization
• BI – Business Intelligence

Data Mining on Gordon

What can we do with Data Mining?
• Exploratory Data Analysis
• Predictive Modeling: Classification and Regression
• Descriptive Modeling
• Cluster analysis/segmentation
• Discovering Patterns and Rules
• Association/Dependency rules
• Sequential patterns
• Temporal sequences
• Deviation detection

PACE and Gordon

PACE and Gordon:
• DataOasis = High-performanceLustreparallelfilesystem
• SDSC Project = Scalable, flexible robust NFS storage
• SDSC Cloud = Reliable storage for easy access, sharing and collaboration

PACE and SDSC

Predictive Analytics Center of Excellence (PACE) @ San Diego Supercomputer Center (SDSC) / University of California, San Diego (UCSD). Includes brief history of SDSC and its Cyberinfrastructure components and several PACE project descriptions.