This NSF-funded summer program teaches students the principles of data science and machine learning. Students will learn concepts about data modeling, databases, SQL, data cleaning, data wrangling, and visualization. Students will learn basic Python programming for processing data and learn machine learning techniques, including basic concepts, training, classification, and sentiment analysis.
The program will use the open-source Texera platform to help students get familiar with these concepts even if they have a limited computing background. We will include a capstone project for students to learn these skills by analyzing real data (e.g., social media data) to apply the knowledge to conduct ML-based data science. The instructors and staff include professors and Ph.D. students from UCI and UCLA who are experts in data management, data science, and machine learning.
Teaching Assistants (TAs)
- Program date: 07/10/2023 – 07/21/2023
- Deadline to apply: 05/22/2023
- Acceptance notification: 05/26/2023
- Eligibility: Rising high school students in grades 9th-12th and graduating 12th graders.
- Prerequisites: Algebra II or Integrated Math II
- Contact email: ds4all AT ics DOT uci DOT edu
- If interested, please click here to apply.
|Date||Day||Morning (Lecture)||Afternoon (Lab)|
|7/10/23||1||Orientation, Texera overview, sample tweets, operators, form teams. (Slides)||Explore Texera by using basic operators to create a Tweet analysis workflow for a topic of interest.||UCI|
Python, User-defined functions, Lambda functions. (Slides)
|Use UDFs and Lambda Functions to do text cleaning and visualizations on Tweet data.||UCI|
Python UDF (continued)
|Use UDFs to implement custom aggregations for Tweet data.||UCI|
|7/13/23||4||Control Statements in Python, Semi-structured data (JSON)|
|Use Table APIs to implement custom visualizations on Tweet data.||UCI|
|7/14/23||5||JSON Parsing, Geolocation-based visualization|
|Milestone presentation; Complete the whole pipeline of tweet analysis (from JSONL to visualizations)||UCI|
ML model: classification
|Use UDF for a classification task||UCLA/UCI|
|7/18/23||7||ML model: clustering (Slides)||Use UDF for a clustering task||UCLA/UCI|
|7/19/23||8||ML model: neural networks, CNN, RNN, MLP, transformer(Slides)||Use UDF for representative NN models, e.g., CNN||UCLA/UCI|
|7/20/23||9||ML model: recommender systems (Slides)||Use twitter data to build a workflow including multiple components||UCLA/UCI|
|7/21/23||10||ML model: Large Language Models (No Slides)||Use twitter data to build a workflow including multiple components||UCLA/UCI|