Data Science for ALL (2023)

(2023)

This NSF-funded summer program teaches students the principles of data science and machine learning. Students will learn concepts about data modeling, databases, SQL, data cleaning, data wrangling, and visualization. Students will learn basic Python programming for processing data and learn machine learning techniques, including basic concepts, training, classification, and sentiment analysis.

The program will use the open-source Texera platform to help students get familiar with these concepts even if they have a limited computing background. We will include a capstone project for students to learn these skills by analyzing real data (e.g., social media data) to apply the knowledge to conduct ML-based data science. The instructors and staff include professors and Ph.D. students from UCI and UCLA who are experts in data management, data science, and machine learning.


Summer 2023 DS4ALL Photos

Summer 2023 DS4ALL Students and Instructors

Faculty Instructors

Dr. Chen Li, Department of Computer Science, UCI

Dr. Wei Wang, Department of Computer Science, UCLA

Teaching Assistants (TAs)

Xiaozhen Liu, Department of Computer Science, UCI
Profile photo for Mingyu Derek Ma

Mingyu Derek Ma, Department of Computer Science, UCLA

Yanqiao Zhu, Department of Computer Science, UCLA

Alexander Taylor, Department of Computer Science, UCLA

Jeehyun Hwang, Department of Computer Science, UCLA

Details

  • Program date: 07/10/2023 – 07/21/2023
  • Deadline to apply: 05/22/2023
  • Acceptance notification: 05/26/2023
  • Eligibility: Rising high school students in grades 9th-12th and graduating 12th graders.
  • Prerequisites: Algebra II or Integrated Math II
  • Contact email: ds4all AT ics DOT uci DOT edu
  • If interested, please click here to apply.

Instruction Location

Room 110 (Kay Lab), Information & Computer Science II (ICS2), University of California, Irvine
Irvine, CA 92617


Program Schedule

Lunch included.

DateDayMorning (Lecture)Afternoon (Lab)
(Week 1)
(Week 2)
Instructor/TA
7/10/231Orientation, Texera overview, sample tweets, operators, form teams. (Slides)Explore Texera by using basic operators to create a Tweet analysis workflow for a topic of interest. UCI
7/11/232
Python, User-defined functions, Lambda functions. (Slides)
Use UDFs and Lambda Functions to do text cleaning and visualizations on Tweet data.UCI
7/12/233

Python UDF (continued)
(Slides)
Use UDFs to implement custom aggregations for Tweet data.UCI
7/13/234Control Statements in Python, Semi-structured data (JSON)
(Slides)
Use Table APIs to implement custom visualizations on Tweet data.UCI
7/14/235JSON Parsing, Geolocation-based visualization
(Slides)
Milestone presentation; Complete the whole pipeline of tweet analysis (from JSONL to visualizations)UCI
7/17/236ML/AI Overview
(Slides);
ML model: classification
(Slides
Use UDF for a classification taskUCLA/UCI
7/18/237ML model: clustering (Slides)Use UDF for a clustering taskUCLA/UCI
7/19/238ML model: neural networks, CNN, RNN, MLP, transformer(Slides)Use UDF for representative NN models, e.g., CNNUCLA/UCI
7/20/239ML model: recommender systems (Slides)Use twitter data to build a workflow including multiple componentsUCLA/UCI
7/21/2310ML model: Large Language Models (No Slides)Use twitter data to build a workflow including multiple componentsUCLA/UCI