I am Pin Li

Data Scientist,Product Analyst,Researcher,Photographer,Learner

Name: Pin Li

Profile: Data Scientist

Email: pinli0206@gmail.com

Location: Rochester, NY

Skill

Python 90%
SQL 85%
Javascript 60%
Java 50%
PHP 45%
HTML 45%
About Me

My name is Pin Li, and I have 3 years of working experience in data analyst role. I enjoy digging valuable insights via coding and making an impact to all stakeholders. My most frequently used coding languages are Python, SQL. I am familiar with most data tools like DBT, Snowflake, Tableau, PowerBI.

I learn things quickly and would love to make an impact. During my internship at iSmileTechnologies Inc., I quickly learnt Node.js in one week and finished the chat robot and recommendation logic in two months. During my analyst career I have been volunteered to automate the boring stuff. For example I web scraped tracking data for my previous company in logistics industry and stored them into database for future reference. As you can see I am open-minded and learn new things quickly. I want to become a team player and dedicate all my skills and talents to develop high-quality and unique products.

Passions

I am passionate about...

Data

3 Years work experience
Research Experience with 2 publications
MS Data Science from University of Rochester
Love to play with social media data

Coding

Keen on automating boring routine
Built chatbot and recommendation system
Built mini restaurant system (school project)
Enjoy having dinner with Algo course videos

Photography

Love animals
Love coincidence
Wish to tell a story through my lens
A 10-year amateur

Portfolio

Projects I am proud of

NLP

The COVID-19 Pandemic and Mental Health Concerns on Twitter

We aim to examine mental health discussions on Twitter during the COVID-19 pandemic in the US and infer demographic composition of Twitter users who had mental health concerns. Extract mental health-related tweets from the US and conduct topic modeling using LDA to monitor users’ discussions surrounding mental health concerns. Publicated by Health Data Science in 2022.

NLP

Portraits of Working from Home during the COVID-19 Pandemic

Conduct a large-scale social media study using Twitter data to portrait different groups who have positive/negative opinions about WFH. Perform OLS regression to investigate relationship between sentiment about WFH and user characteristics. Use LDA to extract topics and discover how tweet contents relate to people's attitude. Publicated by JMIR medical informatics in 2021.

Database & Web

Local Restaurant Pickup Order System using PHP & HTML

The goal of our project is to build our own database and web interfaces for a local restaurant to run a successful food ordering system, including both customer interface and admin management interface. The project involves designing & creating database with MySQL, creating web interfaces with HTML, and connecting the interfaces with server database using PHP.

Modeling & Web

Credit Risk Prediction Model With LightGBM

Our goal for this project is not only to find a relatively accurate and robust model for FICO risk prediction, but more importantly, to improve its interpretability and give understandable explanations for sales representatives in a bank/credit card company can use to decide on accepting or rejecting applications. We made an interactive page showing model results via Streamlit.

Real Scenario Case

National Chain Retailer Attraction Plan using Python & Tableau

The project objective is to attract a national retailer and identify an optimal location for it in Downtown Rochester with consideration of economic status, demographic attributes and business opportunities. Methodologies included quantitative and qualitative research: data augmentation, peer city comparable analysis, descriptive analysis, predictive modeling, and final pitch deck.

Business Case

Wine Retailer Case - Casual Effect Analysis using R & Tableau

This case study is insight-oriented concerning the effectiveness of email advertising and the potential target markets. My team utilized R to conduct linear regression to see the average causal effect difference, and casual forest to determine whether we should target one certain customer based on the potential benefits that generate compared to promotion costs.

Gallery

Pictures I take, stories I told.