ShuMing Peh

+65 9899 0206

Blk 650 Senja Link
670650
Singapore

shuming.peh@gmail.com
Github / LinkedIn

Summary

Highlight: Data scientist/Machine Learning Engineer with 5+ years’ experience building end-to-end data apps; data pipeline, modelling, continuous re-training pipeline to end deployment


Mathematical/Statistical Skills

Derivation: Advance calculus of differential equations, stochastic calculus and real analysis

Modeling: Machine learning, time series analysis, sampling, ANOVA analysis, A/B testing, linear algebra, model/feature selection

Hands on experience: applying several ML/statistical algorithms to real world problems: (deep) Neural Network, Recurrent Neural Network, Gradient Boosting, Ensembling techniques (inclusive of stacking and blending), Clustering, GLM, Markov, Game Theory and Simulation Models.


Programming Skills

Functional programming: Python, PySpark

Distributed systems: Hadoop, Hive, Spark

Databases: MySQL, Postgres, Redshift

Data visualization: R (inclusive of Rshiny), Python, Tableau

Model deployment/Orchestration: Airflow, Docker, Kubernetes, MLflow

Tech Stack: AWS

Professional experience

Senior Machine Learning Engineer

ShopBack

  • Developing and productionize deep (and machine) learning recommendation models in-house
  • Responsible for end-to-end MLOps
  • Data pipeline and model development
  • Continuous re-training of model
  • Model deployment, maintenance and AB testing
  • Productionized models
  • Deals recommendation for offline vertical:
  • Designed and built a learning to rank (LTR) model from gradient boosted trees
  • MLflow is used as the model tracking and registry
  • Batch prediction of a daily update
  • Daily airflow job that predicts user cohort recommendation, and uploads results to Postgres
  • FastAPI is dockerized and deployed onto (Kubernetes) EKS cluster
  • AB tested against manual curation, with model beating manual convincingly
  • Automated weekly retraining pipeline that decides if the retrain model will replace current production model by comparing evaluation metric (MRR) on same test dataset
  • Merchant recommendation for online vertical:
  • Designed and built a rNN deep learning model (GRU4REC) from scratch
  • MLflow is used as the model tracking and registry
  • Real-time prediction
  • Model client service (FastAPI) is dockerized and deployed onto (Kubernetes) EKS cluster
  • Model serving (BentoML) is dockerized and deployed onto (Kubernetes) EKS cluste
  • AB tested against AWS personalize model, and new model managed to beat AWS comfortably
  • Automated weekly retraining pipeline, that decides if the retrain model will replace current production model by comparing evaluation (MRR) metric
  • Productionized model result
  • Deals recommendation for (SBGO) offline vertical: +~15% in CTR and +~20% in transactions
  • Merchant recommendation for (SBOC) online vertical: +~3% in CTR and +~4% in transactions
  • Data Scientist/Founder

    UpTick (Data Consultancy)

  • Providing data (science and engineering) consultancy services to SME and start-ups
  • Working with clients and built out end to end data products, notable clients are:
  • Ablr (fintech)
  • Built out ETL and data infrastructure
  • Created and integrated a credit (score) engine that leverages on 2 deep neural network (dNN) models of loan default and will user be late for payments
  • Created and integrated a differential interest rate pricing model for loans with differing credit score
  • 3PlayGrounds
  • Built out ETL and data infrastructure as a baseline for having more robust and useable data for future developments like predictive analytics
  • GrowthOS
  • Creating content that is related to data analysis and (pirate) growth metrics
  • Data Scientist 2

    Skyscanner Ltd

  • Attached to Skyscanner accommodation ranking team
  • Worked on hotels ranking problem of returning the best sorted search results personalized to the user and search parameters
  • Implemented new features and model (LighGBM) that improved hotel booking relatively by 10%
  • Created a dNN model to replace current model, offline evaluations improve model’s MRR +~5%
  • Data Scientist 2

    Skyscanner Ltd

  • Attached to Skyscanner car hire vertical
  • Integrated an improved cross-sell model for car hire with deep learning that is more scalable with better evaluation metrics that has increased car hire searches and redirects by 5% and 1% respectively
  • Integrated a lead time model of when is the best time interval to send a push notification for the car hire cross-sell model
  • Created and integrated a price prediction model for car hire itinerary, with 80% accuracy (using RMSE)
  • Created a vertical recommendation, decision to recommend car hire or airport transfer for different cities
  • Created and tested a result sorting algorithm for car hire search results page, which improves redirects and bookings by 3% and 1% respectively.
  • Growth

    Skyscanner Ltd

  • Developed 2 in-house models from scratch for Multi Touch Attribution
  • Created a predictive model for 7 days retention as part of early warning system for business metrics
  • Improved app onboarding experience by leveraging on ML to determine what app features higher retained users are using in their initial month of installs
  • Maintained and improved a statistical self-servicing tool that proves causal impact on changes made to Skyscanner.
  • Data Analyst

    Firemonkeys Electronic Arts

  • Involved in scoping of AB tests and conducted statistical analysis on test results to provide insights and recommendation
  • Created simulation models for game events to validate and assist the tuning of values from product managers. Examples: Time taken, or total spend to complete events
  • Created and tested a price optimization model that has increased weekend revenue by 80%
  • Developed a bottom up forecasting revenue model that is now used by EA company-wide
  • Created a user acquisition spend optimization model that minimizes risk and maximize returns that has helped acquisition team for their buy strategy and direction
  • Managed an associate analyst and trained up his technical and soft skills to be a data analyst
  • Awards/Side Projects

    Sky High Award

    Skyscanner Ltd

  • Built out data products for other teams outside of car hire
  • Successfully mentored a marketing graduate to transit into a Data Analyst role
  • Speaker at AnalytiCon

    Firemonkeys Electronic Arts

  • Presented as the last speaker on day 1 in EA inaugural Analytics Conference on content planning saturation using the statistical theory of mean reversion
  • Talent Spotlight Award

    Firemonkeys Electronic Arts

  • Recognized and mentioned in EA Mobile quarterly general meeting for building the bottom up forecasting revenue model for company-wide use
  • Education

    Major in Quantitative Finance

    Singapore Management University

    Personal Details

    Nationality

  • Singapore
  • Working rights in Australia

  • Requires sponsorship
  • Languages

  • English (native)
  • Mandarin
  • Find me on GitHub