NTU
← Back
NTU logo

2026 — 2027 · Full-time Student

M.Sc. Data Science in CCDS

Nanyang Technological University, Singapore

"After 5 years at POI-TECH — leading teams, earning recognition, and operating in my comfort zone — I chose to step away. Not because I had to, but because I knew AI was reshaping everything, and I wanted to be on the right side of that shift. Singapore, CCDS, and a deep dive into data science felt like the only honest next step."

The Decision

🌏

International Reach

Five years building products for Chinese enterprises made me realize how much I wanted to think and operate at a global scale. Singapore was the bridge.

🤖

AI Is Inevitable

I had already shipped industrial AI products. But I knew the difference between directing AI engineers and truly understanding the models. I wanted the latter.

📊

Obsessed with Data

Every product decision I made was rooted in data. Formalizing that instinct — with statistics, ML theory, and systems — felt like the most natural investment.

SG
Location
Singapore
CCDS
School
College of Computing
2026
Intake
Aug 2026 – 2027
3
Projects
In first semester

Academic Projects

PythonRaBitQPCAK-MeansMar – Apr 2026

High-Performance Vector Search System on DBLP Dataset via RaBitQ

100k+ docs
Processed via byte-stream parsing with 384-d semantic embeddings
1-bit quantization
RaBitQ + Popcount — memory reduced to tens of MBs
66% compute cut
PCA 384→128 dims while preserving recall > 90% variance
High-Performance Vector Search System on DBLP Dataset via RaBitQ diagram
  • Processed 100k+ documents via byte-stream parsing and generated 384-d semantic embeddings.
  • Designed IVF + K-Means index to reduce search space and improve query efficiency.
  • Applied RaBitQ quantization (1-bit) with Popcount acceleration, reducing memory to tens of MBs.
  • Optimized with PCA (384→128), cutting compute by 66% while preserving recall >90% variance.
PythonLLM EvaluationData AnalysisMar – Apr 2026 (ongoing)

Safety Evaluation of Generative Autonomous Driving Videos

100+ videos
Labeled dataset for driving video safety evaluation
Multi-level errors
Semantic, logical, and decision-level failure detection
LLM-powered
Automated prompts for safety scoring at scale
  • Built a safety evaluation pipeline for generative driving video models, detecting semantic, logical, and decision-level errors.
  • Created a labeled dataset of 100+ videos and developed LLM-based prompts for automated safety evaluation.
  • Conducted failure mode analysis to quantify and analyze safety issue distributions.
PostgreSQLPythonMar – Apr 2026

Large-Scale DBLP Database System: Parsing, Modeling and Optimization

PostgreSQL
Robust relational schema for academic publication data
XML → SQL
Full pipeline from raw DBLP XML to structured tables
Indexing study
Scalability experiments across different DB sizes
  • Architected a robust PostgreSQL relational database for large-scale academic publication data.
  • Parsed and transformed the DBLP XML dataset into structured tables for analysis.
  • Evaluated query performance and scalability by experimenting with different database sizes and indexing strategies.
← Back to Experience