Hello!

I'm Desmond Kao

Software Engineer, Data Scientist, AI Engineer, Quant, Classical Pianist

My skills include...

AWS
Azure
C#
C++
Docker
FastAPI
Java
JavaScript
Kubernetes
MySQL
Node.js
NumPy
Pandas
PostgreSQL
PyTorch
Python
R
React
Scikit-learn
Snowflake
TensorFlow
TypeScript
AWS
Azure
C#
C++
Docker
FastAPI
Java
JavaScript
Kubernetes
MySQL
Node.js
NumPy
Pandas
PostgreSQL
PyTorch
Python
R
React
Scikit-learn
Snowflake
TensorFlow
TypeScript

About Me

Originally from San Francisco, I'm now a Computer Science and Data Science student at NYU building AI systems that solve tangible business problems across New York City's financial and tech landscape. I've engineered full-stack platforms and machine learning pipelines for NYC hedge funds, built automated pricing models for real estate companies, and developed NLP tools for academic research centers including NYU's Carter Journalism Institute and Yale. Whether it's automating investment research workflows that process millions of data points daily, or building AI systems that help teams answer complex questions in seconds instead of hours, I love turning messy data challenges into production-ready solutions. Beyond engineering, I'm a classical pianist with 16 years of experience, and I explore creative outlets through Muay Thai and music production.

Languages

Python JavaScript TypeScript SQL Java C++ C# R MATLAB HTML/CSS

Cloud & Infrastructure

AWS (EC2, S3, Lambda) Azure Snowflake Firebase Docker Kubernetes

Frameworks & Libraries

React Node.js Express FastAPI Flask Angular Spark PySpark Streamlit

AI & Machine Learning

TensorFlow PyTorch Scikit-learn Pandas NumPy SciPy HuggingFace LangGraph RAG NLP Deep Learning Time Series Statistics A/B Testing Causal Inference Matplotlib Seaborn Plotly Vector DB

Tools & Databases

PostgreSQL MySQL Git PowerBI Tableau Jupyter Excel Datadog CI/CD

Projects

Proprietary Code
01

Compass – AI Financial Dashboard

Built equities-focused AI dashboard with Python/FastAPI integrating Grok API for factor report generation. Developed statistical analysis engine using SciPy for traditional quant metrics and portfolio analytics. Deployed full-stack system combining LLM capabilities with rigorous statistical methods for investment research.

Python
FastAPI
Grok API
React
SciPyStatistics
RAG
Proprietary Code
02

Hedge Fund Data Intelligence Platform

Built firmwide AI chatbot with Python/FastAPI connecting directly to Snowflake for flexible querying across all company data. Integrated Grok API and PostgreSQL for natural language queries enabling instant access to research reports, market data, and internal documents. Production system serving entire investment staff.

Python
FastAPI
RAG
Snowflake
Grok API
PostgreSQL
Proprietary Code
03

Iris – Investor Relations Platform

Developed full-stack investor relations tool for NYC hedge fund connected to Snowflake data warehouse. Built automated data validation pipelines with PostgreSQL backend and report generation system that exports client-ready documents in existing formats, eliminating manual data pulling and validation. One-stop platform for generating all client-facing investment reports.

Python
React
Snowflake
FastAPI
PostgreSQL
Data Validation
Proprietary Code
07

ML Stock Price Predictor – Short Seller Analysis

Developed machine learning models predicting stock price movements following short seller report releases. Built feature engineering pipeline extracting signals from report text using NLP and combining with market data. Implemented and compared logistic regression and deep Q-learning (DQL) models for binary classification of post-report price direction, achieving strong predictive performance for trading strategy development.

Python
Scikit-learn
Deep Learning
NLP
Feature EngineeringLogistic Regression
Proprietary Code
08

Property Distress Prediction Model

Built logistic regression model predicting property distress for real estate portfolio management. Developed automated pipeline processing and scoring properties daily with strong predictive accuracy, enabling proactive risk assessment and investment decisions.

Python
Scikit-learn
Logistic RegressionFeature Engineering
Proprietary Code
012

NLP Web Scraping API

Built NLP API for extracting structured data from unstructured sources using Gemini and Claude APIs. Developed automated web scraping pipeline with intelligent feature extraction and validation, transforming raw web data into clean, actionable datasets. Containerized with Docker for scalable deployment.

Python
Gemini API
Claude API
NLP
Web Scraping
FastAPI
Docker
Proprietary Code
013

DataLens – Enterprise Data Quality Dashboard

Built ETL monitoring pipeline validating schema integrity and detecting anomalies across millions of records daily. Developed PowerBI dashboard tracking data freshness and pipeline health for multiple datasets. Significantly reduced manual audit time with automated alerts and LLM-generated anomaly summaries.

Python
SQL
PySpark
PowerBI
Azure

Experience

Software & AI Engineering Intern

AlphaQuest

June 2025 - Present
  • Built AI chatbot in Python/FastAPI with Snowflake RAG + OpenAI API, significantly reducing research time for investment staff
  • Developed AI-driven commentary pipeline automating daily report generation, synthesizing news, research, and portfolio data
  • Created full-stack investor relations platform in React/Python automating chart generation, analysis reports, and ad hoc queries, streamlining all IR workflows for client communications
Python
FastAPI
React
Snowflake
OpenAI API
RAG

Software & Data Engineering Intern

Catenary Alternatives Asset Management

Dec 2024 - May 2025
  • Developed firmwide Flask/SQL research API enabling teams to query live data with dramatically improved retrieval speeds
  • Built autonomous LLM scraping pipeline in Flask/Azure with daily data collection, significantly reducing research time
  • Created AI Excel assistant in Python using Perplexity + Tavily APIs, automating large-scale data entry tasks
Python
Flask
SQL
Azure
NLP

ML & Data Engineering Intern

Neue Urban

Nov 2024 - Feb 2025
  • Developed ML valuation API in Python, significantly improving property pricing accuracy and reducing appraisal costs
  • Automated end-to-end valuation pipeline from data ingestion to prediction, dramatically reducing review time
  • Integrated live market/listing APIs to enrich model features, improving valuation precision across portfolios
Python
Scikit-learn
FastAPI
Machine Learning

AI Research Engineering Intern

Arthur L. Carter Journalism Institute

Nov 2024 - Mar 2025
  • Built LLM/NLP pipeline in Python/HuggingFace analyzing large-scale social media posts for sentiment trends
  • Developed large-scale document scraper with LLM-based summarization, significantly reducing publication review time
  • Created automated bias detection system in Python evaluating news articles for neutrality and language framing
Python
NLP
HuggingFace
Web Scraping

ML Engineering Intern

Yale University (HP Funded Research)

June 2024 - July 2024
  • Built lightweight LLM with TensorFlow/Keras, using pruning and quantization for low-power on-device inference
  • Benchmarked model latency and power consumption across CPU/GPU setups, optimizing for sustainable local deployment
  • Led ML/AI sustainability workshop for industry professionals, sharing practical techniques for efficient model design
Python
TensorFlow
Keras
Machine Learning

Education

B.A. in Computer Science and Data Science

New York University

Sep 2023 - May 2027
  • GPA: 3.75
  • Coursework: Data Structures, Basic Algorithms, Database Management, Linear Algebra, Discrete Mathematics, Calculus I & II, Principles of Data Science I & II, Causal Inference, AI Ethics
  • Member of BUGS (Open Source Club @ NYU)