Close

Wei Zhen Teoh

AI Researcher & Engineer

About Me

I work at Descript as an AI researcher. I develop audio processing technology for software product. Back in school I studied computer science, mathematics, statistics and finance. I am originally from Malaysia.

Work Experience

Lyrebird (acquired by Descript)

AI Researcher

I work across audio processing related projects, mainly speech synthesis. I develop machine learning algorithms and their interfaces with product backend. I also design evaluation system to benchmark technology progress. On product front, I work with product manager to dissect technology issues faced by users.

Education

University of Toronto

Sept 2017 - Dec 2018

MSc Applied Computing

This professional program is geared towards training students for applied research roles in software technology industry. My major program concentration falls in the field of Machine Learning. I completed courses and technical projects covering both theories and applications in the field.


Courses completed:
CSC2541 - Scalable and Flexible Models of Uncertainty
CSC2547 - Learning Discrete Latent Structure
CSC2548 - Machine Learning in Computer Vision

University of Toronto

Sept 2013 - Apr 2017

BSc Mathematical Applications, Statistics

My program concentrations span across the fields of Mathematics, Statistics, Economics and Finance.


Courses completed:
MAT357 - Real Analysis
APM466 - Mathematical Finance
ECO326 - Game Theory
STA447 - Stochastic Processes
STA414 - Statistical Methods for Machine Learning
CSC321 - Neural Networks

Publications

Salient Facial Features from Humans and Deep Neural Networks
S Sun*, WZ Teoh*, M Guerzhoy
arXiv preprint arXiv:2003.08765

In this work, we explore the features that are used by humans and by ConvNets to classify faces. We use Guided Backpropagation to visualize the facial features that influence the output of a ConvNet the most when identifying specific individuals. we develop a human intelligence task to find out which facial features humans find to be the most important. We explore the differences between the saliency information gathered from humans and from ConvNets.

Paper

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
K Kumar, R Kumar, T de Boissiere, L Gestin, WZ Teoh, J Sotelo, A Brébisson, Y Bengio, A Courville
Advances in Neural Information Processing Systems, 14881-892

We successfully train GANs to generate high quality coherent audio waveforms. We apply our models on the tasks of speech synthesis and music applications. Our model is non-autoregressive, with significantly fewer parameters than competing models and generalizes to unseen speakers for speech synthesis. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks.

Blog Post Paper Code

Selected Projects

Generating Emotional Speech in Cloned Voices
Master Program Internship Project, 2018
Lyrebird & University of Toronto Department of Computer Science

In this project we work towards a Text-to-Speech (TTS) model for cloned voices with controllable prosody. Our goal is two-fold: to copy the vocal identity from a new speaker in a time-and-data efficient manner; and to generate artificial speech in the cloned voice with emotion that has not previously been observed in the new speaker’s recordings.

Poster

Uncertainty Guided Recommendation with Bandits
Master Program Course Project, 2017
University of Toronto Department of Computer Science

We provide a model for Instantaneous Feedback Recommendation Systems. We model the ratings in the MovieLens dataset using neural network probabilistic factorization models. The posterior distributions of user's and item's latent vector parameters are approximated using stochastic variational inference. We recast the items in the ratings matrix as arms in the multi-armed bandits context. The predictive reward distribution of consuming each item for a user depends on repective posterior latents, and gets adjusted as new ratings realized. By applying bandit policy functions, recommendations are provided to a user with a balance of interest exploration and exploitation intents.

Report

Handwriting Synthesis with RNN
Reproducibility Project, 2018

The purpose of this project is to reproduce the results in the paper "Generating Sequences With Recurrent Neural Networks" by Alex Graves. This paper is an important milestone for the class of variable sequence generative models. The location-based attention model applied to condition handwriting generation with text is an important engineering innovation. It serves as important inspiration for many modern text-to-speech models.

Code

Skills