I am a PhD student at Michigan State University in the department of Computer Science and Engineering working in the Computer Vision Lab advised by Dr. Xiaoming Liu.

My primary interests are in deep learning, computer vision, autonomous driving. I mainly focus on object recognition in urban scenes. This primarily involves research in pedestrian detection, 2D/3D object detection, and scene forecasting among others.






Selected projects conducted as part of research efforts within Michigan State University CSE doctoral program.

Kinematic 3D Object Detection in Monocular Video (ECCV 2020)

In this work, we propose a novel method for monocular video-based 3D object detection which leverages kinematic motion to extract scene dynamics and improve localization accuracy. We first propose a novel decomposition of object orientation and a self-balancing 3D confidence. Collectively, using only a single model, we efficiently leverage 3D kinematics from monocular videos to improve the overall localization precision in 3D object detection while also producing useful by-products of scene dynamics (ego-motion and per-object velocity).

Find out more at project website, and publication.

The Edge of Depth: Explicit Constraints between Segmentation and Depth (CVPR 2020)

In this work we study the mutual benefits of two common computer vision tasks, self-supervised depth estimation and semantic segmentation from images. We propose to explicitly measure the border consistency between segmentation and depth and minimize it in a greedy manner by iteratively supervising the network towards a locally optimal solution. Our proposed approach advances the state of the art on unsupervised monocular depth estimation in the KITTI.

Find out more at project website, publication, and source code.

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection (ICCV 2019)

We propose a Monocular 3D Region Proposal Network to perform single-shot multi-class 3D object detection. We further design depth-aware convolutional layers which enable location specific feature development. M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.

Find out more at project website, publication, and source code.

Pedestrian Detection with Autoregressive Network Phases (CVPR 2019)

We propose an autoregressive cascaded phase network designed to progressively improve precision. The proposed framework utilizes a lightweight and stackable de-encoder module with inner-lateral convolutions designed to encourage features to both refine and diversify. We explicitly encourage increasing levels of precision by assigning strict labeling policies to each consecutive phase such that early phases develop features primarily focused on achieving high recall and later on accurate precision. Collectively, our framework leads new state-of-the-art performance on challenging settings of Caltech.

Find out more at project website, publication, and source code.

Recurrent Flow-Guided Semantic Forecasting (WACV 2019)

We decompose the semantic forecasting task into two subtasks: current frame segmentation and future optical flow prediction. We build an efficient, effective, low overhead model with three core components: flow prediction network, feature-flow aggregation LSTM, and end-to-end learnable warp layer. Our method achieves state-of-the-art accuracy on short-term and moving objects semantic forecasting while reducing model parameters by up to 95% and increasing efficiency by greater than 40x.

Find out more at project website, publication, and source code.

Illuminating Pedestrians via Simultaneous Detection & Segmentation (ICCV 2017)

In this work, we explore how semantic segmentation can be used to boost pedestrian detection accuracy while having little to no impact on network efficiency. We propose a segmentation infusion network to enable joint supervision on semantic segmentation and pedestrian detection.

Find out more at project website, publication, and source code.

Pedestrian Detection with 30 Class Semantic Segmentation (Video Demonstration)

Monocular urban pedestrian detection system featuring a cascade of networks, and a 30 class semantic segmentation side task jointly trained using Caltech and Cityscapes.

The classes utilized for segmentation are based on classes commonly seen in urban street scenes including roads, buildings, vehicles, pedestrians, traffic signs, and trees. Video is captured on Michigan State University campus in conjunction with the the CANVAS autonomous driving effort.

Course Projects

Selected projects from various coursework at Michigan State University as part of the CSE doctoral program.

Fooling Pedestrian Detection CNNs (Computer and Network Security - 2018)

Course Project Spoofing GAN

We implemented a GAN loss to generate residual images to augment urban driving scenes with 2 primary goals: photo realism and maximum confusion/anarchy for the pedestrian detector. In essence, we aim to fool and attack a state-of-the-art detector. We further investigate the effects of using the synthetic data for alternate training and thus putting the GAN network against a detector. We find that state-of-the-art systems are not naturally robust to such attacks in that the miss-rate error will quadruple (4x) unless the network is trained directly on the real and synthetic image data.

Natural Language Person Retrieval in Traffic Surveillance (Language and Interaction - 2018)

Course Project Person Retrieval NLP

We build a convolutional word embedding network using a series of fully-connected layers followed by two LSTMs and utilize attribute labeling as an auxillary task, resulting in a natural language person retrieval system functional on in-the-wild traffic surveillance data (manually collected at MSU). The system works as a proof-of-concept and, in our experience, performs only modestly due to apparent poor generalization between training and test data domains. We merge the CUHK person description dataset with the PETA pedestrian attribute dataset for training the respective loss functions.

Generating Semi-Synthetic Pedestrian Data (Advanced Computer Graphics - 2017)

Course Project Synthetic Graphics

We build a proof-of-concept project for generating synthetic pedestrians. We use the MakeHuman tool to generate synthetic 3D pedestrian models with highly variable pose, shape, race, hair, clothing, etc. In addition to synthetic, we further use a simple cut-and-paste method based on pixel-level segmentation masks to generate real people. Finally, we place synthetic or real pedestrians onto arbitrary background images. We learn depth cues to place pedestrians at proper scales on the sidewalk or road regions only. In reflection, this method, while a useful proof-of-concept, could be drastically improved using a GAN to enforce realistic shadows, color consistencies, and borders.

For Fun (show all?)

Selected computer science projects from recent years, conducted for little to no reason.

Blasphemy 2016

Javascript Open Source

A simple offline javascript solution to checking and filtering profanity. The project contains a large collection of profane words built in. Each of these words were ran through the English dictionary to generate a list of potential false positives which are then compared against before the filtering finishes.

Simply import blasphemy.js and start having fun!

View Project

Multipi 2015

C# Unity Game

Multipi is a simple Unity game based on an imaginary operation in mathematics called multipiecation. The only objective of the game is to throw pies at other pies. When pies collide they multiply in size or quantity depending on the circumstance.

A man is in game creeps around eating pies, eventually multiplying himself. Score is kept in terms of pi and pie.

The game is playable using Unity's web player below.

Play game

KetteringJS 2015

Javascript API Open Source

KetteringJS is a library meant to give easy and structured access to various information and functions through Javascript. Uses a combination of basic authentication, reliable rest calls, and unfortunate web scraping. Documentation available.

Created as a way to abstract the functions used in KUMobile project (listed below) so that others could make use of it as well. The latest build is available through rawgit CDN service here.

View Project

OUScheduler 2013

Java Open Source

The Oakland Scheduler is a tool created specifically for Oakland University in order to make choosing classes easier. Features: visual representation, filter by day, filter by sections, sort by time, export schedule, and print schedule.

Created for fun through using old quick and dirty algorithms left over from very early prototypes of KUMobile.

View Project

Lightsaber Simulator 2013

C++ OpenCV OpenGL

Simulates up to two 3D Lightsabers from the user's webcam feed in real-time. The application uses OpenCV to track target props and then OpenGL to draw 3D Lightsaber ontop. Flicker and blur also implemented.

This project was lots of fun with very rewarding hard work! Completed as part of a team for virtual reality course at Kettering University.

View Project