Prathamesh Mandke

I work as a Senior ML Engineer at Qualcomm AI Research developing software for on-device personalization and adaptation of Large Language and Vision Models.

Previously, during my Master's at Virginia Tech, I worked with Prof. Anuj Karpatne in the Dept. of CS on using domain specific prior knowledge in machine learning to accelerate scientific discovery. I spent my summer's interning at Qualcomm - working on model efficieny (AIMET), Flytbase, Inc. - working on deep learning in autonomous drones and at Siemens working on industrial autonomous systems.

I enjoy writing code in Python and C++.
In my free time, I love to play soccer, take road trips and play the keyboard.
Here's a cool map of places I have visited in the United States.

Email: pkmandke AT vt DOT edu

GitHub / LinkedIn / Blog / Google Scholar / CV

Experience

	Qualcomm Senior Machine Learning Engineer July 2021 - Present - Implemented Qualcomm's first LLM LoRA fine-tuning on SnapDragon mobile CPU using PyTorch in C++. - Leveraged LLM block-quantization, checkpointing, etc. to significantly reduce peak memory, US patent accepted. - Worked with MSFT Research on co-developing a Federated Learning SDK using C++/gRPC/Azure. Work published at Interspeech'23: PDF. - Played key role in developing end-to-end SW for Federated and On-Device Personalization R&D using C++/Python. Work demonstrated at NeurIPS'21 (YouTube) and NeurIPS'23(YouTube).
	Qualcomm Software Engineer Intern May 2020 - August 2020 I worked with the AI Model Efficieny Toolkit (AIMET) team at Qualcomm AI Research. I developed visualization utilities for AIMET features such as Quantization Simulation and Data Free Quantization including Cross Layer Scaling and Bias Correction. I also implemented a utility to convert AIMET’s internal model graph representation to ONNX, setting a framework for extending visualization support to additional new features.
	Virginia Tech Graduate Research Assistantship January 2020 - May 2020 poster / I worked on the problem of sparse view CT image reconstruction using deep Convolution Neural Networks. Modified U-Net CNN architecture with transpose convolutions and residual connection as well as reducing parameter footprint to achieve robust reconstruction performance in terms of SSIM and PSNR. Implemented Wasserstein GAN based reconstruction pipelines in PyTorch and evaluated against vanilla GANs.
	Flytbase, Inc. Summer Intern May 2019 - July 2019 Worked on deep learning based barcode localization in warehouse automation using autonomous drones. Trained and tested YoloV2, YoloV3, Faster-RCNN and SSD models with Inception, ResNet and MobileNet backbones using a custom built dataset on the NVIDIA Titan X GPU. Explored embedded deployment of models using Intel’s Movidius Neural Compute Stick using docker in linux. Tools: Tensorflow, Darknet , Python, C, bash.
	Siemens, Ltd. Intern - Autonomous Systems Summer'17 & Summer'18 Domain - Industrial Autonomous Systems Worked in the domain of industrial autonomous systems in the R&D division of the Switchgear Digital Factory. I designed, built and programmed a contactor testing fixture automaton that achieved the target reduction in cycle time and has been deployed on the assembly line at Siemens in Mumbai, India. Keywords: PLC prgramming, programmable DMM, linear position transducer and auto-transformer interfacing and control.

Education

Virginia Tech

Master's degree, Computer Engineering, GPA: 4.0/4.0
August 2019 - Present

Courses: Deep Learning, Adv Machine Learning, Adv Parallel Computing, Information Storage & Retrieval, Electronic Design & Automation

College of Engineering, Pune (COEP)

Bachelor of Technology, Electronics and Telecommunication, GPA: 9.11/10
May 2019

Bachelor’s thesis: Deep Knowledge Distillation: Model Compression of FaceNet CNNs for OpenCL-FPGA Implementation
Courses: Data Structures, Information Theory & Coding, Embedded Software & RTOS, Object Oriented Programming, Speech Processing, Soft Computing.
Activities: Center of Excellence in Signal & Image Processing, COEP Amateur Radio Club, COEP Atheltics

Projects

	Deep Knowledge Distillation: Model Compression of FaceNet CNNs for OpenCL-FPGA implementation Bachelor's thesis, College of Engineering, Pune December 2018 - May 2019 website / Our work extends the idea of distillation based knowledge transfer as suggested by Hinton et al., to the regression based FaceNet model by training a MobileNet architecture from a pre-trained Inception based CNN in a student teacher setting. By training multiple models on ~1M face images from the VGG2 dataset, our work demonstrats that the student networks show similar and even marginally better performance than the teacher on the LFW face verification task, thus, corroborating the theory behind knowledge transfer. We also benchmarked the performance of the Inception (teacher) and the MobileNet (student) networks on the DE10 Nano SoC FPGA using OpenCL.
	Clustering Large Scale Text Corpora for Efficient Information Retrieval Virginia Tech August 2019 - December 2019 website / code / Vectorized 2 large text corpora viz., the ETD corpus with ~33k documents and the Tobacco Settlement articles corpus with ~1M documents using Doc2Vec - a neural network based algorithm by Quoc Le and Tomas Mikolov. Implemented K-Means clustering, Agglomerative clustering, DBSCAN and Birch on the document vectors, thereby benchmarking them based on such metrics as the Calinski-Harasbasz Index, the Davies-Bouldin score and the Silhoutte score. Also, implemented robust cross-validation for cluster size. All experiments conducted on the Kubernetes cluster in the Dept. of CS at VT using docker for containerization and the Ceph file system.
	Simulated Annealing for the Travelling Tournament Problem ECE-5534: Electronic Design Automation (Virginia Tech) March 2020 code / Implemented the Simulated Annealing algorithm for the travelling tournament problem in C++. Benchmarked performance for non-trivial problem sizes.
	IEEE: Human Posture Recognition using Artificial Neural Networks H. Kale, P. Mandke, H. Mahajan, V. Deshpande, 2018 IEEE 8th International Advance Computing Conference, Greater Noida, India,2018, pp. 272-278. January 2018 - May 2018 paper / dataset / This work proposes the design of an embedded human posture recognition system using Artificial Neural Networks as the classifiers. We present the design, build a prototype and demonstrate the results of experimentation with human subjects. The MEMS IMU MPU-6050 accelerometer sensor is interfaced to the ESP-8266 based NodeMCU to wirelessly transmit the data to a central Raspberry-Pi server using HTTP over TCP/IP for real-time inference. Our design deploys two sensor modules, one each on the thigh and chest of the human subject. We perform experimentation by training an artificial neural network by building our own custom dataset of 44,800 points across 6 postures.
	Lempel-Ziv-Welch Text File Compression - A Python Package College of Engineering, Pune March 2018 - July 2018 code / I worked with Prof. P. P. Bartakke to build this python package for utf-8 text file compression based on the Lempel-Ziv-Welch universal coding algorithm. Achieves O(logN) phrase look-up complexity using the trie data structure for storing codebook entries. As an extension to our work, we studied the variation of the compression ratio as a fucntion of the underlying file distribution. By generating synthetic files with Poisson, Uniform, Gaussian and Exponential distributions we demonstrated a strong correlation between the file distribution and the compression ratio.