I am a second-year graduate student specializing in Machine Learning from the Computer Science Department at Columbia University, New York . My interests revolve around Machine Learning and its applications in the domain of finance and medicine. Currently, I am working under Prof. Hod Lipson in Creative Machines Lab on 3D Food Printing. I am also extending my work under Prof. Pat Stokes on developing a math puzzle for enhancing the logical capabilities of young children.
I completed my Bachelors in Computer Engineering from St. Francis Institute of Technology, Mumbai in 2015. During my under-graduation, I worked under Dr. Kavita Sonawane on Improvising the existing Video Summarization Techniques. The work was also awarded as one of the best projects in 2015. Besides academics, I was into various extra-curricular activities during this tenure. I was a core committee member of Entrepreneurship Cell, Placement Cell and also co-founded a non-profit organization for spreading the awareness of cleanliness.
During Summer- 2017, I worked with the Algorithmic Trading Team in Jefferies LLC, a New-York based Investment Banking Firm, as a Software Developer Intern. My responsibilities revolved around regression analysis, data visualization and software developer. Before this internship, I worked as a Software Developer in the Data Analytics team in a start-up based in Mumbai.
I would describe myself as a confident, bubbling individual with the quest to learn more and more, ability to gel well in a team, be cool during pressure and manage time effectively. I would like to enhance these strengths by working in a well-knit team of intellectuals working in the domain of Machine Learning. My ultimate objective is to employ learning techniques for the upliftment of medical informatics.
Software Developer Intern (May 2017 - September 2017)
During my internship at Jefferies, I worked on two projects – Client Alpha and BARRA Risk Analysis, with the high-frequency algorithmic trading team in Equity Division.
Client Alpha aimed to systematically produce alpha of a client, which is the excess information shared by a client. I implemented various regression models in Python which could fit a statistically-confident data and predict residual risk accurately for every client. Besides, I had worked on developing a end-to-end internal web application which visualizes the various financial details like Volatility curve, for clients and gives the predicted alpha score. Technologies like C#, AngularJS, Onetick and R were used.
In BARRA Risk Analysis Project, I developed a software which periodically looked for files, updated the database and connected to a web service to compute the ultimate risk of a portfolio for various projects. The entire process had to be optimized to improve the time taken to scrawl across a multitude of files and update over 10M rows at once. The application was scheduled by batch to update whenever there is a modification. The project was developed in C# and its initial scope was researched in R. A Java utility was recreated for the same.
Teaching Assistant (January 2017 - May 2017)
Worked as a Course Assistant in Spring – 2017 under Prof. Hardeep Johar for the MBA courses:
Software Developer (July 2015 - June 2016)
CPConverge is a data analytics start-up which aimed at analyzing the empirical student data based on aptitude tests and their career choices.
As a Developer, I had diverse responsibilities in data analytics, algorithmic modelling, software development and content generation. I also co-headed a small team to come up with a solution for adaptive aptitude test where-in various internal machine learning models where tested and implemented. Some of features of this test were: 100% dynamic test creation from the client side, with choices like, number of sections, type of questions, number of questions in each section, adaptive algorithm, duration of the test, Analysis of the user- performance in the test, including time-based, topic-based suggestions for the user to improve and Full-fledged secure authentication system which used the secure Sendgrid API to send emails for password recovery and registration welcome messages
Student Instructor (August 2013 - August 2014)
Student Assistant for
Developed a C++ library to implement ensemble methods, which are believed to have a upper edge in performance as compared to other standalone applications. Supervised algorithms like logistic regression, SVM, KNN, perceptron where used as the standalone counterparts. The library was tested against two use cases – Viola Jones Face Detection and Cancer Prediction. It was also benchmarked against C++’s MLPack, Java’s Weka and Python’s scikit-learn. The Armadillo implementation of the library proved to supersede the peers in logistic regression and cumulative ensemble method implementation
Built a graph library using templates and concepts, with three separate implementations for directed graph, directed acyclic graph and directed trees. Included functions for graph search and finding minimum cost path between two nodes. The library was tested rigorously such that it does not leak
This application provides an option for any driver to pick up passengers on his way such that the maximum travel time of every passenger is optimized. The discrete combinatorial optimization problem was solved using a newly devised algorithm, which as opposed to many current technologies provides the optimal automated solution to both passengers and drivers. AWS components like DynamoDB, SNS, SQS and Elastic Beanstalk, where used to realize the project
The goal of the application is to collect tweets based on various categories like sports, films, technology, etc., preprocess them, and render the filtered tweets in the map. Twitter Streaming API was used to fetch the tweets from the twitter host in real-time, AWS ElasticSearch to store the filtered tweets in backend and, to show tweets within a certain distance from the point desired by the user, Beanstalk to deploy the application in an auto-scaling environment and Google Maps API for the heatmap.
Scrapped movie reviews along with their ratings from IMDB website and generated a training set with equal number of positive and negative reviews. Stochastic Gradient Descent Classifier and Naïve Bayes where used to train the classifier. An accuracy of 83.6% was achieved for SGD and 74.9% for Naïve Bayes
The lack of ways to automatically generate summaries from videos can make the process of watching surveillance videos tedious and time consuming. It is also human to miss out on the interesting features from such long CCTV tapes. This problem motivated the project, where-in various clustering techniques on sparsely coded videos which was generated using histogram of optical flow and histogram of gradients. The techniques were compared using 3 parameters – processing time, accuracy and length of output video. An average processing time of 1/24 was achieved on videos over length 65 minutes. The final application was awarded as one of the best in the gradschool in 2015
Academic Achievement Award (2012 and 2013), St. Francis Institute of Technology
Co-founder of Agli Mumbai, a non-profit organization in Mumbai
Core Organizer of Eureka, IIT Bombay
Received a certificate for Exemplary Performance in Theory of Computer Science, Discrete Mathematics and Data Structures
Awarded First Prize and Best Speaker for Intra-College Technical Presentation on Big Data Analytics
Ranked within top 1% for Regional Mathematics Olympiad, 2010