Projects

Anime recommendation System Permalink

April 10, 2023

The goal of the recommendation system is to spread the fun of anime to more people. We want to encourage the competition of the animation industry, by using a ranking system throughout different anime. The motivation behind the dive into this particular topic is based on our group’s personal interests in the different anime genres. this system can reach audiences that are quite familiar with a variety of different anime series or those looking into what different series would align with their interests. Members may be able to obtain a recommendat We have information related to the genre, ratings, synopsis, episodes, production studio, etc at our disposal. We aim to design a recommendation system that recommends the next anime show that a user should watch. We perform data exploration to find the important features (In our case, Score and Rank). We used the ‘hybrid’ recommender model approach that weighs different methods based on popularity, randomness and recommendation. We also try to incorporate the above mentioned features into our recommender model. The results for our method are provided in the report. Code

Predicting Corrosion in Surface Coatings (PPG Paints) Permalink

December 01, 2022

The final project for my machine learning course was sponsored by PPG Industries which is a fortune 500 Company. Surface coatings play a role in the manufacturing of products from common household goods, eyeglasses, buildings, cars, planes, and more! Coatings prevent corrosion and prolong the useful life of industrial materials, components, and machinery. Without a properly designed surface coating, the materials we interact with would not last as long as we want them to! Properly designing coating materials requires experts in chemistry and chemical engineering, materials science, manufacturing, experimental design, and more. Experiments are performed to find the optimal material chemistry and manufacturing process conditions that minimize the amount of corroded surface after a test. Machine learning models are trained using historical data to predict material performance. The trained machine learning models are then used to find the optimal constituents and process settings to minimize corrosion. This project aims to do this by performing a comparative study between various Bayesian machine learning techniques to predict the percentage of corrosion on the surface.

Detecting Geospatialness of Prepositions Permalink

June 01, 2022

Spatial relations in natural language are frequently expressed through prepositions. Thus, in the locative expressions “New York in the United States” and “the house on the river” the prepositions “in” and “on” respectively serve to communicate the relationships in space between the subject and object of the preposition. Automatic detection of the use of prepositions in a spatial and in particular a geo-spatial sense that refers to geographic context is of interest in supporting automated methods for determining the actual geographic location referred to by locative expressions. This work focuses on disambiguation of prepositions in natural language, with the goal of distinguishing whether a preposition is used in a specifically geo-spatial sense. We conduct machine learning experiments that demonstrate the clear benefit for geo-spatial sense detection of using transformer model deep learning methods when compared with a variety of methods, that include Naive Bayes, Support Vector Machine (SVM) and Random Forest classifiers with hand crafted linguistic features, and a bag of words approach with a meta-classifier that adds geo-spatial features. The best performance was obtained with the BERT-based XLNet transfomer model, with a best precision of 0.96 and and an F1 score of 0.94 when evaluated on a corpus of natural language expressions that were annotated for this task. We also conducted experiments to detect generic spatial sense, in which the best the best F1 score, of 0.95, was again obtained with XLNet.Code

Query-Annotation Tool Permalink

November 01, 2020

One of the main challenges of Natural language processing (NLP) is converting unstructured data into a structured format. Structured data can then in turn be used to create knowledge graphs, train other machine learning models, etc. A widely used method for this is Named Entity Recognition (NER). It involves the identification and extraction of some particular entities of interest in the text. In our case we had a corpus of 600 research texts regarding the different types of coatings applied on Steel. Our task was to populate a domain model consisting of predefined entities like ingredients used, there quantities, the conditions under which the steel coating process took place, coating type, substrate, etc. We started by creating our own corpus to train a NER model that could identify some of the basic entities like occurrences of molecules, processes, conditions, actions and quantities. Using the trained model we implemented an Annotation tool cum Search tool where one could perform queries, that would perform search on the database and return focused results. Code

C language parser Permalink

March 01, 2020

This repository contains the implementation of a lex based semantic analyser capable of parsing the C language.

Packet Sniffer Permalink

October 01, 2019

The project is a program to get access to the data flowing through your router/ethernet. Any kind of data can be sniffed from the network.

Abhibha Gupta