-->
Currently working on research project in AI and Game Theory for Public Safety and Security with CAIS research group under Prof. Milind Tambe. Core responsibilities include software development of CAIS research projects, management of teamcore websites cais.usc.edu and teamcore.usc.edu and writing code with respect to the publication pages.
October 2017 - Till DateWorked as a part of Security Technology and Response group at Symantec. Core responsibilities were to increase the Malware and Grayware coverage and also to carry out various efficacy improvement and research activities.
June 2015 - July 2017Worked on a project named Fulfillment by Amazon Planning & Automation a.k.a FBAPA. Developed Seller selection, Forecast Override and Backend services for FBAPA portal which was used to recommend quantities of items to the sellers. Authored hive scripts for normalizing the demands of the customers.
January 2014 - July 2014A web service with a RESTful API using Node.js. The web service is used to query user data. The data is stored in mongoDB using node.js. It has the following endpoints - GET /users, GET /users/{username}, POST /users/{username}, UPDATE /users/{username}, DELETE /users/{username} for storing and retrieving user information.
The code is an implementation of Hidden Markov Model part-of-speech tagger for English, Hindi and Chinese language. The training data is tokenized and tagged; the test data is also tokenized, and the tagger add the tags to the test data. Add one smoothing is done for unseen words. The tagger accurately classifies 88% and 85% of the text for English and Chinese respectively.
The code is an implementation of perceptron classifiers (vanilla and averaged) to identify hotel reviews as either true or fake, and either positive or negative using the word tokens as features from the text. The vanilla percpetron is 88.59% and averaged percpetron is 88.90% accurate in classifying test data calculated using F1 measure.
Deploying a loadpoint entry is an integral part of installation for every malicious payload. It enables the payload to launch and execute every time the system boots. However, the loadpoint entries are not used as standalone detection entities. Instead, they are only cleaned up by anti-virus software, if the associated files are detected, either in a static scan or based on their behaviour. By identifying unique loadpoint trigrams from an internal telemetry collected over a predefined period and studying their associations with Ground Truth Good and Bad files, low confidence Good and Bad files and Unknown files, as well as honouring their prevalence and age, we were able to successfully validate the idea. Even in its most restricted form, based on the confidence for the disposition for the trigram, the technology is successfully used to either block an attack, prompt the user, or silently submit files and associated telemetry for backend validations. This research work is published in Virus Bulletin Conference 2017.
TIRAMOLA is a cloud-enabled, open-source framework to perform automatic resizing of NoSQL clusters according to user-defined policies. Decisions on adding or removing worker VMs from a cluster are modeled as a Markov Decision Process and taken in real-time. The system automatically decides on the most advantageous cluster size according to user-defined policies, it then proceeds on requesting/releasing VM resources from the provider and orchestrating them inside a NoSQL cluster. As a part of this project deployed private cloud at home network using eucalyptus 3.5 on CentOS 6.0. Installed multi-node Hadoop and HBase cluster over it. Also installed Ganglia for monitoring. TPCC benchmark was set for all the transactions. Neo load was used for load generation and distribution. Implemented Markov's Decision Process to decide at run time whether to add the VM, remove the VM or no change depending upon the load. Technologies used are JAVA, Cloud services, Hadoop, HBase, Machine learning algorithms.
Resolution is a method of inference leading to a refutation theorem-proving technique for sentences in propositional logic and first-order logic(FOL). Iteratively applying the resolution rule along with unification in a suitable way allows for inferencing whether a CNF statement is satisfiable or unsatisfiable. Attempting to prove a satisfiable FOL statement as unsatisfiable may result in a nonterminating computation. Resolution works on statements in Conjunctive Normal Form(CNF). Resolution is a complete and sound inference procedure because it works on CNF which is universal. Implemented inference Engine for Resolution, a proof by contradiction for Conjunctive Normal Form in pure python.
This project is a development of an AI game to test how humans respond to deception. A scenario is setup in which the user plays an inside attacker at a company and needs to access different computers to get points. Before accessing each computer, the user has a chance to check it for a "Monitored" alert. This alert states that a security analyst is monitoring who accesses the computer, but it might not always be true. That is, the analysts may send the message even if they aren't actually monitoring the computer, and this deception is based on underlying optimal probabilities from a machine learning algorithm developed by CAIS group. Technologies used are Machine Learning, Data Mining, JavaScript, HTML, CSS, JAVA.
Developed a model in Pig for weather forecasting. Took 50 years of weather data for Pune city in India and implemented a big data model based on it. The model could 99.6% accurately forecast the weather for next days. Technologies used are Big Data, Hadoop, Pig.
Designed and modeled new detection technology to heuristically detect already seen malicious files belonging to non-Ground Truth, non-prevalent bands by current detections and data from telemetry at Symantec. Technologies used are Python, SQL, Malware Analysis and Debugging.
Implemented time constrained AI game using minimax algorithm with alpha beta pruning in pure python. Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player games. It stops completely evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move.
Profile based system for String Analysis and Categorization is a system capable of classifying a string being Malicious or Clean based on string signatures and emulator dumps. Created training data set of pre-existing malicious and clean files by extracting strings, normalizing them using using pre-defined set of rules and values to create a string and segment profile. The system was tested on unseen clean and malicious files and was found to classify files accurately with 5% FP rate. The system was then used to automatically author string signatures. Technologies used are Python, MSSQL, Malware Analysis and Debugging.
Fulfillment by Amazon Planning & Automation(FBAPA) is program designed for recommending quantities of items to be supplied by sellers. The major focus of this project is to automate procurements for Indian Marketplace. Various services are developed to come up with appropriate recommended quantities for each seller. As a part of this project, I developed Seller Selection service for selecting sellers for particular item based on the previous data of sellers. I also developed Forecast Override Service for India which took the actual forecast from ISMs. This includes the reading of forecast from Excel and putting it into S3 bucket, calculating the custom forecast using hive scripts and returning the forecast for further calculations. I also integrated with Space Robot Team and developed Backend service for the FBAPA Portal. Technologies used are JAVA, Spring, Hadoop, Hive, AWS services like DynamoDB and S3.
The code is an implementation of traditional N-Queens problem with obstacles using 3 algorithms namely Depth First Search, Breadth First Search & Simulated Annealing. If an obstacle is present in a row of the board, the queen can be placed at either side of the obstacle without attacking each other. The code is a pure python implementation of basic search algorithms in Artificial Intelligence.
Submitted for Patent
April 2017Submitted for Patent
July 2016Received 4 awards of “Passion and Focus” for work done on multiple projects in Symantec in a span of 1.5 yrs
Received 1st rank in city and State rank 173 in International Mathematics Olympiad by Science Olympiad Foundation
Volunteered for various Community Services like donating books, arranging school kits at Symantec for various NGOs
© 2018 Prachi Jhanwar