Updated Jan 11 2024
Data Engineer Intern @ New Engen 💿 | 2x GSoC @ MetaBrainz ☀️ | Open Source 🧑💻 | Python Dev 🐍 | IIT Madras 🧠
📧 Email | 👔 Linkedin | 🐦 Twitter | 💻 GitHub
💫 About Me
Hi, I am Prathamesh, an aspiring Data Engineer & Analyst based in Pune with a sheer obsession for data, music, computers, and open-source software.
Back in 4th grade, I started tinkering around with computers and developed a severe passion for technology, making it a highlight of most of my teen life. 6 years ago I developed a similar passion for making & consuming insane amounts of music. Given my 3+ years of experience in Music Production, playing Piano, & Audio Engineering under my artist alias “SNÆK” & a lifelong love for computers, my passion for the world of music and technology has now convolved into a passion for Data and AI involving various audio Technologies!
I’ve 5+ months of experience, working as a Google Summer of Code ‘22 contributor @MetaBrainz where I’ve optimized, transformed, and cleaned 600+GBs of data and created various dashboards, ran benchmarks, and generated reports - aiding in the successful cleanup of the massive Music Listening Histories Dataset! I also love teaching and have worked as a Machine Learning Teaching Assistant at GHRCEM, Pune - instructing 70+ sophomores about the practical applications of ML.
And as a 3rd-year undergrad, I am currently pursuing a BS in Data Science and applications at IIT Madras, and a BTech. in AI at GHRCEM Pune, where my friends know me for my enthusiasm & weird sense of humor, and my teachers know me for my perseverance and reliability.
🎯 Skills:
- Domain: Data Engineering, Data Analytics, Web Scraping, Python Development, API Scraping, Machine Learning, Web Automation.
- Languages and Tools: Python, SQL, Linux, Git, Docker, Shell Scripting, MS Excel.
- Frameworks: PostgreSQL, BigQuery, GitHub, Flask, BeautifulSoup, Pandas, Numpy, scikit-learn, Apache Arrow, Numba Multiprocessing, Mechanize.
- Dashboarding & Visualization: Tableau, Matplotlib, Plotly, Seaborn, Hugo, HTML, CSS.
- Cloud: Microsoft Azure - App Service, Linux VM Compute, ML Workspace | GCP - BigQuery
- Soft Skills: Leadership, Team Management, Creative Writing, Good Sense of Humor.
- Audio/Music Production & Mixing: FL Studio, Pro Tools Audacity.
- Design/Editing: Adobe Photoshop, Da Vinci Resolve, Canva.
- Social Media/Marketing: 3+ years of Business Social Media Handling Linkedin, Twitter, Instagram, GitHub, etc - using toneden.io & later.com.
- Languages: English, Marathi, Hindi (Proficient) | Japanese (Elementry), Russian (Elementry).
- Interests: Technology, Music Production, Playing Piano, Reading, Cats.
🔬 Work Experience
Data Engineer Intern | New Engen Inc. [Dec '23 - Current]
- Working with New Engen Inc to build Data Pipelines and transformations to handle massive amounts of Marketing Data for clients like Jockey, Google Fiber, Home Depot and more!
- Tech Stack: Python, dbt, Google Cloud Platform, BigQuery, Airflow, Adverity, SalesForce Intelligence (Datorama), Git, etc.
Contributor (Data Engineer & Analyst) | Google Summer of Code 2023 @ Metabrainz [May '23 - Nov 2023]
- Conducted Big Data Analytics using SPARQL, SQL, Pandas (Python) to the explore highest quality and quantity data sources from Wikidata & Geonames.
- Orchestrated mission-critical infrastructure to automate the extraction, transformation, and loading of mission-critical data, eliminating 90% manual data feeding processes for areas in MusicBrainz.
- Architected and executed an end-to-end data pipeline leveraging Python, PostgreSQL, and Shell Scripting to seamlessly synchronize metadata for 500k+ entities between Wikidata and MusicBrainz.
- Established CI/CD pipelines, deployed services through Docker, devised comprehensive test suites with Pytest, and meticulously managed documentation.
- Tech Stack: Python, SQL (PostgreSQL), SPARQL, Pandas, Git, Docker, Shell Scripting.
- Domain: Data Engineering, Data Analytics.
- Project Summary: https://blog.metabrainz.org/?p=11035
Contributor (Data Engineer & Analyst) | Google Summer of Code 2022 @ Metabrainz [May '22 - Oct 2022]
- Executed various Data Engineering functions, employing high-performance Python and SQL scripts with PostgreSQL, Pandas, Apache Arrow, and Numba to optimize & transform the Music Listening Histories Dataset.
- Led the overhaul of 611.39 GB (27 billion rows) of music streaming data, originating from 583k+ last.fm users.
- Significantly enhanced Data-Lake efficiency by reducing storage size by 53% and improving read/write speeds by 9%.
- Streamlined Data Analytics and Visualization, created Dashboards, conducted Benchmarking, and handled Report Generation for the ”ListenBrainz” project in collaboration with various teams at the MetaBrainz Foundation.
- Tech Stack: Python, SQL (PostgreSQL), Pandas, Apache Arrow, Matplotlib, Git, Shell Scripting.
- Domain: Data Engineering, Data Analytics.
- Project Summary: https://blog.metabrainz.org/?p=9785
Teaching Assistant (Machine Learning) | Dept. of Artificial Intelligence, GHRCEM Pune [March '22 - May 2023]
- Conducted hands-on Machine Learning, Data Processing, and Data Visualization sessions for 70+ sophomore students at the Dept. of AI (GHRCEM).
- Introduced students to Machine Learning concepts like Linear Regression, Naive Bayes (incl. Text Classification), KNN, and Support Vector Machines, etc.
- Tech Stack: Python, scikit-learn, Pandas, Numpy, Matplotlib, and Seaborn.
- Domain: Machine Learning.
🏫 Education:
BS. Data Science and Applications | Indian Institute of Technology, Madras [2021 - 2025]
- An off-campus 4-year degree program in Applied Data Science.
- SGPA: 8.24
Btech. Artificial Intelligence | G.H. Raisoni College of Engineering & Management, Pune [2020 - 2024]
- An on-campus undergraduate 4-year degree program in Artificial Intelligence & Computer Science.
- SGPA: 8.78 [Feb 2021 - Current]
🏗️ Projects:
lastfm-scraper | 🔗 Project Demo
| 🔗 Codebase
Last.fm
offers services that allow users to track their music listening histories across multiple streaming services. However, I realized there was no way to download my own listening history for my own use! Inspired by Spotify Wrapped, I was hell-bent on utilizing this data for exciting analytics and machine learning applications. This is why I developed my own scraper to scrape my listening history by leveraging the last.fm API. The goal of this project is to scrape, process & deliver user data into accessible formats like CSV and JSON.- Tech Stack: Python, Flask, Pandas, Git, MS Azure, HTML/CSS/JS, REST APIs, GitHub Actions.
- Skills used: Data Wrangling, Data Cleaning, Cloud, CI/CD.
Document Topic Modelling | 🔗 Codebase
- A simple interactive commandline utility to classify text into pre-defined topics using Machine Learning (NLP). This project is based on the
LDA (Latent Dirichlet Allocation) model
, and built using Python, Scikit-learn, and Gensim. - Tech Stack: Python, Git, Scikit-learn, Gensim, Rich.
- Skills used: Machine Learning, Natural Language Processing.
Portfolio Site | 🔗 Project Demo
| 🔗 Codebase
- This portfolio site was custom-built with clean looks and minimalism kept in mind. I used
Hugo
, a static site generator to write the site contents in Markdown for better, distraction-free maintenance. Even the site rendering and hosting are automated with a simple CI/CD pipeline built usingGitHub Actions
. The site is finally hosted onGitHub pages
. - Tech Stack: Hugo, Git, GitHub Actions.
- Skills Used: Web Development, CI/CD.
Monthly Budget Tableau Dashboard | 🔗 Project Demo
- A simple and elegant tableau dashboard that visualizes my monthly financial spending habits. For this project, I fetched data from my personal spreadsheet based budget tracking system hosted on notion.
- Tech Stack: Tableau, MS Excel.
- Skills used: Data Visualization, Dashboarding.
🏆 Achievements
- Amongst 967 globally selected candidates out of 43,765 applicants in GSoC 2023.
- 2x Speaker at IIT Madras Student Placement Council about Open Source and GSoC, engaging and educating 3k+ students about OSS.
- Elected as President of the Student’s Association of Artificial Intelligence, GHRCEM Pune.
- Elected as Vice-President of the IEEE Student’s Chapter, GHRCEM Pune.
- Wrote a GSoC guide blog with 35k+ Linkedin impressions and 3.7k+ views.
- Organized multiple college events with 200+ attendees each, achieving an average event rating of 4.59/5.00.
- Represented the “Hadar Cluster” (South East Asia) at IEEE Asia Pacific’s CLAP (2021) program.
🤝 Volunteering
President | Student’s Association of Artificial Intelligence, GHRCEM, Pune [Nov '21 - Mar 2022]
- Operated Human Resources, Planning, and Execution for all events at the Department of AI, GHRCEM, Pune.
- Hosted events and workshops like “Tech Talks 1.0: Biostatistics w/ Mr. Shariq Mohammed, Boston University”, and “YOU 2.0: The complete personality upliftment program” with 200+ attendees and 4.59/5.00 average event ratings.
Co-Chair | IEEE Student’s Chapter, GHRCEM Pune [Mar '21 - Feb 2023]
- Hosted several flagship events at IEEE Pune Section - like IEEE CODE-STROM [2022], EAC Funded Cloud and Data Engineering Workshop [2022]
- Represented the “Hadar Cluster” (South East Asia) at IEEE Asia Pacific’s CLAP [2021] program.
Speaker and Project Lead | Ek Bharat Shrestha Bharat Club, GHRCEM, Pune [Sep '21 - Nov 2021]
- Designed and presented 5+ inter-state presentations to Aryan Institute of Technology, Bhubaneshwar, Odisha; while Representing GH Raisoni College of Engineering and Management Pune, Maharashtra.
Vice President - Music Club, GH Raisoni College of Engineering & Management, Pune [Aug 2021 - Nov 2021]
- Operated Human Resources, Planning, and execution for 6+ introductory and jamming sessions.
⌨️ Blogs:
📜 Certificates:
🔗 Supervised Machine Learning & Classification in Python
| Coursera [Jul 2022]🔗 Introduction to Data Science in Python
| Coursera [Sep 2021]🔗 Data Collection and Processing with Python
| Coursera [Feb 2022]🔗 Applied Plotting, Charting & Data Representation in Python
| Coursera [Oct 2021]🔗 Applied Text Mining in Python - Coursera
| Coursera [Jan 2022]🔗 Regular Expressions in Python
| Coursera [Dec 2021]🔗 Data Manipulation with Pandas
| Kaggle [Sep 2021]🔗 Data Cleaning with Python
| Kaggle[Oct 2021]🔗 Git from Basics to Advanced: Practical Guide for Developers
| Udemy [Jul 2021]🔗 Python Basics
| Coursera [Feb 2022]🔗 Python Functions, Files, and Dictionaries
| Coursera [Jan 2022]🔗 Python Classes and Inheritance
| Coursera [Feb 2022]🔗 Working with BigQuery
| Coursera [Nov 2021]
🎹 Hobbies
- In my free time, I like playing Piano 🎹, and Producing Music 🎧 under my artist alias
SNÆK
. - I also LOVE listening to music whenever I can! Check out my streams here:
last.fm - snaekboi
- Been trying to get into books as well! My favorites are “The Subtle Art of Not Giving a F*ck” by Mark Manson, and “Tokyo Ghoul” by Sui Ishida.