Updated Jan 11 2024

Data Engineer Intern @ New Engen 💿 | 2x GSoC @ MetaBrainz ☀️ | Open Source 🧑‍💻 | Python Dev 🐍 | IIT Madras 🧠

📧 Email | 👔 Linkedin | 🐦 Twitter | 💻 GitHub


💫 About Me

Hi, I am Prathamesh, an aspiring Data Engineer & Analyst based in Pune with a sheer obsession for data, music, computers, and open-source software.

Back in 4th grade, I started tinkering around with computers and developed a severe passion for technology, making it a highlight of most of my teen life. 6 years ago I developed a similar passion for making & consuming insane amounts of music. Given my 3+ years of experience in Music Production, playing Piano, & Audio Engineering under my artist alias “SNÆK” & a lifelong love for computers, my passion for the world of music and technology has now convolved into a passion for Data and AI involving various audio Technologies!

I’ve 5+ months of experience, working as a Google Summer of Code ‘22 contributor @MetaBrainz where I’ve optimized, transformed, and cleaned 600+GBs of data and created various dashboards, ran benchmarks, and generated reports - aiding in the successful cleanup of the massive Music Listening Histories Dataset! I also love teaching and have worked as a Machine Learning Teaching Assistant at GHRCEM, Pune - instructing 70+ sophomores about the practical applications of ML.

And as a 3rd-year undergrad, I am currently pursuing a BS in Data Science and applications at IIT Madras, and a BTech. in AI at GHRCEM Pune, where my friends know me for my enthusiasm & weird sense of humor, and my teachers know me for my perseverance and reliability.


🎯 Skills:

  • Domain: Data Engineering, Data Analytics, Web Scraping, Python Development, API Scraping, Machine Learning, Web Automation.
  • Languages and Tools: Python, SQL, Linux, Git, Docker, Shell Scripting, MS Excel.
  • Frameworks: PostgreSQL, BigQuery, GitHub, Flask, BeautifulSoup, Pandas, Numpy, scikit-learn, Apache Arrow, Numba Multiprocessing, Mechanize.
  • Dashboarding & Visualization: Tableau, Matplotlib, Plotly, Seaborn, Hugo, HTML, CSS.
  • Cloud: Microsoft Azure - App Service, Linux VM Compute, ML Workspace | GCP - BigQuery
  • Soft Skills: Leadership, Team Management, Creative Writing, Good Sense of Humor.
  • Audio/Music Production & Mixing: FL Studio, Pro Tools Audacity.
  • Design/Editing: Adobe Photoshop, Da Vinci Resolve, Canva.
  • Social Media/Marketing: 3+ years of Business Social Media Handling Linkedin, Twitter, Instagram, GitHub, etc - using toneden.io & later.com.
  • Languages: English, Marathi, Hindi (Proficient) | Japanese (Elementry), Russian (Elementry).
  • Interests: Technology, Music Production, Playing Piano, Reading, Cats.

🔬 Work Experience

Data Engineer Intern | New Engen Inc. [Dec '23 - Current]

  • Working with New Engen Inc to build Data Pipelines and transformations to handle massive amounts of Marketing Data for clients like Jockey, Google Fiber, Home Depot and more!
  • Tech Stack: Python, dbt, Google Cloud Platform, BigQuery, Airflow, Adverity, SalesForce Intelligence (Datorama), Git, etc.

Contributor (Data Engineer & Analyst) | Google Summer of Code 2023 @ Metabrainz [May '23 - Nov 2023]

  • Conducted Big Data Analytics using SPARQL, SQL, Pandas (Python) to the explore highest quality and quantity data sources from Wikidata & Geonames.
  • Orchestrated mission-critical infrastructure to automate the extraction, transformation, and loading of mission-critical data, eliminating 90% manual data feeding processes for areas in MusicBrainz.
  • Architected and executed an end-to-end data pipeline leveraging Python, PostgreSQL, and Shell Scripting to seamlessly synchronize metadata for 500k+ entities between Wikidata and MusicBrainz.
  • Established CI/CD pipelines, deployed services through Docker, devised comprehensive test suites with Pytest, and meticulously managed documentation.
  • Tech Stack: Python, SQL (PostgreSQL), SPARQL, Pandas, Git, Docker, Shell Scripting.
  • Domain: Data Engineering, Data Analytics.
  • Project Summary: https://blog.metabrainz.org/?p=11035

Contributor (Data Engineer & Analyst) | Google Summer of Code 2022 @ Metabrainz [May '22 - Oct 2022]

  • Executed various Data Engineering functions, employing high-performance Python and SQL scripts with PostgreSQL, Pandas, Apache Arrow, and Numba to optimize & transform the Music Listening Histories Dataset.
  • Led the overhaul of 611.39 GB (27 billion rows) of music streaming data, originating from 583k+ last.fm users.
  • Significantly enhanced Data-Lake efficiency by reducing storage size by 53% and improving read/write speeds by 9%.
  • Streamlined Data Analytics and Visualization, created Dashboards, conducted Benchmarking, and handled Report Generation for the ”ListenBrainz” project in collaboration with various teams at the MetaBrainz Foundation.
  • Tech Stack: Python, SQL (PostgreSQL), Pandas, Apache Arrow, Matplotlib, Git, Shell Scripting.
  • Domain: Data Engineering, Data Analytics.
  • Project Summary: https://blog.metabrainz.org/?p=9785

Teaching Assistant (Machine Learning) | Dept. of Artificial Intelligence, GHRCEM Pune [March '22 - May 2023]

  • Conducted hands-on Machine Learning, Data Processing, and Data Visualization sessions for 70+ sophomore students at the Dept. of AI (GHRCEM).
  • Introduced students to Machine Learning concepts like Linear Regression, Naive Bayes (incl. Text Classification), KNN, and Support Vector Machines, etc.
  • Tech Stack: Python, scikit-learn, Pandas, Numpy, Matplotlib, and Seaborn.
  • Domain: Machine Learning.

🏫 Education:

BS. Data Science and Applications | Indian Institute of Technology, Madras [2021 - 2025]

  • An off-campus 4-year degree program in Applied Data Science.
  • SGPA: 8.24

Btech. Artificial Intelligence | G.H. Raisoni College of Engineering & Management, Pune [2020 - 2024]

  • An on-campus undergraduate 4-year degree program in Artificial Intelligence & Computer Science.
  • SGPA: 8.78 [Feb 2021 - Current]

🏗️ Projects:

lastfm-scraper | 🔗 Project Demo | 🔗 Codebase

  • Last.fm offers services that allow users to track their music listening histories across multiple streaming services. However, I realized there was no way to download my own listening history for my own use! Inspired by Spotify Wrapped, I was hell-bent on utilizing this data for exciting analytics and machine learning applications. This is why I developed my own scraper to scrape my listening history by leveraging the last.fm API. The goal of this project is to scrape, process & deliver user data into accessible formats like CSV and JSON.
  • Tech Stack: Python, Flask, Pandas, Git, MS Azure, HTML/CSS/JS, REST APIs, GitHub Actions.
  • Skills used: Data Wrangling, Data Cleaning, Cloud, CI/CD.

Document Topic Modelling | 🔗 Codebase

  • A simple interactive commandline utility to classify text into pre-defined topics using Machine Learning (NLP). This project is based on the LDA (Latent Dirichlet Allocation) model, and built using Python, Scikit-learn, and Gensim.
  • Tech Stack: Python, Git, Scikit-learn, Gensim, Rich.
  • Skills used: Machine Learning, Natural Language Processing.

Portfolio Site | 🔗 Project Demo | 🔗 Codebase

  • This portfolio site was custom-built with clean looks and minimalism kept in mind. I used Hugo, a static site generator to write the site contents in Markdown for better, distraction-free maintenance. Even the site rendering and hosting are automated with a simple CI/CD pipeline built using GitHub Actions. The site is finally hosted on GitHub pages.
  • Tech Stack: Hugo, Git, GitHub Actions.
  • Skills Used: Web Development, CI/CD.
  • A simple and elegant tableau dashboard that visualizes my monthly financial spending habits. For this project, I fetched data from my personal spreadsheet based budget tracking system hosted on notion.
  • Tech Stack: Tableau, MS Excel.
  • Skills used: Data Visualization, Dashboarding.

🏆 Achievements

  • Amongst 967 globally selected candidates out of 43,765 applicants in GSoC 2023.
  • 2x Speaker at IIT Madras Student Placement Council about Open Source and GSoC, engaging and educating 3k+ students about OSS.
  • Elected as President of the Student’s Association of Artificial Intelligence, GHRCEM Pune.
  • Elected as Vice-President of the IEEE Student’s Chapter, GHRCEM Pune.
  • Wrote a GSoC guide blog with 35k+ Linkedin impressions and 3.7k+ views.
  • Organized multiple college events with 200+ attendees each, achieving an average event rating of 4.59/5.00.
  • Represented the “Hadar Cluster” (South East Asia) at IEEE Asia Pacific’s CLAP (2021) program.

🤝 Volunteering

President | Student’s Association of Artificial Intelligence, GHRCEM, Pune [Nov '21 - Mar 2022]

  • Operated Human Resources, Planning, and Execution for all events at the Department of AI, GHRCEM, Pune.
  • Hosted events and workshops like “Tech Talks 1.0: Biostatistics w/ Mr. Shariq Mohammed, Boston University”, and “YOU 2.0: The complete personality upliftment program” with 200+ attendees and 4.59/5.00 average event ratings.

Co-Chair | IEEE Student’s Chapter, GHRCEM Pune [Mar '21 - Feb 2023]

  • Hosted several flagship events at IEEE Pune Section - like IEEE CODE-STROM [2022], EAC Funded Cloud and Data Engineering Workshop [2022]
  • Represented the “Hadar Cluster” (South East Asia) at IEEE Asia Pacific’s CLAP [2021] program.

Speaker and Project Lead | Ek Bharat Shrestha Bharat Club, GHRCEM, Pune [Sep '21 - Nov 2021]

  • Designed and presented 5+ inter-state presentations to Aryan Institute of Technology, Bhubaneshwar, Odisha; while Representing GH Raisoni College of Engineering and Management Pune, Maharashtra.

Vice President - Music Club, GH Raisoni College of Engineering & Management, Pune [Aug 2021 - Nov 2021]

  • Operated Human Resources, Planning, and execution for 6+ introductory and jamming sessions.

⌨️ Blogs:

📜 Certificates:


🎹 Hobbies

  • In my free time, I like playing Piano 🎹, and Producing Music 🎧 under my artist alias SNÆK.
  • I also LOVE listening to music whenever I can! Check out my streams here: last.fm - snaekboi My Last.fm
  • Been trying to get into books as well! My favorites are “The Subtle Art of Not Giving a F*ck” by Mark Manson, and “Tokyo Ghoul” by Sui Ishida.