Sign in

Math and Data enthusiast!

In search of the most accurate algorithm!

Photo by Olisa Obiora on Unsplash

If you’re just starting out with data science, the Titanic: Machine Learning from Disaster project on Kaggle is one of the best ways to learn Classification Algorithms! In this article, I go through how I applied 7 different classification algorithms on this dataset, and we find out which gives us the best accuracy.

I’ve even submitted the models to the Kaggle competition, so we find out exactly how each model performs on unknown data!

Firstly, if you’re going to be using any of these models, you’ll have to clean your data. I’ve gone…

Learning the basic EDA techniques in Machine Learning

Photo by Alwi Alaydrus on Unsplash

Perhaps most of us tried hands-on data analysis through the Titanic: Machine Learning from Disaster Kaggle competition. This was my first data science project as well. Through this article, I go through the various ways we can perform EDA on the given data.

I started this project just one month into learning Data Science, with almost no prior experience with Machine Learning, and just some knowledge from a few online courses that I had been attending.

If you’re only interested in the final Classification Algorithms and how I’ve applied them, then check…

A simple use-case of scraping text and creating a Heroku app.

Web scraping, also known as web harvesting, is when we extract data from websites. I dived into the world of web scraping through a simple use case of scraping Customer Reviews from India’s largest Online Store —

If you're only here for the final product, then here is it is —
Just search keywords of what you want to see (example: iPhone SE), and click the search button to load reviews from

Search Page

How I scraped images from Google Chrome in seconds!

In this small and simple use-case, we explore how to use Selenium to scrap images from Google Chrome for any keyword (or set of keywords) searched by a user.


  • Our program should take any keyword (For Example: “cat”) from the user, along with the number of images needed, and scrap that many images from Google Images on the Chrome browser.
  • The images must be stored in a folder, named after the search term, and should be numbered properly so as to make them easy to access and interpret.

Program Flow

  1. Firstly, our program…

An actual Stephenian’s record.

There is no doubt that St. Stephen’s College is one of the most prestigious colleges, not just in Delhi, but in India. You may have come across people talking about the high cut-off percentages, the interview process and the low acceptance rate. As a “Stephenian” myself, I’ll tell you about my journey.

Image via:

Quoting Wikipedia :

St. Stephen’s generally receives around 30,000 applicants for 400 seats each year leading to an incredibly low admit rate of 1.33%.

Also, about 50% of these seats are reserved for Christian students, making it even more competitive compared to other DU…

Exploring some of the ways linear algebra proves to be an important component while making any data science model!

What is Linear Algebra?

As Wikipedia defines it: linear algebra is the branch of mathematics concerning linear equations such as:

Challenges faced by consumers and retailers amongst the crisis, depicted with data!


Since the outbreak of the pandemic, almost all retailers have had to adapt to the subtle or direct changes that have been inflicted on them and on their business. Except for those dealing with essential goods and services, all retailers have had to either completely halt their business, or operate with limited staff, deteriorating sales, and strict government restrictions.

Most retailers have had to think about how they can cope with the short term challenges that COVID-19 has brought, but also plan for its long-term impact. Even today…

If you've been playing with data, you may have surely come across outliers! But what exactly is an outlier?

Wikipedia Definition :

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error.

Clearly, an outlier is something that doesn't match with most of our observations. For example, let's say we are measuring the heights of students in a class. Consider the data as follows:

heights = [150,158,156,162,147,164,157,256,150]

When we plot these values, we find out that something seems off!

As we dive into the world of “Unsupervised” Machine Learning, we will encounter problems that would require us to cluster the data available to us. This means we have to divide the data into clusters based on their level of similarity. K-Means Clustering allows us to do just that.

As the name suggests, the algorithm makes use of the “means” of the data to cluster them. Here “K” is just the number of clusters we want our data to be divided in. …

Eshita Goel

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store