In search of the most accurate algorithm!
If you’re just starting out with data science, the Titanic: Machine Learning from Disaster project on Kaggle is one of the best ways to learn Classification Algorithms! In this article, I go through how I applied 7 different classification algorithms on this dataset, and we find out which gives us the best accuracy.
I’ve even submitted the models to the Kaggle competition, so we find out exactly how each model performs on unknown data!
Learning the basic EDA techniques in Machine Learning
Perhaps most of us tried hands-on data analysis through the Titanic: Machine Learning from Disaster Kaggle competition. This was my first data science project as well. Through this article, I go through the various ways we can perform EDA on the given data.
I started this project just one month into learning Data Science, with almost no prior experience with Machine Learning, and just some knowledge from a few online courses that I had been attending.
A simple use-case of scraping text and creating a Heroku app.
Web scraping, also known as web harvesting, is when we extract data from websites. I dived into the world of web scraping through a simple use case of scraping Customer Reviews from India’s largest Online Store — Flipkart.com
If you're only here for the final product, then here is it is —
Just search keywords of what you want to see (example: iPhone SE), and click the search button to load reviews from Flipkart.com
How I scraped images from Google Chrome in seconds!
In this small and simple use-case, we explore how to use Selenium to scrap images from Google Chrome for any keyword (or set of keywords) searched by a user.
An actual Stephenian’s record.
There is no doubt that St. Stephen’s College is one of the most prestigious colleges, not just in Delhi, but in India. You may have come across people talking about the high cut-off percentages, the interview process and the low acceptance rate. As a “Stephenian” myself, I’ll tell you about my journey.
Quoting Wikipedia :
St. Stephen’s generally receives around 30,000 applicants for 400 seats each year leading to an incredibly low admit rate of 1.33%.
Also, about 50% of these seats are reserved for Christian students, making it even more competitive compared to other DU…
Exploring some of the ways linear algebra proves to be an important component while making any data science model!
As Wikipedia defines it: linear algebra is the branch of mathematics concerning linear equations such as:
Challenges faced by consumers and retailers amongst the crisis, depicted with data!
Since the outbreak of the pandemic, almost all retailers have had to adapt to the subtle or direct changes that have been inflicted on them and on their business. Except for those dealing with essential goods and services, all retailers have had to either completely halt their business, or operate with limited staff, deteriorating sales, and strict government restrictions.
Most retailers have had to think about how they can cope with the short term challenges that COVID-19 has brought, but also plan for its long-term impact. Even today…
If you've been playing with data, you may have surely come across outliers! But what exactly is an outlier?
Wikipedia Definition :
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error.
Clearly, an outlier is something that doesn't match with most of our observations. For example, let's say we are measuring the heights of students in a class. Consider the data as follows:
heights = [150,158,156,162,147,164,157,256,150]
When we plot these values, we find out that something seems off!
As we dive into the world of “Unsupervised” Machine Learning, we will encounter problems that would require us to cluster the data available to us. This means we have to divide the data into clusters based on their level of similarity. K-Means Clustering allows us to do just that.
As the name suggests, the algorithm makes use of the “means” of the data to cluster them. Here “K” is just the number of clusters we want our data to be divided in. …
As we dive into the world of Machine Learning and Data Science, one of the easiest and fun ways is to explore the various machine learning algorithms. They can be intimidating, especially if you’re just starting out. One of the simplest algorithms that we can explore with very basic knowledge of data science is the Linear Regression algorithm.
Linear Regression is mainly used in predicting continuous values. We deal with 2 kinds of variables: