No data scientist is able to work without a comprehensive awareness of Bayesian inference and conditional probability. And so, today, we are going to discuss the exact same with the assistance of applications and examples. More to the point, we are going to discuss how Data Scientist make use of Bayes’ Theorem.
Bayes’ Theorem is the central idea in Data Science. It’s most popular in Machine Learning as a classifier which produces utilization of Naive Bayes’ Classifier. It’s also emerged as an innovative algorithm for the improvement of Bayesian Neural Networks. The uses of Bayes’ Theorem are all over the place within the area of Data Science. Let us initially get an overview of precisely what Bayes’ Theorem is actually.
What is Bayes’ Theorem?
Bayes’ Theorem is actually the fundamental foundation of probability. It’s the dedication of the conditional likelihood of an event. This conditional probability is widely known as a theory. This particular hypothesis is estimated through previous evidence or perhaps knowledge. This conditional probability is actually the likelihood of the occurrence of an occasion, provided that various other occasion has already happened.
The formula of Bayes’ Theorem involves the posterior probability P(H | E)as the product of the probability of hypothesis P(E | H), multiplied by the probability of the hypothesis P(H) and divided by the probability of the evidence P(E).
Let us now understand each term of the Bayes’ Theorem formula in detail –
- P(H | E) – This’s described as the posterior likelihood. Posteriori basically means deriving concept out of provided evidence. It denotes the conditional likelihood of H (hypothesis), provided the evidence E.
- P(E | H) – This particular part of our Bayes’ the likelihood is denoted by Theorem. It’s the conditional likelihood of the occurrence of this evidence, provided the theory. The likelihood of the evidence is calculated by it, given that the assumed hypothesis is true.
- P(H) – This’s described as the previous probability. It denotes the initial likelihood of the theory H being correct before the setup of Bayes’ Theorem. That’s, this particular probability is actually without the involvement of the evidence or the data.
- P(E) – This’s the likelihood of the occurrence of proof regardless of the theory.
Bayes’ Theorem Example
Let us believe a straightforward example to know Bayes’ Theorem. Assume the weather conditions of the day is actually cloudy. Today, you have to understand whether it will rain today, because of the cloudiness of the day time. Thus, you’re meant to compute the likelihood of rainfall, because of the proof of cloudiness.
That is, P(Rain | Clouds), where finding whether it would rain today is the Hypothesis (H) and Cloudiness is the Evidence (E). This is the posterior probability part of our equation.
Now, suppose we know that 60% of the time, rainfall is caused by cloudy weather. Therefore, we have the probability of it being cloudy, given the rain, that is P(clouds | rain) = P(E | H). This is the backward probability where, E is the evidence of observing clouds given the probability of the rainfall, which is originally our hypothesis. Now, out of all the days, 75% of the days in a month are cloudy. This is the probability of cloudiness or P(clouds). Also, since this is a rainy month of the year, it rains usually for 15 days out of 30 days. That is, the probability of hypothesis of rainfall or P(H) is P(Rain) = 15/30 = 0.5 or 50%. Now, let us calculate the probability of it raining, given the cloudy weather.
P(Rain | Cloud) = (P(Cloud | Rain) * P(Rain)) / (P(Cloud))
= (0.6 * 0.5) / (0.75)
= 0.4
Therefore, we find out that there is a 40% chance of rainfall, given the cloudy weather.
After understanding Bayes’ Theorem, let us understand the Naive Bayes’ Theorem. The Naive Bayes’ theorem is an implementation of the standard theorem in the context of machine learning.
Naive Bayes Theorem
Naive Bayes is actually an effective supervised learning algorithm that’s used for classification. The Naive Bayes classifier is actually an extension of above discussed regular Bayes Theorem. In a Naive Bayes, the probability contributed by every aspect is calculated by us. Most we make use of it in textual classification activities as spam filtering. Let us realize just how Naive Bayes calculates the likelihood contributed by all of the elements.
Assume that, as being a data scientist, you’re tasked with creating a spam filter. You’re supplied with a summary of spam keywords like as
- Free
- Discount
- Full Refund
- Urgent
- Weight Loss
Nevertheless, the business you’re working with is actually a solution finance company. Thus, several of the vocabulary happening in the spam mails is actually utilized in the mails of the company of yours. Several of these words are –
- Important
- Free
- Urgent
- Stocks
- Customers
You also have the probability of word usages in spam messages and company emails.
Spam Email | Company Email |
Free (0.3) | Important (0.5) |
Discount (0.15) | Free (0.25) |
Full Refund (0.1) | Urgent (0.1) |
Urgent (0.2) | Stocks (0.5) |
Weight Loss (0.25) | Customers (0.1) |
Suppose you obtain have a message “Free trials for weight loss program. Become members at a discount.” Is this message spam or a company email? Calculating the probability of the components occurring in the sentence – Free (0.4) + Weight Loss (0.25) + Discount (0.15) = 0.8 or 80%
Whereas, calculating the probability of it being an email from your company = Free (0.25) = 0.25 or 25%.
Therefore, the probability of the mail being spam is much higher than a company email.
A Naive Bayes Classifier selects the outcome of the highest probability, which in the above case was the feature of spam. The Naive Bayes is referred to as ‘naive’ because it assumes the features to be independent of each other. The features in our example were the input words that are present in the sentence.
The conditional freedom among all the functions provides us the formulation above. The frequency of this occurrence of characteristics from x1 to xdis estimated based upon their relation to the category cj. Together with the previous probability as well as the likelihood of the occurrence of an occasion, we compute the posterior probability through that we’re in a position to find the likelihood of the item belonging to a specific category.
Along with this Bayes’ Theorem, Data Scientists use various different tools. You must check the different tools used by a Data Scientist.
Using Naive Bayes as a Classifier
In this section, we will implement the Naive Bayes Classifier over a dataset. The dataset used in this example is the Pima Indian Dataset which is an open dataset available at the UCI Library.
- In the first step, we import all the libraries that will allow us to implement our Naive Bayes Classifier and help us in wrangling the data.
- import pandas as pd
- from sklearn.model_selection import train_test_split
- from sklearn.metrics import accuracy_score
- from sklearn.naive_bayes import GaussianNB
- We then read the csv file through an online URL using read_csv function provided by the Pandas library.
- data = pd.read_csv(“/home/admin1/DataFlair/Data/diabetes.csv”)
- Then, we proceed to divide our data into dependent variable (Y) and independent variables(X) as follows –
- X = data.iloc[:, 0:8].values
- y = data.iloc[:, 8].values
- Using the head() function provided by the Pandas library, we look at the first five rows of our dataset.
- data.head()
- We then proceed to split our dataset into training and validation sets. We will train our Naive Bayes Classifier on the training set and generate predictions from the test set. In order to do so, we will use the train_test_split() function provided by the sklearn library.
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
- In this step, we apply the Naive Bayes Classifier and more specifically, the Gaussian Naive Bayes Classifier. It is an extension of the existing Naive Bayes Classifier that assumes the likelihood of the features to be Gaussian or normally distributed.
- clf = GaussianNB()
- clf.fit(X_train, y_train)
- In the next step, we generate predictions from our given test sample.
- y_predict = clf.predict(X_test)
- print(y_predict)
- Then, we measure the accuracy of our classifier. That is, we test to see how many values were predicted correctly.
- predictions3 = [np.round(value) for value in y_predict]
- accuracy = accuracy_score(y_test, predictions3)
- print(“Accuracy: %.2f%%” % (accuracy * 100.0))
Therefore, our Naive Bayes Classifier predicted 72.44% of the test-cases successfully.
Applications of Bayes’ Theorem
1. Spam Filtering
The first and foremost application of Naive Bayes is its ability to classify texts and in particular, spam emails from non-spam ones. It is one of the oldest spam filtering methodology, with the Naive Bayes spam filtering dating back to 1998. Naive Bayes takes two classes – Spam and Ham and classifies data accordingly.
2. Sentiment Analysis
It’s an aspect of natural language processing which analyzes in case the information is actually good, neutral or negative. Another terminology for Sentiment Analysis is actually opinion mining. Using Naive Bayes, we are able to classify in case the book is actually negative or positive or even determine what category the sentiment of the individual belongs to.
3. Recommendation Systems
Using Naive Bayes we are able to develop recommendation systems. A suggestion system measures the probability of the individual seeing a film or perhaps not, given the previous watches. It’s additionally employed along with collaborative filtering to filter info for the users.
4. Bayesian Neural Networks
Recently, Bayes’ Theorem has been extended into Deep Learning where it is used to design powerful Bayesian Networks. It is then used in complex machine learning tasks like stock forecasting, facial recognition etc. It is a currently trending topic and has revolutionized the field of deep learning.
Summary
In the end, we conclude that the use of Bayes’ Theorem is for finding the conditional probability of an event. It has several extensions that are used in Data Science such as Naive Bayes, Gaussian Naive Bayes’, Gaussian Neural Networks, etc.
Source: Dataflair team