Introduction to Machine Learning Using Python

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can change when exposed to new data. In this article, you will be taught basics of machine learning, and implementation of a simple machine learning algorithm using python.

Setting up the environment

Python community has developed many modules to help programmers implement machine learning. In this article, you will be using numpy, scipy and scikit-learn modules which you need to install using cmd
command:
pip install numpy scipy scikit-learn
A better option would be downloading miniconda or anaconda packages for python, which come prebundled with these packages. Follow the instructions given here to use anaconda.

Machine learning overview

Machine learning involves computer to get trained using a given dataset, and use this training to predict the properties of a given new data. For example, we can train computer by showing it 1000 images of cats, 1000 more images which are not of a cat, and tell each time to computer whether a picture is cat or not. Then if we show the computer a new image, computer should be able to tell whether this new image is cat or not.
Process of training and prediction involves use of specialised algorithms. We feed the training data to an algorithm, and the algorithm uses this training data to give predictions on a new test data. One such algorithm is K-Nearest-Neighbor classification (KNN classification). It takes a test data, and finds k nearest data values to this data from test data set. Then it selects the neighbor of maximum frequency and gives its properties as the prediction result. For example if the training set is:

petal_size flower_type
1 a
2 b
1 a
2 b
3 c
4 d
3 c
2 b
5 a

Now we want to predict flower type for petal of size 2.5 cm. So if we decide no. of neighbors (K)=3, we see that the 3 nearest neighbors of 2.5 are 1, 2 and 3. Their frequencies are 2, 3 and 2 respectively. Therefore the neighbor of maximum frequency is 2 and flower type corresponding to it is b. So for a petal of size 2.5, the prediction will be flower type b.

Implementing Knn- classification algorithm using Python

Here is a python script which demonstrates knn classification algorithm. Here we use the famous iris flower dataset to train the computer, and then give a new value to the computer to make predictions about it. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features are measured from each sample: the length and the width of the sepals and petals, in centimetres. We train our program using this dataset, and then use this training to predict species of a iris flower with given measurements. For more info on this dataset, refer this link: Iris flower dataset Note that this program might not run on Geeksforgeeks IDE, but it can run easily on your local python interpreter, provided, you have installed the required libraries.

from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
from sklearn.model_selection import train_test_split

iris_dataset=load_iris()
X_train, X_test, y_train, y_test=train_test_split(iris_dataset["data"], iris_dataset["target"], random_state=0)
kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train, y_train)
x_new=np.array([[5, 2.9, 1, 0.2]])
prediction=kn.predict(x_new)
print("Predicted target value: {}\n".format(prediction))
print("Predicted feature name: {}\n".format(iris_dataset["target_names"][prediction]))
print("Test score: {:.2f}".format(kn.score(X_test, y_test)))

Output: Predicted target name: [0]
Predicted feature name: [‘setosa’]
Test score: 0.97
Now let us dissect this program line by line.The first line imports iris data set which is already predefined in sklearn module. We also import kNeighborsClassifier algorithm and train_test_split class from sklearn. We also import numpy module for use in this program.
Then we encapsulate load_iris() method in iris_dataset variable. Further we divide the dataset into training data and test data using train_test_split method. The X prefix in variable denotes the feature values (eg. petal length etc) and y prefix denotes target values (eg. 0 for setosa, 1 for virginica and 2 for versicolor). This method divides dataset into training and test data randomly in ratio of 75:25. Then we encapsulate KNeighborsClassifier method in kn variable while keeping value of k=1. This method contains K Nearest Neighbor algorithm in it. In the next line we fit our training data into this algorithm so that computer can get trained using this data.
Now the training part is complete. Now we have dimensions of a new flower in a numpy array called x_new and we want to predict the species of this flower. We do this using the predict method which takes this array as input and spits out predicted target value as output.So the predicted target value comes out to be 0 which stands for setosa. So this flower has good chances to be of setosa species. Finally we find the test score which is the ratio of no. of predictions found correct and total predictions made. We do this using the score method which basically compares the actual values of the test set with the predicted values.
Thus we saw how machine learning works and developed a basic program to implement it using scikit-learn module in python. You can refer scikit learn documentation for more info: Scikit-learn docs. Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *