SOFTWARE DEFECT PREDICTION USING CLASSIFICATION MODELS

One of the most exclusive topic in research area is bugs in software. Software companies spend ton of cash 1 on testing of software and still there exist a term called bug in today’s world. To prevent the bugs as a proactive measure, again they spend ton of cash 2 for predicting the bugs and mitigate the risks of bug in software. Once the software reaches end user, limitations due to bugs cause lot of efforts, time, money, intelligence and resources for product owner to cure the bug.

Software defect prediction is predicting the defects not defecting the predicts 🙂

There are lot of techniques used to predict bugs so that organizations can incorporate them.

This is one of them

Link to the dataset

https://drive.google.com/file/d/1rW9OJ6vSRxTQoDqGplLNp6fm_0hdU8Tc/view?usp=sharing

#Import the libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as mpl
%matplotlib inline
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
import plotly as ply
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.linear_model import LogisticRegression,SGDClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, recall_score, precision_score, jaccard_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
from genetic_selection import GeneticSelectionCV
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cluster import DBSCAN
from matplotlib import cm
from matplotlib.colors import ListedColormap
from sklearn.model_selection import KFold
from keras.models import Sequential
from keras.layers import Dense
import tensorflow as tf
from ypstruct import structure
import random
from sklearn.metrics import classification_report
from random import randint
import warnings
warnings.filterwarnings("ignore")
import os
for path, _,file in os.walk('Y:/Data science/archive/'):
                            for f in file:
                               print(os.path.join(path, f))
Y:/Data science/archive/about JM1 Dataset.txt
Y:/Data science/archive/jm1.arff
Y:/Data science/archive/jm1.csv

In [2]:

colors = [ '#d62728','#2ca02c']

#Colour palette
#colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
#          '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']

In [3]:

#read the csv 
sd = pd.read_csv(r'Y:\Data science\archive\jm1.csv')

In [4]:

sd.describe().T

Out[4]:

countmeanstdmin25%50%75%max
loc10885.042.01617876.5933321.011.0023.0046.003442.00
v(g)10885.06.34859013.0196951.02.003.007.00470.00
ev(g)10885.03.4010476.7718691.01.001.003.00165.00
iv(g)10885.04.0015999.1168891.01.002.004.00402.00
n10885.0114.389738249.5020910.014.0049.00119.008441.00
v10885.0673.7580171938.8561960.048.43217.13621.4880843.08
l10885.00.1353350.1605380.00.030.080.161.30
d10885.014.17723718.7099000.03.009.0918.90418.20
i10885.029.43954434.4183130.011.8621.9336.78569.78
e10885.036836.365343434367.8012550.0161.942031.0211416.4331079782.27
b10885.00.2247660.6464080.00.020.070.2126.95
t10885.02046.46487624131.5444630.09.00112.83634.251726654.57
lOCode10885.026.25227459.6112010.04.0013.0028.002824.00
lOComment10885.02.7375299.0086080.00.000.002.00344.00
lOBlank10885.04.6255409.9681300.00.002.005.00447.00
locCodeAndComment10885.00.3707851.9079690.00.000.000.00108.00

In [5]:

sd.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10885 entries, 0 to 10884
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   loc                10885 non-null  float64
 1   v(g)               10885 non-null  float64
 2   ev(g)              10885 non-null  float64
 3   iv(g)              10885 non-null  float64
 4   n                  10885 non-null  float64
 5   v                  10885 non-null  float64
 6   l                  10885 non-null  float64
 7   d                  10885 non-null  float64
 8   i                  10885 non-null  float64
 9   e                  10885 non-null  float64
 10  b                  10885 non-null  float64
 11  t                  10885 non-null  float64
 12  lOCode             10885 non-null  int64  
 13  lOComment          10885 non-null  int64  
 14  lOBlank            10885 non-null  int64  
 15  locCodeAndComment  10885 non-null  int64  
 16  uniq_Op            10885 non-null  object 
 17  uniq_Opnd          10885 non-null  object 
 18  total_Op           10885 non-null  object 
 19  total_Opnd         10885 non-null  object 
 20  branchCount        10885 non-null  object 
 21  defects            10885 non-null  bool   
dtypes: bool(1), float64(12), int64(4), object(5)
memory usage: 1.8+ MB

In [6]:

#Convert the object type to numeric 
for x in sd.iloc[:, 16:21]:
    sd[x] = pd.to_numeric(sd[x], errors='coerce')

In [7]:

#Determine the class imbalance in the target variable 
print(sd.groupby('defects').size())
defects
False    8779
True     2106
dtype: int64

In [8]:

#Check Unique values on the dataset to take care of high cardinality features
sd.nunique()

Out[8]:

loc                   365
v(g)                  108
ev(g)                  74
iv(g)                  82
n                     806
v                    3991
l                      55
d                    2695
i                    4268
e                    6978
b                     310
t                    6761
lOCode                291
lOComment              88
lOBlank                95
locCodeAndComment      30
uniq_Op                68
uniq_Opnd             171
total_Op              581
total_Opnd            468
branchCount           146
defects                 2
dtype: int64

In [9]:

#Plot the correlation using heatmap

cor = sd.corr()
sns.set(rc={'figure.figsize':(15,12)})

sns.heatmap(cor,annot=True, linewidths = .5, fmt = '.2f',cmap='cubehelix')

Out[9]:

<AxesSubplot:>

In [ ]:

 

In [10]:

#Check for relations and insights between volume of code and the number of defective code
sns.set(rc={'figure.figsize':(20,12)})
fig=sns.scatterplot(data=sd,x=sd['v'],y=sd['b'], hue=sd['defects'],s=100,marker='o',palette=['#2ca02c','#d62728'])
fig.set(xlabel='Volume', ylabel='Bug')
mpl.show()

#Inference - Higher volume of code higher the bugs

In [11]:

sns.set(rc={'figure.figsize':(20,12)})
fig=sns.scatterplot(data=sd,x=sd['loc'],y=sd['v'], hue=sd['defects'],s=100,marker='o',palette=['#2ca02c','#d62728'])
fig.set(xlabel='Volume', ylabel='Bug')
mpl.show()

# Bugs identified even if volume of code is low. So there is no rule of thumb that bug do not occur if volume of code is low 

In [12]:

sns.set(rc={'figure.figsize':(20,12)})
fig=sns.scatterplot(data=sd,x=sd['v'],y=sd['d'], hue=sd['defects'],s=100,marker='o',palette=['#2ca02c','#d62728'])
fig.set(xlabel='Volume', ylabel='Difficulty')
mpl.show()

#Difficulty is high with higher volume of code. Higher the difficulty higher the defects

In [13]:

sns.set(rc={'figure.figsize':(20,12)})
fig=sns.scatterplot(data=sd,x=sd['v'],y=sd['e'], hue=sd['defects'],s=100,marker='o',palette=['#2ca02c','#d62728'])
fig.set(xlabel='Volume', ylabel='Effort')
mpl.show()


#Effort is also higher if volume of code is high. Higher the effort higher the defects

In [14]:

# sns.set(rc={'figure.figsize':(15,5)})

#line of code vs time,difficulty,intelligence and effort

fig, axes = mpl.subplots(2, 2, sharex=True, figsize=(20,12))




fig = sns.lineplot(x=sd['loc'],y=sd['t'],ax=axes[0,0])
# fig.set(xlabel='Line of Code', ylabel='Time')


sns.lineplot(x=sd['loc'],y=sd['d'],palette=['r'],data=sd,ax=axes[0,1])

sns.lineplot(x=sd['loc'],y=sd['i'],ax=axes[1,0])

sns.lineplot(x=sd['loc'],y=sd['e'],ax=axes[1,1])

Out[14]:

<AxesSubplot:xlabel='loc', ylabel='e'>

In [15]:

#3D plot to visualize the volume, time and defects

sns.set(style = "darkgrid")
f = mpl.figure(figsize=(20,12))
ax = f.add_subplot(111, projection = '3d')

x = sd['v']
y = sd['t']
z = sd['defects']

ax.set_xlabel("Volume")
ax.set_ylabel("Time")
ax.set_zlabel("Defects")
cmap = ListedColormap(sns.color_palette("husl", 256).as_hex())
ax.scatter(x, y, z, s=100, c=x, cmap=cmap, alpha=1)


mpl.show()

In [16]:

#Outlier analysis
sns.set(rc={'figure.figsize':(20,12)})
sns.boxplot(x=sd['uniq_Op'],color='r')

Out[16]:

<AxesSubplot:xlabel='uniq_Op'>

In [17]:

# % of defect and non defect
100 * (sd['defects'].value_counts()/len(sd))

Out[17]:

False    80.652274
True     19.347726
Name: defects, dtype: float64

In [18]:

#Class imbalance of defects

N = 8910
mpl.figure(figsize=(7, 5))
ax = sns.countplot(x=sd['defects'],palette=['#2ca02c', '#d62728'])
mpl.xticks(size=12)
mpl.xlabel('Defects', size=14)
mpl.yticks(size=12)
mpl.ylabel('Percentage', size=12)
mpl.title("Class Distribution of Defects", size=16)
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")

t = len(sd)
for p in ax.patches:
    percentage = f'{100 * p.get_height() / t:.1f}%\n'
    x = p.get_x() + p.get_width() / 2
    y = p.get_height()
    ax.annotate(percentage, (x, y), ha='center', va='center')
mpl.tight_layout()
mpl.show()

#Majority of non-defective class to minority of defective class

#minority ---> defective=19.3% < majority ---> non-defective=80.7%

In [19]:

#Drop na values
sd.dropna(inplace=True)
sd.isna().any().sum()

Out[19]:

0

In [20]:

#drop duplicates if any
sd.drop_duplicates(subset=None, inplace=True)
sd.shape

Out[20]:

(8907, 22)

In [21]:

#Verify nulls
sd.isnull().sum()

Out[21]:

loc                  0
v(g)                 0
ev(g)                0
iv(g)                0
n                    0
v                    0
l                    0
d                    0
i                    0
e                    0
b                    0
t                    0
lOCode               0
lOComment            0
lOBlank              0
locCodeAndComment    0
uniq_Op              0
uniq_Opnd            0
total_Op             0
total_Opnd           0
branchCount          0
defects              0
dtype: int64

In [23]:

# Plot upper triangle to remove highly correlated features
uptr = cor.where(np.triu(np.ones(cor.shape), k=1).astype(np.bool))
sns.heatmap(uptr,annot = True, linewidths = .5, fmt = '.2f')

Out[23]:

<AxesSubplot:>

In [24]:

# Identify the feature high collinearity index
drop_feature = [column for column in uptr.columns if any(uptr[column] > 0.95)]

drop_feature

Out[24]:

['v', 'b', 't', 'lOCode', 'total_Op', 'total_Opnd', 'branchCount']

In [25]:

sd = sd.drop(columns=sd[drop_feature])
sd

Out[25]:

locv(g)ev(g)iv(g)nldielOCommentlOBlanklocCodeAndCommentuniq_Opuniq_Opnddefects
01.11.41.41.41.31.301.301.301.302221.21.2False
11.01.01.01.01.01.001.001.001.001111.01.0True
272.07.01.06.0198.00.0520.3155.8523029.10108117.036.0True
3190.03.01.03.0600.00.0617.06254.8774202.672928217.0135.0True
437.04.01.04.0126.00.0617.1934.8610297.3016011.016.0True
1088018.04.01.04.052.00.147.3332.931770.8602010.015.0False
108819.02.01.02.030.00.128.2515.721069.6802012.08.0False
1088242.04.01.02.0103.00.0426.4019.6813716.72110018.015.0False
1088310.01.01.01.036.00.128.4417.441241.570209.08.0False
1088419.03.01.01.058.00.0911.5723.563154.6702112.014.0False

8907 rows × 15 columns

In [26]:

#Split independent and dependent variable
X=sd.drop(["defects"],axis=1).copy()
print(X.head())
print('*'*40)
y=sd[["defects"]]
y.head()
     loc  v(g)  ev(g)  iv(g)      n     l      d       i         e  lOComment  \
0    1.1   1.4    1.4    1.4    1.3  1.30   1.30    1.30      1.30          2   
1    1.0   1.0    1.0    1.0    1.0  1.00   1.00    1.00      1.00          1   
2   72.0   7.0    1.0    6.0  198.0  0.05  20.31   55.85  23029.10         10   
3  190.0   3.0    1.0    3.0  600.0  0.06  17.06  254.87  74202.67         29   
4   37.0   4.0    1.0    4.0  126.0  0.06  17.19   34.86  10297.30          1   

   lOBlank  locCodeAndComment  uniq_Op  uniq_Opnd  
0        2                  2      1.2        1.2  
1        1                  1      1.0        1.0  
2        8                  1     17.0       36.0  
3       28                  2     17.0      135.0  
4        6                  0     11.0       16.0  
****************************************

Out[26]:

defects
0False
1True
2True
3True
4True

In [27]:

#Train test split
train_size=0.8
test_size=0.5
X_train, X_bal, y_train, y_bal = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_bal, y_bal, test_size=0.5)

In [28]:

print("X_train ===> ",X_train.shape)
print("y_train ===> ",y_train.shape)
print("X_test ===> ",X_test.shape)
print("y_test ===> ",y_test.shape)
X_train ===>  (7125, 14)
y_train ===>  (7125, 1)
X_test ===>  (891, 14)
y_test ===>  (891, 1)

In [29]:

#Logistic regression

lr = LogisticRegression()
lr.fit(X_train,y_train)
y_pred_lr = lr.predict(X_test)
print('Acc:{}'.format(accuracy_score(y_test,y_pred_lr)))
print('F1 score:', f1_score(y_test, y_pred_lr))

print('Recall:', recall_score(y_test, y_pred_lr))

print('Precision:', precision_score(y_test, y_pred_lr))
print('Jaccard:', jaccard_score(y_test, y_pred_lr))
print('Confusion Matrix:{}'.format(confusion_matrix(y_test,y_pred_lr)))

scoring = 'accuracy'
kfold = KFold(n_splits = 10)
cv_results = cross_val_score(lr, X_train, y_train, cv = kfold, scoring = scoring)
cv_results

#Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred_lr))
print(confusion_matrix(y_test, y_pred_lr))
#Accuracy score
from sklearn.metrics import accuracy_score
print("ACC: ",accuracy_score(y_pred_lr,y_test))
Acc:0.7643097643097643
F1 score: 0.2857142857142857
Recall: 0.21428571428571427
Precision: 0.42857142857142855
Jaccard: 0.16666666666666666
Confusion Matrix:[[639  56]
 [154  42]]
              precision    recall  f1-score   support

       False       0.81      0.92      0.86       695
        True       0.43      0.21      0.29       196

    accuracy                           0.76       891
   macro avg       0.62      0.57      0.57       891
weighted avg       0.72      0.76      0.73       891

[[639  56]
 [154  42]]
ACC:  0.7643097643097643

In [30]:

#Naive bayes
nb = GaussianNB()
nb.fit(X_train,y_train)
y_pred_nb = nb.predict(X_test)
print('Acc:{}'.format(accuracy_score(y_test,y_pred_nb)))
print('F1 score:', f1_score(y_test, y_pred_nb))

print('Recall:', recall_score(y_test, y_pred_nb))

print('Precision:', precision_score(y_test, y_pred_nb))
print('Jaccard:', jaccard_score(y_test, y_pred_nb))
print('Confusion Matrix:{}'.format(confusion_matrix(y_test,y_pred_nb)))

scoring = 'accuracy'
kfold = KFold(n_splits = 10)
cv_results = cross_val_score(nb, X_train, y_train, cv = kfold, scoring = scoring)
cv_results

#Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred_nb))
print(confusion_matrix(y_test, y_pred_nb))
#Accuracy score
from sklearn.metrics import accuracy_score
print("ACC: ",accuracy_score(y_pred_nb,y_test))
Acc:0.77665544332211
F1 score: 0.19433198380566802
Recall: 0.12244897959183673
Precision: 0.47058823529411764
Jaccard: 0.10762331838565023
Confusion Matrix:[[668  27]
 [172  24]]
              precision    recall  f1-score   support

       False       0.80      0.96      0.87       695
        True       0.47      0.12      0.19       196

    accuracy                           0.78       891
   macro avg       0.63      0.54      0.53       891
weighted avg       0.72      0.78      0.72       891

[[668  27]
 [172  24]]
ACC:  0.77665544332211

In [31]:

#Stochastic gradient descent

sgd= SGDClassifier()
sgd.fit(X_train,y_train)
y_pred_sgd = sgd.predict(X_test)
print('Acc:{}'.format(accuracy_score(y_test,y_pred_sgd)))
print('F1 score:', f1_score(y_test, y_pred_sgd))

print('Recall:', recall_score(y_test, y_pred_sgd))

print('Precision:', precision_score(y_test, y_pred_sgd))
print('Jaccard:', jaccard_score(y_test, y_pred_sgd))
print('Confusion Matrix:{}'.format(confusion_matrix(y_test,y_pred_sgd)))

scoring = 'accuracy'
kfold = KFold(n_splits = 10)
cv_results = cross_val_score(sgd, X_train, y_train, cv = kfold, scoring = scoring)
cv_results

#Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred_sgd))
print(confusion_matrix(y_test, y_pred_sgd))
#Accuracy score
from sklearn.metrics import accuracy_score
print("ACC: ",accuracy_score(y_pred_sgd,y_test))
Acc:0.7800224466891134
F1 score: 0.0
Recall: 0.0
Precision: 0.0
Jaccard: 0.0
Confusion Matrix:[[695   0]
 [196   0]]
              precision    recall  f1-score   support

       False       0.78      1.00      0.88       695
        True       0.00      0.00      0.00       196

    accuracy                           0.78       891
   macro avg       0.39      0.50      0.44       891
weighted avg       0.61      0.78      0.68       891

[[695   0]
 [196   0]]
ACC:  0.7800224466891134

In [32]:

#K nearest neighbour

knn= KNeighborsClassifier()
knn.fit(X_train,y_train)
y_pred_knn = knn.predict(X_test)
print('Acc:{}'.format(accuracy_score(y_test,y_pred_knn)))
print('F1 score:', f1_score(y_test, y_pred_knn))

print('Recall:', recall_score(y_test, y_pred_knn))

print('Precision:', precision_score(y_test, y_pred_knn))
print('Jaccard:', jaccard_score(y_test, y_pred_knn))
print('Confusion Matrix:{}'.format(confusion_matrix(y_test,y_pred_knn)))

scoring = 'accuracy'
kfold = KFold(n_splits = 10)
cv_results = cross_val_score(knn, X_train, y_train, cv = kfold, scoring = scoring)
cv_results

#Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred_knn))
print(confusion_matrix(y_test, y_pred_knn))
#Accuracy score
from sklearn.metrics import accuracy_score
print("ACC: ",accuracy_score(y_pred_knn,y_test))
Acc:0.7384960718294051
F1 score: 0.22591362126245845
Recall: 0.17346938775510204
Precision: 0.3238095238095238
Jaccard: 0.12734082397003746
Confusion Matrix:[[624  71]
 [162  34]]
              precision    recall  f1-score   support

       False       0.79      0.90      0.84       695
        True       0.32      0.17      0.23       196

    accuracy                           0.74       891
   macro avg       0.56      0.54      0.53       891
weighted avg       0.69      0.74      0.71       891

[[624  71]
 [162  34]]
ACC:  0.7384960718294051

In [33]:

#Decision tree

dt= DecisionTreeClassifier()
dt.fit(X_train,y_train)
y_pred_dt = dt.predict(X_test)
print('Acc:{}'.format(accuracy_score(y_test,y_pred_dt)))
print('F1 score:', f1_score(y_test, y_pred_dt))

print('Recall:', recall_score(y_test, y_pred_dt))

print('Precision:', precision_score(y_test, y_pred_dt))
print('Jaccard:', jaccard_score(y_test, y_pred_dt))
print('Confusion Matrix:{}'.format(confusion_matrix(y_test,y_pred_dt)))

scoring = 'accuracy'
kfold = KFold(n_splits = 10)
cv_results = cross_val_score(dt, X_train, y_train, cv = kfold, scoring = scoring)
cv_results

#Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred_dt))
print(confusion_matrix(y_test, y_pred_dt))
#Accuracy score
from sklearn.metrics import accuracy_score
print("ACC: ",accuracy_score(y_pred_dt,y_test))
Acc:0.6734006734006734
F1 score: 0.34606741573033706
Recall: 0.39285714285714285
Precision: 0.3092369477911647
Jaccard: 0.20923913043478262
Confusion Matrix:[[523 172]
 [119  77]]
              precision    recall  f1-score   support

       False       0.81      0.75      0.78       695
        True       0.31      0.39      0.35       196

    accuracy                           0.67       891
   macro avg       0.56      0.57      0.56       891
weighted avg       0.70      0.67      0.69       891

[[523 172]
 [119  77]]
ACC:  0.6734006734006734

In [34]:

#Random forest

rf= RandomForestClassifier()
rf.fit(X_train,y_train)
y_pred_rf = rf.predict(X_test)
print('Acc:{}'.format(accuracy_score(y_test,y_pred_rf)))
print('F1 score:', f1_score(y_test, y_pred_rf))

print('Recall:', recall_score(y_test, y_pred_rf))

print('Precision:', precision_score(y_test, y_pred_rf))
print('Jaccard:', jaccard_score(y_test, y_pred_rf))
print('Confusion Matrix:{}'.format(confusion_matrix(y_test,y_pred_rf)))

scoring = 'accuracy'
kfold = KFold(n_splits = 10)
cv_results = cross_val_score(rf, X_train, y_train, cv = kfold, scoring = scoring)
cv_results

#Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred_rf))
print(confusion_matrix(y_test, y_pred_rf))
#Accuracy score
from sklearn.metrics import accuracy_score
print("ACC: ",accuracy_score(y_pred_rf,y_test))
Acc:0.7833894500561167
F1 score: 0.3275261324041812
Recall: 0.23979591836734693
Precision: 0.5164835164835165
Jaccard: 0.19583333333333333
Confusion Matrix:[[651  44]
 [149  47]]
              precision    recall  f1-score   support

       False       0.81      0.94      0.87       695
        True       0.52      0.24      0.33       196

    accuracy                           0.78       891
   macro avg       0.67      0.59      0.60       891
weighted avg       0.75      0.78      0.75       891

[[651  44]
 [149  47]]
ACC:  0.7833894500561167

In [36]:

#Support vector

svm= SVC()
svm.fit(X_train,y_train)
y_pred_svm = svm.predict(X_test)
print('Acc:{}'.format(accuracy_score(y_test,y_pred_svm)))
print('F1 score:', f1_score(y_test, y_pred_svm))

print('Recall:', recall_score(y_test, y_pred_svm))

print('Precision:', precision_score(y_test, y_pred_svm))
print('Jaccard:', jaccard_score(y_test, y_pred_svm))
print('Confusion Matrix:{}'.format(confusion_matrix(y_test,y_pred_svm)))

scoring = 'accuracy'
kfold = KFold(n_splits = 10)
cv_results = cross_val_score(svm, X_train, y_train, cv = kfold, scoring = scoring)
cv_results

#Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred_svm))
print(confusion_matrix(y_test, y_pred_svm))
#Accuracy score
from sklearn.metrics import accuracy_score
print("ACC: ",accuracy_score(y_test,y_pred_svm))
Acc:0.7800224466891134
F1 score: 0.019999999999999997
Recall: 0.01020408163265306
Precision: 0.5
Jaccard: 0.010101010101010102
Confusion Matrix:[[693   2]
 [194   2]]
              precision    recall  f1-score   support

       False       0.78      1.00      0.88       695
        True       0.50      0.01      0.02       196

    accuracy                           0.78       891
   macro avg       0.64      0.50      0.45       891
weighted avg       0.72      0.78      0.69       891

[[693   2]
 [194   2]]
ACC:  0.7800224466891134

In [ ]:

#Random forest has the highest accuracy and outperformed other classifiers

Leave a Reply