Muhammadu Ilyas
5 min readMay 7, 2022

--

Wine Variety Classification Challenge (Microsoft Learn Classification Challenge)

Wine experts can identify wines from specific vineyards through smell and taste, but the factors that give different wines their individual characteristics are actually based on their chemical composition.

In this challenge, I trained a classification model to analysed the chemical and visual features of wine samples and classify them based on their cultivar (grape variety).

Citation: The data used in this exercise was originally collected by Forina, M. et al.

TASK

The challenge is to explore the data and train a classification model that achieves an overall Recall metric of over 0.95 (95%).

Firstly, I downloaded the data set in .csv format and saved it in a folder allocated for this project. Then created a Jupyter notebook and imported the relevant libraries, modules and packages.

Note that I implemented the no warning notification for the notebook.

Next is that I checked out the data frame using the pandas data frame object function .head(). However, the data set has already been loaded into the notebook by .read_csv().

All Column features in the data set are numerical containing different value of chemical ingredient used in different wines.

I also summarised the data statistically;

I noticed there is no null or missing value in the data set while all entries contain 178 observations.

The ‘Proline’ column has the largest set of data set in values. Hence a pre-processing technique might be employed or the column is not included in the fitting of the model.

The data frame contains 14 columns (13 features and a label).

FEATURES: Alcohol, Malic acid, Ash, Alcalinity, Magnesium, Phenols, Flavanoids, Nonflavanoids, Proanthocyanins, Color intensity, Hue, OD280_315_of_diluted_wines and Proline.

LABEL: Wine Variety

This label contains different classes which are; 0 (variety A)

1 (variety B)

2 (variety C)

A list of features and label were created separately so as to ease the classification process.

DATA VISUALIZATION

First, I compared each feature by their effect on each class label so as to know the features that poses negative correlation towards the prediction process.

Not all features must be included in the fitting and training of the model. Only features which poses a high predictive power are implemented.

All features column has good predictive power, Hence they can all be employed in the model fitting process

I also viewed the class label distribution so as to be sure it is balanced, if unbalanced some balancing techniques like SMOTE, over sampling and under sampling can be used to fix this.

Well, the data is balanced (i.e., the value count difference is a bit minimal and insignificant)

NOTE

I decided to do all further preprocessing in a machine learning pipeline. Since can be reused for any other cases or data set.

I created the training/fitting set and the testing/validation set from the data from the data by slicing the data frame along axis 0 only.

For the machine learning algorithm, I used the Logistic Regressor Model.

I used this because the data set is small in size and simple in terms of data points, hence implementing with a simple model such as Logistic Regression should not be a problem.

When fitting the model notice that the multi-class hyperparameter was tuned to auto. This is required as the classification problem involves more than two classes in the label.

In evaluating the model, the roc auc score is used by taking the test prediction with the label predictions.

The confusion matrix is also visualized.

The relationship is built in the model during the training process which is the key element in future deployment of the model.

The model is then saved in a file directory as a pickle file.

To complete the challenge the model is used to make prediction on new set of wine observation

To view the code on this project, please visit the notebook section of this repository. Also, do not forget to give it a star;

https://github.com/Gbekoilias/Classification-Challenge

Thank you reading! Happy to receive your suggestion and recommendation!

--

--