Food Processing is processing raw food into suitable forms that complement modern dietary habits. The entire process of Food Processing should ensure that the amount of nutrition remains the same and that no poisonous or harmful substances enter the food. Due to this, the Food Safety and Quality factor has gained intense recent interest from the scientific community and the public. Analyzing and maintaining food quality and safety is a complex and tedious process. Food Processing composes of various stages like (i) cultivating, (ii) harvesting, (iii) storage,(iv) preparation and consumption. Grading can be termed the most crucial step in maintaining the food quality.

Grading is a step-in food processing that involves sorting food products based on their morphological parameters such as size, quality and sometimes other criteria such as colour or ripeness. Grading is commonly used for fruits and vegetables, but can also be applied to other foods such as meat, grains and dairy products. The grading process can involve visual inspection, mechanical sorting or a combination of both. Food products are usually graded according to a set of industry standards defining each grade’s characteristics. These standards may vary depending on the type of food and the country or region where it is produced.

Grading is a labour-intensive task and requires the utmost attention by humans. Implementing technology in the grading step of food processing can provide several benefits over traditional human grading systems. Technology-based grading systems are more consistent, accurate and objective than human grading systems. They can process large volumes of food products quickly and efficiently, reducing the time and labour required for grading. Additionally, they can help reduce biases or inconsistencies affecting human graders, such as fatigue or personal preferences. While the initial investment in technology may be higher, it can lead to increased marketability and profitability in the long run.

On the other hand, human grading systems can be subjective, inconsistent and prone to errors, particularly if graders are fatigued or distracted. They can also be labour-intensive and may not be scalable to accommodate large volumes of food products. Therefore, implementing technology-based grading systems can help ensure consistent and accurate grading of food products, leading to improved quality and marketability.

With increasing technology, Machine Vision Systems have entered various fields like biotech, agri-tech and foodtech. Owing to their high accuracy and efficiency, MVSs have cut down labour-intensive tasks efficiently in significant industries. Machine Vision Systems acquire image data from various sources and techniques, such as land-based and aerial technologies to perform various operations in numerous industries like agro-based industries, food industries, etc. MVSs can be used in crop monitoring, foreign object detection, agricultural produce grading, etc.

The role of MVS in food processing industries is to collect various morphological characteristics of raw materials such as colour, size, shape and texture. MVS also accurately detects morphological characteristics not visible to the naked human eye.

MVS application has been recently applied in the food industry in various aspects, like food process monitoring, foreign object detection, food safety and quality evaluation. While being put into application, the MVS guides the operation or working of the machinery. Various approaches, like spectroscopic technologies or multimode tabletop systems have been developed for food safety and quality evaluation. Spectroscopic methods are routinely utilized to analyze the content and qualities of food items. For instance, Infrared spectroscopy may be used to identify the presence of pollutants in food, such as pesticides or heavy metals. Raman spectroscopy may determine the presence of certain compounds in food, such as sugar or fat. Because they can conduct many spectroscopic procedures in one instrument, multimode tabletop systems can be precious in food safety and quality evaluation. This enables a more thorough examination of food items to ensure their safety and quality. Furthermore, multimode tabletop systems are frequently compact and portable, making them suitable for usage in various contexts such as food processing plants or inspection sites. Another method is being applied nowadays to predict the number of actual days that a fruit can be stored/transported, so that it does not get rotten, utilizes Support Vector Regressions (SVR), which is related to Support Vector Machines (SVM).

MVS enables us to capture, observe, assess, evaluate and recognize animate or inanimate objects. MVSs can utilize one or more cameras for this detailed process. The data acquired from the MVS can be utilized to control manufacturing processes in industries.

The main parts of MVS include the following:

1. Digital Cameras: To capture the input;
2. Image Processing Programs: To process the images captured from the cameras;
3. Mechanical System: For Inspection and Quality Control in Industries.

Main Parts of MVS

The Flow of Work in a Machine Vision System:-

Object: The sample that acts as an input to the Machine Vision System is an object. E.g. fruits, vegetables, etc.

Image Acquisition: At this step, the MVS obtains the images of the object via different imaging techniques, including but not limited to Real-time photographs, X-rays, Thermal Imaging, Remote sensing, MRI, etc.

Image Processing: At this step, the MVS produces new images based on existing ones. To extract and improve the region of interest via methods, such as feature extraction in which a picture is subdivided into several parts for better interpretation of the image.

Low-Level Processing: In Machine Vision Systems for the food industry, low-level processing includes capturing digital pictures of food samples using various equipment, followed by pre-processing to increase image quality. Image enhancement, cropping and noise reduction are examples of pre-processing processes that assure accurate analysis and quality control.

Intermediate Level Processing: Image segmentation, picture representation (border and area) and image description are the three essential processes in intermediate-level image analysis for the food business. These processes aid in the separation of pertinent information, the description of the size, shape, texture, flaws of food samples and the extraction of quantitative data for further in-depth analysis and quality control.

High-Level Processing: Image recognition and picture interpretation are examples of high-level processing. Statistical or deep learning approaches are typically used to categorize the target during this stage. These processes often determine how the following equipment performs by delivering essential information.

Image Interpretation: Image Interpretation is the method of identifying features seen in the images. As a result, the analyzed image is made using algorithms such as K Nearest Neighbor (KNN), Support Vector Machine (SVM), Neural Networks and Genetic Algorithms.


Methodologies include data augmentation, image segmentation, feature extraction and classification. These are discussed in detail below:

Data Augmentation

1. Data Augmentation

Data Augmentation is a machine learning and computer vision approach that generates different versions of existing data to artificially enhance the size of a dataset. It entails changing the original data, such as rotating, flipping, scaling or adding noise to generate new samples that retain the same data distribution. Data augmentation aims to increase the model’s capacity to generalize to new, previously unknown data by exposing it to broader instances. When working with restricted or unbalanced datasets, data augmentation can assist in avoiding overfitting and enhancing the performance of machine learning models.

2. Image Segmentation

Image segmentation is an essential part of image understanding and one of the most challenging tasks in image processing. The methods in image segmentation include region-based segmentation, machine learning-based methods and thresholding. As in practical applications, image acquisition is always performed in natural light to achieve higher accuracy and real effects. Somewhere, it is difficult to find a suitable threshold and accurate ages to segment the target to solve this problem K-means is used here. K-Means is a clustering technique used in machine learning and data mining to divide a dataset into K groups based on data point similarity. Each data point is assigned to the nearest cluster centroid and the centroids are recalculated until convergence is reached. To reduce noise and improve image contrast, rank filter and lock transformation are used before applying K-means.

3. Feature Extraction

Every image has its features and characteristics that distinguish it from other types of images. Some features are natural such as brightness, edges and colour. Some images require transformation and processing to obtain principal components. These features must be extracted as numerical values, so that the computer can analyze and understand them. Basic image features include texture features, shape features and colour features.

(i) Texture features: Texture characteristics are visual patterns and structures that may be measured and analyzed to glean information about an image’s content.

(ii) Colour features: The distribution and qualities of colours inside a picture are called colour features. Colour characteristics may be retrieved and analyzed to offer information about image content.

(iii) Shape features: Shape characteristics, which relate to the geometric aspects and spatial interactions between objects in animage are a fundamental component of a picture. It analyzes the visual content to offer information such as the size, orientation and placement of objects within the image.

4. Classification

Let’s take a fruit images dataset and work on them. Here, classification is divided into two steps. The first step is to feed the extracted features to the model to separate the fruit image dataset into ripened, unripened and over-ripened. To do this, we will be applying three models in this article which are: (i) K-Nearest Neighbors (KNN), (ii) Support Vector Machine and (iii) Naïve Bayes(NB). All models can handle high-dimensional data and are efficient too.
(i) K-Nearest Neighbors
It is one of the basic machine learning methods and the method to implement is to input test data into the model trained and labels in the training set.

(ii) Support Vector Machine
It is a supervised learning method widely used in statistical classification and regression analysis.

(iii) Naïve Bayes
It is a supervised machine learning method for classification tasks like text classification.

Now, the next step is to feed ripened fruit images from the output from SVM into YOLOv3.

YOLOv3 is ‘You Only Look Once’, which is an object detection algorithm based on deep neural networks(DNNs). Some of the significant advantages of YOLO are: (i) High accuracy, (ii) Easy to use, (iii) Fast and efficient. To maintain the speed advantage, YOLOv3 is adopted instead of YOLO. In YOLO, Object classification uses Logistics instead of softmax, which increases prediction accuracy, especially for small object recognition classification. Here, YOLOv3 will detect small spots in fruit images and offers a fast response to the application.


A two-layer classifier is the main component of the system and grade fruit images according to their ripeness, unripeness and over-ripeness is presented in the article. The YOLO v3 systems have the capability to detect the ripeness and unripeness in the fruits rapidly and efficiently.

About the Authors:

Authors - Kushagra Agrawal & Nisharg Nargund


The views/opinions expressed by authors on this website solely reflect the author(s) and do not necessarily reflect the views/opinions of the Editors/Publisher. Neither the Editors nor the Publisher can be held responsible and liable for consequences that may arise on account of errors/omissions appearing in the Articles/Opinions.


An editor by day & dreamer at night; passionately involved with both print and digital media; Pet lover; Solo traveller.

Write A Comment

13 − 2 =