Deep Learning Approaches To Image Classification: Exploring The Future Of Visual Data Analysis

Manikanth Sarisa; Venkata Nagesh Boddapati; Gagan Kumar Patra; Chandrababu Kuraku; Siddharth Konkimalla

doi:10.53555/kuey.v28i4.7863

pdf

Published: Nov 25, 2022

DOI: https://doi.org/10.53555/kuey.v28i4.7863

Keywords:

Deep Learning, Image Classification, Vision Transformer (ViT), Transfer-Learning, Drone- Captured Images, Convolutional Neural Networks, Attention Mechanisms, Multi-Modality Input, Architectural Vision Transformer, Dataset

Manikanth Sarisa

Venkata Nagesh Boddapati

Gagan Kumar Patra

Chandrababu Kuraku

Siddharth Konkimalla

Abstract

Over the past decade, deep learning has emerged as a revolutionary technology for the analysis of visual data, particularly images. This master's thesis focuses on deep learning approaches to image classification, which is a key task in many applications using visual data analysis. A state-of-the-art deep learning model, namely the Vision Transformer (ViT), is explored for image classification. ViT is trained using transfer-learning techniques on a new dataset of over 350,000 photographs of European buildings in eight cities, obtained across two separate flights from a drone-mounted camera. Initial results demonstrate that models pre-trained on large datasets such as JFT-300M can achieve performance competitively with the fine-tuning of models trained from scratch on smaller datasets and that ViT outperforms convolutional neural networks for drone-captured images. Further, the prospects of deep learning for image classification are discussed, highlighting the potential impact of new research directions within the architectural vision transformer domain (e.g., Swin-Transformer, CrossViTs, T2T-vision Transformer) and new training techniques (e.g., Vision-Language Pre-training models, multi-modality input). The exponential increase in data generated by cameras, mobile devices, and Internet-of-Things (IoT) sensors has escalated the need for automated processing and analysis of visual data. Furthermore, images and video frames are a popular medium for data collection across various domains, including commercial and industrial. Image classification, or finding the most relevant label for a given photograph, is one key task in many applications using visual data analysis. Popular applications include multimedia search engines, mobile applications navigating to points of interest (POI), and anomaly detection in industrial cameras. As a consequence, many datasets have been assembled, containing millions of photographs collected and labeled according to city, object, or scene. Deep neural networks trained end-to-end directly on pixels have become state-of-the-art image classification technology. More recently, architectures based solely on attention mechanisms, eschewing convolutions, have challenged the long-standing dominance of convolutional neural networks.

Downloads

Download data is not yet available.

How to Cite

Manikanth Sarisa, Venkata Nagesh Boddapati, Gagan Kumar Patra, Chandrababu Kuraku, & Siddharth Konkimalla. (2022). Deep Learning Approaches To Image Classification: Exploring The Future Of Visual Data Analysis. Educational Administration: Theory and Practice, 28(4), 331–345. https://doi.org/10.53555/kuey.v28i4.7863

Issue

Vol. 28 No. 04 (2022)

Section

Articles

Author Biographies

Manikanth Sarisa

Prin. Software Eng. Ally Fin. Inc

Venkata Nagesh Boddapati

Microsoft Support Escalation Engineer

Gagan Kumar Patra

Tata Consult. Serv. Sr. Solution Arch

Chandrababu Kuraku

Mitaja Corporation Sr. Solution Architect

Siddharth Konkimalla

Amazon Com LLC Network Dev Engineer

Article Sidebar

Main Article Content