Building Scalable Data Lakes For Internet Of Things (IoT) Data Management
Main Article Content
Abstract
The rapid expansion of Internet of Things (IoT) devices has resulted in an unprecedented influx of heterogeneous data, posing significant challenges in terms of storage, processing, and analysis. This paper presents scalable data lake architecture, integrated with advanced deep learning techniques, to effectively manage and analyze large volumes of IoT data. The proposed methodology leverages Apache Hadoop for distributed storage, Apache Kafka for real-time data ingestion, and Apache Spark for data processing and model training. Deep learning models, including LSTM, CNN-LSTM hybrid, and GRU, were implemented to capture complex temporal and spatial patterns in IoT data. The CNN-LSTM hybrid model demonstrated superior performance with the lowest MAE and RMSE values, highlighting its effectiveness in predicting future sensor readings. This study underscores the advantages of integrating deep learning models within a scalable data lake frameworks and data strategy, offering significant improvements in predictive accuracy and scalability for IoT applications.