Logo image
Vulnerability to Stability: Scalable Large Language Model in Queue-Based Web Service
Conference proceeding   Peer reviewed

Vulnerability to Stability: Scalable Large Language Model in Queue-Based Web Service

MD Abdul Barek, Md Bajlur Rashid, Md Mostafizur Rahman, A.B.M Kamrul Islamc Riad, Guillermo Francia, Hossain Shahriar and Sheikh Iqbal Ahamed
Proceedings: 2025 IEEE 49th Annual Computers, Software, and Applications Conference COMPSAC 2025, pp.995-1000
IEEE Annual International Computer Software and Applications Conference
Annual Computers, Software, and Applications Conference (COMPSAC), 49th (Toronto, Ontario, Canada, 07/08/2025–07/11/2025)
08/26/2025
Web of Science ID: WOS:001575960000122

Metrics

12 Record Views

Abstract

Large Language Models (LLMs) have demonstrated exceptional capabilities in the field of Artificial Intelligence (AI) and are now widely used in various applications globally. However, one of their major challenges is handling high-concurrency workloads, especially under extreme conditions. When too many requests are sent simultaneously, LLMs often become unresponsive which leads to performance degradation and reduced reliability in real-world applications. To address this issue, this paper proposes a queue-based system that separates request handling from direct execution. By implementing a distributed queue, requests are processed in a structured and controlled manner, preventing system overload and ensuring stable performance. This approach also allows for dynamic scalability, meaning additional resources can be allocated as needed to maintain efficiency. Our experimental results show that this method significantly improves resilience under heavy workloads which prevents resource exhaustion and enables linear scalability. The findings highlight the effectiveness of a queue-based web service in ensuring LLMs remain responsive even under extreme workloads.

Details

Logo image