Full Text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Unmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. In this work, we built on the use of Large Language Models (LLMs) and Visual Language Models (VLMs), together with a state-of-the-art detection pipeline, to provide thorough zero-shot UAV scene literary text descriptions. The generated texts achieve a GUNNING Fog median grade level in the range of 7–12. Applications of this framework could be found in the filming industry and could enhance user experience in theme parks or in the advertisement sector. We demonstrate a low-cost highly efficient state-of-the-art practical implementation of microdrones in a well-controlled and challenging setting, in addition to proposing the use of standardized readability metrics to assess LLM-enhanced descriptions.

Details

Title
Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles
Author
de Curtò, J 1   VIAFID ORCID Logo  ; de Zarzà, I 1   VIAFID ORCID Logo  ; Calafate, Carlos T 2   VIAFID ORCID Logo 

 Centre for Intelligent Multidimensional Data Analysis, HK Science Park, Shatin, Hong Kong; Departamento de Informática de Sistemas y Computadores, Universitat Politècnica de València, 46022 València, Spain; Informatik und Mathematik, GOETHE-University Frankfurt am Main, 60323 Frankfurt am Main, Germany; Estudis d’Informàtica, Multimèdia i Telecomunicació, Universitat Oberta de Catalunya, 08018 Barcelona, Spain 
 Departamento de Informática de Sistemas y Computadores, Universitat Politècnica de València, 46022 València, Spain 
First page
114
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
2504446X
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2779464669
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.