Evento acontece no dia 23 de abril, às 10h, no Anfiteatro do CIn
O Centro de Informática (CIn) da UFPE receberá Fernando Castor, alumni CIn e professor associado da Universidade de Twente, para apresentação do Seminário de Pesquisa intitulado “Less Talk, More Code: Energy–Accuracy Trade-offs and Babbling Suppression in Local LLMs”. A apresentação abordará como tornar o uso de inteligência artificial generativa no desenvolvimento de software mais eficiente, econômico e sustentável, sem comprometer a qualidade dos resultados. O evento acontecerá no dia 23 de abril, às 10h, no Anfiteatro do CIn, sem necessidade de inscrição prévia para participar. Castor também foi professor do CIn, de 2008 a 2024.
Confira abaixo mais informações sobre o seminário e o palestrante:
Title: Less Talk, More Code: Energy–Accuracy Trade-offs and Babbling Suppression in Local LLMs
Abstract:
Developers increasingly rely on generative AI-based coding assistants such as GitHub Copilot and Claude Code in their workflows. Since many such tools are accessible via remote APIs, data privacy and security, as well as cost concerns drive client organizations towards locally-deployed language models. This talk will present a study examining the accuracy-energy trade-off in local LLM deployment. We evaluated 26 LLM families (including both Mixture of Experts and dense architectures) across common software development tasks on two hardware configurations, a commodity GPU and a high-performance AI-specific GPU, considering both full-precision and quantized variants. Our results demonstrate that larger models with higher energy requirements do not consistently yield proportional accuracy gains. Moreover, quantized models often outperform full-precision medium-sized models in both efficiency and accuracy. We also find that no single model excels across all software development task types. Finally, the number of active parameters of a model, output length, and quantization level jointly explain over 73% of the variance in inference energy consumption for Code Generation and almost 90% for Docstring Generation. Prompt size, however, has a negligible impact on energy usage.
In the aforementioned study, we noticed that models often babble, i.e., produce many more tokens than required. For example, when asked to generate a solution to a programming problem, models would often produce, besides the code of the solution itself, informal explanations, tests, comments, and more. These extra tokens require additional resources to be produced and, in the case of large LLMs made available as services, cost literal money. Since solutions are often produced early in the generation process, it is possible to avoid most of these extra tokens by checking whether an acceptable solution has already appeared and stopping early. We call this approach babbling suppression and we have conducted a study applying it to the generation of Python and Java code, with two benchmarks for each one, across ten locally-executable language models. Babbling suppression achieved reductions by the generation process, with no negative impact on accuracy. This technique can be applied as a plugin to a workflow, i.e., without requiring expensive model retraining of up to 65% for Python and 62% for Java in the amount of energy cons.
Bio:
Fernando Castor is an Associate Professor in the Formal Methods and Tools group, University of Twente. Prior to joining UTwente, Fernando was a professor at UFPE for 15 years. His broad research goal is to help developers build more efficient software systems more efficiently. More specifically, he conducts research in Software Engineering, with emphasis on Software Maintenance, Energy Efficiency, and Code Understandability.
Comentários desativados