Evaluating LLM Usage for Efficient and Explainable Numerical and Classified Implicit Sentiment Analysis of Product Desirability

评估大语言模型在产品吸引力数值化与分类化隐性情感分析中的高效性与可解释性

Abstract: Qualitative product feedback can reveal nuanced user experiences, but its implicit sentiment is difficult to measure. This paper presents a scalable and interpretable framework that uses large language models (LLMs) to quantify product desirability from such data. 摘要： 定性的产品反馈能够揭示细微的用户体验，但其隐含的情感往往难以衡量。本文提出了一个可扩展且具有可解释性的框架，利用大语言模型（LLMs）从此类数据中量化产品吸引力。

Using two Product Desirability Toolkit (PDT) datasets from ZORQ and CARMA comprising 106 respondent term groupings with gold-standard human annotation, zero-shot continuous numerical sentiment scoring and categorical sentiment classification are evaluated without relying on explicit review scores. 研究使用了来自 ZORQ 和 CARMA 的两套产品吸引力工具包（PDT）数据集，包含 106 组带有黄金标准人工标注的受访者术语。在不依赖显式评分的情况下，研究评估了零样本（zero-shot）连续数值情感评分和分类情感分析的效果。

Across the datasets, LLMs generated numerical sentiment scores directly from qualitative responses and closely matched expert labels, achieving Pearson correlations up to 0.97 and classification accuracy up to 94%. LLMs maintained robustness even when handling data presented in multiple forms and consistently expressed high confidence. 在这些数据集中，大语言模型直接从定性反馈中生成了数值情感评分，并与专家标注高度吻合，皮尔逊相关系数最高达到 0.97，分类准确率最高达到 94%。即使在处理多种形式的数据时，大语言模型依然保持了稳健性，并始终表现出高置信度。

In contrast, lexicon-based and transformer baselines did not produce statistically significant results. Among the models tested, GPT-4o-mini achieved performance comparable to larger models at 94% lower cost, supporting scalable deployment. 相比之下，基于词典的方法和传统的 Transformer 基准模型未能产生具有统计学意义的结果。在所测试的模型中，GPT-4o-mini 以低 94% 的成本实现了与大型模型相当的性能，这为可扩展部署提供了支持。

The framework also incorporates model confidence ratings and human-readable rationale explanations (xAI), improving interpretability, transparency, and trust while supporting practical use in product satisfaction assessment. 该框架还结合了模型置信度评分和人类可读的逻辑解释（xAI），在提升可解释性、透明度和信任度的同时，也支持了其在产品满意度评估中的实际应用。

In general, using the PDT tool as a survey method along with a cost efficient LLM for sentiment analysis has the potential to provide for product evaluation with results that are rich in terms of sentiment scores (both numerical and classified sentiment) and in terms of the high-level user impressions of the product that can be used to identify ideas for product development and improvement, as well as marketing ideas for target audiences. 总的来说，将 PDT 工具作为一种调查方法，并结合高性价比的大语言模型进行情感分析，有望为产品评估提供丰富的结果——不仅包含数值和分类情感评分，还能提供高层级的用户印象。这些结果可用于识别产品开发与改进的思路，以及针对目标受众的营销创意。