TY - JOUR
T1 - Text Regression Analysis
T2 - A Review, Empirical, and Experimental Insights
AU - Taha, Kamal
N1 - Publisher Copyright:
© 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
PY - 2024
Y1 - 2024
N2 - Effective management and analysis of large-scale textual data presents significant challenges, notably due to high storage and processing demands. Text regression analysis, a specific branch of text mining, has proven invaluable in enabling individuals, researchers, and businesses to derive meaningful insights from the rapidly increasing volumes of textual data. However, the general grouping of algorithms in existing surveys often leads to confusion and imprecise evaluations. This paper addresses these issues by introducing a methodological taxonomy tailored for text regression analysis. This taxonomy categorizes algorithms into specific techniques and detailed categories, organized into two levels: methodology category and methodology technique. To validate the accuracy of the different techniques and categories, we conduct both empirical and experimental evaluations of text regression techniques. Empirical evaluations are based on four criteria, while experimental evaluations include rankings of: (1) algorithms employing identical techniques, (2) various techniques within the same category, and (3) contrasting categories overall. This dual approach of methodological structuring and rigorous evaluation offers a nuanced and comprehensive understanding of text regression algorithms, equipping researchers with the knowledge to make informed decisions.
AB - Effective management and analysis of large-scale textual data presents significant challenges, notably due to high storage and processing demands. Text regression analysis, a specific branch of text mining, has proven invaluable in enabling individuals, researchers, and businesses to derive meaningful insights from the rapidly increasing volumes of textual data. However, the general grouping of algorithms in existing surveys often leads to confusion and imprecise evaluations. This paper addresses these issues by introducing a methodological taxonomy tailored for text regression analysis. This taxonomy categorizes algorithms into specific techniques and detailed categories, organized into two levels: methodology category and methodology technique. To validate the accuracy of the different techniques and categories, we conduct both empirical and experimental evaluations of text regression techniques. Empirical evaluations are based on four criteria, while experimental evaluations include rankings of: (1) algorithms employing identical techniques, (2) various techniques within the same category, and (3) contrasting categories overall. This dual approach of methodological structuring and rigorous evaluation offers a nuanced and comprehensive understanding of text regression algorithms, equipping researchers with the knowledge to make informed decisions.
KW - empirical evaluation
KW - experimental evaluation
KW - Text data mining
KW - text regression analysis
UR - https://www.scopus.com/pages/publications/85201746310
U2 - 10.1109/ACCESS.2024.3446765
DO - 10.1109/ACCESS.2024.3446765
M3 - Article
AN - SCOPUS:85201746310
SN - 2169-3536
VL - 12
SP - 137333
EP - 137344
JO - IEEE Access
JF - IEEE Access
ER -