Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Unsupervised Learning for Detection of Cognitive Distortions in Patient Narratives. / Bobo, Samson; Kolonin, Anton.
Advances in Neural Computation, Machine Learning, and Cognitive Research IX. ed. / Boris Kryzhanovsky; Witali Dunin-Barkowski; Vladimir Redko; Yury Tiumentsev; Valentin V. Klimov. Springer, 2026. p. 545-562 43 (Studies in Computational Intelligence; Vol. 1241 SCI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Unsupervised Learning for Detection of Cognitive Distortions in Patient Narratives
AU - Bobo, Samson
AU - Kolonin, Anton
N1 - Conference code: 27
PY - 2026
Y1 - 2026
N2 - This paper introduces an unsupervised machine learning framework for detecting CDs in psychotherapy transcripts. Our novel pipeline integrates semantic embedding using MiniLM-L6-v2, Principal Component Analysis (75 orthogonal directions, PCA75), optimized HDBSCAN clustering (silhouette score = 0.098), and KeyBERT-assisted clinical interpretation. Analysis of 6,057 patient narratives reveals three dominant CD profiles: Social Anxiety with (64.9% distorted utterances), Performance Anxiety (100% distorted utterances), and Mixed Symptoms (noise cluster, r = −0.30). Clinical validation by three licensed psychologists evaluating 100 samples per cluster demonstrates strong cluster coherence (Fleiss’ κ = 0.68, indicating “substantial agreement” per Landis and Koch, 1977). The framework provides clinicians with a scalable taxonomy-free tool for cognitive pattern identification, enabling more efficient treatment personalization and progress monitoring.
AB - This paper introduces an unsupervised machine learning framework for detecting CDs in psychotherapy transcripts. Our novel pipeline integrates semantic embedding using MiniLM-L6-v2, Principal Component Analysis (75 orthogonal directions, PCA75), optimized HDBSCAN clustering (silhouette score = 0.098), and KeyBERT-assisted clinical interpretation. Analysis of 6,057 patient narratives reveals three dominant CD profiles: Social Anxiety with (64.9% distorted utterances), Performance Anxiety (100% distorted utterances), and Mixed Symptoms (noise cluster, r = −0.30). Clinical validation by three licensed psychologists evaluating 100 samples per cluster demonstrates strong cluster coherence (Fleiss’ κ = 0.68, indicating “substantial agreement” per Landis and Koch, 1977). The framework provides clinicians with a scalable taxonomy-free tool for cognitive pattern identification, enabling more efficient treatment personalization and progress monitoring.
KW - Clinical intepretation
KW - Cognitive Distortions
KW - Unsupervised learning
UR - https://www.scopus.com/pages/publications/105020041365
UR - https://www.mendeley.com/catalogue/3cb244c5-f926-396b-87be-17b38a673c57/
U2 - 10.1007/978-3-032-07690-8_43
DO - 10.1007/978-3-032-07690-8_43
M3 - Conference contribution
SN - 978-3-032-07689-2
T3 - Studies in Computational Intelligence
SP - 545
EP - 562
BT - Advances in Neural Computation, Machine Learning, and Cognitive Research IX
A2 - Kryzhanovsky, Boris
A2 - Dunin-Barkowski, Witali
A2 - Redko, Vladimir
A2 - Tiumentsev, Yury
A2 - Klimov, Valentin V.
PB - Springer
T2 - XXVII International Conference on Neuroinformatics
Y2 - 20 October 2025 through 24 October 2025
ER -
ID: 71986474