Standard

A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development. / Лыу, Минь Шао Кхуэ ; Бенедичук, Маргарита Вячеславовна; Ропперт, Екатерина Ивановна и др.

в: Journal of Imaging, Том 11, № 12, 454, 18.12.2025.

Результаты исследований: Научные публикации в периодических изданияхстатьяРецензирование

Harvard

APA

Vancouver

Лыу МШК, Бенедичук МВ, Ропперт ЕИ, Кенжин РМ, Тучинов БН. A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development. Journal of Imaging. 2025 дек. 18;11(12):454. doi: 10.3390/jimaging11120454

Author

Лыу, Минь Шао Кхуэ ; Бенедичук, Маргарита Вячеславовна ; Ропперт, Екатерина Ивановна и др. / A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development. в: Journal of Imaging. 2025 ; Том 11, № 12.

BibTeX

@article{96bc710a75c941d59153972ba1aa1860,
title = "A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development",
abstract = "The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 scans to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 14 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field correction, skull stripping, spatial registration, and interpolation alter voxel statistics and geometry. While these steps improve within-dataset consistency, residual differences persist between datasets. Finally, a feature-space case study using a 3D DenseNet121 shows measurable residual covariate shift after standardized preprocessing, confirming that harmonization alone cannot eliminate inter-dataset bias. Together, these analyses provide a unified characterization of variability in public brain MRI resources and emphasize the need for preprocessing-aware and domain-adaptive strategies in the design of generalizable brain MRI foundation models.",
keywords = "brain MRI, public datasets, foundation models, data harmonization, preprocessing variability, covariate shift",
author = "Лыу, {Минь Шао Кхуэ} and Бенедичук, {Маргарита Вячеславовна} and Ропперт, {Екатерина Ивановна} and Кенжин, {Роман Мугарамович} and Тучинов, {Баир Николаевич}",
note = "This research was funded by the Ministry of Economic Development of the Russian Federation in accordance with the subsidy agreement with the Novosibirsk State University dated 17 April 2025 grant number No. 139-15-2025-006: IGK 000000C313925P3S0002.",
year = "2025",
month = dec,
day = "18",
doi = "10.3390/jimaging11120454",
language = "English",
volume = "11",
journal = "Journal of Imaging",
issn = "2313-433X",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "12",

}

RIS

TY - JOUR

T1 - A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development

AU - Лыу, Минь Шао Кхуэ

AU - Бенедичук, Маргарита Вячеславовна

AU - Ропперт, Екатерина Ивановна

AU - Кенжин, Роман Мугарамович

AU - Тучинов, Баир Николаевич

N1 - This research was funded by the Ministry of Economic Development of the Russian Federation in accordance with the subsidy agreement with the Novosibirsk State University dated 17 April 2025 grant number No. 139-15-2025-006: IGK 000000C313925P3S0002.

PY - 2025/12/18

Y1 - 2025/12/18

N2 - The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 scans to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 14 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field correction, skull stripping, spatial registration, and interpolation alter voxel statistics and geometry. While these steps improve within-dataset consistency, residual differences persist between datasets. Finally, a feature-space case study using a 3D DenseNet121 shows measurable residual covariate shift after standardized preprocessing, confirming that harmonization alone cannot eliminate inter-dataset bias. Together, these analyses provide a unified characterization of variability in public brain MRI resources and emphasize the need for preprocessing-aware and domain-adaptive strategies in the design of generalizable brain MRI foundation models.

AB - The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 scans to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 14 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field correction, skull stripping, spatial registration, and interpolation alter voxel statistics and geometry. While these steps improve within-dataset consistency, residual differences persist between datasets. Finally, a feature-space case study using a 3D DenseNet121 shows measurable residual covariate shift after standardized preprocessing, confirming that harmonization alone cannot eliminate inter-dataset bias. Together, these analyses provide a unified characterization of variability in public brain MRI resources and emphasize the need for preprocessing-aware and domain-adaptive strategies in the design of generalizable brain MRI foundation models.

KW - brain MRI

KW - public datasets

KW - foundation models

KW - data harmonization

KW - preprocessing variability

KW - covariate shift

U2 - 10.3390/jimaging11120454

DO - 10.3390/jimaging11120454

M3 - Article

C2 - 41440594

VL - 11

JO - Journal of Imaging

JF - Journal of Imaging

SN - 2313-433X

IS - 12

M1 - 454

ER -

ID: 72865901