Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Aspects of GPU perfomance in algorithms with random memory access. / Kashkovsky, Alexander V.; Shershnev, Anton A.; Vashchenkov, Pavel V.
Proceedings of the XXV Conference on High-Energy Processes in Condensed Matter, HEPCM 2017: Dedicated to the 60th Anniversary of the Khristianovich Institute of Theoretical and Applied Mechanics SB RAS. ed. / Fomin. Vol. 1893 American Institute of Physics Inc., 2017. 030047 (AIP Conference Proceedings; Vol. 1893).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Aspects of GPU perfomance in algorithms with random memory access
AU - Kashkovsky, Alexander V.
AU - Shershnev, Anton A.
AU - Vashchenkov, Pavel V.
PY - 2017/10/26
Y1 - 2017/10/26
N2 - The numerical code for solving the Boltzmann equation on the hybrid computational cluster using the Direct Simulation Monte Carlo (DSMC) method showed that on Tesla K40 accelerators computational performance drops dramatically with increase of percentage of occupied GPU memory. Testing revealed that memory access time increases tens of times after certain critical percentage of memory is occupied. Moreover, it seems to be the common problem of all NVidia's GPUs arising from its architecture. Few modifications of the numerical algorithm were suggested to overcome this problem. One of them, based on the splitting the memory into "virtual" blocks, resulted in 2.5 times speed up.
AB - The numerical code for solving the Boltzmann equation on the hybrid computational cluster using the Direct Simulation Monte Carlo (DSMC) method showed that on Tesla K40 accelerators computational performance drops dramatically with increase of percentage of occupied GPU memory. Testing revealed that memory access time increases tens of times after certain critical percentage of memory is occupied. Moreover, it seems to be the common problem of all NVidia's GPUs arising from its architecture. Few modifications of the numerical algorithm were suggested to overcome this problem. One of them, based on the splitting the memory into "virtual" blocks, resulted in 2.5 times speed up.
UR - http://www.scopus.com/inward/record.url?scp=85034272764&partnerID=8YFLogxK
U2 - 10.1063/1.5007505
DO - 10.1063/1.5007505
M3 - Conference contribution
AN - SCOPUS:85034272764
VL - 1893
T3 - AIP Conference Proceedings
BT - Proceedings of the XXV Conference on High-Energy Processes in Condensed Matter, HEPCM 2017
A2 - Fomin, null
PB - American Institute of Physics Inc.
T2 - 25th Conference on High-Energy Processes in Condensed Matter, HEPCM 2017
Y2 - 5 June 2017 through 9 June 2017
ER -
ID: 9673653