-
Amira Abdel-Rahman (admin) authoredAmira Abdel-Rahman (admin) authored
numbers.md 6.58 KiB
------------------------------------------------ V100 ------------------------------------------------ 7 TFlop double 14 TFlop single 32 GB/sec PCIe 300 GB/sec NVLINK ------------------------------------------------ 1x1x4 chiral ------------------------------------------------ input 1058444 particles mps: 254 srt: 625 frc: 3521 int: 106 cpy: 2424 out: 2.8e+05 (us)) ------------------------------------------------ coupon ------------------------------------------------ input 986154 particles mps: 277 srt: 289 frc: 3242 int: 91 cpy: 2251 out: 2.72e+05 (us)) ------------------------------------------------ simulate ------------------------------------------------ 14 TFlop / 277 Mpps = 50541 ops/particle ------------------------------------------------ integrate ------------------------------------------------ 986154 particles / 91e-6 s = 10.8 G/s 14 TFlop / (986154 particles / 91e-6 s) = 1291 ops/integrate ------------------------------------------------ copy ------------------------------------------------ 986154 particles * 7 arrays * 4 bytes / 2251 us = 12266 = 12 GB/s ------------------------------------------------ GPU_check ------------------------------------------------ peer access: from 0 to 1: yes from 0 to 2: yes from 0 to 3: yes from 0 to 4: yes from 0 to 5: no from 0 to 6: no from 0 to 7: no from 1 to 0: yes from 1 to 2: yes from 1 to 3: yes from 1 to 4: no from 1 to 5: yes from 1 to 6: no from 1 to 7: no from 2 to 0: yes from 2 to 1: yes from 2 to 3: yes from 2 to 4: no from 2 to 5: no from 2 to 6: yes from 2 to 7: no from 3 to 0: yes from 3 to 1: yes from 3 to 2: yes from 3 to 4: no from 3 to 5: no from 3 to 6: no from 3 to 7: yes from 4 to 0: yes from 4 to 1: no from 4 to 2: no from 4 to 3: no from 4 to 5: yes from 4 to 6: yes from 4 to 7: yes from 5 to 0: no from 5 to 1: yes from 5 to 2: no from 5 to 3: no from 5 to 4: yes from 5 to 6: yes from 5 to 7: yes from 6 to 0: no from 6 to 1: no from 6 to 2: yes from 6 to 3: no from 6 to 4: yes from 6 to 5: yes from 6 to 7: yes from 7 to 0: no from 7 to 1: no from 7 to 2: no from 7 to 3: yes from 7 to 4: yes from 7 to 5: yes from 7 to 6: yes GPUs: number: 0 name: Tesla V100-SXM2-16GB global memory: 16945512448 max grid size: 2147483647 max threads per block: 1024 max threads dimension: 1024 multiprocessor count: 80 max threads per multiprocessor: 2048 number: 1 name: Tesla V100-SXM2-16GB global memory: 16945512448 max grid size: 2147483647 max threads per block: 1024 max threads dimension: 1024 multiprocessor count: 80 max threads per multiprocessor: 2048 number: 2 name: Tesla V100-SXM2-16GB global memory: 16945512448 max grid size: 2147483647 max threads per block: 1024 max threads dimension: 1024 multiprocessor count: 80 max threads per multiprocessor: 2048 number: 3 name: Tesla V100-SXM2-16GB global memory: 16945512448 max grid size: 2147483647 max threads per block: 1024 max threads dimension: 1024 multiprocessor count: 80 max threads per multiprocessor: 2048 number: 4 name: Tesla V100-SXM2-16GB global memory: 16945512448 max grid size: 2147483647 max threads per block: 1024 max threads dimension: 1024 multiprocessor count: 80 max threads per multiprocessor: 2048 number: 5 name: Tesla V100-SXM2-16GB global memory: 16945512448 max grid size: 2147483647 max threads per block: 1024 max threads dimension: 1024 multiprocessor count: 80 max threads per multiprocessor: 2048 number: 6 name: Tesla V100-SXM2-16GB global memory: 16945512448 max grid size: 2147483647 max threads per block: 1024 max threads dimension: 1024 multiprocessor count: 80 max threads per multiprocessor: 2048 number: 7 name: Tesla V100-SXM2-16GB global memory: 16945512448 max grid size: 2147483647 max threads per block: 1024 max threads dimension: 1024 multiprocessor count: 80 max threads per multiprocessor: 2048 copy 5000000 floats from CPU to GPU 6854.000000 us, 2.918e+09 B/s 1846.000000 us, 1.08342e+10 B/s pinned copy 5000000 floats from GPU to GPU 0: GPU 1: 849.000000 us, 2.35571e+10 B/s GPU 2: 851.000000 us, 2.35018e+10 B/s GPU 3: 443.000000 us, 4.51467e+10 B/s GPU 4: 444.000000 us, 4.5045e+10 B/s GPU 5: 2051.000000 us, 9.75134e+09 B/s GPU 6: 2026.000000 us, 9.87167e+09 B/s GPU 7: 2017.000000 us, 9.91572e+09 B/s add 5000000x5000000 floats: 3.747896 s, 6670.409500 G/s peer add 5000000x5000000 GPU 0 floats: GPU 1: 3.755309 s, 6657.241500 G/s GPU 2: 3.731777 s, 6699.221000 G/s GPU 3: 3.735422 s, 6692.684500 G/s GPU 4: 3.736765 s, 6690.278500 G/s parallel peer add 5000000x5000000 GPU 0 floats: 5 GPUS: 0.783563 s, 19940.962000 G/s ------------------------------------------------ CPU_check ------------------------------------------------ processor : 95 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz stepping : 7 microcode : 0x5002f01 cpu MHz : 1221.613 cache size : 36608 KB physical id : 1 siblings : 48 core id : 23 cpu cores : 24 apicid : 111 initial apicid : 111 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit bogomips : 5999.99 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: add 500000x500000 floats with 1 thread: 65.058972 s, 3.842667 G/s add 500000x500000 floats with 96 threads: 1.376176 s, 181.662813 G/s ------------------------------------------------ pipe_check ------------------------------------------------ send: 100000000 points receive: 4.301932 s, 9.29815e+07 B/s