Skip to content
Snippets Groups Projects
numbers.md 6.58 KiB
Newer Older
  • Learn to ignore specific revisions
  • <pre>
    ------------------------------------------------
    V100
    ------------------------------------------------
    
    7 TFlop double
    14 TFlop single
    32 GB/sec PCIe
    300 GB/sec NVLINK
    
    ------------------------------------------------
    1x1x4 chiral
    ------------------------------------------------
    
    input 1058444 particles
    mps: 254 srt: 625 frc: 3521 int: 106 cpy: 2424 out: 2.8e+05 (us))
    
    ------------------------------------------------
    coupon
    ------------------------------------------------
    
    input 986154 particles
    mps: 277 srt: 289 frc: 3242 int: 91 cpy: 2251 out: 2.72e+05 (us))
    
    ------------------------------------------------
    simulate
    ------------------------------------------------
    
    14 TFlop / 277 Mpps = 50541 ops/particle
    
    ------------------------------------------------
    integrate
    ------------------------------------------------
    
    986154 particles / 91e-6 s = 10.8 G/s
    14 TFlop / (986154 particles / 91e-6 s) = 1291 ops/integrate
    
    ------------------------------------------------
    copy
    ------------------------------------------------
    
    986154 particles * 7 arrays * 4 bytes / 2251 us
    = 12266 = 12 GB/s
    
    ------------------------------------------------
    GPU_check
    ------------------------------------------------
    
    peer access:
       from 0 to 1: yes
       from 0 to 2: yes
       from 0 to 3: yes
       from 0 to 4: yes
       from 0 to 5: no
       from 0 to 6: no
       from 0 to 7: no
       from 1 to 0: yes
       from 1 to 2: yes
       from 1 to 3: yes
       from 1 to 4: no
       from 1 to 5: yes
       from 1 to 6: no
       from 1 to 7: no
       from 2 to 0: yes
       from 2 to 1: yes
       from 2 to 3: yes
       from 2 to 4: no
       from 2 to 5: no
       from 2 to 6: yes
       from 2 to 7: no
       from 3 to 0: yes
       from 3 to 1: yes
       from 3 to 2: yes
       from 3 to 4: no
       from 3 to 5: no
       from 3 to 6: no
       from 3 to 7: yes
       from 4 to 0: yes
       from 4 to 1: no
       from 4 to 2: no
       from 4 to 3: no
       from 4 to 5: yes
       from 4 to 6: yes
       from 4 to 7: yes
       from 5 to 0: no
       from 5 to 1: yes
       from 5 to 2: no
       from 5 to 3: no
       from 5 to 4: yes
       from 5 to 6: yes
       from 5 to 7: yes
       from 6 to 0: no
       from 6 to 1: no
       from 6 to 2: yes
       from 6 to 3: no
       from 6 to 4: yes
       from 6 to 5: yes
       from 6 to 7: yes
       from 7 to 0: no
       from 7 to 1: no
       from 7 to 2: no
       from 7 to 3: yes
       from 7 to 4: yes
       from 7 to 5: yes
       from 7 to 6: yes
    GPUs:
       number: 0
          name: Tesla V100-SXM2-16GB
          global memory: 16945512448
          max grid size: 2147483647
          max threads per block: 1024
          max threads dimension: 1024
          multiprocessor count: 80
          max threads per multiprocessor: 2048
       number: 1
          name: Tesla V100-SXM2-16GB
          global memory: 16945512448
          max grid size: 2147483647
          max threads per block: 1024
          max threads dimension: 1024
          multiprocessor count: 80
          max threads per multiprocessor: 2048
       number: 2
          name: Tesla V100-SXM2-16GB
          global memory: 16945512448
          max grid size: 2147483647
          max threads per block: 1024
          max threads dimension: 1024
          multiprocessor count: 80
          max threads per multiprocessor: 2048
       number: 3
          name: Tesla V100-SXM2-16GB
          global memory: 16945512448
          max grid size: 2147483647
          max threads per block: 1024
          max threads dimension: 1024
          multiprocessor count: 80
          max threads per multiprocessor: 2048
       number: 4
          name: Tesla V100-SXM2-16GB
          global memory: 16945512448
          max grid size: 2147483647
          max threads per block: 1024
          max threads dimension: 1024
          multiprocessor count: 80
          max threads per multiprocessor: 2048
       number: 5
          name: Tesla V100-SXM2-16GB
          global memory: 16945512448
          max grid size: 2147483647
          max threads per block: 1024
          max threads dimension: 1024
          multiprocessor count: 80
          max threads per multiprocessor: 2048
       number: 6
          name: Tesla V100-SXM2-16GB
          global memory: 16945512448
          max grid size: 2147483647
          max threads per block: 1024
          max threads dimension: 1024
          multiprocessor count: 80
          max threads per multiprocessor: 2048
       number: 7
          name: Tesla V100-SXM2-16GB
          global memory: 16945512448
          max grid size: 2147483647
          max threads per block: 1024
          max threads dimension: 1024
          multiprocessor count: 80
          max threads per multiprocessor: 2048
    copy 5000000 floats from CPU to GPU
       6854.000000 us, 2.918e+09 B/s
       1846.000000 us, 1.08342e+10 B/s pinned
    copy 5000000 floats from GPU to GPU 0:
       GPU 1: 849.000000 us, 2.35571e+10 B/s
       GPU 2: 851.000000 us, 2.35018e+10 B/s
       GPU 3: 443.000000 us, 4.51467e+10 B/s
       GPU 4: 444.000000 us, 4.5045e+10 B/s
       GPU 5: 2051.000000 us, 9.75134e+09 B/s
       GPU 6: 2026.000000 us, 9.87167e+09 B/s
       GPU 7: 2017.000000 us, 9.91572e+09 B/s
    add 5000000x5000000 floats:
       3.747896 s, 6670.409500 G/s
    peer add 5000000x5000000 GPU 0 floats:
       GPU 1: 3.755309 s, 6657.241500 G/s
       GPU 2: 3.731777 s, 6699.221000 G/s
       GPU 3: 3.735422 s, 6692.684500 G/s
       GPU 4: 3.736765 s, 6690.278500 G/s
    parallel peer add 5000000x5000000 GPU 0 floats:
       5 GPUS: 0.783563 s, 19940.962000 G/s
    
    ------------------------------------------------
    CPU_check
    ------------------------------------------------
    
    processor	: 95
    vendor_id	: GenuineIntel
    cpu family	: 6
    model		: 85
    model name	: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
    stepping	: 7
    microcode	: 0x5002f01
    cpu MHz		: 1221.613
    cache size	: 36608 KB
    physical id	: 1
    siblings	: 48
    core id		: 23
    cpu cores	: 24
    apicid		: 111
    initial apicid	: 111
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 13
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
    bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
    bogomips	: 5999.99
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 46 bits physical, 48 bits virtual
    power management:
    
    add 500000x500000 floats with 1 thread:
       65.058972 s, 3.842667 G/s
    add 500000x500000 floats with 96 threads:
       1.376176 s, 181.662813 G/s
    
    ------------------------------------------------
    pipe_check
    ------------------------------------------------
    
    send: 100000000 points
    receive: 4.301932 s, 9.29815e+07 B/s
    
    </pre>