Fosstodon

Dr. Moritz LehmannBattle of the giants: Nvidia <a href="https://mast.hpc.social/tags/Blackwell" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Blackwell</a> B200 takes the lead in FluidX3D CFD performance<a href="https://mast.hpc.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Nvidia</a> <a href="https://mast.hpc.social/tags/B200" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#B200</a> just launched, and I'm one of the first people to benchmark 8x B200 via Shadeform, in a WhiteFiber server with 2x <a href="https://mast.hpc.social/tags/Intel" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Intel</a> <a href="https://mast.hpc.social/tags/Xeon6" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Xeon6</a> 6960P 72-core CPUs. 🖖😋8x Nvidia B200 go head-to-head with 8x <a href="https://mast.hpc.social/tags/AMD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AMD</a> <a href="https://mast.hpc.social/tags/MI300X" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#MI300X</a> in the <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> benchmark, winning overall (with FP16S storage) at 219300 MLUPs/s (~17TB/s combined VRAM bandwidth), but losing in FP32 & FP16C storage. 8x MI300X achieve 204924 MLUPs/s.

Dr. Moritz LehmannMy <a href="https://mast.hpc.social/tags/IWOCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#IWOCL</a> 2025 Keynote presentation is online! 🖖🧐 Scaling up <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> beyond 100 Billion cells on a single computer - a story about the true cross-compatibility of <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a> <a href="https://www.youtube.com/watch?v=Sb3ibfoOi0c&list=PLA-vfTt7YHI2HEFrpzPhhQ8PhiztKhHU8&index=1" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.youtube.com/watch?v=Sb3ibfoOi0c&list=PLA-vfTt7YHI2HEFrpzPhhQ8PhiztKhHU8&index=1</a> Slides: <a href="https://www.iwocl.org/wp-content/uploads/iwocl-2025-moritz-lehmann-keynote.pdf" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.iwocl.org/wp-content/uploads/iwocl-2025-moritz-lehmann-keynote.pdf</a>

Dr. Moritz LehmannWhat an honor to start the <a href="https://mast.hpc.social/tags/IWOCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#IWOCL</a> conference with my keynote talk! Nowhere else you get to talk to so many <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a> and <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#SYCL</a> experts in one room! I shared some updates on my <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> solver, how I optimized it at the smallest level of a single grid cell, to scale it up on the largest <a href="https://mast.hpc.social/tags/Intel" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Intel</a> <a href="https://mast.hpc.social/tags/Xeon6" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Xeon6</a> <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#HPC</a> systems that provide more memory capacity than any <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a> server. 🖖😃

Dr. Moritz LehmannJust arrived in wonderful Heidelberg, looking forward to present the keynote talk at <a href="https://mast.hpc.social/tags/IWOCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#IWOCL</a> tomorrow!! See you there! 🖖😁 <a href="https://www.iwocl.org/" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.iwocl.org/</a> <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a> <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#SYCL</a> <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a> <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#HPC</a>

Dr. Moritz LehmannI made this <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> simulation run on a frankenstein zoo of 🟥AMD + 🟩Nvidia + 🟦Intel <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a>s! 🖖🤪 <a href="https://www.youtube.com/watch?v=_8Ed8ET9gBU" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.youtube.com/watch?v=_8Ed8ET9gBU</a>The ultimate SLI abomination setup: - 1x Nvidia A100 40GB - 1x Nvidia Tesla P100 16GB - 2x Nvidia A2 15GB - 3x AMD Instinct MI50 - 1x Intel Arc A770 16GBI split the 2.5B cells in 9 domains of 15GB - A100 takes 2 domains, the other GPUs 1 domain each. The GPUs communicate over PCIe via <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a>.Huge thanks to Tobias Ribizel from TUM for the hardware!

Dr. Moritz LehmannI got access to <a href="https://mastodon.social/@LRZ_DE" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@LRZ_DE</a>'s new coma-cluster for <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a> benchmarking and experimentation 🖖😋💻🥨🍻 I've added a ton of new <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a>/<a href="https://mast.hpc.social/tags/CPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CPU</a> benchmarks: <a href="https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks</a>Notable hardware configurations include: - 4x H100 NVL 94GB - 2x Nvidia L40S 48GB - 2x Nvidia A2 15GB datacenter toaster - 2x Intel Arc A770 16GB - AMD+Nvidia SLI abomination consisting of 3x Instinct MI50 32GB + 1x A100 40GB - AMD Radeon 8060S (chonky Ryzen AI Max+ 395 iGPU with quad-channel RAM) thanks to <a href="https://mast.hpc.social/@cheese" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@cheese</a>

Dr. Moritz Lehmann<a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> v3.2 is out! I've implemented the much requested <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a> summation for object force/torque; it's ~20x faster than <a href="https://mast.hpc.social/tags/CPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CPU</a> <a href="https://mast.hpc.social/tags/multithreading" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#multithreading</a>. 🖖😋 Horizontal sum in <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a> was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add. Also improved volumetric <a href="https://mast.hpc.social/tags/raytracing" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#raytracing</a>! <a href="https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2</a>

Dr. Moritz LehmannHot Aisle's 8x AMD <a href="https://mast.hpc.social/tags/MI300X" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#MI300X</a> server is the fastest computer I've ever tested in <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a>, achieving a peak <a href="https://mast.hpc.social/tags/LBM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LBM</a> performance of 205 GLUPs/s, and a combined VRAM bandwidth of 23 TB/s. 🖖🤯 The <a href="https://mast.hpc.social/tags/RTX" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#RTX</a> 5090 looks like a toy in comparison.MI300X beats even Nvidia's GH200 94GB. This marks a very fascinating inflection point in <a href="https://mast.hpc.social/tags/GPGPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPGPU</a>: <a href="https://mast.hpc.social/tags/CUDA" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CUDA</a> is not the performance leader anymore. 🖖😛 You need a cross-vendor language like <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a> to leverage its power.FluidX3D on <a href="https://mast.hpc.social/tags/GitHub" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GitHub</a>: <a href="https://github.com/ProjectPhysX/FluidX3D" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D</a>

Dr. Moritz LehmannI'm doing a podcast about <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> today with Improbable Matter, going live in 30 minutes! 🖖🤠 <a href="https://youtu.be/csGLVZqr0SE" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://youtu.be/csGLVZqr0SE</a>

Dr. Moritz LehmannThe 4x <a href="https://mast.hpc.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Nvidia</a> <a href="https://mast.hpc.social/tags/H100" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#H100</a> SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a>, achieving 78 GLUPs/s <a href="https://mast.hpc.social/tags/LBM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LBM</a> performance at ~1650W <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a> power draw. 🖖😋🖥️🔥 <a href="https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#multi-gpu-benchmarks" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#multi-gpu-benchmarks</a> <a href="https://www.hpc.uni-bayreuth.de/clusters/festus/#__tabbed_1_3" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.hpc.uni-bayreuth.de/clusters/festus/#__tabbed_1_3</a>

Dr. Moritz Lehmann<a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> v3.1 is out! I have updated the <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a> headers for better device specs detection via device ID and <a href="https://mast.hpc.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Nvidia</a> compute capability, fixed broken voxelization on some <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a>s and added a workaround for a CPU compiler bug that corrupted rendering. Also <a href="https://mast.hpc.social/tags/AMD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AMD</a> GPUs will now show up with their correct name (no idea why AMD can't report it as CL_DEVICE_NAME like every other sane vendor and instead need CL_DEVICE_BOARD_NAME_AMD extension...) Have fun! 🖖😉 <a href="https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.1" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.1</a>

Dr. Moritz LehmannRTX 5090 performance numbers for <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> are in - thanks to <a href="https://masto.ai/@phoronix" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@phoronix</a>! And I finally found a way to format the performance chart on the <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/GitHub" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GitHub</a> page a bit better - especially larger font size. Hacking the mermaid gantt chart currently is the only way to embed a compact bar chart directly into markdown, without extra image file. The mermaid language is still horriffic - inconsistent and half the styling commands don't even work. No way yet to color bars blue. <a href="https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks</a>

Dr. Moritz Lehmann<a href="https://masto.ai/@phoronix" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@phoronix</a> nice, the 512-bit memory bus doing its thing in <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a>! 🖖😋 Thanks for benchmarking!

Dr. Moritz Lehmann3 different <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a>s, 1 <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> simulation - <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> "SLI"-ing (Intel A770 + Intel B580 + Nvidia Titan Xp) for 678 Million grid cells in 36GB combined VRAM <a href="https://www.youtube.com/watch?v=9VP3fruwnXc" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.youtube.com/watch?v=9VP3fruwnXc</a>

Dr. Moritz LehmannFinally 2¹² ⭐ for <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> on <a href="https://mast.hpc.social/tags/GitHub" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GitHub</a>! 🖖🤓

Dr. Moritz Lehmann<a href="https://mast.hpc.social/@st01014" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@st01014</a> have added B580 <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> benchmarks with non-zero-initialized box: <a href="https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks</a> (scroll down below bar chart, expand the section with the full table there)

Dr. Moritz Lehmann<a href="https://mast.hpc.social/@st01014" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@st01014</a> it's wrong unfortunately. B580 has a hardware optimization to detect if a kernel writes all 0's to VRAM, and in this case skips the write completely, which saves a lot of BW. The <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> benchmark is a 0-initialized box, where B580 applies this. In a non-0-initialized simulation, performance is more what you expect from 456GB/s. It's a bit of an edge case, no other <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a> does that, so I have not yet made adjustments on app side. Will post good benchmarks on the weekend. Cc <a href="https://masto.ai/@phoronix" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@phoronix</a>

Dr. Moritz LehmannThis is the largest <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#CFD</a> simulation ever on a single computer, the <a href="https://mast.hpc.social/tags/NASA" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#NASA</a> X-59 at 117 Billion grid cells. This video visualizes 7.6 PetaByte if volumetric data.I did this simulation on 2x <a href="https://mast.hpc.social/tags/Intel" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Intel</a> Xeon 6980P <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#HPC</a> CPUs with 6TB MRDIMM memory at massive 1.7TB/s bandwidth. No <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GPU</a>s required! 🖖😋🟦<a href="https://www.youtube.com/watch?v=K5eKxzklXDA" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.youtube.com/watch?v=K5eKxzklXDA</a>As a little gift to you all: <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> v3.0 is out now, enabling 31% larger resolution on CPUs/iGPUs with <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#OpenCL</a> zero-copy buffers: <a href="https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.0" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.0</a>

Dr. Moritz Lehmann<a href="https://fediscience.org/@giuseppebilotta" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@giuseppebilotta</a> yes, <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#FluidX3D</a> can real-time render in <a href="https://mast.hpc.social/tags/ASCII" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#ASCII</a> mode over SSH! We'll have that at <a href="https://mast.hpc.social/tags/SC24" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#SC24</a> as live demo!

Recent searches

Search options

Administered by:

Server stats:

#fluidx3d