fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

10K
active users

#GPU

36 posts34 participants4 posts today
Continued thread

Panic most recently used by lkpikmalloc ...

Well, that was fast... didn't even get a mouse cursor of a full MATE Desktop menu system load. Was yet to connect kgdb to COM1 (need to swap from minicom to do so)... makes me want a PCIe RS232 card (for "comconsole_pcidev") so that I have a few more COMs to play with on redirects. Gotta love these iGPU tash-bins eh? "It's better than not having a GPU right?" ... not really.''

Closed bug report from the drm-515-kmod, discussing amdgpu memory leak. so, maybe a new one in drm-61-kmod, would not be surprised.
- github.com/freebsd/drm-kmod/is

Short term revision of approach:
----

1. Today via post arrives, an AMD Radeon Pro W7500 (single slot 8GB, Navi-whatever gen)
2. I'll block off the iGPU during loader.conf sequence, using a "pptdev" blackhole (not for VM pt, but maybe an experiment for a 14.1 VM with the known-good amdgpu version).
3. Known as: throw money at the problem?

Some hardware notes:
----

1. This is not a Nvidia GPU situation; there are several generations of cards in the room which have been cycled through the workstation during "hardware isolation" and "process of elimination" sequences. I know those are stable, and which gen cards require which nvidia driver versions for stability purposes.

2. This is not a FreeBSD kernel issue, nor a Xorg "Plain Jane FrameBuffer" situation. The kernel (14.0, 14.1, 14.2) is stable and fine, and the basic vt driver for non-4K display-port functionality works fine. I can work all day in a series of tmux windows with some fifty or so panes, but that's not quite the optimal experience.

3. The AMD iGPU (Raphael) maxes out default to 512MB GART VRAM, and it can handle 240Hz @ 4K all day with no issues as long as that 512M doesn't get used up... that is until the latest amdgpu kmod drm, which crashes whenever it feels like it.

Michael... yes yes, I do have a lot of hardware, but this issue has surpassed the Sunk Cost Fallacy and has become a consumate knowledge-requirement process. I must know where this is failing so horrendously, otherwise the operating rule of "if it doesn't fulfill its hardware destiny, it will get the hammer and flames"... and the hardware is too nice for that - plus I could involve Supermicro support since it's still in warranty, but a replacement motherboard or CPU for the iGPU isn't going to solve a kernel module issue.

In the interim, laptop life and tablet meetings are getting me by, mostly decently.

Debug items of interest:
----
intsmb0: <AMD FCH SMBus Controller> at device 20.0 on pci0
intsmb0: Could not allocate I/O space
device_attach: intsmb0 attach returned 6

drmn0: Fetched VBIOS from VFCT
amdgpu: ATOM BIOS: 102-RAPHAEL-008
drmn0: Trusted Memory Zone (TMZ) feature not supported
drmn0: PCIE atomic ops is not supported
drmn0: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
[drm ERROR :amdgpu_bo_init] Unable to set WC memtype for the aperture base

Loader items of usage:
----
# Multi-Console Output
# boot output primary: TTY, standard monitor via UEFI
# boot output secondary: COM1 RS232 Redirect (physical)
# boot output tertiary: COM2 RS232 Redirect (BMC SoL)
ipmi_load="YES"
boot_mute="NO"
boot_verbose="YES"
verbose_loading="YES"
boot_multicons="YES"
boot_serial="YES"
console="efi,comconsole,comconsole"
comconsole_port1="0x3F8"
comconsole_speed1="115200"
comconsole_port2="0x2F8"
comconsole_speed2="115200"
hw.uart.console="io:0x3f8,br:115200 io:0x2f8,br:115200"

#amd#gpu#drm616kmod
Continued thread

ok ive to go now but the plan once i return ~2 hrs later is to dig thru my pendrives and use 1 to backup my current bios settings, use another to load up the latest stable bios version, take pics of my current bios settings, upgrade to the latest bios version, and then lastly to re-configure bios as close as possible to my current settings.

fingers crossed, my
#Proxmox node would still be fine and hopefully, the ReBAR option would show up on my BIOS, cos rn on BIOS v4.6 (2020), it's not (even with CSM disabled, above 4G decoding enabled). My server hardware: #AMD Ryzen 7 1700, #ASRock B450M Pro4 #motherboard, and #Intel #ArcA380 #GPU.

#AMD splits #ROCm toolkit into two parts – ROCm #AMDGPU drivers get their own branch under Instinct #datacenter #GPU moniker
The new #datacenter Instinct driver is a renamed version of the #Linux AMDGPU driver packages that are already distributed and documented with ROCm. Previously, everything related to ROCm (including the amdgpu driver) existed as part of the ROCm software stack.
tomshardware.com/pc-components

Tom's Hardware · AMD splits ROCm toolkit into two parts – ROCm AMDGPU drivers get their own branch under Instinct datacenter GPU monikerBy Aaron Klotz

Does anyone have a collection of .dds (and maybe also .ktx and/or .ktx2) textures I could use to test my texture viewer (github.com/DanielGibson/texvie)?

For DDS I only found test textures from GLI and unfortunately many of them are invalid/broken :-/
For KTX(2) I have the ones from libktx (and the broken ones from GLI).

Thanks in advance! :)

crossplatform texture viewer. Contribute to DanielGibson/texview development by creating an account on GitHub.
GitHubGitHub - DanielGibson/texview at devcrossplatform texture viewer. Contribute to DanielGibson/texview development by creating an account on GitHub.

Had fun at #CosmosDBConf showcasing “Accelerating Real-Time Analytics with Cosmos DB and #NVIDIA GPU-Enhanced Serverless Apache Spark!” We used separate CPU/GPU builds in our Maven POM via profiles, Dockerfiles for container apps, and GPU-accelerated Spark to handle streaming data at warp speed. ⚡🚀

Missed it? Catch the recording: youtu.be/x-0-S0MS5ko?si=wr6pKX
Sample code here: aka.ms/sparkrapidsgpudemo

ЦОДы, GPU, NVIDIA A16, охлаждение: о серьезных вещах простым языком

Добрый день, дорогой читатель. Меня зовут Селезнев Павел, я инженер второй линии поддержки в облачном провайдере Nubes. С каждой новой статьёй я расту в должности, поэтому пишу ещё одну :) Несколько месяцев назад нам с коллегой поставили задачу: провести сравнительные тесты, чтобы проверить, насколько сильно разогреется видеокарта под нагрузкой при использовании воздуха и диэлектрической жидкости. Об этих тестах я и расскажу в статье, которая должна пролить свет на жизнь GPU в ЦОДе. Предисловие Как понятно из названия статьи, речь пойдёт о жизни GPU в контексте ЦОДа (центра обработки данных), проведённых тестах разных вариантов охлаждения и выводах, к которым пришла наша команда по итогу этих самых тестов и рассуждений. Тестировали мы GPU NVIDIA A16 в течение нескольких дней. На момент написания материала в нашем ЦОДе реализована система охлаждения посредством использования прецизионных кондиционеров, а в качестве хладагента — фреон. Данная система представляет собой большие промышленные шкафы (кондиционеры), которые беспрерывно охлаждают нагретый оборудованием воздух с помощью того самого фреона. На картинке упрощённо показан процесс теплообмена.

habr.com/ru/companies/nubes/ar

ХабрЦОДы, GPU, NVIDIA A16, охлаждение: о серьезных вещах простым языкомДобрый день, дорогой читатель. Меня зовут Селезнев Павел, я инженер второй линии поддержки в облачном провайдере Nubes. С каждой новой статьёй я расту в должности, поэтому пишу ещё одну :) Несколько...