fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

8.6K
active users

#inference

1 post1 participant0 posts today
Adam :redhat: :ansible: :bash:<p>State of the Model Serving Communities - August 2025 by <span class="h-card" translate="no"><a href="https://fosstodon.org/@terrytangyuan" class="u-url mention">@<span>terrytangyuan</span></a></span> <br /><a href="https://inferenceops.substack.com/p/state-of-the-model-serving-communities" target="_blank" rel="nofollow noopener" translate="no"><span class="invisible">https://</span><span class="ellipsis">inferenceops.substack.com/p/st</span><span class="invisible">ate-of-the-model-serving-communities</span></a></p><p><a href="https://fosstodon.org/tags/OpenSource" class="mention hashtag" rel="tag">#<span>OpenSource</span></a> <a href="https://fosstodon.org/tags/Kubernetes" class="mention hashtag" rel="tag">#<span>Kubernetes</span></a> <a href="https://fosstodon.org/tags/AI" class="mention hashtag" rel="tag">#<span>AI</span></a> <a href="https://fosstodon.org/tags/Inference" class="mention hashtag" rel="tag">#<span>Inference</span></a> <a href="https://fosstodon.org/tags/ModelServing" class="mention hashtag" rel="tag">#<span>ModelServing</span></a> <a href="https://fosstodon.org/tags/RedHat" class="mention hashtag" rel="tag">#<span>RedHat</span></a></p>
Habr<p>Новый метод поиска от Sakana: расширение inference-time scaling и коллективный разум</p><p>Аналитический центр red_mad_robot продолжает разбирать исследования японской лаборатории Sakana AI — в прошлый раз это была архитектура CTM, вдохновлённая внутренней динамикой человеческого мышления. Теперь — метод, который помогает языковым моделям мыслить точнее уже на этапе выполнения запроса. В работе представлены два подхода: AB‑MCTS и его расширение Multi‑LLM AB‑MCTS. Первый объединяет два принципа — уточнение уже готовых ответов и генерацию альтернативных, второй добавляет работу нескольких языковых моделей. Всё это чтобы научить модели «думать» одновременно глубже и шире.</p><p><a href="https://habr.com/ru/companies/redmadrobot/articles/933222/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">habr.com/ru/companies/redmadro</span><span class="invisible">bot/articles/933222/</span></a></p><p><a href="https://zhub.link/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://zhub.link/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://zhub.link/tags/monte_carlo_tree_search" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>monte_carlo_tree_search</span></a> <a href="https://zhub.link/tags/abmcts" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>abmcts</span></a> <a href="https://zhub.link/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://zhub.link/tags/reasoning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reasoning</span></a> <a href="https://zhub.link/tags/thompson_sampling" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>thompson_sampling</span></a> <a href="https://zhub.link/tags/reinforcement_learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcement_learning</span></a></p>
katch wreck<p>`His initial intended uses were for linguistic analysis and other mathematical subjects like card shuffling, but both Markov chains and matrices rapidly found use in other fields.` </p><p><a href="https://en.wikipedia.org/wiki/Stochastic_matrix#History" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">en.wikipedia.org/wiki/Stochast</span><span class="invisible">ic_matrix#History</span></a></p><p><a href="https://mastodon.social/tags/AndreyMarkov" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AndreyMarkov</span></a> <a href="https://mastodon.social/tags/Markov" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Markov</span></a> <a href="https://mastodon.social/tags/MarkovChain" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MarkovChain</span></a> <a href="https://mastodon.social/tags/MarkovModel" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MarkovModel</span></a> <a href="https://mastodon.social/tags/statistics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>statistics</span></a> <a href="https://mastodon.social/tags/stochastic" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stochastic</span></a> <a href="https://mastodon.social/tags/stochasticProcess" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stochasticProcess</span></a> <a href="https://mastodon.social/tags/randomWalk" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>randomWalk</span></a> <a href="https://mastodon.social/tags/statisticalPhysics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>statisticalPhysics</span></a> <a href="https://mastodon.social/tags/physics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>physics</span></a> <a href="https://mastodon.social/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://mastodon.social/tags/distribution" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>distribution</span></a> <a href="https://mastodon.social/tags/equilibrium" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>equilibrium</span></a> <a href="https://mastodon.social/tags/transitionRate" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>transitionRate</span></a> <a href="https://mastodon.social/tags/transitionMatrix" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>transitionMatrix</span></a> <a href="https://mastodon.social/tags/MarkovMatrix" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MarkovMatrix</span></a> <a href="https://mastodon.social/tags/stochasticMatrix" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stochasticMatrix</span></a> <a href="https://mastodon.social/tags/linearAlgebra" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>linearAlgebra</span></a> <a href="https://mastodon.social/tags/differentialEquation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>differentialEquation</span></a> <a href="https://mastodon.social/tags/equation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>equation</span></a></p>
eicker.news ᳇ tech news<p><a href="https://eicker.news/tags/Mistral" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Mistral</span></a> AI conducted a comprehensive <a href="https://eicker.news/tags/lifecycle" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>lifecycle</span></a> <a href="https://eicker.news/tags/analysis" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>analysis</span></a> of its <a href="https://eicker.news/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a>, Mistral Large 2, to quantify its <a href="https://eicker.news/tags/environmental" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>environmental</span></a> <a href="https://eicker.news/tags/impact" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>impact</span></a>. The <a href="https://eicker.news/tags/study" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>study</span></a>, <a href="https://eicker.news/tags/peerreviewed" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>peerreviewed</span></a> and compliant with international standards, revealed the model’s <a href="https://eicker.news/tags/training" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>training</span></a> and <a href="https://eicker.news/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://eicker.news/tags/impacts" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>impacts</span></a>, including <a href="https://eicker.news/tags/greenhousegas" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>greenhousegas</span></a> emissions, <a href="https://eicker.news/tags/water" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>water</span></a> use, and <a href="https://eicker.news/tags/resourcedepletion" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>resourcedepletion</span></a>. <a href="https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai?eicker.news" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">mistral.ai/news/our-contributi</span><span class="invisible">on-to-a-global-environmental-standard-for-ai?eicker.news</span></a> <a href="https://eicker.news/tags/tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tech</span></a> <a href="https://eicker.news/tags/media" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>media</span></a> <a href="https://eicker.news/tags/news" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>news</span></a></p>
Dr Mircea Zloteanu ☀️ 🌊🌴<p><a href="https://mastodon.social/tags/statstab" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>statstab</span></a> #391 {sensemakr} Sensitivity Analysis Tools for OLS</p><p>Thoughts: No unobserved variables is an untestable assumption, but you can quantify the robustness of your ATE.</p><p><a href="https://mastodon.social/tags/R" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>R</span></a> <a href="https://mastodon.social/tags/causalinference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>causalinference</span></a> <a href="https://mastodon.social/tags/observational" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>observational</span></a> <a href="https://mastodon.social/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://mastodon.social/tags/confounding" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>confounding</span></a> <a href="https://mastodon.social/tags/bias" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>bias</span></a> <a href="https://mastodon.social/tags/sensitivity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sensitivity</span></a></p><p><a href="https://carloscinelli.com/sensemakr/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">carloscinelli.com/sensemakr/</span><span class="invisible"></span></a></p>
Some Bits: Nelson's Linkblog<p>Laptop LLM: You can run decent AIs on a small computer<br><a href="https://www.technologyreview.com/2025/07/17/1120391/how-to-run-an-llm-on-your-laptop/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">technologyreview.com/2025/07/1</span><span class="invisible">7/1120391/how-to-run-an-llm-on-your-laptop/</span></a><br> <a href="https://tech.lgbt/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://tech.lgbt/tags/hardware" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hardware</span></a> <a href="https://tech.lgbt/tags/llama" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llama</span></a> <a href="https://tech.lgbt/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://tech.lgbt/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> #+</p>
st1nger :unverified: 🏴‍☠️ :linux: :freebsd:<p><a href="https://infosec.exchange/tags/GPUHammer" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPUHammer</span></a> is the first attack to show <a href="https://infosec.exchange/tags/Rowhammer" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Rowhammer</span></a> bit flips on <a href="https://infosec.exchange/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPU</span></a> memories, specifically on a GDDR6 memory in an <a href="https://infosec.exchange/tags/NVIDIA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NVIDIA</span></a> A6000 GPU. Our attacks induce bit flips across all tested DRAM banks, despite in-DRAM defenses like TRR, using user-level <a href="https://infosec.exchange/tags/CUDA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CUDA</span></a> <a href="https://infosec.exchange/tags/code" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>code</span></a>. These bit flips allow a malicious GPU user to tamper with another user’s data on the GPU in shared, time-sliced environments. In a proof-of-concept, we use these bit flips to tamper with a victim’s DNN models and degrade model accuracy from 80% to 0.1%, using a single bit flip. Enabling Error Correction Codes (ECC) can mitigate this risk, but ECC can introduce up to a 10% slowdown for <a href="https://infosec.exchange/tags/ML" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ML</span></a> <a href="https://infosec.exchange/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> workloads on an <a href="https://infosec.exchange/tags/A6000" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>A6000</span></a> GPU.</p><p><a href="https://gpuhammer.com/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">gpuhammer.com/</span><span class="invisible"></span></a></p>
N-gated Hacker News<p>🧠🚀 Apparently, someone thought we needed yet another <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> engine, but this time exclusively for Apple's golden child, the <a href="https://mastodon.social/tags/Silicon" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Silicon</span></a> chip. Because clearly, the world was just yearning for a new way to "infer" things while stuck in a walled garden. 🌳🔒<br><a href="https://github.com/trymirai/uzu" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/trymirai/uzu</span><span class="invisible"></span></a> <a href="https://mastodon.social/tags/Apple" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Apple</span></a> <a href="https://mastodon.social/tags/Tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Tech</span></a> <a href="https://mastodon.social/tags/Innovation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Innovation</span></a> <a href="https://mastodon.social/tags/WalledGarden" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WalledGarden</span></a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ngated</span></a></p>
Hacker News<p>LLM Inference Handbook</p><p><a href="https://bentoml.com/llm/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">bentoml.com/llm/</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> <a href="https://mastodon.social/tags/Inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Inference</span></a> <a href="https://mastodon.social/tags/Handbook" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Handbook</span></a> <a href="https://mastodon.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLMs</span></a> <a href="https://mastodon.social/tags/AIInference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIInference</span></a> <a href="https://mastodon.social/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://mastodon.social/tags/TechTrends" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechTrends</span></a></p>
Dr Mircea Zloteanu ☀️ 🌊🌴<p><a href="https://mastodon.social/tags/statstab" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>statstab</span></a> #383 Berkson's paradox</p><p>Thoughts: aka Berkson's bias, collider bias, or Berkson's fallacy. Important for interpreting conditional probabilities. Can produce counterintuitive patterns.</p><p><a href="https://mastodon.social/tags/paradox" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>paradox</span></a> <a href="https://mastodon.social/tags/collider" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>collider</span></a> <a href="https://mastodon.social/tags/bias" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>bias</span></a> <a href="https://mastodon.social/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://mastodon.social/tags/causalinference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>causalinference</span></a></p><p><a href="https://en.m.wikipedia.org/wiki/Berkson's_paradox" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">en.m.wikipedia.org/wiki/Berkso</span><span class="invisible">n's_paradox</span></a></p>
Habr<p>Эффективный инференс множества LoRA адаптеров</p><p>LoRA — популярный метод дообучения больших моделей на небольших датасетах, однако на этапе инференса низкоранговые адаптеры работают неэффективно, а их объединение с весами требует хранения отдельной полной копии модели для каждого адаптера. MultiLoRA решает эту проблему, позволяя одновременно выполнять инференс с несколькими адаптерами на основе одной базовой модели. В статье мы сравним производительность MultiLoRA-инференса в двух популярных фреймворках — vLLM и TensorRT-LLM . Тесты проведём на готовых релизных Docker-образах, оценивая, какой фреймворк эффективнее обрабатывает батчи запросов в сценариях, близких к офлайн и асинхронному инференсу.</p><p><a href="https://habr.com/ru/articles/922290/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">habr.com/ru/articles/922290/</span><span class="invisible"></span></a></p><p><a href="https://zhub.link/tags/multilora" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>multilora</span></a> <a href="https://zhub.link/tags/offline_inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>offline_inference</span></a> <a href="https://zhub.link/tags/async_inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>async_inference</span></a> <a href="https://zhub.link/tags/vllm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vllm</span></a> <a href="https://zhub.link/tags/TensorRTLLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TensorRTLLM</span></a> <a href="https://zhub.link/tags/tensorrt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tensorrt</span></a> <a href="https://zhub.link/tags/peft" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>peft</span></a> <a href="https://zhub.link/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://zhub.link/tags/benchmark" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>benchmark</span></a> <a href="https://zhub.link/tags/lora" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>lora</span></a></p>
Andrzej Wąsowski ☑️ 🟥Self promotion
Dr Mircea Zloteanu ☀️ 🌊🌴<p><a href="https://mastodon.social/tags/statstab" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>statstab</span></a> #370 The Problem with “Magnitude-based Inference”</p><p>Thoughts: An appealing but flawed approach. Good overview of the error inflation issue.</p><p><a href="https://mastodon.social/tags/MBI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MBI</span></a> <a href="https://mastodon.social/tags/errorrate" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>errorrate</span></a> <a href="https://mastodon.social/tags/sportsscience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sportsscience</span></a> <a href="https://mastodon.social/tags/sports" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sports</span></a> <a href="https://mastodon.social/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://mastodon.social/tags/critique" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>critique</span></a></p><p><a href="https://journals.lww.com/acsm-msse/fulltext/2018/10000/the_problem_with__magnitude_based_inference_.23.aspx" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">journals.lww.com/acsm-msse/ful</span><span class="invisible">ltext/2018/10000/the_problem_with__magnitude_based_inference_.23.aspx</span></a></p>
Hacker News<p>Compiling LLMs into a MegaKernel: A Path to Low-Latency Inference</p><p><a href="https://zhihaojia.medium.com/compiling-llms-into-a-megakernel-a-path-to-low-latency-inference-cf7840913c17" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">zhihaojia.medium.com/compiling</span><span class="invisible">-llms-into-a-megakernel-a-path-to-low-latency-inference-cf7840913c17</span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/CompilingLLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CompilingLLMs</span></a> <a href="https://mastodon.social/tags/MegaKernel" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MegaKernel</span></a> <a href="https://mastodon.social/tags/LowLatency" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LowLatency</span></a> <a href="https://mastodon.social/tags/Inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Inference</span></a> <a href="https://mastodon.social/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a></p>
Dr Mircea Zloteanu ☀️ 🌊🌴<p><a href="https://mastodon.social/tags/statstab" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>statstab</span></a> #368 The FisherlPearson Chi-Squared Controversy: A Turning Point for<br>Inductive Inference</p><p>Thoughts: An overview of the difference between Pearson's descriptive view and Fisher's inferential view of X2.</p><p><a href="https://mastodon.social/tags/fisher" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fisher</span></a> <a href="https://mastodon.social/tags/pearson" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>pearson</span></a> <a href="https://mastodon.social/tags/inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inference</span></a> <a href="https://mastodon.social/tags/chisquared" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>chisquared</span></a> </p><p><a href="https://genepi.qimr.edu.au/contents/p/staff/1983BairdBJPS105-118.pdf" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">genepi.qimr.edu.au/contents/p/</span><span class="invisible">staff/1983BairdBJPS105-118.pdf</span></a></p>
Hacker News<p>Tokasaurus: An LLM Inference Engine for High-Throughput Workloads</p><p><a href="https://scalingintelligence.stanford.edu/blogs/tokasaurus/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">scalingintelligence.stanford.e</span><span class="invisible">du/blogs/tokasaurus/</span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/Tokasaurus" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Tokasaurus</span></a> <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> <a href="https://mastodon.social/tags/Inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Inference</span></a> <a href="https://mastodon.social/tags/Engine" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Engine</span></a> <a href="https://mastodon.social/tags/HighThroughput" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HighThroughput</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/TechInnovation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechInnovation</span></a></p>
Hacker News<p>Why DeepSeek is cheap at scale but expensive to run locally</p><p><a href="https://www.seangoedecke.com/inference-batching-and-deepseek/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">seangoedecke.com/inference-bat</span><span class="invisible">ching-and-deepseek/</span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/DeepSeek" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DeepSeek</span></a> <a href="https://mastodon.social/tags/Inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Inference</span></a> <a href="https://mastodon.social/tags/Batching" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Batching</span></a> <a href="https://mastodon.social/tags/Cheap" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Cheap</span></a> <a href="https://mastodon.social/tags/Expensive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Expensive</span></a> <a href="https://mastodon.social/tags/To" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>To</span></a> <a href="https://mastodon.social/tags/Run" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Run</span></a> <a href="https://mastodon.social/tags/Local" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Local</span></a> <a href="https://mastodon.social/tags/Scale" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scale</span></a></p>
HPC Guru<p>Nvidia Dynamo: If disaggregation is Dynamo’s backbone, the smart management of the KV cache is its brain At around 300 tokens per second per user, you can generate 30 times more tokens per normalized per <a class="hashtag" rel="nofollow noopener" href="https://bsky.app/search?q=%23GPU" target="_blank">#GPU</a> <a href="https://www.vastdata.com/sharedeverything/why-everyones-talking-about-nvidia-dynamo-and-why-it-actually-matters" rel="nofollow noopener" target="_blank">www.vastdata.com/sharedeveryt...</a> <a class="hashtag" rel="nofollow noopener" href="https://bsky.app/search?q=%23AI" target="_blank">#AI</a> <a class="hashtag" rel="nofollow noopener" href="https://bsky.app/search?q=%23Inference" target="_blank">#Inference</a> via <a class="mention" href="https://bsky.brid.gy/ap/did:plc:ixk2ynvh3u2zw4bynbchbgjt" rel="nofollow noopener" target="_blank">@nicolehemsoth.bsky.social</a><br><br><a href="https://www.vastdata.com/sharedeverything/why-everyones-talking-about-nvidia-dynamo-and-why-it-actually-matters" rel="nofollow noopener" target="_blank">Why Everyone’s Talking About N...</a></p>
N-gated Hacker News<p>Ah, behold the majestic <a href="https://mastodon.social/tags/DeepSeekR1" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DeepSeekR1</span></a>-0528, a model so <a href="https://mastodon.social/tags/mysterious" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>mysterious</span></a> and elusive that not even <a href="https://mastodon.social/tags/Inference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Inference</span></a> <a href="https://mastodon.social/tags/Providers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Providers</span></a> dare to touch it. 🤔✨ With a grand total of zero downloads last month, it's clear that this <a href="https://mastodon.social/tags/685B" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>685B</span></a> parameter behemoth is the hottest <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> sensation—if only in its creator's wildest dreams. 🐒💭<br><a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-0528" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">huggingface.co/deepseek-ai/Dee</span><span class="invisible">pSeek-R1-0528</span></a> <a href="https://mastodon.social/tags/Parameters" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Parameters</span></a> <a href="https://mastodon.social/tags/HottestSensation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HottestSensation</span></a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ngated</span></a></p>
HPC Guru<p>AMD vs NVIDIA <a class="hashtag" href="https://bsky.app/search?q=%23Inference" rel="nofollow noopener" target="_blank">#Inference</a> Benchmark: Who Wins? – Performance &amp; Cost Per Million Tokens Report by @SemiAnalysis_ tldr: It's not a simple answer <a href="https://semianalysis.com/2025/05/23/amd-vs-nvidia-inference-benchmark-who-wins-performance-cost-per-million-tokens/" rel="nofollow noopener" target="_blank">semianalysis.com/2025/05/23/a...</a> <a class="hashtag" href="https://bsky.app/search?q=%23AI" rel="nofollow noopener" target="_blank">#AI</a> <a class="hashtag" href="https://bsky.app/search?q=%23GPU" rel="nofollow noopener" target="_blank">#GPU</a> <a class="hashtag" href="https://bsky.app/search?q=%23HPC" rel="nofollow noopener" target="_blank">#HPC</a> via <a class="mention" href="https://bsky.app/profile/ogawa-tadashi.bsky.social" rel="nofollow noopener" target="_blank">@ogawa-tadashi.bsky.social</a></p>