fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

10K
active users

#vllm

0 posts0 participants0 posts today
LavX News<p>Unlocking Cost-Efficient AI: AIBrix and the Future of Large Language Model Deployment</p><p>As open-source large language models like LLaMA and Mistral gain traction, AIBrix emerges as a pivotal tool for organizations looking to optimize AI deployments. This innovative cloud-native infrastru...</p><p><a href="https://news.lavx.hu/article/unlocking-cost-efficient-ai-aibrix-and-the-future-of-large-language-model-deployment" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.lavx.hu/article/unlocking</span><span class="invisible">-cost-efficient-ai-aibrix-and-the-future-of-large-language-model-deployment</span></a></p><p><a href="https://mastodon.cloud/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> <a href="https://mastodon.cloud/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://mastodon.cloud/tags/Kubernetes" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Kubernetes</span></a> <a href="https://mastodon.cloud/tags/AIBrix" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIBrix</span></a> <a href="https://mastodon.cloud/tags/vLLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vLLM</span></a></p>
Reshmi Aravind<p>Explore how vLLM optimizes the Llama Stack for efficient and scalable LLM inference in this deep dive from the vLLM team.</p><p><a href="https://blog.vllm.ai/2025/01/27/intro-to-llama-stack-with-vllm.html" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blog.vllm.ai/2025/01/27/intro-</span><span class="invisible">to-llama-stack-with-vllm.html</span></a></p><p>Excellent ,<span class="h-card" translate="no"><a href="https://bsky.brid.gy/r/https://bsky.app/profile/terrytangyuan.xyz" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>terrytangyuan.xyz</span></a></span> !</p><p><a href="https://mstdn.social/tags/redhat" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>redhat</span></a> <a href="https://mstdn.social/tags/opensource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opensource</span></a> <a href="https://mstdn.social/tags/VLLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>VLLM</span></a></p>
Habr<p>Построение инфраструктуры для работы с языковыми моделями: опыт X5 Tech</p><p>Привет, Хабр! Я Мичил Егоров, руководитель команды разработки продуктов искусственного интеллекта в X5 Tech. В последнее время языковые модели (LLM) стали неотъемлемой частью многих бизнес-процессов, начиная от чат-ботов и заканчивая автоматической обработкой отзывов клиентов. Однако, чтобы эффективно использовать такие модели, необходима мощная и гибкая инфраструктура. За последний год команда X5 Tech значительно выросла, проверила множество гипотез и протестировала различные модели. Основные кейсы использования включают чат-боты, суфлёры для модераторов, автоматическое резюмирование и обработку отзывов клиентов. В этой статье расскажу, как команда X5 Tech построила инфраструктуру для работы с языковыми моделями, какие вызовы преодолели и какие решения были приняты.</p><p><a href="https://habr.com/ru/companies/X5Tech/articles/880288/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">habr.com/ru/companies/X5Tech/a</span><span class="invisible">rticles/880288/</span></a></p><p><a href="https://zhub.link/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://zhub.link/tags/%D1%81_%D0%BD%D1%83%D0%BB%D1%8F" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>с_нуля</span></a> <a href="https://zhub.link/tags/%D0%BF%D0%BE%D1%81%D1%82%D1%80%D0%BE%D0%B5%D0%BD%D0%B8%D0%B5_%D0%B8%D0%BD%D1%84%D1%80%D0%B0%D1%81%D1%82%D1%80%D1%83%D0%BA%D1%82%D1%83%D1%80%D1%8B" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>построение_инфраструктуры</span></a> <a href="https://zhub.link/tags/opensource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opensource</span></a> <a href="https://zhub.link/tags/%D0%97%D0%B0%D0%BA%D1%80%D1%8B%D1%82%D1%8B%D0%B5_%D1%80%D0%B5%D1%88%D0%B5%D0%BD%D0%B8%D1%8F" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Закрытые_решения</span></a> <a href="https://zhub.link/tags/%D0%BC%D0%B0%D1%81%D0%BA%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%B4%D0%B0%D0%BD%D0%BD%D1%8B%D1%85" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>маскирование_данных</span></a> <a href="https://zhub.link/tags/clearml" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>clearml</span></a> <a href="https://zhub.link/tags/vllm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vllm</span></a> <a href="https://zhub.link/tags/%D0%BB%D0%BE%D0%B3%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>логирование</span></a> <a href="https://zhub.link/tags/%D0%BC%D0%BE%D0%BD%D0%B8%D1%82%D0%BE%D1%80%D0%B8%D0%BD%D0%B3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>мониторинг</span></a></p>
Adam :redhat: :ansible: :bash:<p>Introducing vLLM Inference Provider in Llama Stack</p><p><a href="https://blog.vllm.ai/2025/01/27/intro-to-llama-stack-with-vllm.html" target="_blank" rel="nofollow noopener noreferrer" translate="no"><span class="invisible">https://</span><span class="ellipsis">blog.vllm.ai/2025/01/27/intro-</span><span class="invisible">to-llama-stack-with-vllm.html</span></a></p><p><a href="https://fosstodon.org/tags/artificialintelligence" class="mention hashtag" rel="tag">#<span>artificialintelligence</span></a> <a href="https://fosstodon.org/tags/AI" class="mention hashtag" rel="tag">#<span>AI</span></a> <a href="https://fosstodon.org/tags/vLLM" class="mention hashtag" rel="tag">#<span>vLLM</span></a> <a href="https://fosstodon.org/tags/llamastack" class="mention hashtag" rel="tag">#<span>llamastack</span></a> <a href="https://fosstodon.org/tags/opensource" class="mention hashtag" rel="tag">#<span>opensource</span></a> <a href="https://fosstodon.org/tags/RedHat" class="mention hashtag" rel="tag">#<span>RedHat</span></a></p>
Adam :redhat: :ansible: :bash:<p>Achieve better large language model inference with fewer GPUs</p><p>&quot;we achieved approximately 55-65% of the throughput on a server config that is approximately 15% of the cost&quot;</p><p><a href="https://www.redhat.com/en/blog/achieve-better-large-language-model-inference-fewer-gpus" target="_blank" rel="nofollow noopener noreferrer" translate="no"><span class="invisible">https://www.</span><span class="ellipsis">redhat.com/en/blog/achieve-bet</span><span class="invisible">ter-large-language-model-inference-fewer-gpus</span></a></p><p><a href="https://fosstodon.org/tags/OpenShiftAI" class="mention hashtag" rel="tag">#<span>OpenShiftAI</span></a> <a href="https://fosstodon.org/tags/RedHat" class="mention hashtag" rel="tag">#<span>RedHat</span></a> <a href="https://fosstodon.org/tags/OpenShift" class="mention hashtag" rel="tag">#<span>OpenShift</span></a> <a href="https://fosstodon.org/tags/AI" class="mention hashtag" rel="tag">#<span>AI</span></a> <a href="https://fosstodon.org/tags/Kubernetes" class="mention hashtag" rel="tag">#<span>Kubernetes</span></a> <a href="https://fosstodon.org/tags/vllm" class="mention hashtag" rel="tag">#<span>vllm</span></a> <a href="https://fosstodon.org/tags/kubeflow" class="mention hashtag" rel="tag">#<span>kubeflow</span></a> <a href="https://fosstodon.org/tags/kserve" class="mention hashtag" rel="tag">#<span>kserve</span></a></p>
michabbb<p>🎯 <a href="https://social.vivaldi.net/tags/OpenSource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenSource</span></a> Language Model Platform Launch </p><p>🔧 Leverages <a href="https://social.vivaldi.net/tags/vLLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vLLM</span></a> technology with custom <a href="https://social.vivaldi.net/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GPU</span></a> scheduler for running various <a href="https://social.vivaldi.net/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> models<br>🤖 Supports major models: <a href="https://social.vivaldi.net/tags/Llama3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Llama3</span></a> (405B/70B/8B), <a href="https://social.vivaldi.net/tags/Qwen2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Qwen2</span></a> 72B, <a href="https://social.vivaldi.net/tags/Mixtral" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Mixtral</span></a>, <a href="https://social.vivaldi.net/tags/Gemma2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Gemma2</span></a>, <a href="https://social.vivaldi.net/tags/Jamba15" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Jamba15</span></a>, <a href="https://social.vivaldi.net/tags/Phi3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Phi3</span></a></p><p><a href="https://glhf.chat/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">glhf.chat/</span><span class="invisible"></span></a></p>
jordan<p>Just occurred to me that I could use a micro <a href="https://mastodon.jordanwages.com/tags/vllm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vllm</span></a> like <a href="https://mastodon.jordanwages.com/tags/omnivision" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>omnivision</span></a> to automatically add <a href="https://mastodon.jordanwages.com/tags/alttext" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>alttext</span></a> to my image posts without having to manually input them. I typically use <a href="https://mastodon.jordanwages.com/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ChatGPT</span></a> to generate my alt-text anyway, and rarely have to manually edit the content. Since I own this server, I could even do it at the database level, directly modifying the content.</p><p>Worth it? Potential pitfalls? Waste of effort?</p><p><a href="https://huggingface.co/NexaAIDev/omnivision-968M#how-to-use-on-device" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">huggingface.co/NexaAIDev/omniv</span><span class="invisible">ision-968M#how-to-use-on-device</span></a></p><p><a href="https://mastodon.jordanwages.com/tags/mastoadmin" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>mastoadmin</span></a> <a href="https://mastodon.jordanwages.com/tags/mastobot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>mastobot</span></a> <a href="https://mastodon.jordanwages.com/tags/bot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bot</span></a> <a href="https://mastodon.jordanwages.com/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.jordanwages.com/tags/llm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llm</span></a></p>
michabbb<p>New Cloud Platform for Large Language Model Deployment 🚀</p><p>🔧 Run any <a href="https://social.vivaldi.net/tags/opensource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opensource</span></a> <a href="https://social.vivaldi.net/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> supported by <a href="https://social.vivaldi.net/tags/vLLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vLLM</span></a> on autoscaling <a href="https://social.vivaldi.net/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GPU</span></a> clusters, supporting models up to 640GB VRAM</p><p>🤖 Compatible with major models: <a href="https://social.vivaldi.net/tags/Llama3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Llama3</span></a> 405B/70B/8B, <a href="https://social.vivaldi.net/tags/Qwen2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Qwen2</span></a> 72B, <a href="https://social.vivaldi.net/tags/Mixtral" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Mixtral</span></a> 8x22B, <a href="https://social.vivaldi.net/tags/Gemma2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Gemma2</span></a> 27B, <a href="https://social.vivaldi.net/tags/Phi3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Phi3</span></a>, and more</p><p>💻 Features include:<br>- <a href="https://social.vivaldi.net/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> compatible <a href="https://social.vivaldi.net/tags/API" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>API</span></a><br>- Custom-built <a href="https://social.vivaldi.net/tags/GPU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GPU</span></a> scheduler<br>- Support for full-weight and 4-bit AWQ repos<br>- Multi-tenant architecture for cost efficiency</p><p>🆓 Currently free during beta phase, promising competitive pricing post-launch</p><p><a href="https://glhf.chat/landing/home" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">glhf.chat/landing/home</span><span class="invisible"></span></a></p>
Brian Cunnie<p>-- Could NOT find Python (missing: Python_INCLUDE_DIRS Interpreter Development.Module Development.SABIModule)</p><p>→</p><p>sudo apt install -y python3.10-dev</p><p><a href="https://ruby.social/tags/Python" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Python</span></a> <a href="https://ruby.social/tags/vllm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vllm</span></a></p>
Judith van Stegeren<p>Many companies are currently scrambling for ML infra engineers. They need people that know how to manage AI infrastructure, and that can seriously speed up training and inference with specialized tooling like vLLM, Triton, TensorRT, Torchtune, etc.</p><p><a href="https://fosstodon.org/tags/inference" class="mention hashtag" rel="tag">#<span>inference</span></a> <a href="https://fosstodon.org/tags/training" class="mention hashtag" rel="tag">#<span>training</span></a> <a href="https://fosstodon.org/tags/genai" class="mention hashtag" rel="tag">#<span>genai</span></a> <a href="https://fosstodon.org/tags/triton" class="mention hashtag" rel="tag">#<span>triton</span></a> <a href="https://fosstodon.org/tags/vllm" class="mention hashtag" rel="tag">#<span>vllm</span></a> <a href="https://fosstodon.org/tags/pytorch" class="mention hashtag" rel="tag">#<span>pytorch</span></a> <a href="https://fosstodon.org/tags/torchtune" class="mention hashtag" rel="tag">#<span>torchtune</span></a> <a href="https://fosstodon.org/tags/tensorrt" class="mention hashtag" rel="tag">#<span>tensorrt</span></a> <a href="https://fosstodon.org/tags/nvidia" class="mention hashtag" rel="tag">#<span>nvidia</span></a></p>
Redd T. Panda<p>We've swapped our <a href="https://unfufadoo.net/tags/vLLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vLLM</span></a> to <a href="https://unfufadoo.net/tags/MistralAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MistralAI</span></a> <a href="https://unfufadoo.net/tags/Pixtral" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Pixtral</span></a> and its awesome, we are using this <a href="https://unfufadoo.net/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://unfufadoo.net/tags/model" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>model</span></a> for our <a href="https://unfufadoo.net/tags/alttext" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>alttext</span></a> posts for our <a href="https://unfufadoo.net/tags/memes" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>memes</span></a>, <a href="https://unfufadoo.net/tags/images" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>images</span></a> and <a href="https://unfufadoo.net/tags/photos" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>photos</span></a>. </p><p><a href="https://unfufadoo.net/tags/mastodon" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>mastodon</span></a> <a href="https://unfufadoo.net/tags/Fediverse" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Fediverse</span></a> <a href="https://unfufadoo.net/tags/community" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>community</span></a></p>
michabbb<p>🚀 <a href="https://social.vivaldi.net/tags/Qwen2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Qwen2</span></a>.5: New <a href="https://social.vivaldi.net/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> model family released by Qwen Team</p><p><a href="https://social.vivaldi.net/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> variants: 0.5B to 72B parameters, support 29+ languages including English, Chinese, French, Spanish<br>Specialized models: <a href="https://social.vivaldi.net/tags/Qwen2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Qwen2</span></a>.5Coder for coding, <a href="https://social.vivaldi.net/tags/Qwen2" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Qwen2</span></a>.5Math for mathematics<br>128K token context length, can generate up to 8K tokens<br><a href="https://social.vivaldi.net/tags/OpenSource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenSource</span></a> under Apache 2.0 license (except 3B and 72B variants)</p><p>💡 Key improvements:</p><p>Enhanced knowledge (85+ on <a href="https://social.vivaldi.net/tags/MMLU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MMLU</span></a>)<br>Better coding skills (85+ on <a href="https://social.vivaldi.net/tags/HumanEval" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>HumanEval</span></a>)<br>Improved math capabilities (80+ on <a href="https://social.vivaldi.net/tags/MATH" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MATH</span></a>)<br>Stronger instruction following and long text generation<br>Better handling of structured data and outputs (e.g., <a href="https://social.vivaldi.net/tags/JSON" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>JSON</span></a>)</p><p>🔬 Performance highlights:</p><p><a href="https://social.vivaldi.net/tags/Qwen2572B" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Qwen2572B</span></a> competitive with leading models like <a href="https://social.vivaldi.net/tags/Llama3" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Llama3</span></a> and <a href="https://social.vivaldi.net/tags/MistralAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MistralAI</span></a><br>Smaller models (e.g., 3B) show impressive efficiency<br><a href="https://social.vivaldi.net/tags/QwenPlus" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>QwenPlus</span></a> API model competes with <a href="https://social.vivaldi.net/tags/GPT4" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GPT4</span></a> and <a href="https://social.vivaldi.net/tags/Claude" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Claude</span></a> on some benchmarks</p><p>🛠️ Available via <a href="https://social.vivaldi.net/tags/HuggingFace" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>HuggingFace</span></a>, <a href="https://social.vivaldi.net/tags/vLLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vLLM</span></a>, and other deployment options<br>📊 Comprehensive benchmarks and comparisons provided in the blog post </p><p><a href="https://qwenlm.github.io/blog/qwen2.5/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">qwenlm.github.io/blog/qwen2.5/</span><span class="invisible"></span></a></p>
Tao of Mac<p>iGPU Compute and LLMs on the AceMagic AM18</p><p>As part of my forays into LLMs and GPU compute “on the small” I’ve been playing around with the AceMagic AM18 in a few unusual ways. If you missed my initial impressions, you might want to check them out first.(...)</p><p><a href="https://botsin.space/tags/am18" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>am18</span></a> <a href="https://botsin.space/tags/amd" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>amd</span></a> <a href="https://botsin.space/tags/amdgpu" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>amdgpu</span></a> <a href="https://botsin.space/tags/gpu" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpu</span></a> <a href="https://botsin.space/tags/hardware" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>hardware</span></a> <a href="https://botsin.space/tags/homelab" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>homelab</span></a> <a href="https://botsin.space/tags/llm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llm</span></a> <a href="https://botsin.space/tags/ml" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ml</span></a> <a href="https://botsin.space/tags/ollama" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ollama</span></a> <a href="https://botsin.space/tags/proxmox" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>proxmox</span></a> <a href="https://botsin.space/tags/review" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>review</span></a> <a href="https://botsin.space/tags/ryzen" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ryzen</span></a> <a href="https://botsin.space/tags/vllm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vllm</span></a> <a href="https://botsin.space/tags/zluda" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>zluda</span></a></p><p><a href="https://taoofmac.com/space/blog/2024/04/13/2100" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">taoofmac.com/space/blog/2024/0</span><span class="invisible">4/13/2100</span></a></p>
Habr<p>Как ускорить LLM-генерацию текста в 20 раз на больших наборах данных</p><p>Всем привет, я Алан, разработчик-исследователь в MTS AI. В команде фундаментальных исследований мы занимаемся исследованием LLM, реализацией DPO и валидацией наших собственных языковых моделей. В рамках этих задач у нас возникла потребность в генерации большого количества данных с помощью LLM. Такая генерация обычно занимает много времени. Однако за последний год, с ростом популярности LLM, стали появляться различные инструменты для развертывания таких моделей. Одной из самых эффективных библиотек для инференса языковых моделей является библиотека vLLM. В статье показывается, как с помощью асинхронных запросов и встроенных особенностей vLLM можно увеличить скорость генерации примерно в 20 раз. Приятного чтения!</p><p><a href="https://habr.com/ru/companies/mts_ai/articles/791594/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">habr.com/ru/companies/mts_ai/a</span><span class="invisible">rticles/791594/</span></a></p><p><a href="https://zhub.link/tags/%D0%B8%D1%81%D0%BA%D1%83%D1%81%D1%81%D1%82%D0%B2%D0%B5%D0%BD%D0%BD%D1%8B%D0%B9_%D0%B8%D0%BD%D1%82%D0%B5%D0%BB%D0%BB%D0%B5%D0%BA%D1%82" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>искусственный_интеллект</span></a> <a href="https://zhub.link/tags/nlp" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>nlp</span></a> <a href="https://zhub.link/tags/%D1%8F%D0%B7%D1%8B%D0%BA%D0%BE%D0%B2%D1%8B%D0%B5_%D0%BC%D0%BE%D0%B4%D0%B5%D0%BB%D0%B8" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>языковые_модели</span></a> <a href="https://zhub.link/tags/vllm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vllm</span></a> <a href="https://zhub.link/tags/%D0%BE%D0%B1%D1%80%D0%B0%D0%B1%D0%BE%D1%82%D0%BA%D0%B0_%D0%B5%D1%81%D1%82%D0%B5%D1%81%D1%82%D0%B2%D0%B5%D0%BD%D0%BD%D0%BE%D0%B3%D0%BE_%D1%8F%D0%B7%D1%8B%D0%BA%D0%B0" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>обработка_естественного_языка</span></a> <a href="https://zhub.link/tags/%D0%BD%D0%B5%D0%B9%D1%80%D0%BE%D1%81%D0%B5%D1%82%D0%B8" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>нейросети</span></a> <a href="https://zhub.link/tags/%D0%BD%D0%B5%D0%B9%D1%80%D0%BE%D0%BD%D0%BD%D1%8B%D0%B5_%D1%81%D0%B5%D1%82%D0%B8" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>нейронные_сети</span></a> <a href="https://zhub.link/tags/%D0%B8%D0%BD%D1%84%D0%B5%D1%80%D0%B5%D0%BD%D1%81" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>инференс</span></a> <a href="https://zhub.link/tags/llm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llm</span></a></p>
Karim Lalani<p>Self-hosting LLMs — Deployment Considerations for LangChain and LangServe applications</p><p><a href="https://medium.com/@klcoder/self-hosting-llms-deployment-considerations-for-langchain-and-langserve-applications-b9ae689c744a?sk=95c1b2d147764f0aafad932ef825ea6f" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">medium.com/@klcoder/self-hosti</span><span class="invisible">ng-llms-deployment-considerations-for-langchain-and-langserve-applications-b9ae689c744a?sk=95c1b2d147764f0aafad932ef825ea6f</span></a></p><p><a href="https://me.dm/tags/llm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llm</span></a> <a href="https://me.dm/tags/genai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>genai</span></a> <a href="https://me.dm/tags/langchain" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>langchain</span></a> <a href="https://me.dm/tags/ollama" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ollama</span></a> <a href="https://me.dm/tags/vllm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vllm</span></a> <a href="https://me.dm/tags/llamacpp" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llamacpp</span></a></p>
michabbb<p>Serving <a href="https://social.vivaldi.net/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> 24x Faster On the <a href="https://social.vivaldi.net/tags/Cloud" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Cloud</span></a> with <a href="https://social.vivaldi.net/tags/vLLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vLLM</span></a> and <a href="https://social.vivaldi.net/tags/SkyPilot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SkyPilot</span></a> <a href="https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blog.skypilot.co/serving-llm-2</span><span class="invisible">4x-faster-on-the-cloud-with-vllm-and-skypilot/</span></a></p><p><a href="https://social.vivaldi.net/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a></p>