I did a live stream today on these “Reasoning”models (Llama 3.3, DeepSeek R1) and whether they can make complex C code faster given enough constraints. The function was CPython’s vector in-place add for integers and the results were… interesting.
https://www.youtube.com/live/aQlfYzgg3eo?si=dhGXiDiFcH9N_oeg
@tonybaloney i’m very much looking forward to watching this!
this weekend, I plan to do some side-by-side factoring to compare ChatGPT and deepseek, and also to see if I can get some mix of open source tools to provide a user experience that something like ChatGPT‘s canvas feature.
@feoh I talk about that a bit in the video. R1 isn’t really a chat model and it doesn’t have a system prompt (which determines most of ChatGPTs behaviour) so it’s a very raw interface and if you ask it ChatGPT style short questions it probably won’t perform well.
@tonybaloney It's super interesting to me. A lot of people seem hyper-fixated on tokens per second, which I can understand for use cases like embedding the llm in another app, but to me some of the most interesting differences between the models and commercial offering that embody them are the human interface factors.
IMO things like Canvas, Claude, and Copilot chat are more important than people credit.