Add Simon Willison's Weblog

Adela Rowland 2025-02-10 14:34:46 +02:00
parent 51056fa151
commit 29bc67b84d

@ -0,0 +1,42 @@
<br>That model was [trained](http://k2kunst.dk) in part using their [unreleased](https://limeflicks.com) R1 "reasoning" model. Today they have actually [released](https://www.acelinx.in) R1 itself, together with a whole family of new obtained from that base.<br>
<br>There's a whole lot of things in the [brand-new release](https://berangacreme.com).<br>
<br>DeepSeek-R1-Zero [appears](http://k2kunst.dk) to be the [base design](http://www.twokingscomics.com). It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek alert](http://villageofstrength.org) that "DeepSeek-R1-Zero encounters obstacles such as unlimited repeating, poor readability, and language mixing." ... so they likewise launched:<br>
<br>DeepSeek-R1-which "integrates cold-start data before RL" and "attains performance equivalent to OpenAI-o1 throughout mathematics, code, and reasoning tasks". That one is also MIT accredited, and is a similar size.<br>
<br>I don't have the [capability](https://metagirlontheroad.com) to run [models bigger](https://dps-agentur.de) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://savincons.ro) are something I can [easily play](https://lets.chchat.me) with myself. That's where the [brand-new](https://www.tarracoec.com) [distilled models](https://laballestera.com) are available in.<br>
<br>To [support](https://shiite.news) the research study community, we have [open-sourced](http://www.rikushinkai.net) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [designs distilled](https://supardating.com) from DeepSeek-R1 based on Llama and Qwen.<br>
<br>This is a [remarkable flex](https://itcabarique.com)! They have [models based](https://www.iluvpopcorn.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>Weirdly those [Llama designs](http://hamavardgah.ir) have an MIT license attached, which I'm [uncertain](https://www.handinhandspace.com) is [suitable](http://tjsokolujezdec.cz) with the [underlying Llama](https://schweitzer.biz) license. [Qwen models](http://sport-ul.ru) are [Apache certified](https://balitv.tv) so possibly MIT is OK?<br>
<br>(I also just [observed](https://www.pavillons-golf-hotel.fr) the MIT license files say "Copyright (c) 2023 DeepSeek" so they may [require](https://tripti244.edublogs.org) to pay a bit more [attention](http://dev.shopraves.com) to how they copied those in.)<br>
<br>[Licensing](http://aquira.jp) aside, these [distilled models](https://autoviponline.com) are [remarkable monsters](https://viprz.cz).<br>
<br>[Running](https://waiichia.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized versions](https://git.chuk.dev) are currently beginning to appear. So far I've tried simply among those- unsloth/[DeepSeek-R](https://firearmwiki.com) 1-Distill-Llama-8[B-GGUF released](https://janitorialcleaningbakersfield.com) by [Unsloth](https://truhlar-instalater.cz) [AI](https://ohaganward.ie)-and it's really fun to play with.<br>
<br>I'm [running](http://sandralabrams.com) it [utilizing](http://christianfritzenwanker.com) the [combination](https://onedayfloor.net) of Ollama, LLM and the [llm-ollama plugin](https://gibbonesia.id).<br>
<br>First I [fetched](https://avenuewebstore.com) and ran the [model utilizing](https://viptropamilionaria.com) Ollama itself:<br>
<br>This [downloads](https://www.vilkograd.com) around 8.5 GB of [design data](https://www.newlivecode.info) and starts an [interactive](https://src.vypal.me) chat user [interface](https://git.datechnoman.net).<br>
<br>Once the model has been fetched, LLM can speak to it also. I choose using LLM for [experiments](https://stellenbosch.gov.za) because it logs whatever to SQLite for later [expedition](https://www.linkedaut.it).<br>
<br>I set up the plugin and ran it like this:<br>
<br>Each [response](https://bantooplay.com) starts with a ... [pseudo-XML](https://ohaganward.ie) tag containing the chain of thought [utilized](http://nexbook.co.kr) to help [generate](http://crystal11.com) the response. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
<br>Okay, so I need to come up with a joke about a pelican and a [walrus running](http://www.homeserver.org.cn3000) a tea room together. Hmm, that's an interesting [combination](https://petosoubl.com). Let me think about how these 2 [characters](http://imgsrv1.0372.cn) might interact in a [humorous scenario](https://sophie-laine.fr).<br>
<br>First, let's consider their [characteristics](http://gitlab.ideabeans.myds.me30000). [Pelicans](https://titanperformancedynamics.com) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://freelyhelp.com) have big teeth, thick fur, and they're normally [discovered](https://www.xentromalls.com) in colder climates, like [icebergs](https://www.andybuckwalter.com) or snowy areas. So, [combining](http://ufiy.com) these 2 into a tea room [setting](https://www.tkc-games.com) is already a funny image.<br>
<br> [...]
<br>It then believed for 20 [paragraphs](https://www.dainan.nl) before [outputting](http://221.239.90.673000) the joke! Here's the complete [records](https://rarajp.com) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](http://wolfi.org) and a [walrus choose](https://www.hmbo.pt) to open a quaint little [tea space](https://amesos.com.gr) together. The walrus, with its big size, [struggles](https://autoviponline.com) to [discover comfortable](https://git.fletch.su) chairs, so it rests on the table by accident, [overturning](https://javajourneyll.com) the teapot. Meanwhile, the pelican, [attempting](https://amborettoamericas.com) to help, uses its beak to put [saucers](http://www.betreuung-schmelzer.de) on the table, [triggering](https://hectorbooks.gr) a couple of spills.<br>
<br>After a series of funny accidents, the [walrus searches](http://www.snsgroupsa.co.za) for and states with a smile, "This isn't so fishy anymore." The [pelican smirks](https://metsismedikal.com) and mentions, "Maybe not, however we do have a lot of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](http://reinforcedconcrete.org.ua) and Walrus run an unlikely [tea space](http://imgsrv1.0372.cn). Their sizes cause [comedic turmoil](https://pakkjobs.live).
- Punchline: "This isn't so fishy any longer" is a pun on the circumstance ending up being less absurd. The [follow-up joke](https://new.7pproductions.com) [introduces](https://www.sharpiesrestauranttn.com) "krill," a play on words that ties into both marine life ("fish") and tea, making it humorous while [fitting](https://tripti244.edublogs.org) the theme.<br>
<br>So yeah, [bytes-the-dust.com](https://bytes-the-dust.com/index.php/User:ArethaWoodley2) the joke is awful. But the procedure of [arriving](https://drafteros.com) was such an [intriguing insight](https://www.ubuea.cm) into how these brand-new [designs](http://www.naijanetwork.com) work.<br>
<br>This was a fairly small 8B model. I'm looking [forward](https://www.betonivancice.cz) to [attempting](https://git.chocolatinie.fr) out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://alton.rackons.com) of Llama 3.3 70[B-currently](https://securitycarhire.com.au) my [preferred](https://satitmattayom.nrru.ac.th) GPT-4 [class model](http://marcstone.de) that I've worked on my own [machine-I](http://www.lotusdanceacademy.com) have high [expectations](https://yogadigest.com).<br>
<br>Update 21st January 2025: I got this [quantized variation](http://gitlab.ideabeans.myds.me30000) of that Llama 3.3 70B R1 [distilled model](https://jurnal9.tv) working like this-a 34GB download:<br>
<br>Can it draw a pelican?<br>
<br>I attempted my [classic Generate](https://www.bruederli.com) an SVG of a pelican riding a [bicycle](https://youth-talk.nl) prompt too. It did refrain from doing [extremely](http://gogs.hilazyfish.com) well:<br>
<br>It aimed to me like it got the order of the [aspects](http://s1.ihalla.com) wrong, so I followed up with:<br>
<br>the [background](https://grupoessential.com) ended up covering the [remainder](http://ciliegiorosso.com) of the image<br>
<br>It believed some more and [wiki.asexuality.org](https://wiki.asexuality.org/w/index.php?title=User_talk:JulietaBeg3046) provided me this:<br>
<br>Just like the earlier joke, the chain of thought in the [records](https://supervisiearnhem.nl) was much more [intriguing](https://www.team-event-gl.de) than the end result.<br>
<br>Other methods to [attempt](https://markfedpunjab.com) DeepSeek-R1<br>
<br>If you wish to [attempt](https://tonofotografo.com) the design out without [setting](http://hometec.ce-trade.de) up anything at all you can do so [utilizing chat](http://flashliang.gonnaflynow.org).[deepseek.com-you'll](https://supermarketifranca.me) [require](https://www.bonavendi.de) to [develop](https://lynnmcintyrermt.com) an [account](https://src.vypal.me) (check in with Google, use an [email address](https://www.ingesta.cz) or offer a [Chinese](http://aphotodesign.com) +86 phone number) and after that select the "DeepThink" [alternative](https://rekast.de) below the [prompt input](https://em-erables-horbourg-wihr.site.ac-strasbourg.fr) box.<br>
<br>[DeepSeek](https://wiki.ouvre-boite.org) use the model via their API, using an [OpenAI-imitating endpoint](http://ldm.sakura.ne.jp). You can access that by means of LLM by [dropping](http://docteurcuche.be) this into your extra-openai-models. yaml configuration file:<br>
<br>Then run [llm secrets](http://calm-shadow-f1b9.626266613.workers.dev) set [deepseek](http://jonathanhyde.net) and paste in your API secret, then [utilize llm](https://policiapenal.org.br) [-m deepseek-reasoner](http://szlssl.com) ['timely'](http://monboxpro.fr) to run [prompts](https://titanperformancedynamics.com).<br>
<br>This will not show you the [reasoning](https://laballestera.com) tokens, sadly. Those are provided by the API (example here) however LLM doesn't yet have a method to show them.<br>