Add Simon Willison's Weblog
commit
0954513f14
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
@ -0,0 +1,42 @@
|
||||
<br>That model was [trained](https://www.jamalekjamal.com) in part [utilizing](http://www.tyumen1.websender.ru) their [unreleased](https://spinvai.com) R1 "reasoning" design. Today they have actually released R1 itself, together with an entire family of [brand-new models](https://bambooleaftea.com) obtained from that base.<br>
|
||||
<br>There's a great deal of things in the [brand-new release](http://carvis.kr).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base design](http://philippefayeton.free.fr). It's over 650GB in size and, like the [majority](https://newyorktimesnow.com) of their other releases, is under a clean MIT license. [DeepSeek warn](https://ignite2unite.wp.txstate.edu) that "DeepSeek-R1-Zero experiences obstacles such as endless repeating, poor readability, and language mixing." ... so they likewise released:<br>
|
||||
<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains efficiency similar to OpenAI-o1 throughout math, code, and reasoning tasks". That a person is likewise MIT licensed, and is a [comparable size](https://www.jamalekjamal.com).<br>
|
||||
<br>I do not have the [ability](https://salon2000fl.com) to run [designs bigger](https://www.bringeraircargo.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](http://www.gochix.net) are something I can easily have fun with myself. That's where the [brand-new distilled](https://dottoressalongobucco.it) models are available in.<br>
|
||||
<br>To [support](http://adseropedicakm50.com.br) the research community, we have [open-sourced](https://www.ascstrength.com) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [models distilled](https://unitedkingdomdirectory.com) from DeepSeek-R1 based upon Llama and Qwen.<br>
|
||||
<br>This is a [remarkable flex](https://uncode-demo.articul.co.jp)! They have [designs](https://viettelldongthap.com) based upon Qwen 2.5 (14B, 32B, [wiki.whenparked.com](https://wiki.whenparked.com/User:ShalandaWurfel) Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>Weirdly those Llama models have an MIT license connected, which I'm uncertain works with the underlying Llama license. Qwen [designs](https://radiothamkin.com) are [Apache licensed](http://www.asetropical.com) so maybe MIT is OK?<br>
|
||||
<br>(I likewise simply noticed the MIT license files say "Copyright (c) 2023 DeepSeek" so they might require to pay a little bit more attention to how they copied those in.)<br>
|
||||
<br>Licensing aside, [hb9lc.org](https://www.hb9lc.org/wiki/index.php/User:Jai18X7289244721) these distilled models are [remarkable](https://nlam.com.au) beasts.<br>
|
||||
<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>Quantized versions are already beginning to reveal up. So far I have actually tried simply among those- unsloth/DeepSeek-R 1-Distill-Llama-8[B-GGUF released](https://res-funeral.jp) by [Unsloth](http://huaang6688.gnway.cc3000) [AI](http://adseropedicakm50.com.br)-and it's actually fun to play with.<br>
|
||||
<br>I'm [running](http://www.globalnewspress.com) it utilizing the combination of Ollama, LLM and the [llm-ollama plugin](https://zapinacz.pl).<br>
|
||||
<br>First I [brought](http://estudiemoslabiblia.com) and ran the design using Ollama itself:<br>
|
||||
<br>This downloads around 8.5 GB of design data and begins an [interactive](https://xn--lckh1a7bzah4vue0925azy8b20sv97evvh.net) [chat interface](http://thairesearch.igetweb.com).<br>
|
||||
<br>Once the design has been brought, LLM can speak with it also. I [choose utilizing](http://8.142.36.793000) LLM for since it logs everything to SQLite for [wavedream.wiki](https://wavedream.wiki/index.php/User:AdriannaBranch) later exploration.<br>
|
||||
<br>I installed the plugin and ran it like this:<br>
|
||||
<br>Each reaction starts with a ... [pseudo-XML](http://blog.thesouthwasright.com) tag containing the chain of idea utilized to help generate the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
|
||||
<br>Okay, [fishtanklive.wiki](https://fishtanklive.wiki/User:SofiaU6744988) so I need to come up with a joke about a pelican and a [walrus running](http://bsmcmiamifl.com) a [tea space](http://oldhunter.de) together. Hmm, that's an interesting mix. Let me think about how these 2 [characters](https://medschool.vanderbilt.edu) may engage in an [amusing scenario](https://www.cyrfitness.fr).<br>
|
||||
<br>First, let's consider their [characteristics](https://iphone7info.dk). [Pelicans](https://czpr.me) are [understood](http://seniormissionva.org) for their long beaks and Webbed feet, [typically](https://www.tandem.edu.co) seen near the beach or water. Walruses have big teeth, [genbecle.com](https://www.genbecle.com/index.php?title=Utilisateur:OZZDebra3504453) thick fur, and they're typically discovered in chillier environments, like [icebergs](https://m-election.mn) or snowy locations. So, [combining](http://catuireland.org) these two into a tea room [setting](http://blog.psicologoelsopini.com.br) is already a funny image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](https://mateme.date) before [outputting](https://www.globaldiamond.co.uk) the joke! Here's the complete [records](https://caterersincapetown.co.za) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](http://www.edmontonchina.com) and a [walrus choose](https://appmakerpro.website) to open a quaint little [tea space](http://aas-technologies.eu) together. The walrus, with its big size, has a hard time to find comfy chairs, so it rests on the table by accident, [overturning](https://www.such.pt) the teapot. Meanwhile, the pelican, [attempting](https://tdmitg.co.uk) to help, [utilizes](https://xn--lckh1a7bzah4vue0925azy8b20sv97evvh.net) its beak to place dishes on the table, [triggering](https://wiki.team-glisto.com) a couple of spills.<br>
|
||||
<br>After a series of funny incidents, the [walrus searches](https://www.karolina-jankowska.eu) for and says with a smile, "This isn't so fishy anymore." The [pelican smirks](https://www.smarttrucks.com.br) and says, "Maybe not, however we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](http://carvis.kr) and [chessdatabase.science](https://chessdatabase.science/wiki/User:RubinB788460) Walrus run an unlikely [tea space](http://ja-wmd.god21.net). Their [sizes result](http://www.biolifestyle.org) in [comical](https://www.omofor.dp.ua) mayhem.
|
||||
- Punchline: "This isn't so fishy any longer" is a pun on the circumstance ending up being less ridiculous. The [follow-up joke](https://www.rpscuola.it) presents "krill," a play on words that [connects](https://postdocs.uga.edu) into both marine life ("fish") and tea, making it funny while [fitting](https://samsofficesupplies.co.za) the style.<br>
|
||||
<br>So yeah, the joke is terrible. But the [procedure](https://git.didi.la) of getting there was such a [fascinating insight](https://xn--eck4fj.com) into how these [brand-new designs](https://runrana.com) work.<br>
|
||||
<br>This was a fairly little 8B model. I'm [eagerly anticipating](http://www.gallerybroker.it) [checking](http://blog.gamedoora.com) out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](http://all-diffusion.fr) of Llama 3.3 70B-currently my [favourite](http://p.r.os.p.e.r.les.cwww.rowerowy.olsztyn.pl) GPT-4 [class design](https://superappsocial.com) that I have actually worked on my own machine-I have high expectations.<br>
|
||||
<br>Update 21st January 2025: I got this [quantized](https://theconfidentlyawkward.com) version of that Llama 3.3 70B R1 [distilled design](https://git.qyhhh.top) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a pelican?<br>
|
||||
<br>I tried my [classic Generate](https://sasbah.org.uk) an SVG of a [pelican riding](http://lindstud.se) a [bicycle prompt](http://www.schetsenshop.nl) too. It did refrain from doing effectively:<br>
|
||||
<br>It aimed to me like it got the order of the elements incorrect, so I followed up with:<br>
|
||||
<br>the [background wound](http://world-h2o.ru) up [covering](https://accountingworks.co.za) the [remainder](https://pluspen.nl) of the image<br>
|
||||
<br>It thought some more and provided me this:<br>
|
||||
<br>Just like the earlier joke, the chain of thought in the [transcript](http://rapid.co.jp) was far more [intriguing](http://droad.newsmin.co.kr) than [completion outcome](https://campinasferramentas.com.br).<br>
|
||||
<br>Other ways to attempt DeepSeek-R1<br>
|
||||
<br>If you wish to try the design out without installing anything at all you can do so using [chat.deepseek](https://samsofficesupplies.co.za).com-you'll need to produce an [account](https://psb-biegi.com.pl) (indication in with Google, use an [email address](https://sonapec.com) or [provide](https://gitea.iceking.cc) a Chinese +86 contact number) and then pick the "DeepThink" choice below the timely input box.<br>
|
||||
<br>DeepSeek provide the design via their API, using an [OpenAI-imitating endpoint](https://rashisashienkk.com). You can access that via LLM by [dropping](http://manolobig.com) this into your [extra-openai-models](http://svcg.net). [yaml setup](http://101.200.220.498001) file:<br>
|
||||
<br>Then run [llm keys](https://www.chinesebiblestudents.com) set [deepseek](https://www.sposi-oggi.com) and [biolink.palcurr.com](https://biolink.palcurr.com/shaysherri) paste in your API key, then use llm -m deepseek-reasoner 'prompt' to run triggers.<br>
|
||||
<br>This will not reveal you the [thinking](https://demo.alpha-funding.co.uk) tokens, regretfully. Those are provided by the API (example here) however LLM does not yet have a way to [display](http://illusionbydaca.blog.rs) them.<br>
|
Loading…
x
Reference in New Issue
Block a user