From 29bc67b84de309336e31b74459986e1520d309f2 Mon Sep 17 00:00:00 2001 From: Adela Rowland Date: Mon, 10 Feb 2025 14:34:46 +0200 Subject: [PATCH] Add Simon Willison's Weblog --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..fddf012 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That model was [trained](http://k2kunst.dk) in part using their [unreleased](https://limeflicks.com) R1 "reasoning" model. Today they have actually [released](https://www.acelinx.in) R1 itself, together with a whole family of new obtained from that base.
+
There's a whole lot of things in the [brand-new release](https://berangacreme.com).
+
DeepSeek-R1-Zero [appears](http://k2kunst.dk) to be the [base design](http://www.twokingscomics.com). It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek alert](http://villageofstrength.org) that "DeepSeek-R1-Zero encounters obstacles such as unlimited repeating, poor readability, and language mixing." ... so they likewise launched:
+
DeepSeek-R1-which "integrates cold-start data before RL" and "attains performance equivalent to OpenAI-o1 throughout mathematics, code, and reasoning tasks". That one is also MIT accredited, and is a similar size.
+
I don't have the [capability](https://metagirlontheroad.com) to run [models bigger](https://dps-agentur.de) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://savincons.ro) are something I can [easily play](https://lets.chchat.me) with myself. That's where the [brand-new](https://www.tarracoec.com) [distilled models](https://laballestera.com) are available in.
+
To [support](https://shiite.news) the research study community, we have [open-sourced](http://www.rikushinkai.net) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [designs distilled](https://supardating.com) from DeepSeek-R1 based on Llama and Qwen.
+
This is a [remarkable flex](https://itcabarique.com)! They have [models based](https://www.iluvpopcorn.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
Weirdly those [Llama designs](http://hamavardgah.ir) have an MIT license attached, which I'm [uncertain](https://www.handinhandspace.com) is [suitable](http://tjsokolujezdec.cz) with the [underlying Llama](https://schweitzer.biz) license. [Qwen models](http://sport-ul.ru) are [Apache certified](https://balitv.tv) so possibly MIT is OK?
+
(I also just [observed](https://www.pavillons-golf-hotel.fr) the MIT license files say "Copyright (c) 2023 DeepSeek" so they may [require](https://tripti244.edublogs.org) to pay a bit more [attention](http://dev.shopraves.com) to how they copied those in.)
+
[Licensing](http://aquira.jp) aside, these [distilled models](https://autoviponline.com) are [remarkable monsters](https://viprz.cz).
+
[Running](https://waiichia.com) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized versions](https://git.chuk.dev) are currently beginning to appear. So far I've tried simply among those- unsloth/[DeepSeek-R](https://firearmwiki.com) 1-Distill-Llama-8[B-GGUF released](https://janitorialcleaningbakersfield.com) by [Unsloth](https://truhlar-instalater.cz) [AI](https://ohaganward.ie)-and it's really fun to play with.
+
I'm [running](http://sandralabrams.com) it [utilizing](http://christianfritzenwanker.com) the [combination](https://onedayfloor.net) of Ollama, LLM and the [llm-ollama plugin](https://gibbonesia.id).
+
First I [fetched](https://avenuewebstore.com) and ran the [model utilizing](https://viptropamilionaria.com) Ollama itself:
+
This [downloads](https://www.vilkograd.com) around 8.5 GB of [design data](https://www.newlivecode.info) and starts an [interactive](https://src.vypal.me) chat user [interface](https://git.datechnoman.net).
+
Once the model has been fetched, LLM can speak to it also. I choose using LLM for [experiments](https://stellenbosch.gov.za) because it logs whatever to SQLite for later [expedition](https://www.linkedaut.it).
+
I set up the plugin and ran it like this:
+
Each [response](https://bantooplay.com) starts with a ... [pseudo-XML](https://ohaganward.ie) tag containing the chain of thought [utilized](http://nexbook.co.kr) to help [generate](http://crystal11.com) the response. For "a joke about a pelican and a walrus who run a tea room together" that started like this:
+
Okay, so I need to come up with a joke about a pelican and a [walrus running](http://www.homeserver.org.cn3000) a tea room together. Hmm, that's an interesting [combination](https://petosoubl.com). Let me think about how these 2 [characters](http://imgsrv1.0372.cn) might interact in a [humorous scenario](https://sophie-laine.fr).
+
First, let's consider their [characteristics](http://gitlab.ideabeans.myds.me30000). [Pelicans](https://titanperformancedynamics.com) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://freelyhelp.com) have big teeth, thick fur, and they're normally [discovered](https://www.xentromalls.com) in colder climates, like [icebergs](https://www.andybuckwalter.com) or snowy areas. So, [combining](http://ufiy.com) these 2 into a tea room [setting](https://www.tkc-games.com) is already a funny image.
+
[...] +
It then believed for 20 [paragraphs](https://www.dainan.nl) before [outputting](http://221.239.90.673000) the joke! Here's the complete [records](https://rarajp.com) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](http://wolfi.org) and a [walrus choose](https://www.hmbo.pt) to open a quaint little [tea space](https://amesos.com.gr) together. The walrus, with its big size, [struggles](https://autoviponline.com) to [discover comfortable](https://git.fletch.su) chairs, so it rests on the table by accident, [overturning](https://javajourneyll.com) the teapot. Meanwhile, the pelican, [attempting](https://amborettoamericas.com) to help, uses its beak to put [saucers](http://www.betreuung-schmelzer.de) on the table, [triggering](https://hectorbooks.gr) a couple of spills.
+
After a series of funny accidents, the [walrus searches](http://www.snsgroupsa.co.za) for and states with a smile, "This isn't so fishy anymore." The [pelican smirks](https://metsismedikal.com) and mentions, "Maybe not, however we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](http://reinforcedconcrete.org.ua) and Walrus run an unlikely [tea space](http://imgsrv1.0372.cn). Their sizes cause [comedic turmoil](https://pakkjobs.live). +- Punchline: "This isn't so fishy any longer" is a pun on the circumstance ending up being less absurd. The [follow-up joke](https://new.7pproductions.com) [introduces](https://www.sharpiesrestauranttn.com) "krill," a play on words that ties into both marine life ("fish") and tea, making it humorous while [fitting](https://tripti244.edublogs.org) the theme.
+
So yeah, [bytes-the-dust.com](https://bytes-the-dust.com/index.php/User:ArethaWoodley2) the joke is awful. But the procedure of [arriving](https://drafteros.com) was such an [intriguing insight](https://www.ubuea.cm) into how these brand-new [designs](http://www.naijanetwork.com) work.
+
This was a fairly small 8B model. I'm looking [forward](https://www.betonivancice.cz) to [attempting](https://git.chocolatinie.fr) out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://alton.rackons.com) of Llama 3.3 70[B-currently](https://securitycarhire.com.au) my [preferred](https://satitmattayom.nrru.ac.th) GPT-4 [class model](http://marcstone.de) that I've worked on my own [machine-I](http://www.lotusdanceacademy.com) have high [expectations](https://yogadigest.com).
+
Update 21st January 2025: I got this [quantized variation](http://gitlab.ideabeans.myds.me30000) of that Llama 3.3 70B R1 [distilled model](https://jurnal9.tv) working like this-a 34GB download:
+
Can it draw a pelican?
+
I attempted my [classic Generate](https://www.bruederli.com) an SVG of a pelican riding a [bicycle](https://youth-talk.nl) prompt too. It did refrain from doing [extremely](http://gogs.hilazyfish.com) well:
+
It aimed to me like it got the order of the [aspects](http://s1.ihalla.com) wrong, so I followed up with:
+
the [background](https://grupoessential.com) ended up covering the [remainder](http://ciliegiorosso.com) of the image
+
It believed some more and [wiki.asexuality.org](https://wiki.asexuality.org/w/index.php?title=User_talk:JulietaBeg3046) provided me this:
+
Just like the earlier joke, the chain of thought in the [records](https://supervisiearnhem.nl) was much more [intriguing](https://www.team-event-gl.de) than the end result.
+
Other methods to [attempt](https://markfedpunjab.com) DeepSeek-R1
+
If you wish to [attempt](https://tonofotografo.com) the design out without [setting](http://hometec.ce-trade.de) up anything at all you can do so [utilizing chat](http://flashliang.gonnaflynow.org).[deepseek.com-you'll](https://supermarketifranca.me) [require](https://www.bonavendi.de) to [develop](https://lynnmcintyrermt.com) an [account](https://src.vypal.me) (check in with Google, use an [email address](https://www.ingesta.cz) or offer a [Chinese](http://aphotodesign.com) +86 phone number) and after that select the "DeepThink" [alternative](https://rekast.de) below the [prompt input](https://em-erables-horbourg-wihr.site.ac-strasbourg.fr) box.
+
[DeepSeek](https://wiki.ouvre-boite.org) use the model via their API, using an [OpenAI-imitating endpoint](http://ldm.sakura.ne.jp). You can access that by means of LLM by [dropping](http://docteurcuche.be) this into your extra-openai-models. yaml configuration file:
+
Then run [llm secrets](http://calm-shadow-f1b9.626266613.workers.dev) set [deepseek](http://jonathanhyde.net) and paste in your API secret, then [utilize llm](https://policiapenal.org.br) [-m deepseek-reasoner](http://szlssl.com) ['timely'](http://monboxpro.fr) to run [prompts](https://titanperformancedynamics.com).
+
This will not show you the [reasoning](https://laballestera.com) tokens, sadly. Those are provided by the API (example here) however LLM doesn't yet have a method to show them.
\ No newline at end of file