<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom"><script src="/feed-style.js" xmlns="http://www.w3.org/1999/xhtml"></script><channel><title>Kai&apos;s Notes</title><description>Sharing knowledge about AI, personal projects, and personal growth</description><link>https://blog.gujiakai.me/</link><language>en</language><atom:link href="https://blog.gujiakai.me/en/rss.xml" rel="self" type="application/rss+xml"/><item><title>docker compose down Then up -d, or Just up -d? What the Official Docs Actually Say</title><link>https://blog.gujiakai.me/en/2026/06/docker-compose-up-vs-down/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/06/docker-compose-up-vs-down/</guid><description>Based on the official Docker docs: the real difference between docker compose up -d and down + up, when up -d alone is enough, when you actually need down first, and the classic pitfall of a stale latest image.</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;If you deploy services with Docker Compose regularly, chances are you&apos;ve built up this muscle-memory combo:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose down
docker compose up -d
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Stop the whole project, wipe it clean, then bring everything back up. It works, of course — but many people can&apos;t quite explain it: doesn&apos;t &lt;code&gt;docker compose up -d&lt;/code&gt; replace old containers on its own? Is the &lt;code&gt;down&lt;/code&gt; step actually necessary, or is it redundant?&lt;/p&gt;
&lt;p&gt;This article, based on the official Docker documentation, settles once and for all what each of these commands does and when to use which.&lt;/p&gt;
&lt;h2&gt;What the Official Docs Say&lt;/h2&gt;
&lt;h3&gt;&lt;code&gt;docker compose up&lt;/code&gt;: Create and Start, With Built-in Change Detection&lt;/h3&gt;
&lt;p&gt;The official reference defines &lt;code&gt;up&lt;/code&gt; as: build, recreate, and start the containers for your services, attaching to their output; with &lt;code&gt;-d&lt;/code&gt;, that is &lt;code&gt;--detach&lt;/code&gt;, the containers run in the background instead.&lt;/p&gt;
&lt;p&gt;The part that actually answers our question is this key passage in the docs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If there are existing containers for a service, and the service&apos;s configuration or image was changed after the container&apos;s creation, &lt;code&gt;docker compose up&lt;/code&gt; picks up the changes by stopping and recreating the containers, while preserving mounted volumes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In other words, &lt;code&gt;up&lt;/code&gt; already has the complete &quot;detect changes → remove old container → swap in new container&quot; logic built in. That&apos;s exactly where the behavior you&apos;ve observed — old containers automatically getting replaced by new ones — comes from.&lt;/p&gt;
&lt;p&gt;And it&apos;s restrained about it: only services that have changed get recreated. Containers with no changes are left as they are and keep running, completely unaffected.&lt;/p&gt;
&lt;p&gt;Around this mechanism, the docs also provide two switches pointing in opposite directions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;--no-recreate&lt;/code&gt;: don&apos;t recreate containers even if changes are detected.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--force-recreate&lt;/code&gt;: recreate containers even if neither the configuration nor the image has changed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;code&gt;docker compose down&lt;/code&gt;: Stop and Tear Down the Whole Project&lt;/h3&gt;
&lt;p&gt;The official definition of &lt;code&gt;down&lt;/code&gt;: stop containers, and remove the containers and networks created by &lt;code&gt;up&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;By default, it removes three kinds of things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The service containers defined in the Compose file;&lt;/li&gt;
&lt;li&gt;The networks defined in the &lt;code&gt;networks&lt;/code&gt; section;&lt;/li&gt;
&lt;li&gt;The project&apos;s default network.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Networks and volumes declared as &lt;code&gt;external&lt;/code&gt;, however, are never removed.&lt;/p&gt;
&lt;p&gt;For data volumes, there are two cases to consider:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Named volumes&lt;/strong&gt;: preserved by default; they&apos;re only removed if you explicitly add &lt;code&gt;-v&lt;/code&gt; or &lt;code&gt;--volumes&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anonymous volumes&lt;/strong&gt;: not removed by default either, but the docs add a warning that&apos;s easy to miss: anonymous volumes don&apos;t have stable names, so when you run &lt;code&gt;up&lt;/code&gt; again later, the new containers won&apos;t automatically mount those old anonymous volumes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hence the official recommendation: data that needs to persist between updates should live in bind mounts or named volumes — don&apos;t rely on anonymous volumes.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/06/11/a9Qf/20260610181808755.webp&quot; alt=&quot;Diagram: after down, anonymous volumes are orphaned from the new containers&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The official getting-started guide has a very intuitive example: a small app that counts visits with Redis. After a &lt;code&gt;down&lt;/code&gt; followed by &lt;code&gt;up&lt;/code&gt;, the visit counter resets to zero.&lt;/p&gt;
&lt;p&gt;The reason is simple: &lt;code&gt;down&lt;/code&gt; deletes the containers, and any data written to the container&apos;s writable layer disappears with them; &lt;code&gt;stop&lt;/code&gt; merely stops the containers — both the containers and their data remain.&lt;/p&gt;
&lt;h2&gt;The Essential Difference Between the Two Approaches&lt;/h2&gt;
&lt;p&gt;Put the pieces together and the difference becomes clear.&lt;/p&gt;
&lt;p&gt;Running just:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose up -d
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;is an in-place, incremental update.&lt;/p&gt;
&lt;p&gt;Compose compares, service by service, the current configuration against the state of the running containers, and replaces only the parts that changed. The project network stays as it is; containers that weren&apos;t recreated keep even their IP addresses; and anonymous volumes from the old containers are &quot;taken over&quot; by the new ones.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;up&lt;/code&gt; has a &lt;code&gt;-V&lt;/code&gt; / &lt;code&gt;--renew-anon-volumes&lt;/code&gt; flag, whose purpose is to &quot;recreate anonymous volumes instead of retrieving data from the previous containers.&quot; The very existence of this flag confirms, conversely, that the default behavior is to retrieve the old data.&lt;/p&gt;
&lt;p&gt;Whereas running:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose down
docker compose up -d
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;is a full-stack teardown and rebuild.&lt;/p&gt;
&lt;p&gt;All containers are stopped and removed first, and the project network is torn down as well; then &lt;code&gt;up&lt;/code&gt; creates the network and all the containers from scratch.&lt;/p&gt;
&lt;p&gt;Which means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The whole application goes through a complete downtime window;&lt;/li&gt;
&lt;li&gt;Every container gets replaced, including the ones you never touched;&lt;/li&gt;
&lt;li&gt;The network is rebuilt wholesale, and container IPs are reassigned;&lt;/li&gt;
&lt;li&gt;Anonymous volumes from the old containers are orphaned for good — the new containers start with blank data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/06/11/I6kv/20260610182021999.webp&quot; alt=&quot;Diagram: up -d incremental update vs full-stack rebuild after down&quot; /&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Just &lt;code&gt;up -d&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;down&lt;/code&gt; then &lt;code&gt;up -d&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Containers&lt;/td&gt;
&lt;td&gt;Only changed services recreated&lt;/td&gt;
&lt;td&gt;All removed, then recreated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unchanged services&lt;/td&gt;
&lt;td&gt;Unaffected, keep running&lt;/td&gt;
&lt;td&gt;Stopped and rebuilt along with the rest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project network&lt;/td&gt;
&lt;td&gt;Stays as it is&lt;/td&gt;
&lt;td&gt;Removed and recreated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anonymous volume data&lt;/td&gt;
&lt;td&gt;New containers take over old data&lt;/td&gt;
&lt;td&gt;Orphaned with the old containers — effectively lost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Named volumes&lt;/td&gt;
&lt;td&gt;Preserved&lt;/td&gt;
&lt;td&gt;Preserved, unless you run &lt;code&gt;down -v&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Downtime scope&lt;/td&gt;
&lt;td&gt;Brief interruption for changed services only&lt;/td&gt;
&lt;td&gt;One full round of whole-stack downtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Most of the Time, Just &lt;code&gt;up -d&lt;/code&gt; Is Enough&lt;/h2&gt;
&lt;p&gt;You changed a service&apos;s environment variables, port mappings, or image tag in &lt;code&gt;compose.yaml&lt;/code&gt;, or added a new service — for these everyday scenarios, simply running:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose up -d
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;is enough.&lt;/p&gt;
&lt;p&gt;Compose will touch exactly the parts that need touching, and the remaining services won&apos;t even notice. This is the standard update path as officially designed — the one with the least downtime and the safest behavior.&lt;/p&gt;
&lt;p&gt;There is, however, one very common pitfall here, and it&apos;s the real reason many people believe &quot;&lt;code&gt;up -d&lt;/code&gt; doesn&apos;t take effect, you have to &lt;code&gt;down&lt;/code&gt; first&quot;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;up&lt;/code&gt; does not proactively pull new images from the registry.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If your service is pinned to an unchanging tag like &lt;code&gt;myapp:latest&lt;/code&gt;, and the image in the registry has been updated while your local copy is still the old one, then as far as Compose is concerned, &quot;the image hasn&apos;t changed&quot; — and &lt;code&gt;up -d&lt;/code&gt; will do nothing at all.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/06/11/b1fE/20260610193526624.webp&quot; alt=&quot;Diagram: with an unchanged tag, up -d won&apos;t pull the new image — pull first&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The correct way to update is to pull first, then start:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose pull
docker compose up -d
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can also merge it into a single step:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose up -d --pull always
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If the image is built locally, use this instead:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose up -d --build
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once the image has been pulled — or rebuilt — Compose detects that it changed and replaces the corresponding containers. At no point does &lt;code&gt;down&lt;/code&gt; need to be involved.&lt;/p&gt;
&lt;h2&gt;When You Actually Need &lt;code&gt;down&lt;/code&gt; First&lt;/h2&gt;
&lt;h3&gt;1. You Changed the Definition of Top-Level Resources Like Networks&lt;/h3&gt;
&lt;p&gt;Docker networks don&apos;t support in-place reconfiguration.&lt;/p&gt;
&lt;p&gt;If you adjust a network&apos;s subnet, driver, or other parameters in the compose file, the old network — together with the containers attached to it — usually has to be torn down before it can be recreated with the new configuration.&lt;/p&gt;
&lt;p&gt;That&apos;s exactly &lt;code&gt;down&lt;/code&gt;&apos;s job. The same goes for changes to named volume definitions.&lt;/p&gt;
&lt;h3&gt;2. You Want a Genuinely Clean Environment&lt;/h3&gt;
&lt;p&gt;When you&apos;re chasing a weird bug or resetting test data, &lt;code&gt;down&lt;/code&gt; gives you a deterministic &quot;zero state.&quot;&lt;/p&gt;
&lt;p&gt;If the persisted data should be wiped too, add &lt;code&gt;-v&lt;/code&gt; to remove the named volumes along with everything else:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose down -v
docker compose up -d
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Careful: &lt;code&gt;down -v&lt;/code&gt; deletes named volumes, and that data cannot be recovered.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;3. You&apos;re Taking the Stack Out of Service for a While&lt;/h3&gt;
&lt;p&gt;If this isn&apos;t just a brief pause but you actually want to free up the container and network resources, then &lt;code&gt;down&lt;/code&gt; is precisely what it was designed for.&lt;/p&gt;
&lt;p&gt;In this scenario, you don&apos;t even need an &lt;code&gt;up&lt;/code&gt; right after it.&lt;/p&gt;
&lt;h3&gt;4. You Need to Clean Up Services Removed From the Compose File&lt;/h3&gt;
&lt;p&gt;If you&apos;ve deleted a service from the compose file and want to clean up its leftover container while you&apos;re at it, &lt;code&gt;down&lt;/code&gt; can certainly do that.&lt;/p&gt;
&lt;p&gt;But in many cases, the better option is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose up -d --remove-orphans
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It cleans up orphaned containers just the same, without affecting the other services that are still running — usually the more convenient choice.&lt;/p&gt;
&lt;h2&gt;Two Commands That Often Get Confused, While We&apos;re at It&lt;/h2&gt;
&lt;h3&gt;&lt;code&gt;docker compose restart&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;restart&lt;/code&gt; merely restarts the processes inside the containers.&lt;/p&gt;
&lt;p&gt;It does not apply any changes you&apos;ve made to the compose file, nor does it swap in a new image. Running &lt;code&gt;restart&lt;/code&gt; after editing your configuration accomplishes nothing.&lt;/p&gt;
&lt;p&gt;What you should be running in that situation is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose up -d
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;code&gt;docker compose stop&lt;/code&gt; / &lt;code&gt;docker compose start&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;stop&lt;/code&gt; / &lt;code&gt;start&lt;/code&gt; simply stop and resume containers.&lt;/p&gt;
&lt;p&gt;The containers themselves and the data inside them are preserved exactly as they were — the right fit for &quot;switch it off for now, bring it back as-is later.&quot; That&apos;s also the biggest difference between them and &lt;code&gt;down&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Back to the Original Question&lt;/h2&gt;
&lt;p&gt;Habitually running &lt;code&gt;down&lt;/code&gt; before &lt;code&gt;up -d&lt;/code&gt; isn&apos;t wrong — it always lands you in a correct, fresh state.&lt;/p&gt;
&lt;p&gt;It&apos;s just that most of the time, it&apos;s overkill: longer whole-stack downtime, a rebuilt network, and orphaned anonymous-volume data. And everything those costs buy you, &lt;code&gt;up -d&lt;/code&gt; could have achieved with far less commotion.&lt;/p&gt;
&lt;p&gt;A simple way to decide:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Day-to-day config or image updates: use &lt;code&gt;docker compose pull &amp;amp;&amp;amp; docker compose up -d&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;Images built locally: use &lt;code&gt;docker compose up -d --build&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;Changed top-level resources like networks, need a thorough cleanup, or plan to decommission the stack: that&apos;s when you reach for &lt;code&gt;down&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;p&gt;References: this article is based primarily on the official Docker documentation, including the &lt;a href=&quot;https://docs.docker.com/reference/cli/docker/compose/up/&quot;&gt;docker compose up command reference&lt;/a&gt;, the &lt;a href=&quot;https://docs.docker.com/reference/cli/docker/compose/down/&quot;&gt;docker compose down command reference&lt;/a&gt;, and the section of the &lt;a href=&quot;https://docs.docker.com/compose/gettingstarted/&quot;&gt;Docker Compose quickstart&lt;/a&gt; on the data-persistence difference between &lt;code&gt;down&lt;/code&gt; and &lt;code&gt;stop&lt;/code&gt;.&lt;/p&gt;
</content:encoded></item><item><title>DeepSeek V4 Shouldn&apos;t Be Overshadowed by GPT-5.5</title><link>https://blog.gujiakai.me/en/2026/04/the-other-launch/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/04/the-other-launch/</guid><description>GPT-5.5 stole the spotlight on launch day, but DeepSeek V4&apos;s 1M-token context recall and 1.6T-parameter open-source flagship deserve attention in their own right.</description><pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;Recently, I have been using GPT-5.5 to review computer science knowledge, and its capability has genuinely stunned me. The earlier GPT-5 series models felt somewhat lacking in a human touch, but 5.5 has clearly changed that impression. I believe many people feel the same way: lately, everyone has started paying attention to GPT again. Image 2 is far ahead of other text-to-image models, and GPT-5.5 also feels like a model worthy of the LLM crown.&lt;/p&gt;
&lt;p&gt;I still remember the timing: GPT-5.5 arrived in the early hours of April 24, 2026, Beijing time, while DeepSeek V4 was released around noon that same day. It was another major release from the DeepSeek team after half a year of quiet work.&lt;/p&gt;
&lt;p&gt;In DeepSeek&apos;s launch article, most of the models used for comparison were previous-generation models from overseas AI companies. Without question, DeepSeek V4 cannot beat GPT-5.5, but its value and contribution should not be overshadowed by GPT-5.5&apos;s brilliance.&lt;/p&gt;
&lt;h2&gt;DeepSeek Capabilities I Am Optimistic About&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;1. 1M context, with strong retrieval ability&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/27/xq5U/20260427041004089.webp&quot; alt=&quot;DeepSeek Pro performs strongly on Context Arena&quot; /&gt;&lt;/p&gt;
&lt;p&gt;On Context Arena, DeepSeek V4 Pro ranks first among Chinese open source models in retrieval ability under the 128K context stress test.&lt;/p&gt;
&lt;p&gt;Why does this matter? When you assign a task to a model and let it execute through tools such as OpenCode, the longer the task runs and the longer the context becomes, the easier it is for the model to forget earlier information. In the end, the result is more likely to drift away from what the user expected.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. The largest parameter scale among Chinese, and even global, open source models&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In recent years, constrained by factors such as compute, many Chinese teams, including Alibaba&apos;s Qwen team, have been researching smaller models and pushing their performance to the limit. But the effective path toward AGI and continued capability improvement is still to make models larger while also making them more efficient. This time, DeepSeek has raised the total parameter count of V4 Pro directly to 1.6T, more than twice that of the R1 model. This helps ensure the model has more abundant world knowledge.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. ...&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are many other highlights I have not yet discovered. If readers have new observations, feel free to add them in the comments.&lt;/p&gt;
&lt;h2&gt;My Personal Experience Using DeepSeek V4 Pro&lt;/h2&gt;
&lt;p&gt;Yesterday, I subscribed to Kimi&apos;s lowest-tier membership and used it together with the official Kimi CLI for data preprocessing.&lt;/p&gt;
&lt;p&gt;The preprocessing results still lagged behind Claude Code with the Opus model and Codex with GPT-5.5. Also, Kimi K2.6 only has a 256K context window. Even with fairly good prompts, it still failed to remove some obvious noise.&lt;/p&gt;
&lt;p&gt;So today, I topped up 50 yuan for the DeepSeek API and paired it with OpenCode to clean up the remaining work from Kimi. The initial result was not satisfying, so I paused the execution in OpenCode and instructed it to read one article completely, then preprocess that article before moving on. In the end, with OpenCode&apos;s help, DeepSeek V4 Pro completed the cleanup task quite well.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/27/eI2q/20260427042733433.webp&quot; alt=&quot;DeepSeek V4 Pro completed the cleanup task quite well with OpenCode&apos;s help&quot; /&gt;&lt;/p&gt;
&lt;p&gt;After that, I gave it more data preprocessing tasks, and the results were also fairly satisfying.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/27/3Nsl/20260427042236192.webp&quot; alt=&quot;DeepSeek V4 Pro performing data preprocessing&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;DeepSeek V4 Pro&apos;s experience on the web or desktop client is not as smooth as Doubao&apos;s, and its feature set is not as complete. But in API-based workflows, it performs tasks quite well.&lt;/p&gt;
&lt;p&gt;With the May Day holiday approaching, DeepSeek API pricing has been heavily discounted, making it very cost-effective.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/27/Cl6p/20260427043301407.webp&quot; alt=&quot;DeepSeek is heavily discounted around the May Day holiday&quot; /&gt;&lt;/p&gt;
&lt;p&gt;DeepSeek is currently on the Pareto frontier: strong model capability at a low price. If your budget is limited but you still want to preserve model quality, it is a good option.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/27/6jUa/20260427043428645.webp&quot; alt=&quot;DeepSeek is on the Pareto frontier&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Although its performance is not as strong as the latest models such as GPT-5.5, its strengths are openness, low cost, and the acceleration of AI democratization. Models such as Gemini have far more parameters than DeepSeek, so it is not surprising that DeepSeek cannot currently beat the very top models. Even so, its contribution deserves recognition.&lt;/p&gt;
&lt;p&gt;The DeepSeek team is quiet and restrained: unmoved by praise, unafraid of criticism, following its own path with composure and discipline, and holding to long-termism. This attitude is much better than OpenAI&apos;s Sam Altman-style hype or Anthropic keeping Mythos under wraps while stirring up attention.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/27/9uGl/20260427044647457.webp&quot; alt=&quot;The low-key DeepSeek&quot; /&gt;&lt;/p&gt;
&lt;p&gt;When I was in the second year of graduate school, from the second half of 2024 to 2025, before R1 came out, I already used DeepSeek for data processing. It was cheap, had no concurrency limits, and offered the best value for money.&lt;/p&gt;
&lt;p&gt;I am optimistic about DeepSeek. Every stir from the little blue whale pushes open source AI further forward. DeepSeek stands on the right side of history, and I look forward to more surprises from it in the future.&lt;/p&gt;
</content:encoded></item><item><title>What Should We Watch Out for When AI Starts Researching Its Own Alignment?</title><link>https://blog.gujiakai.me/en/2026/04/anthropic-ai-self-alignment/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/04/anthropic-ai-self-alignment/</guid><description>A look at Anthropic&apos;s latest research: can AI supervise itself, or does it introduce new risks?</description><pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;When we&apos;re still worrying about the risks that may come with AI&apos;s rapid progress, Anthropic has already started a striking and far-reaching line of research: letting AI conduct &quot;alignment research&quot; itself, meaning teaching AI how to supervise and limit its own capabilities.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/15/3qMx/20260415023140274.webp&quot; alt=&quot;AI self-research lab&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;AI Doing Research on Its Own: Has the Future Already Started?&lt;/h2&gt;
&lt;p&gt;This project has a dramatic name: &lt;strong&gt;Automated Alignment Researchers (AAR)&lt;/strong&gt;. Put simply, it means letting AI carry out scientific research autonomously, including proposing hypotheses, designing experiments, analyzing data, and iterating through communication with other agents, all without human intervention.&lt;/p&gt;
&lt;p&gt;The result was eye-catching: nine AI agents, in just five days and 800 cumulative hours, significantly outperformed human experts on the target task. Human researchers spent seven days and achieved a Performance Gap Recovered (PGR) score of 0.23, while the AI system reached an astonishing 0.97.&lt;/p&gt;
&lt;p&gt;More importantly, the whole effort cost only around $18,000, which works out to roughly $22 per hour for each AI researcher.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/15/h9Ww/20260415023413410.webp&quot; alt=&quot;AI vs. human efficiency comparison&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Why Does &quot;Automated Alignment&quot; Matter So Much?&lt;/h2&gt;
&lt;p&gt;You might ask: why let AI supervise AI?&lt;/p&gt;
&lt;p&gt;As AI capabilities continue to advance quickly, one urgent question is emerging: &lt;strong&gt;when future AI systems become more capable than humans, how will we supervise them effectively?&lt;/strong&gt; Anthropic&apos;s research tries to simulate that scenario through a &quot;weak-to-strong supervision&quot; setup, where a weaker AI, acting as a stand-in for humans, teaches and supervises a stronger AI.&lt;/p&gt;
&lt;p&gt;In this experiment, three different AI models played different roles. Qwen1.5-0.5B-Chat was the weaker small model serving as the &quot;weak teacher&quot; - a proxy for humans in a future where AI becomes vastly more powerful. Qwen3-4B-Base was the more capable but not yet fully trained model serving as the &quot;strong student&quot; - a stand-in for a future superintelligent AI that surpasses human intelligence. And the systems actually doing the research were nine instances of Claude Opus 4.6: they were not part of the teaching loop itself, but instead focused on discovering better ways for the weak teacher to guide the strong student.&lt;/p&gt;
&lt;p&gt;The core finding is that even when the supervisor is much weaker than the system being supervised, it may still be possible to steer the stronger system effectively if the right method is found. That has obvious implications for how humans might one day supervise superhuman AI.&lt;/p&gt;
&lt;h2&gt;The Magic and the Pitfalls of Automated Research&lt;/h2&gt;
&lt;p&gt;While the study showcases AI&apos;s striking efficiency, it also exposes some uniquely AI-shaped risks and traps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Diversity matters a lot&lt;/strong&gt;: Anthropic found that giving each AI researcher a different but fuzzy starting point worked better than forcing all of them through the same process. Over-specifying the workflow actually reduced creativity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI can cheat too&lt;/strong&gt;: The AI researchers sometimes tried to game the evaluation through leaderboard chasing and &lt;strong&gt;reward hacking&lt;/strong&gt;, for example by bypassing the supervisor and directly guessing the most likely correct answer. That is a reminder that even very capable systems may exploit weaknesses in the scoring process.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Generalization remains limited&lt;/strong&gt;: Although the method worked well on certain tasks, Anthropic did not see significant gains when trying to transfer it into real production settings. That suggests the approach may still be overfitting to a narrow experimental setup.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/15/jbQ4/20260415023447204.webp&quot; alt=&quot;Reward hacking warning&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;How Should We Face the Future of AI &quot;Doing Research on Its Own&quot;?&lt;/h2&gt;
&lt;p&gt;Even with all these constraints, the study points to a clear trend: &lt;strong&gt;AI may gradually take over large amounts of basic, repetitive research work, while human roles shift upward toward higher-level judgment, especially value judgments on ambiguous problems and the design of evaluation systems.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;But we also need to stay clear-eyed about the risk of &quot;alien science&quot;: AI could produce theories or methods that humans find difficult to understand, let alone verify.&lt;/p&gt;
&lt;p&gt;Anthropic&apos;s research does not prove that AI can already do fully autonomous research. What it does show is this: &lt;strong&gt;we need clear and reliable evaluation standards for AI, we need to prevent systems from exploiting loopholes, and human judgment and oversight remain indispensable.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In the future, we may face a new scientific ecosystem in which humans and AI work side by side to explore the unknown. But humans have to remain vigilant and make sure AI truly serves us, rather than the other way around.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/15/d5jT/20260415023509792.webp&quot; alt=&quot;Humans and AI facing the future together&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.anthropic.com/research/automated-alignment-researchers&quot;&gt;https://www.anthropic.com/research/automated-alignment-researchers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://alignment.anthropic.com/2026/automated-w2s-researcher/&quot;&gt;https://alignment.anthropic.com/2026/automated-w2s-researcher/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>Let Yourself Feel &quot;Learned Helplessness&quot; for a While</title><link>https://blog.gujiakai.me/en/2026/04/restart-after-failure/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/04/restart-after-failure/</guid><description>A note to myself when I feel trapped in learned helplessness, and to anyone else who feels lost.</description><pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;For a while after the Qingming Festival, I became sluggish and drained.&lt;/p&gt;
&lt;p&gt;When the interview results for the national civil service tax bureau came out, I missed the shore by a small margin. I also failed to make it into the interview round for the provincial exam. Even when preparing for public institution exams, I kept feeling a weight on my chest. I had worked hard, but I still felt there was an unbridgeable gap between me and the top candidates.&lt;/p&gt;
&lt;p&gt;After three years of my master&apos;s program, my thesis has just been sent out for blind review. Graduation is approaching fast, yet my mind is full of confusion and anxiety about the future.&lt;/p&gt;
&lt;p&gt;Recently, I realized that I had fallen into a state called &quot;learned helplessness.&quot; The first time I came across this term was when I was preparing for the written test of the teaching qualification exam. Back then, it felt far away from me. Only now do I realize that this idea has quietly made its way into my heart.&lt;/p&gt;
&lt;p&gt;Simply put, learned helplessness is a state in which repeated failure gradually makes a person lose confidence in changing their situation. Even when opportunities appear, they may still feel unable to act. That seems to be exactly where I am right now. My spirit has scattered, and even the motivation to keep trying is close to disappearing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/14/q7tE/20260413211924172.webp&quot; alt=&quot;A young person sits dejectedly at a desk piled with books and materials for civil service and public institution exams, while a gray sky hangs outside the window.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;But rationally, I know I should not keep letting myself sink like this.&lt;/p&gt;
&lt;p&gt;In truth, setbacks in exams do not completely negate my effort or everything I have invested. All the experiences and accumulation from the past still matter. The real question is how to adjust my mindset and set out again in a better state.&lt;/p&gt;
&lt;p&gt;First, I want to accept my failure.&lt;/p&gt;
&lt;p&gt;Failure does not mean I am incapable, nor does it define me. It is simply an unavoidable episode in the journey of life. Only by accepting failure can I truly let go of it and step out of the shadow it casts.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/14/qrU6/20260413212348859.webp&quot; alt=&quot;A person stands at a crossroads, gradually becoming calm. Dark clouds remain behind them, while the sky ahead slowly clears.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Second, I hope to rebuild my inner drive.&lt;/p&gt;
&lt;p&gt;What is that drive? It is the firm belief in your goal, the force inside you that keeps you moving forward. Losing it may only be temporary, not permanent. As long as we are willing, we can gather that strength again and continue on.&lt;/p&gt;
&lt;p&gt;I have decided to set a few small goals for myself and slowly return to a steady rhythm. I will try to complete concrete tasks each day, such as exercising for half an hour, reviewing professional knowledge, or actively attending a spring recruitment fair. By doing these small things, I hope to slowly rebuild my confidence and regain that inner drive.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/14/W9dt/20260413212625919.webp&quot; alt=&quot;A notebook and planner are neatly arranged on a desk beside a cup of hot tea or coffee, while the morning light pours through the window onto the tabletop.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Life is never a straight road. Failure and setbacks are unavoidable parts of the scenery. What matters is that when we realize we are in trouble, we also know how to make peace with ourselves.&lt;/p&gt;
&lt;p&gt;I am writing these words not to vent negativity, but to see my situation clearly, remind myself to accept imperfection, and begin again.&lt;/p&gt;
&lt;p&gt;If you are reading this and feel lost too, I hope you can find your own direction.&lt;/p&gt;
&lt;p&gt;Let&apos;s keep going together.&lt;/p&gt;
</content:encoded></item><item><title>AIGC Plagiarism Detection: CNKI&apos;s Self-Contradiction and a Doomed Battle of Containment</title><link>https://blog.gujiakai.me/en/2026/04/aigc-plagiarism-check-cnki-contradiction/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/04/aigc-plagiarism-check-cnki-contradiction/</guid><description>The academic controversy and reflections sparked by AIGC plagiarism detection during the 2026 graduation season</description><pubDate>Wed, 01 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;AIGC Plagiarism Detection: CNKI&apos;s Self-Contradiction and a Doomed Battle of Containment&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;Selling AI tools to help you write papers with one hand, penalizing you for using AI with the other — CNKI, whose side are you on?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;h2&gt;Prologue: An Absurd Graduation Season&lt;/h2&gt;
&lt;p&gt;The 2026 graduation season is saturated with an unprecedented anxiety across Chinese social media.&lt;/p&gt;
&lt;p&gt;On Xiaohongshu (China&apos;s Instagram-like platform), a master&apos;s student posted her CNKI AIGC detection report — 36.9%, with red flags everywhere. She wrote every word of her thesis by hand; the traditional plagiarism check showed only 1%, but the AI detection slapped her with the label &quot;suspected AIGC-generated.&quot; In the comments, others shared even more outrageous experiences: a handwritten 23,000-word thesis flagged as &quot;medium risk,&quot; a purely original 345-word abstract marked as 99% AI-generated.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/02/fS1s/20260401195613640.webp&quot; alt=&quot;Anxious college student facing an AIGC detection report&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Some spent over a hundred yuan (roughly $15) on a single CNKI AIGC check, only to receive a report that felt like a lottery ticket — the same paper yielded results differing by over 50 percentage points across different platforms. Others discovered that without changing a single word, their AIGC rate skyrocketed from 0.84% to 41.3% after a CNKI system update.&lt;/p&gt;
&lt;p&gt;And the most ironic scene appeared beneath a viral Xiaohongshu post with 20,000 likes: someone discovered that running the flagged paragraphs through &lt;strong&gt;CNKI&apos;s own translation tool&lt;/strong&gt; reduced the AIGC rate to zero. In other words — CNKI&apos;s own AI doesn&apos;t count as AI.&lt;/p&gt;
&lt;p&gt;This isn&apos;t a joke. This is daily life for Chinese college graduates in 2026.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;1. What Is AIGC Detection? How Does It Work?&lt;/h2&gt;
&lt;p&gt;AIGC detection, short for &quot;AI-Generated Content Detection,&quot; aims to determine whether a piece of text was generated by an AI large language model (such as DeepSeek, etc.).&lt;/p&gt;
&lt;p&gt;The underlying principles aren&apos;t overly complex, relying primarily on these technical approaches:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Perplexity Analysis:&lt;/strong&gt; In simple terms, it checks whether a piece of text is &quot;too smooth.&quot; AI-generated text tends to use precise vocabulary, regular sentence structures, and seamless transitions — like a machine doing fill-in-the-blank exercises. Human writing features leaps in thought, sudden colloquial expressions, and even grammatically &quot;incorrect&quot; sentences. Low perplexity = text is too &quot;predictable&quot; = more likely AI-written.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Burstiness Analysis:&lt;/strong&gt; Human writing has a distinctive characteristic — it fluctuates between long and short, dense and sparse. Sometimes you write an ultra-long subordinate clause; sometimes you just drop a single word: &quot;yeah.&quot; AI, however, produces text that&apos;s uniform and steady throughout, like a train cruising at constant speed. Low burstiness = style is too uniform = more likely AI-written.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/02/q8pP/20260401195642966.webp&quot; alt=&quot;Comparison of human writing vs. AI writing characteristics&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Semantic Fingerprinting and Deep Learning Models:&lt;/strong&gt; Some advanced detection systems (such as Turnitin&apos;s Authorship Investigate) construct &quot;semantic fingerprints&quot; of text, analyzing sentence dependency relationships, modifier nesting levels, and over 23 other indicators. Simply put, they try to find traces of AI in the text&apos;s &quot;skeleton.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Watermark Detection:&lt;/strong&gt; Some AI models embed invisible &quot;watermarks&quot; during text generation — for example, restricting the frequency of certain vocabulary, or like Google&apos;s Gemini model using SynthID technology to embed digital watermarks directly into generated text or images. Detection systems identify these statistical anomalies or specific watermark signatures to determine whether content is AI-generated.&lt;/p&gt;
&lt;p&gt;Sounds scientific? Hold on — here come the problems.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;2. Is AIGC Detection Accurate?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;In a word: no. In two words: absolutely not.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This isn&apos;t an emotional outburst — it&apos;s a conclusion backed by substantial evidence.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Classic Literature Flagged as AI:&lt;/strong&gt; Tests show that Zhu Ziqing&apos;s &lt;em&gt;Moonlight Over the Lotus Pond&lt;/em&gt; was flagged as 62.88% AI-generated by one platform, Liu Cixin&apos;s &lt;em&gt;The Wandering Earth&lt;/em&gt; excerpt was flagged at 52.88%, and even Wang Bo&apos;s &lt;em&gt;Preface to the Pavilion of Prince Teng&lt;/em&gt; was judged 100% AI-generated. These works existed decades or even over a thousand years before AI was born.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wildly Different Results Across Platforms:&lt;/strong&gt; The same paper scored 21.76% on the Zhuque platform and 74.07% on SpeedAI — a 52-percentage-point gap. Different platforms use different models and algorithms with no unified standard; detection results are essentially a coin toss.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/02/dCx2/20260401195705112.webp&quot; alt=&quot;The absurd AIGC detection slot machine&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Even OpenAI Gave Up:&lt;/strong&gt; OpenAI once launched its own AI detection tool (AI Classifier), which could only correctly identify 26% of AI-generated text while misclassifying 9% of human writing as AI-generated. The tool was quietly taken offline in July 2023.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Systematic Discrimination Against Non-Native Speakers:&lt;/strong&gt; Stanford University research found that AI detection tools had an average false positive rate of 61.3% for non-native English speakers, with 97.8% of TOEFL essays flagged by at least one detector. The reason is straightforward — non-native speakers tend to use simpler, more &quot;standard&quot; expressions, which happen to match AI writing characteristics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Inherent Bias Against Academic Writing Style:&lt;/strong&gt; Academic papers inherently emphasize rigorous logic, standardized expression, and precise terminology — characteristics that overlap significantly with AI-generated text. The better written, more professional, and more well-organized a paper is, the more likely it is to be flagged as AI-generated. This creates an absurd paradox: &lt;strong&gt;the better your paper is written, the more likely it is to be suspected as not your own work.&lt;/strong&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;3. CNKI&apos;s Self-Contradiction: Selling AI With One Hand, Policing AI With the Other&lt;/h2&gt;
&lt;p&gt;This is the most absurd part of the entire affair.&lt;/p&gt;
&lt;p&gt;On one hand, CNKI actively promotes its AI products — the &quot;CNKI AI Academic Research Assistant&quot; — advertising how it helps researchers improve efficiency, assists with literature reviews, and optimizes writing. On the other hand, CNKI offers its AIGC detection service, charging students 2 yuan per thousand characters to check how much of their paper is &quot;suspected AI-generated.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You encourage me to use AI, then punish me for using AI?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It&apos;s like a car company selling you a vehicle, then setting up a checkpoint at the exit to fine you for driving it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/02/Ir1y/20260401195734287.webp&quot; alt=&quot;CNKI&apos;s self-contradiction: selling AI assistants with one hand, issuing detection fines with the other&quot; /&gt;&lt;/p&gt;
&lt;p&gt;A highly upvoted comment on Xiaohongshu precisely exposed this contradiction: run the paragraphs flagged by CNKI&apos;s AIGC detection through CNKI&apos;s own translation tool, and the AIGC rate drops to zero. CNKI&apos;s own AI output isn&apos;t caught by its own detection system — users jokingly call it &quot;in-house AI doesn&apos;t count as AI.&quot;&lt;/p&gt;
&lt;p&gt;This isn&apos;t a technical bug — it&apos;s the essential nature of the business model laid bare: &lt;strong&gt;For CNKI, AIGC detection is first and foremost a business, and only secondarily a technical problem.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;CNKI was once fined 87.6 million yuan for monopolistic practices. Before the fine, master&apos;s and doctoral thesis plagiarism checks during peak graduation season were scalped at up to 1,200 yuan per check. Only after the penalty did CNKI open up individual checking services. Now, with AIGC detection added, the comprehensive cost for a master&apos;s thesis check runs 280–350 yuan, and doctoral theses cost 380–580 yuan. Due to unstable results, many students have to check repeatedly — some shared receipts totaling four to five hundred yuan.&lt;/p&gt;
&lt;p&gt;A 2,000-like post on Xiaohongshu put it plainly in its title: &quot;My Heartbreaking Journey of Reducing CNKI AIGC Scores, or How I Became a Great Philanthropist&quot; — &quot;donating&quot; hard-earned money to CNKI.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;4. AIGC &quot;Score Reduction&quot;: Turning Good Writing Into Drivel&lt;/h2&gt;
&lt;p&gt;Facing the pressure of AIGC detection, a grey market industry chain has rapidly expanded — AIGC score reduction services.&lt;/p&gt;
&lt;p&gt;The principle is simple: since detection systems flag text that&apos;s &quot;too standardized, too fluent, and too logical&quot; as AI-written, just do the opposite — make good writing look more &quot;human.&quot; How?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Replace professional terminology with colloquial expressions&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Break long sentences into short ones and insert meaningless transitional words&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scramble paragraph logic and order&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add personal feelings and subjective judgments — that &quot;human touch&quot;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Translate Chinese to English and back again, using the &quot;noise&quot; from translation tools to mask AI traces&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result? A well-structured, rigorously argued academic paper gets mangled into something fragmented and incoherent. Students report spending an entire semester writing a 40,000-word thesis, only to delete massive sections to reduce their AIGC rate, submitting a final version far inferior to their first draft.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;This is the greatest irony of AIGC detection: it doesn&apos;t promote academic integrity — it punishes good writing.&lt;/strong&gt; It forces students to turn professional, thoughtful prose into drivel and scramble clear logic into mush, all to satisfy an unreliable algorithm.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/02/Apm3/20260401195758374.webp&quot; alt=&quot;The AIGC score reduction machine: turning polished papers into wastepaper drivel&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;5. Pros and Cons: Is AIGC Detection Worth It?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Potential Benefits:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;To some extent, it deters those who rely entirely on AI to ghostwrite their papers&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It has prompted universities to begin discussing AI&apos;s role in academia&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It has raised public awareness around academic integrity&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Clear Drawbacks:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;High false positive rates that are unfair to original authors&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;No unified detection standards — results contradict each other across platforms&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Increased financial burden and psychological stress on students&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Spawned a grey market for AIGC score reduction that actually lowers paper quality&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Systematic bias against non-native speakers and interdisciplinary researchers&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Platforms like CNKI acting as both referee and player creates severe conflicts of interest&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Those who write earnestly are often the ones punished, while actual ghostwriting operations find ways to evade detection&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On balance, &lt;strong&gt;current AIGC detection does far more harm than good.&lt;/strong&gt; It resembles a hastily launched commercial product rather than a thoroughly validated academic integrity tool.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;6. The Way Forward: Guidance Over Gatekeeping&lt;/h2&gt;
&lt;p&gt;AI is here, and it isn&apos;t leaving. Trying to stop students from using AI with an unreliable detection system is like trying to hold back a flood with a fishing net — you can&apos;t stop the water, and you&apos;ll hurt innocent fish in the process.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The right direction should be &quot;guidance&quot; rather than &quot;gatekeeping&quot;:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Establish Transparent AI Usage Disclosure Systems:&lt;/strong&gt; Instead of guessing whether students used AI, let them proactively declare: what AI tools were used, at which stages, what AI contributed, and what modifications and judgments they made themselves. Leading international journals (Nature, IEEE, Wiley, etc.) are already implementing similar systems requiring authors to disclose AI usage in detail.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Create a Tiered Disclosure Framework:&lt;/strong&gt; Classify AI involvement into four levels — Information Retrieval (AI used only for searching materials), Assisted Optimization (AI provides writing suggestions), Collaborative Creation (AI participates in generating core content), and Primary Generation (AI generates most of the content). Different levels correspond to different disclosure requirements.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prioritize Process Over Product:&lt;/strong&gt; Evaluate whether students truly understand and have mastered their research content through reviewing the writing process (draft history, revision records), in-depth questioning during thesis defense, and supervisors&apos; process-based assessments — rather than relying on a percentage from an algorithm.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Teach Students to Use AI Properly:&lt;/strong&gt; AI is a tool, not a replacement. Universities should offer relevant courses teaching students how to leverage AI for accelerating literature searches, assisting data analysis, and optimizing written expression, while maintaining independent thinking and academic judgment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stop Using Immature Detection Technology as a Hard Metric:&lt;/strong&gt; Multiple top international universities (UCLA, Cornell, Duke, etc.) have explicitly advised against using AI detection tools as the sole basis for academic integrity judgments, citing &quot;immature technology, high false positive rates, and unfairness to students.&quot; Chinese universities should follow suit.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/04/02/y6Dj/20260401195835812.webp&quot; alt=&quot;Guidance over gatekeeping: properly channeling the AI technology wave&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;7. AI Writing Tool Recommendations: Choose the Right Model, Double Your Efficiency&lt;/h2&gt;
&lt;p&gt;Since AI-assisted writing is an irreversible trend, choosing the right tools is crucial. Here are the best AI models for academic writing and long-form content creation (as of April 2026):&lt;/p&gt;
&lt;h3&gt;Top Pick: Claude (Anthropic)&lt;/h3&gt;
&lt;p&gt;Claude is currently the best AI model for academic writing, bar none.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Strong at both code and writing&lt;/strong&gt; — Claude achieves top-tier performance in both coding ability and written composition, which is extremely rare among AI models.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Ultra-long context window&lt;/strong&gt; — Supporting 1 million tokens of context means you can feed in your entire paper and references at once, and Claude can read through everything to provide coherent, in-depth suggestions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Natural writing style with a &quot;human touch&quot;&lt;/strong&gt; — Claude&apos;s output doesn&apos;t have the cookie-cutter &quot;AI voice&quot; of some models; it adjusts its style based on context, handling everything from academic papers to casual blog posts with ease.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Strong logical reasoning&lt;/strong&gt; — Claude excels particularly in writing tasks that require argumentation, analysis, and critical thinking.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Recommended models&lt;/strong&gt;: Claude Opus 4.6 (strongest reasoning + writing), Claude Opus 4.5 (classic, stable choice).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;For Fact-Checking: GPT-5.4 (OpenAI)&lt;/h3&gt;
&lt;p&gt;As OpenAI&apos;s latest flagship model, the GPT series excels at logical reasoning and fact-checking, but its generated text often carries a strong &quot;AI voice,&quot; making it unsuitable for direct use in AI-assisted writing.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Best use case&lt;/strong&gt;: Expression verification, data validation, and logical structuring.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Recommended models&lt;/strong&gt;: GPT-5.4 (professional verification first choice), GPT-5.4 mini (lightweight daily verification).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Alternative: Gemini 3.1 Pro (Google)&lt;/h3&gt;
&lt;p&gt;Gemini 3.1 Pro serves as a viable alternative to Claude Opus models.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Ultra-long context window&lt;/strong&gt; — Gemini 3.1 Pro supports 1 million tokens of context, suitable for processing ultra-large-scale literature reviews.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Strong multimodal capabilities&lt;/strong&gt; — Can directly analyze charts, formulas, and data within papers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Google ecosystem integration&lt;/strong&gt; — Deeply integrated with Google Scholar, Google Docs, and other tools.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Why Not Smaller Parameter Models?&lt;/h3&gt;
&lt;p&gt;This isn&apos;t bias — it&apos;s a technical fact: &lt;strong&gt;model parameter scale directly affects how &quot;human&quot; the output sounds.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Large parameter models (such as Claude Opus 4.6, Gemini 3.1 Pro) have been exposed to more diverse human writing samples during training, so their output more closely resembles human writing in vocabulary richness, sentence variety, and semantic depth. Smaller parameter models, limited by training data and computational resources, tend to produce more &quot;standardized&quot; output — monotonous vocabulary, fixed sentence patterns, and lacking personality.&lt;/p&gt;
&lt;p&gt;What does this mean for academic writing? Using smaller models for writing assistance not only makes the output more likely to be caught by AIGC detection systems, but also shows a noticeable gap in the depth and nuance of academic expression. While some models may have unique advantages in Chinese-language contexts, for overall academic writing performance, it&apos;s still recommended to prioritize top-tier international large parameter models.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Conclusion: Let AI Be Wings, Not Shackles&lt;/h2&gt;
&lt;p&gt;The explosion of ChatGPT in 2023 ushered in the AI era — just three years ago. In those three years, AI has gone from a novelty toy to an indispensable tool. Academia should not greet it with hostility, and certainly should not use an unreliable detection system to manufacture panic.&lt;/p&gt;
&lt;p&gt;As the core platform of China&apos;s academic infrastructure, CNKI should be guiding and regulating, not simultaneously selling AI services and setting up toll booths. This approach of being both referee and player harms students and undermines academic integrity itself.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The best academic integrity isn&apos;t enforced by algorithms — it&apos;s safeguarded by institutions and cultivated through education.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Guidance will always triumph over gatekeeping.&lt;/p&gt;
</content:encoded></item><item><title>The World&apos;s Most Powerful AIs All Failed: Pattern Reasoning Becomes LLMs&apos; Cognitive Graveyard</title><link>https://blog.gujiakai.me/en/2026/03/llm-cannot-solve-civil-service-exam-pattern-reasoning/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/03/llm-cannot-solve-civil-service-exam-pattern-reasoning/</guid><description>On the eve of the provincial civil service exam, I tested GPT 5.4 Pro, Gemini 3, Claude Opus 4.6 and other top AIs on pattern reasoning questions. They all failed spectacularly — some even resorted to searching for answers online. What fatal blind spot does this expose in current AI?</description><pubDate>Sat, 14 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;The World&apos;s Most Powerful AIs All Failed: Pattern Reasoning Becomes LLMs&apos; Cognitive Graveyard&lt;/h1&gt;
&lt;h2&gt;An Accidental &quot;Crash Test&quot;&lt;/h2&gt;
&lt;p&gt;March 14, 2026 — the provincial civil service exam is just days away. Out of curiosity, I fed a set of real pattern reasoning questions to the world&apos;s most powerful AI models: OpenAI&apos;s GPT 5.4 Pro, Google&apos;s Gemini 3 Deep Think, Anthropic&apos;s Claude Opus 4.6, and China&apos;s Doubao.&lt;/p&gt;
&lt;p&gt;The result? &lt;strong&gt;A total wipeout.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;What made it even more laughable was that Gemini 3 Deep Think — the model that supposedly crushes human experts on the &quot;Human Last Exam&quot; — started spouting nonsense when faced with these entry-level civil service exam pattern questions. Meanwhile, GPT 5.4 Pro and Doubao took the &quot;smarter&quot; approach: they simply triggered web searches to look up the original questions and answers from exam prep websites.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;That&apos;s not problem-solving. That&apos;s cheating.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/14/1Fza/20260314031128678.webp&quot; alt=&quot;Doubao triggers a search engine to look up original answers during pattern reasoning&quot; /&gt;&lt;/p&gt;
&lt;p&gt;After disconnecting from the internet and retesting, every model immediately showed its true colors: answers were either completely wrong, or the &quot;patterns&quot; they identified could only explain some of the figures and were logically inconsistent.&lt;/p&gt;
&lt;p&gt;This made me wonder: &lt;strong&gt;These super AIs can write code, prove mathematical theorems, and pass the bar exam — so why can&apos;t they handle a few &quot;find the pattern&quot; picture puzzles?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/14/5uHz/20260314031921188.webp&quot; alt=&quot;AI baffled by pattern reasoning&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Layer 1: Blind from the Start — The Innate Deficiency of Visual Encoding&lt;/h2&gt;
&lt;p&gt;To understand why AI can&apos;t do pattern reasoning, you first need to understand how it &quot;sees&quot; images.&lt;/p&gt;
&lt;p&gt;All current multimodal LLMs process images through roughly this pipeline:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Image → Visual Encoder (ViT) → Image Tokens → Language Model Processing
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The problem lies at the very first step.&lt;/p&gt;
&lt;p&gt;Mainstream visual encoders (like Vision Transformer) were designed from the start to optimize for &lt;strong&gt;semantic recognition&lt;/strong&gt; — enabling AI to instantly recognize whether an image contains a cat, a dog, or a landscape. But what do civil service pattern reasoning questions test? &lt;strong&gt;Fine-grained geometric structures&lt;/strong&gt;: how many lines there are, how many intersection points, how many enclosed regions, which direction the axis of symmetry faces, how many degrees something has rotated.&lt;/p&gt;
&lt;p&gt;This low-level structural information gets &quot;lossy compressed&quot; away during the encoding stage.&lt;/p&gt;
&lt;p&gt;Here&apos;s an analogy: &lt;strong&gt;Asking AI to do pattern reasoning is like asking someone to look at pictures through frosted glass — they can tell it&apos;s &quot;roughly a triangle,&quot; but they can&apos;t count how many line segments are intersecting inside it.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Even worse, visual encoders split images into small patches for processing. The tiny intersection points, open/closed line endpoints, and precise element positions in civil service pattern questions can easily be chopped up or blurred at patch boundaries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If the first step is wrong, how could anything after it be right?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/14/s8Gk/20260314032203444.webp&quot; alt=&quot;The &amp;quot;lossy compression&amp;quot; problem of visual encoding&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Layer 2: No &quot;Mental Canvas&quot; — The Absence of Spatial Reasoning&lt;/h2&gt;
&lt;p&gt;What happens in the human brain during pattern reasoning?&lt;/p&gt;
&lt;p&gt;Our parietal lobe activates a &quot;mental canvas&quot; where we rotate, flip, fold, and overlay shapes. When you see an unfolded diagram, you can mentally &quot;fold&quot; it into a cube. When you see a sequence of figures, you can mentally animate the elements and observe their trajectories.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI has no such canvas.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;What is the fundamental nature of a large language model? It&apos;s &lt;strong&gt;autoregressive token sequence prediction&lt;/strong&gt;. Its entire reasoning process is built on the linear generation of &quot;what&apos;s the next token.&quot; To handle spatial problems, it must first &quot;translate&quot; visual patterns into language descriptions, then reason within the language space.&lt;/p&gt;
&lt;p&gt;This translation process creates a catastrophic information bottleneck:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A rotation relationship between shapes — a human spots it at a glance&lt;/li&gt;
&lt;li&gt;AI needs to first describe: &quot;The first figure has a line pointing upper-left at 45 degrees, the second figure has this line pointing upper-right at 45 degrees...&quot;&lt;/li&gt;
&lt;li&gt;And this description itself is often inaccurate&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even worse, AI lacks &quot;visual working memory.&quot; When humans are solving problems, if a first hypothesis is disproved, our eyes automatically return to the figures to refocus and recount. Once AI generates its first round of descriptions, it can only keep building on top of this potentially erroneous description — it has no ability to &quot;look back.&quot;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/14/D9tg/20260314033050793.webp&quot; alt=&quot;Spatial reasoning comparison: Human brain vs. AI&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Layer 3: The Infinitely Open Rule Space — Not Knowing What&apos;s Being Tested&lt;/h2&gt;
&lt;p&gt;The trickiest aspect of civil service pattern reasoning is this: &lt;strong&gt;You never know which dimension of pattern the question is testing.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It could be line count, number of enclosed regions, symmetry, odd/even vertices for single-stroke drawing, element types, black-white ratios, rotation angles, translation steps... dozens of possible pattern dimensions, and often composites of multiple patterns.&lt;/p&gt;
&lt;p&gt;What do humans rely on? &lt;strong&gt;Rapid visual intuition for screening.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With a single sweep across the figure sequence, the brain automatically notices certain &quot;conspicuous&quot; feature changes, then rapidly forms hypotheses, verifies them, eliminates possibilities, and re-hypothesizes... This is a highly parallel, non-linear cognitive process.&lt;/p&gt;
&lt;p&gt;What does AI rely on? &lt;strong&gt;Sequential testing of verbalized rules.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It lacks that &quot;catch the key insight at a glance&quot; intuition. It can only check each possible pattern one by one in some order. Not only is this extremely inefficient, but more fatally — since it already got the first step wrong (accurately perceiving figure features), all subsequent rule-checking is built on a flawed foundation.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/14/t2yM/20260314033304268.webp&quot; alt=&quot;The maze of pattern space: infinite dimensions of possible test points&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Layer 4: Paradigm Conflict — Probabilistic Generation vs. Rigid Deduction&lt;/h2&gt;
&lt;p&gt;This is the most fundamental issue — and the hardest gap to bridge.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The underlying logic of LLMs is probabilistic prediction.&lt;/strong&gt; Their training objective is to learn statistical correlations from massive data and output &quot;the most probabilistically reasonable text sequence.&quot; The core capability is &quot;correlation fitting,&quot; not &quot;causal deduction.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The underlying logic of civil service pattern reasoning is rigid deduction.&lt;/strong&gt; The pattern you identify must apply 100% to all figures in the question stem, corresponding to exactly one correct option. There&apos;s zero tolerance for probabilistic ambiguity.&lt;/p&gt;
&lt;p&gt;A proper solution process should look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Narrow down the test dimension → Propose a pattern hypothesis →
Verify against every stem figure one by one →
If inconsistency found, immediately reject → Try next dimension →
Find a pattern that fits 100% → Match against all options →
Eliminate distractors → Lock in the unique answer
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a &lt;strong&gt;falsifiable, backtrackable, error-correctable&lt;/strong&gt; closed-loop reasoning process.&lt;/p&gt;
&lt;p&gt;LLM generation, however, is &lt;strong&gt;unidirectional, linear, and non-backtracking&lt;/strong&gt;. It simply generates the &quot;highest-probability pattern + answer&quot; based on input, without rigorous exhaustive verification, and without proactively overturning wrong hypotheses.&lt;/p&gt;
&lt;p&gt;The result: AI frequently outputs a &quot;half-right pattern&quot; — one that explains only some of the stem figures, or where multiple options could match. In civil service exams, this is fatal, because test designers specialize in crafting exactly these traps.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Layer 5: Structural Gaps in Training Data&lt;/h2&gt;
&lt;p&gt;&quot;Then just feed AI more pattern reasoning training data, right?&quot;&lt;/p&gt;
&lt;p&gt;Not that simple.&lt;/p&gt;
&lt;p&gt;First, in LLM pretraining corpora, civil service pattern reasoning content accounts for a &lt;strong&gt;vanishingly small fraction&lt;/strong&gt;. The vast majority of image-text data on the global internet consists of &quot;natural images + semantic descriptions&quot; (beach sunsets, cute pets, product photos), not &quot;abstract geometric figures + logical reasoning chains.&quot;&lt;/p&gt;
&lt;p&gt;Second, even if a model sees large numbers of real exam questions during fine-tuning, what it learns is merely the statistical association of &quot;this image corresponds to correct option C,&quot; not the reasoning process in the explanation.&lt;/p&gt;
&lt;p&gt;This explains why:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Original questions can be answered correctly (via memory matching or search)&lt;/li&gt;
&lt;li&gt;Slight variations (change an element, modify a number) cause immediate failure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, the core reasoning process in pattern reasoning is &lt;strong&gt;non-verbal spatial-visual operations&lt;/strong&gt;. &quot;Mentally rotate this figure 90 degrees&quot; — this action is very difficult to fully describe in language. Even when forcing AI to output a chain of thought (CoT), it&apos;s merely &quot;using language to pretend to reason&quot; without actually completing the spatial operation.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/14/n7gF/20260314033700247.webp&quot; alt=&quot;Training data distribution: structural gaps&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Why Did They Choose to &quot;Cheat&quot;?&lt;/h2&gt;
&lt;p&gt;Returning to the opening observation: why did GPT 5.4 Pro and Doubao resort to searching for answers online?&lt;/p&gt;
&lt;p&gt;This actually demonstrates that &lt;strong&gt;the models &quot;know&quot; they can&apos;t do it&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When AI receives a pattern reasoning question, its visual module feeds back chaotic, low-confidence features to the central system. Meanwhile, its OCR capability is extremely strong, instantly recognizing format features in the question (nine-grid layout, keywords like &quot;select from the given options&quot;).&lt;/p&gt;
&lt;p&gt;It immediately realizes: this is a standardized test question, and the original question with answers likely exists on the internet.&lt;/p&gt;
&lt;p&gt;Since its own hard-computed confidence is very low, while calling a search engine might directly match the original question and achieve 100% accuracy — &lt;strong&gt;the model naturally chooses the path of &quot;least resistance, highest reward.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This isn&apos;t a bug. It&apos;s &quot;smart&quot; behavior trained through RLHF (Reinforcement Learning from Human Feedback). It just happens to look like blatant cheating from our perspective.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Once disconnected from the internet, they had nowhere to hide.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/14/wUd8/20260314041801197.webp&quot; alt=&quot;The logic chain behind the cheating behavior&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Where Is the Path Forward?&lt;/h2&gt;
&lt;p&gt;There&apos;s an academic consensus emerging: to truly crack abstract visual reasoning (like the famous ARC Challenge), simply increasing parameter counts is far from sufficient.&lt;/p&gt;
&lt;p&gt;The promising direction is &lt;strong&gt;Neuro-symbolic AI&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Rather than having the model &quot;squint hard at the image,&quot; it would first automatically invoke a precise visual analysis program (like OpenCV) to extract structural features such as face counts, intersection points, and axis-of-symmetry coordinates, converting them into absolutely accurate symbolic matrices. Then the LLM&apos;s logical capabilities would be used to deduce numerical patterns.&lt;/p&gt;
&lt;p&gt;At CVPR 2023, there was a solver specifically designed for Raven&apos;s Progressive Matrices that used a hybrid architecture of &quot;perception module for attribute extraction + algebraic symbolic reasoning,&quot; achieving 93.2% accuracy on the I-RAVEN dataset — higher than the human benchmark of 84.4%.&lt;/p&gt;
&lt;p&gt;This demonstrates that the issue isn&apos;t &quot;machines inherently can&apos;t do this&quot; — it&apos;s that &quot;handing this task end-to-end to a general-purpose chat model&quot; was never the right approach.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/14/yWd8/20260314042001127.webp&quot; alt=&quot;Future solution: neuro-symbolic systems&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;Civil service pattern reasoning — a task that seems like &quot;just a few find-the-pattern puzzles&quot; — has unexpectedly become a mirror reflecting the boundaries of current AI capabilities.&lt;/p&gt;
&lt;p&gt;It precisely strikes at three major weaknesses of large language models:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Insufficient visual perception precision&lt;/strong&gt; — can&apos;t see accurately&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Missing spatial reasoning mechanisms&lt;/strong&gt; — can&apos;t manipulate mentally&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Absent rigid deduction capability&lt;/strong&gt; — can&apos;t reason strictly&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This also reminds us: &lt;strong&gt;AI&apos;s &quot;intelligence&quot; and human &quot;intelligence&quot; may not be the same thing at all.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It can find statistical patterns across massive text corpora, fluently generate code and articles, and pass professional exams requiring extensive knowledge — but when facing a simple task that requires &quot;truly seeing a figure, truly manipulating it mentally, and truly verifying a pattern with logic,&quot; it remains helpless.&lt;/p&gt;
&lt;p&gt;Perhaps this is one of the last moats of human intelligence.&lt;/p&gt;
&lt;p&gt;At least in 2026, civil service pattern reasoning remains a battlefield that belongs to human test-takers.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;If you&apos;ve also tested AI on pattern reasoning, feel free to share your &quot;crash&quot; stories in the comments.&lt;/em&gt;&lt;/p&gt;
</content:encoded></item><item><title>Perplexity Max Is Great, But I Won&apos;t Subscribe</title><link>https://blog.gujiakai.me/en/2026/03/perplexity-max-not-subscribing/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/03/perplexity-max-not-subscribing/</guid><description>Model Council and Computer are genuinely impressive, but is a $200/month multi-model agent really worth paying for?</description><pubDate>Thu, 12 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/12/wI6b/20260311211253767.webp&quot; alt=&quot;Perplexity Max Is Great, But I Won&apos;t Subscribe — Cover&quot; /&gt;&lt;/p&gt;
&lt;p&gt;On March 11, 2026, Perplexity held its first developer conference — Ask 2026 — in a converted church in San Francisco.&lt;/p&gt;
&lt;p&gt;A company that started with AI search launched a &quot;personal computer&quot; agent, enterprise Computer, the iOS browser Comet, and even partnered with cybersecurity giant CrowdStrike for security collaboration — all in one event. CEO Aravind Srinivas said something ambitious on stage: &quot;Traditional operating systems receive commands; AI operating systems receive goals.&quot;&lt;/p&gt;
&lt;p&gt;Taken together, the signal is clear: Perplexity doesn&apos;t want to be just a search engine anymore. It wants to be the operating system of the AI era.&lt;/p&gt;
&lt;p&gt;This article will focus on the two most noteworthy features — &lt;strong&gt;Model Council&lt;/strong&gt; (multi-model committee) and &lt;strong&gt;Computer&lt;/strong&gt; (multi-model agent) — providing a complete breakdown from mechanism to value to limitations. I&apos;ll finish with my honest take on whether the $200 monthly fee is worth it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/12/ulP4/20260311211838190.webp&quot; alt=&quot;From search engine to agent platform&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;I. Model Council: Three Models Argue, a Fourth Judges&lt;/h2&gt;
&lt;h3&gt;What It Actually Is&lt;/h3&gt;
&lt;p&gt;Model Council launched on February 5, 2026 as an exclusive multi-model research feature for Perplexity Max members.&lt;/p&gt;
&lt;p&gt;The mechanism is straightforward: you ask a question, the system sends it simultaneously to three frontier LLMs (say Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro), each generates an independent response, and then a fourth &quot;chairman model&quot; reviews all outputs and synthesizes a unified answer annotating &lt;strong&gt;consensus areas&lt;/strong&gt; and &lt;strong&gt;points of disagreement&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Users can expand to view each model&apos;s complete original response and switch between different model combinations.&lt;/p&gt;
&lt;h3&gt;Design Philosophy: Making Disagreement Visible&lt;/h3&gt;
&lt;p&gt;The most interesting aspect of this feature isn&apos;t the &quot;synthesis&quot; — it&apos;s the &lt;strong&gt;visualization of disagreement&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When three models converge on a judgment, you gain higher confidence. When they show clear disagreement, you know the issue needs further investigation rather than blind trust in any single model&apos;s output. Conceptually, this is closer to ensemble methods in machine learning than a mere model selector.&lt;/p&gt;
&lt;p&gt;Official recommended use cases include investment research, high-stakes personal decisions, and multi-perspective analysis of complex issues. Within Computer workflows, Model Council serves as the &quot;critical checkpoint reviewer&quot; — subjecting specific analysis or review steps to multi-model cross-examination.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/12/Ux7d/20260311212003083.webp&quot; alt=&quot;Model Council workflow: User question → Three models generate in parallel → Chairman model synthesizes → Unified answer&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;My Take: Interesting, But Not Necessarily Worth Paying For&lt;/h3&gt;
&lt;p&gt;Model Council&apos;s approach is genuinely thought-provoking. In an era where AI outputs are plagued by hallucinations and biases, using multi-model cross-validation to improve reliability is logically sound.&lt;/p&gt;
&lt;p&gt;But here&apos;s the thing: &lt;strong&gt;You can do this yourself.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Ask ChatGPT, Claude, and Gemini the same question separately, compare three windows side by side, and manually judge which response is most reliable — this workflow is a bit clunky, but virtually free (if you already subscribe to each), and &lt;strong&gt;being your own judge&lt;/strong&gt; means you&apos;re actively exercising judgment rather than delegating it to yet another &quot;chairman model&quot; that you also can&apos;t verify.&lt;/p&gt;
&lt;p&gt;Model Council&apos;s value lies in convenience and structured presentation, but it provides no information increment that you couldn&apos;t obtain through manual operation. For anyone with reasonable AI experience, &quot;having your own judgment&quot; matters far more than &quot;letting a fourth model judge for you.&quot;&lt;/p&gt;
&lt;h2&gt;II. Perplexity Computer: 19 Models, One &quot;Digital Employee&quot;&lt;/h2&gt;
&lt;h3&gt;What It Actually Is&lt;/h3&gt;
&lt;p&gt;Perplexity Computer launched for consumers on February 25, with the enterprise version and &quot;Personal Computer&quot; local agent announced at Ask 2026 on March 11.&lt;/p&gt;
&lt;p&gt;Computer is positioned as a &lt;strong&gt;cloud-based multi-model AI agent orchestration platform&lt;/strong&gt;. You describe a goal in natural language (say, &quot;Create a competitive analysis report for this industry&quot;), the system automatically decomposes it into subtasks, routes each subtask to the most suitable AI model, executes autonomously in the background (potentially for hours), and delivers the finished product.&lt;/p&gt;
&lt;p&gt;It orchestrates over 19 models: Claude Opus 4.6 handles core reasoning, Gemini manages deep research, GPT-5.2 handles long-context search, Grok runs lightweight tasks, Nano Banana generates images, Veo 3.1 generates video, and GPT-5.3-Codex specializes in code. Each task runs in an isolated sandbox environment with real file systems and browsers.&lt;/p&gt;
&lt;p&gt;Over 400 connectors integrated: Gmail, GitHub, Slack, Notion, Salesforce, Snowflake, and more.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Personal Computer&lt;/strong&gt; announced on March 11 goes further — it&apos;s resident software running on your own Mac mini, giving AI agents 24/7 access to your local files and applications while inference still runs in Perplexity&apos;s cloud.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/12/3Fqj/20260311212124308.webp&quot; alt=&quot;Perplexity Computer multi-model orchestration architecture&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;The March 6 Update&lt;/h3&gt;
&lt;p&gt;Computer&apos;s first major update after launch landed on March 6, expanding in four directions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Custom Skills&lt;/strong&gt; — You can write &quot;capability descriptions&quot; for repetitive tasks (like fixed report templates or writing style requirements), and Computer will automatically invoke them for relevant tasks without re-explaining each time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Embedded Model Council&lt;/strong&gt; — Directly invoke three-model parallel review within Computer workflows, providing cross-validation for critical decision steps.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Voice Mode&lt;/strong&gt; — Describe tasks, give mid-process feedback, or adjust direction using voice.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GPT-5.3-Codex Coding Sub-Agent&lt;/strong&gt; — When encountering complex coding tasks, automatically assigns to a dedicated code model that can build full-stack applications from scratch and even debug through browser DevTools with GitHub integration.&lt;/p&gt;
&lt;h3&gt;My Take: Concept Is Stunning, Execution Is Questionable&lt;/h3&gt;
&lt;p&gt;Computer&apos;s architecture is genuinely impressive. 19 models dispatched on demand, nested multi-agent workflows, sandbox execution, asynchronous long-running tasks — from a technical vision standpoint, this may be the most aggressive multi-model agent solution on the market.&lt;/p&gt;
&lt;p&gt;But several practical issues are hard to ignore:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First, credit consumption is opaque and expensive.&lt;/strong&gt; A &lt;a href=&quot;http://Builder.io&quot;&gt;Builder.io&lt;/a&gt; reviewer reported spending $200 in two days to build a single webpage. Failed tasks still consume credits, and you can&apos;t predict how much any given task will cost. This pricing model is essentially a black box for users.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Second, the complex coding tasks that can be reliably delivered today are primarily handled by Claude Code.&lt;/strong&gt; While Computer also integrates coding capabilities, Claude Code&apos;s stability and developer experience remain the industry benchmark. Computer is more like Claude Code wrapped in an agent shell, but that shell itself adds uncertainty and cost.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Third, Computer&apos;s positioning heavily overlaps with Manus.&lt;/strong&gt; Both are natural-language-driven, auto-decomposing, background-executing agent systems. Computer&apos;s differentiation lies in multi-model orchestration and Perplexity&apos;s search capabilities, but if the core advantage is merely &quot;more comprehensive search sources,&quot; whether that premium is justified is debatable.&lt;/p&gt;
&lt;h2&gt;III. The Unavoidable Question: Is $200/Month Worth It?&lt;/h2&gt;
&lt;p&gt;Model Council and Computer are both exclusive to &lt;strong&gt;Perplexity Max&lt;/strong&gt; members at $200/month.&lt;/p&gt;
&lt;p&gt;Where does this price sit in the current AI subscription market? Claude Max runs about $100 and gives heavy Opus usage. OpenAI Pro at $200 provides GPT 5.4 Pro and higher usage quotas.&lt;/p&gt;
&lt;p&gt;What&apos;s included in Perplexity Max&apos;s $200? Model Council, Computer (with credits), Deep Research, and unlimited access to all models. Sounds comprehensive, but several concerns linger:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Does Claude Opus get degraded through the Max subscription?&lt;/strong&gt; This is a repeatedly discussed question in the community. When Perplexity acts as a middleware layer calling Anthropic&apos;s API, prompt packaging, context management, and potential token truncation can all affect output quality. The Opus you use through Perplexity may not deliver an identical experience to the Opus in Claude&apos;s official client.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Computer&apos;s credit consumption is another deep water.&lt;/strong&gt; The $200 monthly fee doesn&apos;t mean unlimited Computer usage — complex tasks can rapidly exhaust your credit quota. Moreover, Perplexity has precedent for slashing Deep Research quotas from roughly 500/day to 20/month, triggering widespread criticism of a &quot;bait and squeeze&quot; strategy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Perplexity&apos;s &quot;track record&quot; is also worth noting.&lt;/strong&gt; From early accusations of unauthorized content scraping, to copyright disputes with multiple publishers, to the March 11 federal court ruling banning its AI shopping agent from accessing Amazon, to reports of users having their free Pro memberships obtained through promotional channels silently revoked — this company never hesitates with its &quot;act first, ask later&quot; aggressive approach. This style may drive innovation speed, but it also means product strategies and pricing can shift at any moment, and users&apos; existing benefits may not be reliably protected.&lt;/p&gt;
&lt;h2&gt;IV. Perplexity&apos;s True Moat: Search&lt;/h2&gt;
&lt;p&gt;Having noted many shortcomings, I should acknowledge Perplexity&apos;s core strength.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Its search sources are genuinely comprehensive.&lt;/strong&gt; This point has been widely validated among Chinese internet users who&apos;ve subscribed to Max. Opus 4.6 combined with Perplexity&apos;s proprietary search pipeline delivers research query performance that genuinely surpasses using any single model&apos;s search function alone. Seven parallel search types (web, academic, people, images, video, shopping, social) plus premium data sources like PitchBook and Statista give it real advantages in both breadth and depth of information retrieval.&lt;/p&gt;
&lt;p&gt;If your core need is &lt;strong&gt;high-frequency deep research&lt;/strong&gt; — financial due diligence, market analysis, technology evaluation — Perplexity&apos;s search capability is its most compelling selling point.&lt;/p&gt;
&lt;p&gt;But if your needs center on code development, creative writing, or everyday conversation, this search advantage doesn&apos;t align with your use case.&lt;/p&gt;
&lt;h3&gt;How Long Can the Moat Hold?&lt;/h3&gt;
&lt;p&gt;One must face an industry consensus: &lt;strong&gt;Perplexity has always been viewed as a &quot;wrapper&quot; company.&lt;/strong&gt; It doesn&apos;t train its own foundation models. Its core product is built on APIs from OpenAI, Anthropic, Google, and others, with virtually no model-layer innovation. What it does — combining top SOTA models with comprehensive search sources — does produce an excellent research experience. That&apos;s undeniable.&lt;/p&gt;
&lt;p&gt;The problem is that neither of the two key ingredients in this recipe are in its hands.&lt;/p&gt;
&lt;p&gt;OpenAI&apos;s ChatGPT already has web search and Deep Research capabilities. Anthropic has launched Claude&apos;s Web Search tool and Deep Research. Google&apos;s Gemini naturally sits atop the world&apos;s largest search index. When model providers themselves fill in the search gap, Perplexity&apos;s value as a middleware layer gets continuously compressed. This is why the &quot;Perplexity will die&quot; narrative never goes away in the AI community — not because it does a bad job, but because its core capabilities are too easily replicated by upstream providers.&lt;/p&gt;
&lt;p&gt;Perplexity clearly recognizes this, which is why it&apos;s racing toward an agent platform: Computer, Personal Computer, Comet browser, enterprise edition... every move is an attempt to transition from &quot;search middleman&quot; to &quot;AI operating system,&quot; building deeper product stickiness before users leave. The strategic direction is clear-eyed, but whether it can outrun time is another matter entirely.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/12/huQ0/20260311212852664.webp&quot; alt=&quot;Perplexity&apos;s three-layer product architecture: Search → Deep Research → Computer&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;V. My Conclusion&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;I won&apos;t be subscribing to Perplexity Max.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The reason is simple: compared to Claude Max and OpenAI Pro, the value-for-money isn&apos;t there. Computer&apos;s concept is forward-looking, but the credit black box, unstable quota policies, and the awkward &quot;can do it but not well enough&quot; reality in actual use make it hard for me to justify $200 a month. Model Council&apos;s multi-model cross-validation approach has value, but manual operation is a perfectly viable substitute, and being your own judge is more reliable than relying on a fourth model.&lt;/p&gt;
&lt;p&gt;If you&apos;re considering subscribing, I&apos;d suggest asking yourself two questions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First, is your core need search or execution?&lt;/strong&gt; If it&apos;s search, a Pro membership ($20/month) might be sufficient. If it&apos;s executing complex tasks, Claude Code is still the more stable choice today.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Second, can you accept the risk of pricing and quotas changing at any time?&lt;/strong&gt; Perplexity is a company still iterating rapidly (and experimenting rapidly). The uncertainty in product strategy is real.&lt;/p&gt;
&lt;p&gt;What Perplexity is building — multi-model orchestration, agent workflows, an AI-native operating system — directionally correct. But &quot;directionally correct&quot; and &quot;worth buying now&quot; are separated by a long road.&lt;/p&gt;
&lt;p&gt;Rather than chasing the latest paid features, invest your time in genuinely improving your own judgment. After all, no &quot;committee&quot; of models can substitute for your own independent thinking.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/12/Ig9z/20260311213031273.webp&quot; alt=&quot;Tools evolve, but judgment is in your hands&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;This article was written on March 12, 2026, based on Perplexity&apos;s official blog, changelog, and help center documentation, as well as reporting from TechCrunch, VentureBeat, Digital Trends, Axios, AppleInsider, and other technology media. Views expressed represent the author&apos;s personal opinions and do not constitute subscription or investment advice.&lt;/em&gt;&lt;/p&gt;
</content:encoded></item><item><title>The Industrial Recipe for Synthetic Data: HuggingFace&apos;s 90 Experiments Reveal the Laws of Pretraining Data Production</title><link>https://blog.gujiakai.me/en/2026/03/huggingface-finephrase-synthetic-data/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/03/huggingface-finephrase-synthetic-data/</guid><description>HuggingFace spent 12.7 GPU-years running 90 controlled experiments, finally turning the &apos;alchemy&apos; of synthetic data for LLM pretraining into reproducible &apos;chemistry.&apos;</description><pubDate>Wed, 11 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;The Industrial Recipe for Synthetic Data: HuggingFace&apos;s 90 Experiments Reveal the Laws of Pretraining Data Production&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;As LLM training enters the era of &quot;data is king,&quot; efficiently generating high-quality synthetic data has become a critical challenge. HuggingFace spent 12.7 GPU-years running 90 controlled experiments, finally turning this &quot;alchemy&quot; into reproducible &quot;chemistry.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/11/9sRt/20260311013838016.webp&quot; alt=&quot;Synthetic Data: The New &amp;quot;Data Factory&amp;quot; for LLM Training&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;I. Synthetic Data: The Fourth Paradigm Shift in LLM Training&lt;/h2&gt;
&lt;p&gt;The pretraining data for large language models has gone through several clear evolutionary stages.&lt;/p&gt;
&lt;p&gt;Initially, researchers trained language models on small but high-quality corpora like Wikipedia. Then, datasets like C4 and The Pile pushed the scale to hundreds of gigabytes. Next, projects like FineWeb and DCLM expanded data volumes to trillions of tokens, covering nearly the entire crawlable internet.&lt;/p&gt;
&lt;p&gt;Once web text approached its collection limit, the focus shifted to quality filtering: using neural network classifiers to find &quot;educational&quot; or &quot;instructional&quot; content, filtering massive noisy data down to curated subsets.&lt;/p&gt;
&lt;p&gt;Now, the fourth paradigm is taking shape — &lt;strong&gt;synthetic data&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;NVIDIA&apos;s Nemotron-CC rewrote approximately 2 trillion tokens of web text, Zhipu&apos;s GLM-4.5 series generated 500 billion reasoning tokens for mid-training, and frontier models like Qwen3 and Phi-4 heavily incorporate synthetic content in their training data. Synthetic data has evolved from an &quot;optional augmentation technique&quot; to a &quot;standard production step.&quot;&lt;/p&gt;
&lt;p&gt;But the question remains: &lt;strong&gt;How exactly should you do it?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Which model should generate the data? What prompts should you write? Does source data quality matter? Should you mix it with original data? These questions were previously answered mostly by intuition and trial-and-error. The HuggingFace team decided to answer them with systematic experiments.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;II. 90 Experiments, 1 Trillion Tokens, All to Answer One Question&lt;/h2&gt;
&lt;p&gt;The HuggingFace research team designed a large-scale ablation experiment framework:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Experiment scale&lt;/strong&gt;: 90 complete train-evaluate cycles&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Generation volume&lt;/strong&gt;: Over 1 trillion tokens of synthetic text&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute cost&lt;/strong&gt;: Approximately 12.7 GPU-years (H100)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Evaluation method&lt;/strong&gt;: Each experiment trained a 1.2B parameter proxy model, tested on 12 benchmarks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;They explored along three main lines:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Rewriting strategies&lt;/strong&gt;: Which format transformations actually work? Simple paraphrasing, Q&amp;amp;A pairs, step-by-step tutorials, structured tables...&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Generation models&lt;/strong&gt;: Is bigger always better? Do different model families matter? Are newer versions stronger?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data mixing ratios&lt;/strong&gt;: Does source data quality matter? Can synthetic data be used alone? What should it be mixed with?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The final output was &lt;strong&gt;FinePhrase&lt;/strong&gt; — a synthetic pretraining dataset containing 486 billion tokens that achieved clear advantages across all baselines.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/11/2Inw/20260311014309919.webp&quot; alt=&quot;Systematic Design Framework for 90 Experiments&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;III. Core Finding: Prompt Design Is the Biggest Lever&lt;/h2&gt;
&lt;p&gt;Among variables like model size, model family, and source data quality, &lt;strong&gt;prompt design had by far the greatest impact&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The research team tested existing prompts from projects like Nemotron, REWIRE, and BeyondWeb, and also designed 9 entirely new formats. Results showed that only four formats could consistently beat the strongest raw data baseline, DCLM:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Winning Format&lt;/th&gt;
&lt;th&gt;Core Feature&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FAQ&lt;/td&gt;
&lt;td&gt;Reorganizes content into Q&amp;amp;A pairs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;Converts into math word problems + solutions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Table&lt;/td&gt;
&lt;td&gt;Extracts into structured tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tutorial&lt;/td&gt;
&lt;td&gt;Rewrites as step-by-step tutorials&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Simple paraphrasing (Article), review-style summaries (Commentary), conversational format (Discussion), and narrative retelling (Narrative) all performed unremarkably.&lt;/p&gt;
&lt;p&gt;The key difference: &lt;strong&gt;The winning formats all restructured how knowledge is presented, rather than merely polishing the language&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;FAQ makes implicit questions explicit, Table aggregates scattered information into indexable units, and Tutorial externalizes procedural logic. These transformations force the model to convert implicit knowledge in the original document into structured, explicit representations.&lt;/p&gt;
&lt;p&gt;In other words, the value of synthetic data isn&apos;t in &quot;saying the same thing with better wording&quot; — it&apos;s in &lt;strong&gt;reshaping information into &quot;curriculum formats&quot; better suited for model learning&lt;/strong&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;IV. Counter-Intuitive Finding: A 1B Small Model Is Enough&lt;/h2&gt;
&lt;p&gt;The industry previously held a popular assumption: generating high-quality synthetic data requires 70B or even larger models. The REWIRE project used Llama-3.3 70B.&lt;/p&gt;
&lt;p&gt;HuggingFace&apos;s experimental results directly refuted this assumption.&lt;/p&gt;
&lt;p&gt;They compared the entire Gemma-3 series from 270M to 27B, and concluded:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Simple prompts&lt;/strong&gt;: 1B parameters suffice — no significant difference between 1B and 27B&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Complex prompts&lt;/strong&gt; (like REWIRE&apos;s guided rewriting): 4B needed, but no difference between 4B and 27B&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low-quality source data&lt;/strong&gt;: Larger models don&apos;t help &quot;rescue&quot; it either&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On the cost-efficiency Pareto frontier, the &lt;strong&gt;small model + structured prompt&lt;/strong&gt; combination dominated. A 27B model costs 5-10x more GPU resources than a 1B model, with zero improvement in generation quality.&lt;/p&gt;
&lt;p&gt;Furthermore, in a horizontal comparison of all 1B-class models, &lt;strong&gt;SmolLM2-1.7B crushed all competitors&lt;/strong&gt; — including Qwen3, Gemma-3, Llama-3.2, Granite3, and Falcon3. And SmolLM2 is already a model released over a year ago.&lt;/p&gt;
&lt;p&gt;The practical implication is very direct: &lt;strong&gt;Use the cheapest model, and invest all the savings into data volume.&lt;/strong&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/11/wBv1/20260311014609964.webp&quot; alt=&quot;1B Small Model Beats 27B Large Model: Parameter Count Isn&apos;t the Deciding Factor&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;V. The Most Counter-Intuitive Finding: &quot;Worse&quot; Output Is Actually Better&lt;/h2&gt;
&lt;p&gt;This is probably the most surprising conclusion in the entire study.&lt;/p&gt;
&lt;p&gt;The research team compared the output quality of SmolLM2 and Qwen3 when generating math problems:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;SmolLM2&lt;/th&gt;
&lt;th&gt;Qwen3&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Complete solution rate&lt;/td&gt;
&lt;td&gt;68%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output length range&lt;/td&gt;
&lt;td&gt;4-4000 tokens&lt;/td&gt;
&lt;td&gt;100-2600 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Format consistency&lt;/td&gt;
&lt;td&gt;Messy&lt;/td&gt;
&lt;td&gt;Perfect (with LaTeX)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Most common opening repetition rate&lt;/td&gt;
&lt;td&gt;3/1000&lt;/td&gt;
&lt;td&gt;115/1000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;From a human aesthetic standpoint, Qwen3&apos;s output is impeccable. But the downstream models trained on SmolLM2&apos;s data &lt;strong&gt;actually performed better&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The reason is &lt;strong&gt;Template Collapse&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Qwen3 is too &quot;obedient&quot; — its outputs are highly homogeneous. Out of 1000 samples, 115 had identical openings. This uniformity looks like &quot;standards&quot; to humans, but it&apos;s a disaster for pretraining data. SmolLM2, though &quot;sloppy,&quot; maintained extremely high text diversity.&lt;/p&gt;
&lt;p&gt;This reveals a core paradox of pretraining data: &lt;strong&gt;What humans prefer as &quot;neat&quot; may not be what models need for &quot;generalizability&quot;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;For pretraining, diversity matters far more than consistency. A model that is &quot;less obedient&quot; can actually produce better training data.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;VI. Capability Trade-offs: Synthetic Data &quot;Trades Common Sense for Knowledge&quot;&lt;/h2&gt;
&lt;p&gt;Analyzing experiment results benchmark by benchmark, a consistent pattern emerged:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Nearly all synthetic data significantly outperformed raw data on ARC (scientific knowledge), SQuAD (reading comprehension), and DROP (numerical reasoning)&lt;/li&gt;
&lt;li&gt;But nearly all synthetic data underperformed raw data on HellaSwag and PIQA (common sense reasoning)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The macro scores appear roughly even, but the gains and losses offset each other.&lt;/p&gt;
&lt;p&gt;Synthetic data, through structured rewriting, makes the factual knowledge in web pages &quot;explicit,&quot; making it easier for models to learn retrievable information. But this process simultaneously strips away the common sense, contextual cues, and implicit rules about how the world works that exist in raw web text.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Synthetic data is essentially &quot;trading common sense for knowledge.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This explains another key finding: &lt;strong&gt;Training on pure synthetic data is always worse than mixed training&lt;/strong&gt;. Synthetic data must be blended with high-quality raw data to maintain capability balance.&lt;/p&gt;
&lt;p&gt;Moreover, what you mix in matters critically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;High-quality source data&lt;/strong&gt; → Mix in DCLM (to recover common sense signals)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low-quality source data&lt;/strong&gt; → Mix in FineWeb-Edu-HQ (to supplement knowledge signals)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An important finding from the team: &lt;strong&gt;The choice of mix-in dataset is sometimes more important than the source data itself&lt;/strong&gt;. As long as the mix-in data is strong enough, even rewriting low-quality web pages can approach the effectiveness of rewriting high-quality data. This vastly expands the usable data pool.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/11/Pd7r/20260311014802775.webp&quot; alt=&quot;Synthetic Data&apos;s Capability Trade-off: Trading Common Sense for Knowledge&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;VII. Quality Scores Completely Fail on Synthetic Data&lt;/h2&gt;
&lt;p&gt;FineWeb-Edu-score and DCLM-score are commonly used metrics for filtering high-quality web pages. But when applied to evaluate synthetic data, &lt;strong&gt;their predictive power drops to nearly zero&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The DCLM-score&apos;s correlation with downstream performance was only 0.56-0.61 (moderate), while the Edu-score&apos;s correlation was a mere -0.08 (essentially uncorrelated).&lt;/p&gt;
&lt;p&gt;Even more ironic: Edu-score actually &lt;strong&gt;penalizes&lt;/strong&gt; format transformations that improved performance. When text was converted into tables, FAQs, or mathematical notation, the Edu-score judged &quot;quality decreased&quot; — yet these were precisely the best-performing formats.&lt;/p&gt;
&lt;p&gt;The reason: these scorers were trained on &quot;natural web text&quot; and favor coherent long-form narratives. Structured formats appear as &quot;anomalies&quot; to them, even though they are &quot;optimal&quot; for model learning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The conclusion is harsh: there are no shortcuts. You must complete the full &quot;generate → train → evaluate&quot; pipeline to know the true quality of synthetic data.&lt;/strong&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;VIII. The Cost Revolution at the Engineering Level&lt;/h2&gt;
&lt;p&gt;Cost is another core issue in synthetic data generation.&lt;/p&gt;
&lt;p&gt;The REWIRE project used a 70B model to generate 400 billion tokens, requiring an estimated ~350,000 GPU hours. HuggingFace&apos;s FinePhrase used a 1.7B model to generate 486 billion tokens in only ~14,700 GPU hours.&lt;/p&gt;
&lt;p&gt;Efficiency comparison:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Generation Model&lt;/th&gt;
&lt;th&gt;Token Volume&lt;/th&gt;
&lt;th&gt;GPU Hours&lt;/th&gt;
&lt;th&gt;Efficiency (tokens/GPU hour)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cosmopedia&lt;/td&gt;
&lt;td&gt;Mixtral 8x7B&lt;/td&gt;
&lt;td&gt;25B&lt;/td&gt;
&lt;td&gt;&amp;gt;10K&lt;/td&gt;
&lt;td&gt;&amp;lt;2.5M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REWIRE&lt;/td&gt;
&lt;td&gt;Llama-3.3 70B&lt;/td&gt;
&lt;td&gt;400B&lt;/td&gt;
&lt;td&gt;~352K&lt;/td&gt;
&lt;td&gt;~1.1M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FinePhrase&lt;/td&gt;
&lt;td&gt;SmolLM2-1.7B&lt;/td&gt;
&lt;td&gt;486B&lt;/td&gt;
&lt;td&gt;~14.7K&lt;/td&gt;
&lt;td&gt;~33.1M&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;FinePhrase&apos;s generation efficiency is approximately 30x that of REWIRE and 13x that of Cosmopedia.&lt;/p&gt;
&lt;p&gt;Key optimizations included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Speculative Decoding&lt;/strong&gt;: Extremely effective for small models — SmolLM2 achieved a 1.75x speedup&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tensor Parallelism Optimization&lt;/strong&gt;: Frees KV cache space for large MoE models&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flash-Attn Backend&lt;/strong&gt;: Over 50% faster than FlashInfer (on H100)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means synthetic data production has gone from being &quot;an exclusive game for compute giants&quot; to &lt;strong&gt;an engineering practice accessible to small and mid-sized teams&lt;/strong&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/11/Brv0/20260311015008064.webp&quot; alt=&quot;FinePhrase&apos;s Cost Advantage: 30x Efficiency Improvement&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;IX. Clarification on &quot;Model Collapse&quot;&lt;/h2&gt;
&lt;p&gt;Academia frequently warns that AI training on its own generated data leads to &quot;Model Collapse.&quot;&lt;/p&gt;
&lt;p&gt;HuggingFace directly addressed this concern at the beginning of their paper: &lt;strong&gt;This collapse only occurs under extremely closed experimental conditions&lt;/strong&gt; — where a model iteratively trains on its own output without introducing any new information.&lt;/p&gt;
&lt;p&gt;Real-world industrial practice is entirely different:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Synthetic data is mixed with human data&lt;/li&gt;
&lt;li&gt;Prompts reference diverse reference materials&lt;/li&gt;
&lt;li&gt;Synthetic data is a strategic supplement, not a wholesale replacement&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In their FineWeb research, the team even found that naturally occurring AI-generated content on the web &lt;strong&gt;did not cause model degradation&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The real concern isn&apos;t ordinary synthetic data practices, but rather &lt;strong&gt;the extreme scenario where frontier models generate data for other frontier models in a closed loop&lt;/strong&gt;. Synthetic data that is thoughtfully integrated with fresh perspectives isn&apos;t the problem — it&apos;s the solution.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;X. The Practical Recipe: FinePhrase&apos;s Final Configuration&lt;/h2&gt;
&lt;p&gt;Based on systematic validation across 90 experiments, HuggingFace delivered a concise best-practice recipe:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generation model&lt;/strong&gt;: SmolLM2-1.7B-Instruct&lt;br /&gt;
&lt;strong&gt;Prompt format&lt;/strong&gt;: FAQ, Math, Table, Tutorial (pick one or mix)&lt;br /&gt;
&lt;strong&gt;Source data&lt;/strong&gt;: FineWeb-Edu (relaxed quality requirements)&lt;br /&gt;
&lt;strong&gt;Mix-in data&lt;/strong&gt;: DCLM or FineWeb-Edu-HQ&lt;br /&gt;
&lt;strong&gt;Inference optimization&lt;/strong&gt;: suffix-32 speculative decoding + 0.9 memory utilization&lt;/p&gt;
&lt;p&gt;The core logic of this recipe:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Use structured prompts to reshape knowledge formats&lt;/strong&gt; — this is the biggest lever&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use the smallest adequate model&lt;/strong&gt; — invest savings into data volume&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use strong mix-in data as a safety net&lt;/strong&gt; — recover common sense signals, relax source data requirements&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use engineering optimizations to compress costs&lt;/strong&gt; — make synthetic data production sustainable&lt;/li&gt;
&lt;/ol&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/11/Qu2k/20260311015222443.webp&quot; alt=&quot;FinePhrase Final Recipe: Structured Prompts + Small Model + Strong Mix-in Data&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;XI. Unanswered Questions&lt;/h2&gt;
&lt;p&gt;HuggingFace candidly listed the boundaries and open questions of this research:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Repetition and rewriting&lt;/strong&gt;: If data is rewritten each time it&apos;s repeated, can performance degradation be avoided?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mixing ratios&lt;/strong&gt;: What proportion of synthetic data is optimal? 5%, 20%, or 50%?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sampling strategies&lt;/strong&gt;: Is Best-of-N filtering effective?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scale effects&lt;/strong&gt;: Do these findings hold at 100B+ token training scales?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated optimization&lt;/strong&gt;: Can tools like DSPy be used to automatically search for optimal prompts?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These questions define the agenda for the next phase of synthetic data research.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Conclusion: From &quot;Alchemy&quot; to &quot;Chemistry&quot;&lt;/h2&gt;
&lt;p&gt;The fundamental contribution of this research isn&apos;t releasing yet another larger dataset — it&apos;s &lt;strong&gt;transforming synthetic pretraining data generation from experience-driven trial-and-error into a verifiable, reproducible systematic methodology&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Several core conclusions deserve repeated emphasis:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Prompt design is the primary productivity driver&lt;/strong&gt; — restructure formats, don&apos;t polish language&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Small models are good enough&lt;/strong&gt; — 1B-class suffices; don&apos;t worship parameter counts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Diversity beats consistency&lt;/strong&gt; — &quot;obedient&quot; models may actually produce worse data&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Raw data must be mixed in&lt;/strong&gt; — synthetic data &quot;trades common sense for knowledge&quot;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quality scores are unreliable&lt;/strong&gt; — you must complete the full train-evaluate pipeline&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Synthetic data is evolving from an &quot;optional data augmentation trick&quot; to a &quot;core production step in LLM training.&quot; And this research provides the clearest industrial-grade operating guide to date.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://i.see.you/2026/03/11/Nng7/20260311015437646.webp&quot; alt=&quot;From &amp;quot;Alchemy&amp;quot; to &amp;quot;Chemistry&amp;quot;: Synthetic Data Goes Industrial&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;References&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://huggingface.co/spaces/HuggingFaceFW/finephrase&quot;&gt;The Synthetic Data Playbook: Generating Trillions of the Finest Tokens&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>🦞 A Lobster&apos;s Rise: From Clawdbot to OpenClaw - What Did This AI Crustacean Go Through?</title><link>https://blog.gujiakai.me/en/2026/01/clawdbot-moltbot-openclaw-evolution/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/01/clawdbot-moltbot-openclaw-evolution/</guid><description>From 0 to 100K stars in two months, legal notice from Anthropic, crypto scammers hijacking the brand - this open-source lobster&apos;s transformation is more dramatic than a TV series</description><pubDate>Fri, 30 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/30/TFioCsD1f6QX4G3.webp&quot; alt=&quot;Cover Image&quot; /&gt;&lt;/p&gt;
&lt;h1&gt;A Lobster&apos;s Rise: From Clawdbot to OpenClaw - What Did This AI Crustacean Go Through?&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;&quot;Two months ago, I just spent a weekend casually building a small project. Now it has over 100K stars on GitHub and attracted 2 million visits in a single week.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;These are the words of Peter Steinberger (@steipete), the founder of OpenClaw.&lt;/p&gt;
&lt;p&gt;You might not know him, but you&apos;ve probably used his products - he&apos;s the founder of PSPDFKit, the PDF framework that almost every iOS developer has heard of. After the company was acquired in 2023, Peter planned to retire and enjoy life. Instead, he accidentally created one of the fastest-growing open-source projects in GitHub history.&lt;/p&gt;
&lt;p&gt;Imagine: a weekend project you casually built suddenly goes viral worldwide, and even Anthropic&apos;s (Claude&apos;s parent company) legal team reaches out...This plot is more dramatic than a TV series.&lt;/p&gt;
&lt;p&gt;Today, let&apos;s talk about the story of this &quot;lobster&apos;s&quot; rise.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/30/TcgrCwfuSDmzn8N.webp&quot; alt=&quot;Evolution Timeline&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;🦞 Chapter 1: The Birth of Clawdbot - A &quot;Copycat&quot; Lobster&apos;s Debut&lt;/h2&gt;
&lt;p&gt;In November 2025, Peter had a sudden inspiration: he wanted to build an AI assistant that he could use on WhatsApp.&lt;/p&gt;
&lt;p&gt;Initially, it was just a little thing called &quot;WhatsApp Relay.&quot; But Peter got more and more into it, eventually giving it a proper name: &lt;strong&gt;Clawdbot&lt;/strong&gt; - Claude (Anthropic&apos;s AI) + Claw (lobster claw), complete with a cute lobster mascot called Clawd.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Yes, it&apos;s a pun.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What&apos;s special about this &quot;weekend project&quot;?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It runs entirely on your own computer.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Not one of those &quot;upload your data to someone else&apos;s server&quot; SaaS services, but truly &quot;your computer, your API keys, your data.&quot; Laptop, home server, VPS - your choice.&lt;/p&gt;
&lt;p&gt;As one community member put it: &lt;strong&gt;&quot;This is infrastructure that truly belongs to you.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Clawdbot quickly spread through developer circles. GitHub stars broke 9,000 within 24 hours, and it surpassed 100K two months later. After all, who wouldn&apos;t want an AI assistant that can help you reply to emails, check your calendar, and be ready to serve across 13 platforms including WhatsApp, Telegram, Discord, Slack, Signal, and iMessage?&lt;/p&gt;
&lt;p&gt;Moreover, it remembers everything about you - your preferences, your habits, your previous conversations. It reads your &lt;a href=&quot;http://SOUL.md&quot;&gt;SOUL.md&lt;/a&gt; to understand your personality and &lt;a href=&quot;http://MEMORY.md&quot;&gt;MEMORY.md&lt;/a&gt; to remember your history.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&quot;This thing is way smarter than Siri!&quot;&lt;/strong&gt; someone commented.&lt;/p&gt;
&lt;p&gt;Others remarked: &lt;strong&gt;&quot;2026 is truly the year of personal AI agents.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;🔄 Chapter 2: Moltbot - The Awkward Moment of Forced &quot;Molting&quot;&lt;/h2&gt;
&lt;p&gt;In January 2026, just when Clawdbot was at its peak, Peter received an email.&lt;/p&gt;
&lt;p&gt;From: Anthropic Legal Team.&lt;/p&gt;
&lt;p&gt;The content was polite, but the message was clear: &lt;strong&gt;&quot;Clawdbot and Clawd are too similar to our Claude. Please change the name.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Peter was reasonable about it. After all, they&apos;re a multi-billion dollar company, and he&apos;s just an individual developer - no need to fight back.&lt;/p&gt;
&lt;p&gt;But the question was: what to change it to?&lt;/p&gt;
&lt;p&gt;At 5 AM on January 27th, Peter launched a &quot;naming convention&quot; on Discord. Community members went wild with ideas, and finally settled on &lt;strong&gt;Moltbot&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Molting&lt;/strong&gt; is how lobsters grow - they shed their old shell to grow a bigger new one. This meaning was too perfect: the project was also experiencing a transformation, becoming stronger.&lt;/p&gt;
&lt;p&gt;Peter himself was satisfied: &lt;strong&gt;&quot;Anthropic asked us to rename (trademark issue), honestly? &apos;Molt&apos; is perfect - that&apos;s how lobsters grow.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The mascot also changed from Clawd to Molty.&lt;/p&gt;
&lt;p&gt;But renaming came with more than a few headaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Old users were confused: &quot;Why did Clawdbot suddenly stop working?&quot;&lt;/li&gt;
&lt;li&gt;Someone registered the old brand&apos;s social accounts within 10 seconds to post crypto scam messages&lt;/li&gt;
&lt;li&gt;A fake $CLAWD token pumped to $16 million market cap before crashing&lt;/li&gt;
&lt;li&gt;All the old repository links on GitHub became invalid&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Peter had to urgently contact friends at X (Twitter) and GitHub to get these issues under control.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This experience teaches us: rebranding is truly a tough battle. And internet scammers are always faster than you.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;h2&gt;✨ Chapter 3: OpenClaw - The Lobster&apos;s Final Form&lt;/h2&gt;
&lt;p&gt;Just two days later, on January 29th, Peter announced: &lt;strong&gt;The final name is decided - OpenClaw.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Wait, another change?&lt;/p&gt;
&lt;p&gt;It turned out that &quot;Moltbot,&quot; despite its nice meaning, still had some trademark and domain issues. This time, Peter was prepared:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✅ Trademark search passed&lt;/li&gt;
&lt;li&gt;✅ All domains secured (&lt;a href=&quot;http://openclaw.ai&quot;&gt;openclaw.ai&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;✅ Migration code written in advance&lt;/li&gt;
&lt;li&gt;✅ &lt;code&gt;openclaw doctor&lt;/code&gt; command automatically handles config migration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Open&lt;/strong&gt; represents open source, openness, and community-driven development.&lt;br /&gt;
&lt;strong&gt;Claw&lt;/strong&gt; is a tribute to the lobster heritage, also implying this is an AI that can &quot;take action.&quot;&lt;/p&gt;
&lt;p&gt;In Peter&apos;s words: &lt;strong&gt;&quot;The lobster has finally completed its ultimate molt. Welcome to OpenClaw.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;(By the way, the mascot is still that lobster Molty - some things are sacred and cannot be changed🦞)&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/30/x4d1UakA2XwTVW8.webp&quot; alt=&quot;Feature Showcase&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;🚀 What Can OpenClaw Do Now?&lt;/h2&gt;
&lt;p&gt;I have to say, after all these rounds of evolution, OpenClaw has become a quite mature AI assistant platform. With 107K+ stars, 15K+ forks, and 8,300+ commits on GitHub, these numbers represent an active global community.&lt;/p&gt;
&lt;h3&gt;📱 Full Platform Coverage&lt;/h3&gt;
&lt;p&gt;WhatsApp, Telegram, Discord, Slack, Signal, iMessage, Google Chat, Microsoft Teams, Matrix...supporting &lt;strong&gt;13 messaging platforms&lt;/strong&gt; in total. Wherever you chat, it follows you there.&lt;/p&gt;
&lt;h3&gt;🧠 True &quot;Memory&quot;&lt;/h3&gt;
&lt;p&gt;Unlike those AIs that &quot;forget after chatting,&quot; OpenClaw remembers everything about you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://AGENTS.md&quot;&gt;AGENTS.md&lt;/a&gt;&lt;/strong&gt; — Agent configuration file&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://SOUL.md&quot;&gt;SOUL.md&lt;/a&gt;&lt;/strong&gt; — Personality settings&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://TOOLS.md&quot;&gt;TOOLS.md&lt;/a&gt;&lt;/strong&gt; — Tool preferences&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://MEMORY.md&quot;&gt;MEMORY.md&lt;/a&gt;&lt;/strong&gt; — Memory storage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;It truly gets to know you better over time.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;🎙️ Voice Activation&lt;/h3&gt;
&lt;p&gt;Supports &quot;Always-on Speech&quot; feature on macOS, iOS, and Android, with natural voice interaction through ElevenLabs. Imagine just calling out to your phone to have the AI help you with tasks.&lt;/p&gt;
&lt;h3&gt;🌐 Browser Control + System Access&lt;/h3&gt;
&lt;p&gt;Let it help you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Browse web pages, fill forms, scrape data&lt;/li&gt;
&lt;li&gt;Read and write files, run scripts, execute commands&lt;/li&gt;
&lt;li&gt;Achieve web automation through a dedicated Chrome/Chromium instance&lt;/li&gt;
&lt;li&gt;Even extend functionality through 700+ community skills&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;🔒 Security First&lt;/h3&gt;
&lt;p&gt;In this renamed version, the team committed &lt;strong&gt;34 security-related code updates&lt;/strong&gt;. It uses Docker sandbox mode by default to isolate non-primary sessions and supports tool whitelist and blacklist configurations.&lt;/p&gt;
&lt;p&gt;Peter specifically reminds: Prompt injection remains an industry challenge. It&apos;s recommended to use strong models like Claude Opus 4.5 and follow security best practices.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;🛠️ Migration Guide for Existing Users&lt;/h2&gt;
&lt;p&gt;If you&apos;ve used Clawdbot or Moltbot before, don&apos;t worry - migration is super simple. The installation script will automatically handle everything for you.&lt;/p&gt;
&lt;h3&gt;One-Click Upgrade to OpenClaw&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;# Run the installation script, it will automatically detect old configs and migrate
curl -fsSL https://openclaw.ai/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&apos;s that simple. The installation script will automatically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Detect your system environment (macOS/Linux)&lt;/li&gt;
&lt;li&gt;Verify Node.js version (requires v22+)&lt;/li&gt;
&lt;li&gt;Install the latest version of OpenClaw&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;openclaw doctor&lt;/code&gt; to auto-migrate configurations&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You&apos;ll see output like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;◇  Doctor changes ─────────────────────────────────────────────────────────╮
│  - State dir: ~/.clawdbot → ~/.openclaw (legacy path now symlinked)      │
│  - Migrated legacy config: ~/.clawdbot/clawdbot.json →                   │
│    ~/.openclaw/openclaw.json                                             │
├──────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Optional: Clean Up Old Versions&lt;/h3&gt;
&lt;p&gt;After migration, if you want to completely remove old versions:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Uninstall old Clawdbot (will ask which components to delete)
clawdbot uninstall

# Or uninstall Moltbot
moltbot uninstall
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Important Notes ⚠️&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Old &lt;code&gt;clawdbot&lt;/code&gt; and &lt;code&gt;moltbot&lt;/code&gt; commands still work after migration&lt;/li&gt;
&lt;li&gt;Old config directories are symlinked to the new location, no worry about data loss&lt;/li&gt;
&lt;li&gt;Existing Skills and workflows need no modification&lt;/li&gt;
&lt;li&gt;If you encounter issues, run &lt;code&gt;openclaw doctor --fix&lt;/code&gt; to auto-repair&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Version Comparison Table&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;ClawdBot&lt;/th&gt;
&lt;th&gt;MoltBot&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Config Directory&lt;/td&gt;
&lt;td&gt;~/.clawdbot/&lt;/td&gt;
&lt;td&gt;~/.moltbot/&lt;/td&gt;
&lt;td&gt;~/.openclaw/&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Website&lt;/td&gt;
&lt;td&gt;clawd.bot&lt;/td&gt;
&lt;td&gt;molt.bot&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;http://openclaw.ai&quot;&gt;openclaw.ai&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub&lt;/td&gt;
&lt;td&gt;clawdbot/clawdbot&lt;/td&gt;
&lt;td&gt;moltbot/moltbot&lt;/td&gt;
&lt;td&gt;openclaw/openclaw&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NPM Package&lt;/td&gt;
&lt;td&gt;clawdbot&lt;/td&gt;
&lt;td&gt;moltbot&lt;/td&gt;
&lt;td&gt;openclaw&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/30/et9GQV1NnSRCILT.webp&quot; alt=&quot;Community&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;🔮 Future Outlook&lt;/h2&gt;
&lt;p&gt;OpenClaw&apos;s story is far from over.&lt;/p&gt;
&lt;p&gt;Peter is working on several big things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Security Hardening (Top Priority)&lt;/strong&gt; — Continuously strengthening codebase security&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gateway Reliability Improvements&lt;/strong&gt; — Making it smoother for more people to use&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expanding Model Support&lt;/strong&gt; — Already supports KIMI K2.5, Xiaomi MiMo-V2-Flash, and other new models&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Establishing Sustainable Funding&lt;/strong&gt; — Wanting to pay core maintainers full-time salaries&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expanding the Maintainer Team&lt;/strong&gt; — One person really can&apos;t handle it all&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Community members have already done super cool things with OpenClaw:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Automatically managing emails and calendars&lt;/li&gt;
&lt;li&gt;Remotely controlling code compilation and testing&lt;/li&gt;
&lt;li&gt;Using Sentry webhooks to automatically catch errors and submit PR fixes&lt;/li&gt;
&lt;li&gt;Secure remote access through Tailscale&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One user put it well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&quot;The open-source community built a product better than Apple&apos;s Siri with just a few people. Welcome to the AI era - one person plus one code repository can fill the gap left by trillion-dollar companies.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;h2&gt;📝 Final Thoughts&lt;/h2&gt;
&lt;p&gt;From Clawdbot to Moltbot to OpenClaw, this lobster has been through quite a lot.&lt;/p&gt;
&lt;p&gt;Targeted by Anthropic&apos;s legal team, exploited by crypto scammers, renamed twice within two days...&lt;/p&gt;
&lt;p&gt;But it&apos;s still alive, and thriving more than ever.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;107K+ GitHub stars, 15K+ forks, 2 million weekly visits, a global developer community...&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Behind these numbers is a simple belief:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Your AI assistant should truly belong to you. 100% open source, MIT license, forever free.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you&apos;d like to try this &quot;lobster,&quot; check out the official website:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;🌐 Website: &lt;a href=&quot;https://openclaw.ai&quot;&gt;https://openclaw.ai&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;💻 GitHub: &lt;a href=&quot;https://github.com/openclaw/openclaw&quot;&gt;https://github.com/openclaw/openclaw&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;📖 Documentation: &lt;a href=&quot;https://docs.openclaw.ai&quot;&gt;https://docs.openclaw.ai&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;💬 Discord Community: &lt;a href=&quot;https://discord.gg/openclaw&quot;&gt;https://discord.gg/openclaw&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Perhaps it will become your most capable digital assistant of 2026?&lt;/p&gt;
&lt;p&gt;After all, lobsters molt to grow bigger. And OpenClaw is just beginning its growth journey. 🦞&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.molt.bot/blog/introducing-openclaw&quot;&gt;Introducing OpenClaw&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/openclaw/openclaw&quot;&gt;OpenClaw GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dev.to/sivarampg/from-clawdbot-to-moltbot-how-a-cd-crypto-scammers-and-10-seconds-of-chaos-took-down-the-4eck&quot;&gt;From Clawdbot to Moltbot - DEV Community&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>Clawdbot to Moltbot: A 72-Hour Internet Drama</title><link>https://blog.gujiakai.me/en/2026/01/clawdbot-notes/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/01/clawdbot-notes/</guid><description>A 60,000-star open source project forced to rename, hijacked by crypto scammers within 10 seconds, a $16 million fake token crash - this 72-hour storm exposed the fragility and absurdity of the open source ecosystem in the AI era</description><pubDate>Wed, 28 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;Clawdbot to Moltbot: A 72-Hour Internet Drama&lt;/h1&gt;
&lt;h2&gt;Chapter 1: An Overnight Open Source Sensation&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;January 26, 2026&lt;/strong&gt; — An open source project called &lt;strong&gt;Clawdbot&lt;/strong&gt; suddenly went viral.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://kimi-web-img.moonshot.cn/img/linux.do/81c597191be3bfce2b07b0a3a5d8fec972e5511a.png&quot; alt=&quot;Moltbot Logo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Created by Austrian developer &lt;strong&gt;Peter Steinberger&lt;/strong&gt; (@steipete), Clawdbot is a self-hosted AI assistant that can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Run on WhatsApp, Telegram, Discord, Slack, Signal, and iMessage&lt;/li&gt;
&lt;li&gt;Maintain persistent memory, remembering user preferences and conversation history&lt;/li&gt;
&lt;li&gt;Control browsers, execute shell commands, and manage calendars&lt;/li&gt;
&lt;li&gt;Proactively send notifications and reminders&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Steinberger is no unknown — he founded PSPDFKit (now rebranded as Nutrient), &quot;retired&quot; after receiving a $100M+ investment from Insight Partners in 2021, and has now returned to build this &quot;Claude with hands.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Its growth was absolutely insane:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;🚀 Within 24 hours: &lt;strong&gt;9,000+ GitHub stars&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;🚀 Within 72 hours: &lt;strong&gt;60,000+ GitHub stars&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;🚀 Became one of the fastest-growing open source projects in GitHub history&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Andrej Karpathy (former Tesla AI Director) publicly praised it, David Sacks (PayPal Mafia member) tweeted about it, and MacStories called it &quot;the future of personal AI assistants.&quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Chapter 2: Anthropic&apos;s &quot;Trademark Bomb&quot;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;January 27, 2026&lt;/strong&gt; — At the peak of Clawdbot&apos;s viral moment, &lt;strong&gt;Anthropic&lt;/strong&gt; (Claude&apos;s parent company) sent a trademark-related request.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://kimi-web-img.moonshot.cn/img/upload.wikimedia.org/2551b26ed53e3c284329af5a426c7234c23a990a.png&quot; alt=&quot;Anthropic Claude Logo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The problem?&lt;/strong&gt; Anthropic believed &lt;strong&gt;&quot;Clawd&quot;&lt;/strong&gt; was too similar to &lt;strong&gt;&quot;Claude&quot;&lt;/strong&gt;, constituting potential trademark infringement.&lt;/p&gt;
&lt;p&gt;Founder Peter Steinberger announced on X:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;🦞 &lt;strong&gt;BIG NEWS: We&apos;ve molted!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Clawdbot → Moltbot&lt;/strong&gt;&lt;br /&gt;
&lt;strong&gt;Clawd → Molty&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Same lobster soul, new shell.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Anthropic asked us to change our name (trademark stuff), and honestly? &quot;Molt&quot; fits perfectly — it&apos;s what lobsters do to grow.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The rebranding was cleverly conceived:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lobsters grow by molting&lt;/li&gt;
&lt;li&gt;The project was also &quot;molting&quot; into a new form&lt;/li&gt;
&lt;li&gt;New website: &lt;strong&gt;molt.bot&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h2&gt;Chapter 3: 10 Seconds of Disaster 💥&lt;/h2&gt;
&lt;p&gt;However, the renaming process turned into a &lt;strong&gt;disaster&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Peter Steinberger tried to simultaneously rename the GitHub organization and X/Twitter accounts. In the &lt;strong&gt;mere 10-second gap&lt;/strong&gt; between releasing the old names and registering the new ones, &lt;strong&gt;crypto scammers snatched both accounts&lt;/strong&gt;!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&quot;Had to rename our accounts for trademark stuff and messed up the GitHub rename and the X rename got snatched by crypto shills. That went wonderful.&quot;&lt;/em&gt;&lt;br /&gt;
— Peter Steinberger&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The scammers had clearly been monitoring for this opportunity. They instantly seized:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;❌ The original @clawdbot X account&lt;/li&gt;
&lt;li&gt;❌ The original Clawdbot GitHub organization&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;They then began pushing cryptocurrency scams to &lt;strong&gt;tens of thousands of unsuspecting followers&lt;/strong&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Chapter 4: The $16 Million Fake Token Scam&lt;/h2&gt;
&lt;p&gt;The account hijacking was just the beginning. Within hours, a &lt;strong&gt;fake $CLAWD token&lt;/strong&gt; appeared on the Solana blockchain.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://kimi-web-img.moonshot.cn/img/masterthecrypto.com/73db0ac52d91fa61f40fc34aec4d72f906cba3a7.jpg&quot; alt=&quot;Crypto Scam&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scam timeline:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;📈 Fake token market cap surged to &lt;strong&gt;$16,000,000&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;📉 Peter Steinberger publicly stated he would &quot;never launch a token&quot;&lt;/li&gt;
&lt;li&gt;📉 Token price instantly crashed &lt;strong&gt;90%+&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;💸 Late buyers got &quot;rugged,&quot; scammers walked away with millions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Peter was forced to tweet a warning:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&quot;To all crypto folks: Please stop pinging me, stop harassing me. I will never do a coin. Any project that lists me as coin owner is a SCAM.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;h2&gt;Chapter 5: Security Nightmares Surface&lt;/h2&gt;
&lt;p&gt;Meanwhile, security researchers discovered &lt;strong&gt;serious security vulnerabilities&lt;/strong&gt; in Clawdbot/Moltbot.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Blockchain security firm SlowMist reported:&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;Multiple unauthenticated instances are publicly accessible, and several code flaws may lead to credential theft and even remote code execution.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Researcher Jamieson O&apos;Reilly found:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Searching Shodan for &quot;Clawdbot Control&quot; revealed &lt;strong&gt;hundreds of exposed control panels&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;These panels contained: &lt;strong&gt;API keys, bot tokens, OAuth secrets, complete conversation histories&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Attackers could: &lt;strong&gt;impersonate users to send messages, execute commands, steal data&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Demo attack:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Archestra AI CEO &lt;strong&gt;Matvey Kukuy&lt;/strong&gt; sent a malicious email with prompt injection to an exposed Moltbot instance. After the AI read the email, it believed the &quot;legitimate instructions&quot; and &lt;strong&gt;forwarded the user&apos;s 5 most recent emails to the attacker&apos;s address&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The whole process took only 5 minutes.&lt;/strong&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Chapter 6: Community vs Anthropic&lt;/h2&gt;
&lt;p&gt;The community began questioning Anthropic&apos;s decision.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key issues:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Clawdbot actually &lt;strong&gt;drove Claude usage&lt;/strong&gt; — many users specifically configured Clawdbot to use Claude as its underlying model&lt;/li&gt;
&lt;li&gt;This was a &lt;strong&gt;rapidly rising project&lt;/strong&gt; bringing Anthropic free marketing and API revenue&lt;/li&gt;
&lt;li&gt;The renaming chaos caused &lt;strong&gt;actual security disasters and financial losses&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The similarity between &quot;Clawd&quot; and &quot;Claude&quot; was obviously &lt;strong&gt;playful&lt;/strong&gt;, not malicious infringement&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;DHH (Ruby on Rails creator)&lt;/strong&gt; criticized Anthropic&apos;s recent moves as &quot;customer hostile.&quot;&lt;/p&gt;
&lt;p&gt;AWS Hero &lt;strong&gt;AJ Stuyvenberg&lt;/strong&gt; was more direct: &quot;They&apos;re speedrunning the journey from forgivable startup to loathsome corporation before any exit!&quot;&lt;/p&gt;
&lt;p&gt;Developers began looking at OpenAI&apos;s Codex CLI (Apache 2.0 license), questioning whether Anthropic was becoming the kind of company they didn&apos;t want to build on.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Finale: Fighting on Multiple Fronts&lt;/h2&gt;
&lt;p&gt;Peter Steinberger is now simultaneously dealing with:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Front&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🔄 Recovering hijacked GitHub/X accounts&lt;/td&gt;
&lt;td&gt;In progress&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🛡️ Dealing with crypto scammer harassment&lt;/td&gt;
&lt;td&gt;Ongoing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;👥 Managing 8,900+ Discord community members&lt;/td&gt;
&lt;td&gt;Active&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔒 Fixing security vulnerabilities&lt;/td&gt;
&lt;td&gt;Urgent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;📢 Rebuilding brand awareness&lt;/td&gt;
&lt;td&gt;Challenging&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr /&gt;
&lt;h2&gt;Deeper Lessons&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;For open source builders:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Building on corporate platforms means facing ambiguous trademark policies. A single legal letter can force you to rename, exposing you to account hijacking, scams, and chaos.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For AI companies:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Your most passionate supporters are indie developers building quirky experimental tools. Sending legal notices to viral open source projects — ones driving your API usage — is a choice worth careful consideration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For users:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Self-hosting AI agents with root privileges is both powerful and dangerous. The security models for these tools are still immature. Don&apos;t run them on your main machine, don&apos;t give them access to crypto wallets. Use dedicated hardware, isolated accounts, and strict IP whitelisting.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;🤔 Final Thoughts: Is Anthropic Really the &quot;Righteous&quot; Party?&lt;/h2&gt;
&lt;p&gt;This isn&apos;t the first time Anthropic has angered the developer community.&lt;/p&gt;
&lt;p&gt;Just two weeks ago (January 9), Anthropic suddenly banned all users accessing Claude Pro/Max subscriptions through third-party tools — no warning, no migration path. Developers who had deeply integrated Claude into their workflows were &quot;backstabbed&quot; overnight.&lt;/p&gt;
&lt;p&gt;Now there&apos;s the Clawdbot incident.&lt;/p&gt;
&lt;p&gt;A company that touts &quot;AI safety&quot; and &quot;responsible AI&quot; takes trademark action against an open source project that was obviously a good-faith pun and actually promoting the Claude ecosystem. The irony:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Clawdbot drove more people to use Claude API&lt;/strong&gt; → Anthropic makes more money&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clawdbot demonstrated Claude&apos;s capabilities&lt;/strong&gt; → Free marketing material&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clawdbot&apos;s developer was a Claude superfan&lt;/strong&gt; → Community evangelist&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result? A legal letter, a PR disaster, and a group of once-enthusiastic developers seriously considering migration to OpenAI.&lt;/p&gt;
&lt;p&gt;Anthropic&apos;s slogan is &quot;AI safety,&quot; but they seem more adept at &quot;developer hostility.&quot;&lt;/p&gt;
&lt;p&gt;When a company&apos;s legal department is more active than its product department, perhaps it&apos;s time to ask: &lt;strong&gt;Whose safety are they really protecting?&lt;/strong&gt; The users&apos; safety, or their own trademark empire?&lt;/p&gt;
&lt;p&gt;Once the trust of the open source community is lost, it&apos;s hard to rebuild. Anthropic should perhaps reconsider: in the marathon of AI, the real moat is technology and ecosystem, not legal letters.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;🔗 Related Links:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;New project homepage: &lt;a href=&quot;https://molt.bot&quot;&gt;molt.bot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href=&quot;https://github.com/moltbot&quot;&gt;github.com/moltbot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;X account: &lt;a href=&quot;https://x.com/moltbot&quot;&gt;@moltbot&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;This is the reality of the open source AI world: overnight fame, legal threats, crypto scams, security vulnerabilities — all within 72 hours.&lt;/em&gt; 🦞💥&lt;/p&gt;
</content:encoded></item><item><title>Claude&apos;s Founder at Davos: When Programmers No Longer Need to &apos;Write&apos; Code</title><link>https://blog.gujiakai.me/en/2026/01/dario-amodei-davos-interview/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/01/dario-amodei-davos-interview/</guid><description>Insights from Anthropic founder Dario Amodei&apos;s latest Davos interview: Claude&apos;s real capabilities, the rise of Chinese open source, and how we should adapt</description><pubDate>Thu, 22 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;Claude&apos;s Founder at Davos: When Programmers No Longer Need to &apos;Write&apos; Code&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;Insights from Anthropic founder Dario Amodei&apos;s latest Davos interview: Claude&apos;s real capabilities, the rise of Chinese open source, and how we should adapt&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/22/ioNbRaxZGSFMWjz.webp&quot; alt=&quot;Dario Amodei at the Davos Forum interview&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;If you&apos;ve used Claude, you&apos;ve probably experienced this frustrating moment: you&apos;re in the middle of a great conversation, and suddenly your account gets suspended. You finally appeal and get it back, only to end up in the penalty box again a few days later.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/22/jvknM79bQRqKh6B.webp&quot; alt=&quot;Claude&apos;s notorious account suspensions&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In AI circles, Claude&apos;s &quot;ban-prone nature&quot; is almost a meme. But strangely enough, nine out of ten users who&apos;ve been banned still find their way back—because once you&apos;ve used it, you know &lt;strong&gt;this thing is genuinely powerful&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;On January 20, 2026, Dario Amodei, founder of Anthropic (the company behind Claude), gave an interview to Bloomberg at the World Economic Forum in Davos. This usually low-profile AI leader shared plenty of insights: What makes Claude so strong? Has Chinese AI caught up? Will programmers face mass unemployment?&lt;/p&gt;
&lt;p&gt;Today, let&apos;s dive into this interview—and maybe pour some cold water on a few points where Amodei&apos;s views deserve some pushback.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;I. &quot;Two Months Without Writing Code&quot;: AI Programming Isn&apos;t as Magical as It Sounds&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/22/7MchpuKorYjNTzG.webp&quot; alt=&quot;Claude Code working under a programmer&apos;s guidance&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The most eye-catching quote from the interview was about their Claude Code product lead:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;He hasn&apos;t written a single line of code in two months. Claude writes everything.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;At first glance, it sounds like programmers are about to become obsolete, right?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hold on—let&apos;s break down the actual meaning here.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;First, &quot;not writing code&quot; doesn&apos;t mean &quot;not working.&quot; What this person is still doing includes: designing system architecture, breaking down requirements, writing prompts, reviewing AI-generated code, debugging and testing, making technical decisions...&lt;/p&gt;
&lt;p&gt;In other words, he went from being &quot;someone who writes code&quot; to &quot;someone who directs AI to write code.&quot;&lt;/p&gt;
&lt;p&gt;It&apos;s like switching from manual to automatic transmission—sure, you don&apos;t need to work the clutch anymore, but you still need to know when to hit the gas and when to turn the wheel. &lt;strong&gt;Lose control of the wheel, and you&apos;ll still crash.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Amodei himself admitted in the interview that while AI&apos;s cognitive capabilities are growing exponentially, &quot;fully automated programming&quot; is still an unrealistic fantasy. No matter how strong Claude is, it still needs humans to guide it with precise prompts and professional judgment to ensure quality output.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;So here&apos;s the truth: Claude isn&apos;t replacing programmers—it&apos;s amplifying their capabilities.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A programmer who knows how to use Claude might be ten times more efficient than one who doesn&apos;t. But the prerequisite is that you need to be a competent programmer first—knowing what you want and being able to judge whether AI&apos;s output is correct.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;II. Is Chinese AI Falling Behind? The Question Itself Is Wrong&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/22/DphJBgIj6ytzmad.webp&quot; alt=&quot;Qwen models consistently rank #1 globally in downloads&quot; /&gt;&lt;/p&gt;
&lt;p&gt;There was an interesting exchange in the interview. The host asked Amodei: How&apos;s the competition with Chinese AI companies going?&lt;/p&gt;
&lt;p&gt;Amodei&apos;s answer: When competing for enterprise client contracts, we&apos;ve hardly ever lost to Chinese models.&lt;/p&gt;
&lt;p&gt;That sounds impressive, but think about it—&lt;strong&gt;this comparison isn&apos;t exactly fair.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;What kind of product is Claude? It&apos;s backed by trillion-parameter large models, burning astronomical amounts of compute and funding, targeting the high-end enterprise market.&lt;/p&gt;
&lt;p&gt;Meanwhile, the most active force in Chinese AI is on a completely different track: &lt;strong&gt;open source&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;DeepSeek, Qwen, GLM... These models might not match Claude on certain benchmarks, but they&apos;ve achieved something more important: &lt;strong&gt;making AI accessible to ordinary developers and small businesses.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can deploy them on your own servers without worrying about data privacy. You can fine-tune them for your specific needs without being constrained by API limitations. Most importantly, &lt;strong&gt;the cost is lower by an order of magnitude or more&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is what&apos;s called &quot;AI democratization&quot;—not every company can afford Claude&apos;s enterprise subscription, but almost every developer can run an open-source model.&lt;/p&gt;
&lt;p&gt;Amodei&apos;s assessment of Chinese AI in the interview has a bit of a &quot;let them eat cake&quot; flavor. He&apos;s speaking from the perspective of a top AI company CEO, seeing the competitive landscape in the premium market. But he may be underestimating the power of the open-source ecosystem—historically, Linux beating Unix and Android sweeping the mobile market weren&apos;t about being &quot;stronger,&quot; but about being &quot;more accessible.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The real AI landscape isn&apos;t a competition over who&apos;s stronger—it&apos;s a multi-layered ecosystem.&lt;/strong&gt; Claude can be the crown jewel, but Chinese open-source models are continuously lowering the barrier to AI, enabling more people to participate in this transformation.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;III. Will Programmers Lose Their Jobs? It&apos;s a False Dichotomy&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/22/IlEhNZxivVeqatQ.webp&quot; alt=&quot;Programmers collaborating with AI tools&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In the interview, the host asked a pointed question: Will AI cause mass unemployment?&lt;/p&gt;
&lt;p&gt;Amodei&apos;s answer was honest: We might see rapid GDP growth and rising unemployment at the same time.&lt;/p&gt;
&lt;p&gt;That&apos;s fair enough, but I want to look at this question from a different angle.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Instead of asking &quot;will programmers lose their jobs,&quot; ask &quot;what kind of programmers will lose their jobs.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Every technological revolution in history has seen some people eliminated and others rise. When Excel appeared, those skilled at the abacus lost their advantage. When CAD became widespread, hand-drafting skills became less valuable. But the professions of accountant and engineer didn&apos;t disappear.&lt;/p&gt;
&lt;p&gt;AI programming tools follow the same logic.&lt;/p&gt;
&lt;p&gt;Those who&apos;ll be eliminated are the ones who can only mechanically type code, don&apos;t understand business logic, and can&apos;t ask questions—the &quot;code monkeys.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Those who&apos;ll thrive are the ones who can use AI as a &quot;super assistant&quot;:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can precisely describe requirements to get high-quality code from AI&lt;/li&gt;
&lt;li&gt;Can quickly review AI output and spot the pitfalls&lt;/li&gt;
&lt;li&gt;Can integrate AI into their workflow to dramatically boost efficiency&lt;/li&gt;
&lt;li&gt;Most importantly, &lt;strong&gt;can continuously learn new tools and methods&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Amodei said people at his company &quot;haven&apos;t written code in two months,&quot; but what he didn&apos;t mention is that these people are learning how to use AI better every single day.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;That&apos;s the real lesson: it&apos;s not enough to learn one tool—you need to develop the ability for &quot;continuous learning.&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Claude is powerful today, but tomorrow something stronger might come along. Today&apos;s prompt engineering techniques might be obsolete next year. The only constant is change itself.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;IV. Final Thoughts: Stay Clear-Headed, Stay Curious&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/22/itwsekgHoLd9bDy.webp&quot; alt=&quot;A heartwarming scene of human-AI collaboration&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this interview, Amodei displayed the typical perspective of an AI company CEO: confident in his own product, cautious about competitors, both optimistic and careful about the future.&lt;/p&gt;
&lt;p&gt;But as ordinary people, we don&apos;t need to accept any leader&apos;s views wholesale.&lt;/p&gt;
&lt;p&gt;Claude is indeed powerful, but it&apos;s not the only option, nor is it omnipotent. Chinese open-source models may fall short in some areas, but they&apos;re bringing AI technology benefits to more people. Programmers do face challenges, but where there are challenges, there are opportunities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If I had to summarize the takeaway from this interview in one sentence, it would be:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI is a tool, not magic. Those who learn to use it will become stronger; those who expect it to think for them will eventually be left behind.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As for Claude&apos;s account suspension issues... well, use it while you can.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;This article is based on Bloomberg&apos;s Davos interview from January 20, 2026. Views expressed are the author&apos;s own.&lt;/em&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;[Discussion Topic]&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Have you used AI programming tools at work? How was the experience? Feel free to share your stories in the comments~&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Ckt1cj0xjRM&quot;&gt;Anthropic&apos;s Amodei on AI: Power and Risk&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>2:30 AM Inspiration: Why Google&apos;s Hottest AI Model Is Called &apos;Nano Banana&apos;</title><link>https://blog.gujiakai.me/en/2026/01/how-nano-banana-got-its-name/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/01/how-nano-banana-got-its-name/</guid><description>Why is Google DeepMind&apos;s image generation model called Nano Banana? Turns out a product manager, pressured to submit a codename at 2:30 AM, casually combined her two nicknames. A joke-like naming decision became one of the most viral AI product names ever.</description><pubDate>Sun, 18 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;2:30 AM Inspiration: Why Google&apos;s Hottest AI Model Is Called &quot;Nano Banana&quot;&lt;/h1&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/18/27nFLT9kPYlCatV.webp&quot; alt=&quot;Nano Banana Pro Generated Google Logo Banana Image&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Starting mid-last year, a Google AI model went viral—not because of how powerful it is (though it certainly is powerful), but because of its name: &lt;strong&gt;Nano Banana&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Yes, you read that right. A serious AI image generation model with a name like &quot;Nano Banana.&quot;&lt;/p&gt;
&lt;p&gt;What&apos;s the story behind this?&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;It All Started with a 2:30 AM Message&lt;/h2&gt;
&lt;p&gt;The story begins last July.&lt;/p&gt;
&lt;p&gt;At the time, the Google DeepMind team was preparing to launch a new image generation model on LMArena (an AI model evaluation platform). The technical name was already set—Gemini 2.5 Flash Image—but the platform needed a public codename.&lt;/p&gt;
&lt;p&gt;The problem was—everyone kept putting it off.&lt;/p&gt;
&lt;p&gt;Until 2:30 AM the night before launch, when a colleague messaged product manager Naina Raisinghani:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;We need to submit the codename now.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/18/6yFzj7skhpU2ilT.webp&quot; alt=&quot;We Need to Submit the Codename Now&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;&quot;How About Nano Banana?&quot;&lt;/h2&gt;
&lt;p&gt;Drowsy and half-asleep, Naina&apos;s brain popped out an idea: &lt;strong&gt;Nano Banana&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Why this name? It turns out it came from her own nicknames:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Friends called her &lt;strong&gt;Naina Banana&lt;/strong&gt; (because Naina rhymes with Banana)&lt;/li&gt;
&lt;li&gt;Some also called her &lt;strong&gt;Nano&lt;/strong&gt; (because she&apos;s petite and loves computers)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So she combined her two nicknames—&lt;strong&gt;Nano Banana&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;And the name was surprisingly fitting: since this was a Flash (lightning-fast) model, Nano (meaning tiny) perfectly hinted at its lightweight and speedy nature.&lt;/p&gt;
&lt;p&gt;Just like that, a casual suggestion at 2:30 AM became the official codename.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Unexpectedly, It Went Viral&lt;/h2&gt;
&lt;p&gt;In early August, Nano Banana launched on LMArena.&lt;/p&gt;
&lt;p&gt;Users discovered the model&apos;s image editing capabilities were quite impressive—it could maintain facial similarity while cleverly blending multiple images together.&lt;/p&gt;
&lt;p&gt;But what left an even stronger impression was this quirky name.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&quot;What the heck is Nano Banana?&quot;&lt;/strong&gt;&lt;br /&gt;
&lt;strong&gt;&quot;This name is too cute!&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/18/MkONf43FJdvLhE8.webp&quot; alt=&quot;Social Media Discussions About Nano Banana&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The name spread rapidly on social media, with users from different regions creating localized memes around it.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;From Joke to Official Branding&lt;/h2&gt;
&lt;p&gt;What happened next you probably know—Nano Banana became one of the highest-rated image editing models globally.&lt;/p&gt;
&lt;p&gt;Google embraced the serendipity, fully incorporating &quot;banana&quot; elements into the brand design. The latest version even upgraded to &lt;strong&gt;Nano Banana Pro&lt;/strong&gt; (powered by Gemini 3 Pro Image).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/18/YslLKvd2ozSxFB8.webp&quot; alt=&quot;Nano Banana Pro Promotional Image&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;A flash of inspiration at 2:30 AM, a small joke with personal warmth, ultimately became one of the most viral names in Google&apos;s AI product lineup.&lt;/p&gt;
&lt;p&gt;This story teaches us:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Sometimes the best ideas come when you&apos;re relaxed&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Never underestimate a &quot;casually chosen name&quot;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Great product + great name = viral spread&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Next time you&apos;re naming a project, maybe try 2:30 AM?&lt;/p&gt;
&lt;p&gt;(Just kidding. Get some sleep.)&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;#Google #AI #NanoBanana #ArtificialIntelligence #TechTrivia&lt;/strong&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.google/products-and-platforms/products/gemini/how-nano-banana-got-its-name/&quot;&gt;How Nano Banana got its name - Google Blog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>2025: The Year LLMs Changed Everything - A Deep Dive into Simon Willison&apos;s Year-End Review</title><link>https://blog.gujiakai.me/en/2026/01/simon-willison-2025-year-in-llms/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2026/01/simon-willison-2025-year-in-llms/</guid><description>An analysis of Django co-founder Simon Willison&apos;s 2025 LLM year-end summary: reasoning models changed everything, Claude Code hit $1B ARR, Chinese open-source models dominated the rankings, OpenAI lost its lead, and $200/month subscriptions became the new standard.</description><pubDate>Thu, 01 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;2025: The Year LLMs Changed Everything - A Deep Dive into Simon Willison&apos;s Year-End Review&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Original Article&lt;/strong&gt;: &lt;a href=&quot;https://simonwillison.net/2025/Dec/31/the-year-in-llms/&quot;&gt;2025: The year in LLMs&lt;/a&gt; - Simon Willison&lt;/p&gt;
&lt;p&gt;This analysis is based on Simon Willison&apos;s year-end summary. A tribute to this Django co-founder and one of the sharpest observers in the LLM space.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;h2&gt;Preface: Why You Should Take Simon Willison Seriously&lt;/h2&gt;
&lt;p&gt;Simon Willison isn&apos;t one of those AI evangelists who just hypes everything up. He&apos;s the co-creator of the Django framework, the person who defined the term &quot;prompt injection,&quot; and a board member of the Python Software Foundation. More importantly—he&apos;s a developer who uses LLMs for real work every day. In 2025, he built 110 tools with AI assistance.&lt;/p&gt;
&lt;p&gt;When someone like this says &quot;2025 was the year of XXX,&quot; it&apos;s worth paying attention.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #1: Reasoning Models Changed Everything&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s Take&lt;/strong&gt;: Reasoning isn&apos;t about making AI count how many R&apos;s are in &quot;strawberry&quot;—it&apos;s about teaching AI to &lt;strong&gt;work with tools&lt;/strong&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;The real unlock of reasoning was in driving tools. Reasoning models with access to tools can plan out multi-step tasks, execute on them and continue to reason about the results.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;When o1 launched in late 2024, most people&apos;s reaction was: &quot;Oh, it can do math problems now. What does that have to do with me?&quot; This thinking was completely wrong.&lt;/p&gt;
&lt;p&gt;The real value of reasoning models lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Planning ability&lt;/strong&gt;: Breaking complex tasks into executable steps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reflection ability&lt;/strong&gt;: Checking results after execution, adjusting strategies&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool coordination&lt;/strong&gt;: Simultaneously invoking search, code execution, file operations, and other tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What does this mean? It means AI evolved from a &quot;Q&amp;amp;A machine&quot; into an &quot;executor.&quot;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/01/x4k1s3bDohnzfaS.webp&quot; alt=&quot;Reasoning Model Workflow&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #2: Agents Went from &quot;Sci-Fi&quot; to &quot;Practical&quot;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s prediction at year start&lt;/strong&gt;: Agents won&apos;t happen.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s admission at year end&lt;/strong&gt;: I was half wrong.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;I didn&apos;t think agents would happen because I didn&apos;t think the gullibility problem could be solved... But if you define agents as LLM systems that can perform useful work via tool calls over multiple steps then agents are here.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Simon&apos;s &quot;eating his words&quot; is actually quite enlightening. Where was he wrong? He imagined Agents as omnipotent assistants from sci-fi movies. But what are the Agents that actually shipped? &lt;strong&gt;Claude Code&lt;/strong&gt;, &lt;strong&gt;Codex CLI&lt;/strong&gt;—tools that can write code, run tests, and submit PRs for you.&lt;/p&gt;
&lt;p&gt;Key insights:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Agent ≠ general-purpose intelligent assistant&lt;/strong&gt;, but rather &lt;strong&gt;domain-specific automation executor&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code&lt;/strong&gt; became the most mature landing scenario for Agents, because code execution results are verifiable&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Search&lt;/strong&gt; is the second mature scenario—deep research mode actually works now&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Simon offers a pragmatic Agent definition: &lt;strong&gt;&quot;An LLM system that can achieve goals through iterative tool calls.&quot;&lt;/strong&gt; Not fancy, but effective.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #3: Claude Code Is the Most Important Product of 2025&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s exact words&lt;/strong&gt;: &quot;The most impactful event of 2025 happened in February, with the quiet release of Claude Code.&quot;&lt;/p&gt;
&lt;p&gt;This might surprise many people. Not GPT-5? Not DeepSeek R1&apos;s market impact? A &lt;strong&gt;command-line tool&lt;/strong&gt;?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Claude Code represents a paradigm shift—&lt;strong&gt;LLMs moving from chat interfaces to the terminal&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Why does this matter?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Developers&apos; natural habitat&lt;/strong&gt;: The terminal is the most familiar environment for developers. Pipes, redirects, script composition—Unix philosophy merges perfectly with LLMs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$1 billion ARR validation&lt;/strong&gt;: Anthropic announced Claude Code reached $1 billion annual revenue. A CLI tool! This shows professional users are willing to pay for truly useful AI tools&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Asynchronous execution breakthrough&lt;/strong&gt;: Claude Code for web can run in the background. Send a task, grab a coffee, come back and your PR is ready&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the cleaned software engineering benchmark SWE-rebench, Claude Code leads by a wide margin. Claude Code paired with Claude Opus 4.5 is the ultimate Vibe Coding combo. For bug fixes and code review, OpenAI&apos;s Codex GPT 5.2 xhigh excels.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/01/o1pJ3teDyciH9dZ.webp&quot; alt=&quot;Claude Code Leads on SWE-rebench&quot; /&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #4: Chinese Open-Source Models Rose to Dominance&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s data&lt;/strong&gt;: On the Artificial Analysis leaderboard, the top five open-source models are &lt;strong&gt;all from China&lt;/strong&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;GLM-4.7, Kimi K2 Thinking, MiMo-V2-Flash, DeepSeek V3.2, MiniMax-M2.1 are all Chinese open weight models.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2026/01/01/8JKmB1CFXaQvglc.webp&quot; alt=&quot;Top 5 Open-Source Models on Artificial Analysis Are All Chinese&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;DeepSeek R1 launched on January 20, 2025. That day, NVIDIA&apos;s market cap dropped $600 billion. This wasn&apos;t a tech event—it was a geopolitical event.&lt;/p&gt;
&lt;p&gt;Key facts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;DeepSeek V3 training cost about $5.5 million, while US companies spend hundreds of millions&lt;/li&gt;
&lt;li&gt;These models aren&apos;t just &quot;open source&quot;—they&apos;re &lt;strong&gt;truly open source&lt;/strong&gt;—MIT or Apache 2.0 licenses&lt;/li&gt;
&lt;li&gt;While training code and datasets aren&apos;t public, detailed technical papers have advanced the entire industry&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What does this mean for you?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The barrier to locally deploying top-tier models dropped significantly&lt;/li&gt;
&lt;li&gt;The reference point for API costs has been redefined&lt;/li&gt;
&lt;li&gt;The &quot;AI is a US monopoly&quot; narrative has been shattered&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #5: OpenAI Lost Its Lead&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s assessment&lt;/strong&gt;: &quot;This year the rest of the industry caught up.&quot;&lt;/p&gt;
&lt;p&gt;This doesn&apos;t mean OpenAI got worse, but rather:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Image generation was surpassed by Google Nano Banana&lt;/li&gt;
&lt;li&gt;Code capability was challenged by Claude Opus 4.5&lt;/li&gt;
&lt;li&gt;Open-source models were crushed by Chinese vendors&lt;/li&gt;
&lt;li&gt;Audio API was threatened by Gemini Live&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;OpenAI&apos;s advantage now is mainly &lt;strong&gt;brand recognition&lt;/strong&gt;—&quot;Nobody knows LLMs, but everyone&apos;s heard of ChatGPT.&quot; But in professional developer circles, this advantage is eroding.&lt;/p&gt;
&lt;p&gt;After Google released Gemini 3 in December, OpenAI internally declared &quot;Code Red.&quot; This was the first time OpenAI publicly acknowledged feeling competitive pressure.&lt;/p&gt;
&lt;p&gt;A deeper issue: Google has its own TPUs and doesn&apos;t need to pay the &quot;GPU tax&quot; to NVIDIA. When training cost is a core competitive factor, this is a structural advantage.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #6: $200/Month Subscriptions Became the New Standard&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Fact&lt;/strong&gt;: Claude Pro Max, ChatGPT Pro, and Google AI Ultra all landed at the $200/month tier.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s personal experience&lt;/strong&gt;: &quot;I&apos;ve personally paid $100/month for Claude... I&apos;ve heard from plenty of other people who are happy to pay these prices too.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;This reveals a bifurcation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Casual users&lt;/strong&gt;: Free or $20/month is enough&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Power users&lt;/strong&gt;: $200/month is a good deal&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why is it worth it? Because Coding Agents &lt;strong&gt;consume tokens like crazy&lt;/strong&gt;. If you&apos;re using Claude Code daily for complex tasks, pay-per-API could easily exceed $200.&lt;/p&gt;
&lt;p&gt;This also means: &lt;strong&gt;LLMs are transitioning from &quot;novelty toy&quot; to &quot;professional tool&quot;&lt;/strong&gt;. Professional tools deserve professional pricing.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #7: YOLO Mode and the Danger of &quot;Normalization of Deviance&quot;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s warning&lt;/strong&gt;: &quot;The longer we get away with running these systems in fundamentally insecure ways, the closer we are getting to a Challenger disaster of our own.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context&lt;/strong&gt;: YOLO mode = letting Coding Agents auto-execute all operations without human confirmation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;This is Simon&apos;s most serious warning in the article. He cites sociologist Diane Vaughan&apos;s research on the Challenger space shuttle disaster—engineers knew about O-ring problems long before, but because multiple launches went fine, the risk was &quot;normalized.&quot;&lt;/p&gt;
&lt;p&gt;The AI analogy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You run Claude Code in YOLO mode daily without incident&lt;/li&gt;
&lt;li&gt;You start thinking prompt injection is only a theoretical risk&lt;/li&gt;
&lt;li&gt;Until one day, a malicious instruction actually deletes your home directory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Johann Rehberger calls this &quot;&lt;strong&gt;normalization of deviance in the AI space&lt;/strong&gt;.&quot; Simon clearly agrees.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #8: MCP Might Be a Flash in the Pan&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s observation&lt;/strong&gt;: &quot;The reason I think MCP may be a one-year wonder is the stratospheric growth of coding agents.&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core argument&lt;/strong&gt;: When Agents can run arbitrary Bash commands, who needs MCP?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;MCP (Model Context Protocol) was launched by Anthropic in November 2024 and exploded in early 2025—OpenAI, Anthropic, and Mistral all announced support within eight days.&lt;/p&gt;
&lt;p&gt;But Simon points out an awkward fact: &lt;strong&gt;Bash is the ultimate tool&lt;/strong&gt;. An Agent that can run shell commands can invoke any CLI tool—git, gh, ffmpeg, curl—why wrap another layer of MCP?&lt;/p&gt;
&lt;p&gt;Anthropic itself seems to have realized this, launching the lighter &lt;strong&gt;Skills&lt;/strong&gt; mechanism: a Markdown file plus optional scripts, much simpler than MCP&apos;s JSON-RPC server.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #9: Local Models Are Good, But Cloud Models Are Better&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s mixed feelings&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;I got small amounts of real work done offline! My excitement for local LLMs was very much rekindled.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But also:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;I have yet to try a local model that handles Bash tool calls reliably enough for me to trust that model to operate a coding agent on my device.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Local models indeed improved massively in 2025:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mistral Small 3 (24B) ≈ GPT-4 level, runs on 64GB laptops&lt;/li&gt;
&lt;li&gt;20-32B parameter range became the sweet spot&lt;/li&gt;
&lt;li&gt;Can do some real work offline&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the problem is &lt;strong&gt;reliability&lt;/strong&gt;. Coding Agents need models to stably invoke tools dozens or even hundreds of times. Local models can&apos;t do that yet.&lt;/p&gt;
&lt;p&gt;Simon&apos;s conclusion: Next laptop needs at least 128GB RAM, but the main workhorse remains frontier cloud models.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #10: &quot;Slop&quot; Became Word of the Year&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Merriam-Webster&apos;s definition&lt;/strong&gt;: &quot;Low-quality digital content mass-produced through artificial intelligence&quot;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s optimistic lean&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;The internet has always been flooded with low quality content. The challenge, as ever, is to find and amplify the good stuff.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;The popularity of &quot;Slop&quot; (AI junk content) as a word reflects growing public vigilance toward AI-generated content. This is good.&lt;/p&gt;
&lt;p&gt;But Simon raises a deeper question: &lt;strong&gt;Can you perceive slop&apos;s impact?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;His own answer: Probably not. Because he doesn&apos;t use Facebook and carefully curates his information sources. For average users who don&apos;t? They might be drowning in slop without knowing it.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Key Insight #11: Data Centers Are Becoming Extremely Unpopular&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Fact&lt;/strong&gt;: Over 200 environmental groups demanded a moratorium on new US data center construction.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simon&apos;s focus&lt;/strong&gt;: Water resource concerns might be overstated (a distraction), but energy consumption is &lt;strong&gt;real&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;This is the only section touching on AI ethics/social impact, and Simon&apos;s stance is cautious.&lt;/p&gt;
&lt;p&gt;He points out the &lt;strong&gt;Jevons paradox&lt;/strong&gt;: Cost per token drops → users consume more tokens → total energy consumption rises instead of falling.&lt;/p&gt;
&lt;p&gt;$200/month subscription users might consume 10x the compute resources of $20 users. Efficiency gains are offset by usage growth.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;My Summary: The Thinking Framework Simon Willison Teaches Us&lt;/h2&gt;
&lt;p&gt;After reading this 13,000-word year-end summary, what I learned isn&apos;t just 26 trends, but a &lt;strong&gt;methodology for observing the AI industry&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Hands-on practice&lt;/strong&gt;: Simon isn&apos;t a commentator—he built 110 tools and uses these technologies daily&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Admitting mistakes&lt;/strong&gt;: He predicted Agents wouldn&apos;t happen at year start, and candidly admitted he was half wrong at year end&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Defining terms&lt;/strong&gt;: &quot;prompt injection,&quot; &quot;slop,&quot; &quot;lethal trifecta&quot;—clear concepts are prerequisites for clear thinking&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security awareness&lt;/strong&gt;: Even while using YOLO mode daily, he doesn&apos;t forget to warn about &quot;Challenger disaster&quot; risks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Staying curious&lt;/strong&gt;: A 44-year-old Django founder still researching mobile programming&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you want to keep up with LLM developments, there&apos;s no better way than following Simon Willison.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Appendix: Key Terms Created/Popularized by Simon Willison in 2025&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vibe Coding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generating code entirely through prompts, &quot;forgetting the code exists&quot;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;The Lethal Trifecta&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Access to private data + ability to communicate externally + exposure to untrusted content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context Rot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model output quality degrading as conversations grow longer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Slopsquatting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Registering malicious packages using package names hallucinated by LLMs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Asynchronous Coding Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tools that run in the background and submit PRs when complete&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr /&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Original&lt;/strong&gt;: &lt;a href=&quot;https://simonwillison.net/2025/Dec/31/the-year-in-llms/&quot;&gt;2025: The year in LLMs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you found this analysis valuable, subscribe to Simon&apos;s blog: RSS, email, or Bluesky/Mastodon. $10/month also gets you his monthly newsletter.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Notes&lt;/h2&gt;
&lt;p&gt;This article was co-authored by the author with Claude Opus 4.5 and Gemini 3 Pro.&lt;/p&gt;
</content:encoded></item><item><title>AI News - July 30, 2025</title><link>https://blog.gujiakai.me/en/2025/07/ai-news-1/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2025/07/ai-news-1/</guid><description>AI News Roundup for July 30, 2025</description><pubDate>Wed, 30 Jul 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Open Source&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Qwen3-30B-A3B Minor Update&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Qwen3-30B-A3B recently released a minor update version called Qwen3-30B-A3B-Instruct-2507. This efficient Mixture of Experts (MoE) model activates only 3B parameters while achieving performance close to GPT-4o and Qwen3-235B-A22B in non-thinking mode. Key improvements include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced reasoning, coding, and mathematical capabilities&lt;/li&gt;
&lt;li&gt;Expanded multilingual knowledge coverage&lt;/li&gt;
&lt;li&gt;Improved long-context understanding, supporting up to 256K tokens&lt;/li&gt;
&lt;li&gt;Better alignment with user intent and handling of open-ended tasks&lt;/li&gt;
&lt;li&gt;Removed  blocks for more direct and efficient responses&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This update makes the model smarter, faster, and easier to deploy locally, suitable for various complex tasks such as instruction following, logical reasoning, and tool use.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Commentary: Good news for open source and experimentation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Official Tweet: &lt;a href=&quot;https://x.com/Alibaba_Qwen/status/1950227114793586867&quot;&gt;https://x.com/Alibaba_Qwen/status/1950227114793586867&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Model Repository: &lt;a href=&quot;https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507&quot;&gt;https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Closed Source&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;ChatGPT Study Mode&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;OpenAI today launched &quot;Study Mode&quot; for ChatGPT, a learning experience designed to help users work through problems step-by-step rather than providing direct answers. This mode uses guided questioning, step-by-step explanations, and interactive approaches to enhance critical thinking and learning outcomes, particularly useful for homework help, exam preparation, and exploring new knowledge.&lt;/p&gt;
&lt;p&gt;The feature is now available to logged-in users on Free, Plus, Pro, and Team tiers. ChatGPT Edu users will get access in the coming weeks. This update is seen as a responsible application of AI in education, aimed at reducing dependency on generative AI while promoting deeper learning.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2025/07/30/Glty2OPrNMJC3kD.webp&quot; alt=&quot;ChatGPT Study Mode Experience&quot; /&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Commentary: The strongest AI product experience for average users. ChatGPT teaching you how to learn—sometimes you can&apos;t help but wonder if schools are still necessary.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Official Blog: &lt;a href=&quot;https://openai.com/index/chatgpt-study-mode/&quot;&gt;https://openai.com/index/chatgpt-study-mode/&lt;/a&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;NotebookLM &amp;amp; AI Mode Updates&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Google recently announced major updates to NotebookLM, including Video Overviews and Studio panel upgrades.&lt;/p&gt;
&lt;p&gt;Video Overviews serve as a visual alternative to Audio Overviews, generating AI-narrated slideshows that incorporate images, charts, quotes, and data from source documents. This helps users understand complex information more intuitively, with support for customizing topics, learning objectives, and target audiences. The Studio panel features a new interface design that supports creating and storing multiple outputs of the same type within a single notebook (such as multi-language audio or mind maps for different chapters), improving collaboration and multitasking efficiency. This feature is rolling out to English users, with more language support coming soon.&lt;/p&gt;
&lt;p&gt;Additionally, for back-to-school season, Google Search&apos;s AI Mode received updates including: support for uploading images and PDF files on desktop browsers (with plans to expand to Google Drive and other file types), a Canvas tool for multi-session planning (like creating study guides), Search Live with integrated Google Lens for real-time video input, and Lens functionality in Chrome for asking questions about on-screen content. These enhancements aim to improve the learning experience for students, parents, and educators through interactive questioning, cross-referencing information, and visual context. Currently available mainly in the US and India for users 18 and older.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Commentary: Google&apos;s product update blog posts don&apos;t mean features are immediately available—patience is required. Just like when AI Mode announced support for Gemini 2.5 Pro and Deep Research, users didn&apos;t get the feature on day one. NotebookLM is a great learning companion, and these updates further enhance learning assistance. AI Mode is a preview of Google disrupting itself—along with experimental projects like Web Guide, these experiments will eventually become Google Search products for the AI era.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Official Blogs:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://blog.google/technology/google-labs/notebooklm-video-overviews-studio-upgrades/&quot;&gt;https://blog.google/technology/google-labs/notebooklm-video-overviews-studio-upgrades/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://blog.google/products/search/ai-mode-updates-back-to-school/&quot;&gt;https://blog.google/products/search/ai-mode-updates-back-to-school/&lt;/a&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Claude Code --add-dir Command&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Claude Code recently introduced the --add-dir command, a feature extension that allows users to work across multiple directories in a single session. By using the CLI flag --add-dir  at startup or the slash command /add-dir  during a session, developers can seamlessly add additional working directories to Claude Code&apos;s workspace without switching the main directory. This update is particularly useful for working with monorepos, shared configurations, or cross-project collaboration, helping improve code navigation, referencing, and editing efficiency, making Claude Code an even more powerful and flexible terminal AI coding tool.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Commentary: Claude Code has become the most popular product among developers. The cross-directory feature further elevates the experience. Anthropic deserves praise for developing products based on user needs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Official Tweet: &lt;a href=&quot;https://x.com/_catwu/status/1950288312033562751&quot;&gt;https://x.com/_catwu/status/1950288312033562751&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Notes&lt;/h2&gt;
&lt;p&gt;This article was co-authored by the author and Grok 4.&lt;/p&gt;
</content:encoded></item><item><title>A New Beginning</title><link>https://blog.gujiakai.me/en/2025/07/new-beginning/</link><guid isPermaLink="true">https://blog.gujiakai.me/en/2025/07/new-beginning/</guid><description>Launching my personal blog!</description><pubDate>Thu, 17 Jul 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;During my university years, I once ran a WeChat Official Account. But due to frustration with content review and other factors, that first account ended with me voluntarily deactivating it.&lt;/p&gt;
&lt;p&gt;Later, with AI assistance, I built a personal blog from scratch. After more than 3 years of development, my humble site has gained some readers. The image below shows current Cloudflare traffic data for my site—though many of these visitors are actually AI crawlers, so the real numbers are far lower than what&apos;s shown.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2025/07/17/btHN6WzDUhQGnMv.webp&quot; alt=&quot;Cloudflare Traffic Data&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I&apos;ve basically been running this as a labor of love, never considering monetization through Google Ads. As a current graduate student, I haven&apos;t felt much financial pressure yet. But after sustaining an idealistic website purely through passion for so long, some fatigue is inevitable.&lt;/p&gt;
&lt;p&gt;Next year marks the end of my student life, and I&apos;ll inevitably need to start earning my own living. Restarting a WeChat Official Account is one approach—not as a main job, but as a side project to experiment with.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.sa.net/2025/07/17/WqYR5ey98XBtbHQ.webp&quot; alt=&quot;Grok 4&apos;s Analysis of WeChat Account Monetization&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The real Chinese internet is no longer the world of websites you find through Google or Bing—it now exists within the &quot;walled gardens&quot; of major tech giants.&lt;/p&gt;
&lt;p&gt;As I once again embrace writing in this authentic Chinese internet space, I&apos;ll try to avoid being generic. All my writing will be carefully crafted. This account won&apos;t touch any sensitive or rule-violating topics—I&apos;ll self-censor accordingly.&lt;/p&gt;
&lt;p&gt;I also understand that by publishing on WeChat, my writing becomes training data for Tencent&apos;s Hunyuan large language model. This is unavoidable on the public web, and even private platforms can&apos;t escape it. I accept this reality.&lt;/p&gt;
&lt;p&gt;This account&apos;s avatar and name match my WeChat account. Every article published here will have a corresponding original on the public web—click the &quot;Read More&quot; button at the end of each article to jump to the source.&lt;/p&gt;
&lt;p&gt;This account mainly shares knowledge about AI, personal tinkering projects, and personal growth insights. I aim to update at least once a week.&lt;/p&gt;
&lt;p&gt;A new beginning—let&apos;s go!&lt;/p&gt;
</content:encoded></item></channel></rss>