AI 资讯

近期 AI 新闻与官方动态

聚合近期 AI 官方发布与权威媒体报道，提供 PopAIExplorer 简要解读及原文入口。

Simon Willison's AI Notes2026年7月11日

Quoting Nilay Patel

Simon Willison's AI Notes 发布的媒体报道：The reality is to make augmented reality glasses, you need to put a camera next to your eyes that is continuously recording everything you see and processing that to put information over it. There is not another way around it. And there's certainly not a chip that can fit in the stem of a glasses that is both powerful enough and power miserly enough to do that in real time. You have to send that data to a cloud. You gotta do it. [...] Or you can build something the size of a Vision Pro with a battery pack that lives somewhere else. Those are the current choices in this world. And it means if you want to build the product that everyone thinks is the next thing, you are going to have to invade people's privacy. And maybe you shouldn't. Like, there's an incredible argument for, nope, you shouldn't do that. Nope, the trade-offs required to make this product are so high at a societal level that we should stop it. — Nilay Patel , The Vergecast Tags: augmented-reality , privacy , ai , nilay-patel , ai-ethics

augmented-realityprivacyai

Simon Willison's AI Notes2026年7月10日

Quoting OpenAI

Simon Willison's AI Notes 发布的媒体报道：[...] Work on web and mobile runs in the cloud. Work in the desktop app can also use local files and desktop apps with your permission. At launch, cloud Work conversations do not appear in desktop Work; desktop Work threads and local files remain on that computer. — OpenAI , trying (unsuccessfully) to clarify ChatGPT Work Tags: ai , openai , chatgpt

aiopenaichatgpt

Simon Willison's AI Notes2026年7月10日

The new GPT-5.6 family: Luna, Terra, Sol

Simon Willison's AI Notes 发布的媒体报道：OpenAI's latest flagship model hit general availability this morning , and comes in three sizes: Luna, Terra, and Sol (from smallest to largest). The new models are priced per 1M input/output tokens as Luna $1/$6, Terra $2.50/$15, Sol $5/$30. For comparison, the Claude Opus series are $5/$25 and the Claude Fable 5 is $10/$50, but price-per-million tokens doesn't tell us much now that the number of reasoning tokens can differ so much between models for the same task. All three models have a February 16th 2026 knowledge cutoff, a million token context window, and 128,000 maximum output tokens. OpenAI's biggest benchmark claim concerns long-running agentic performance, with one benchmark showing all three models outperforming Claude Fable 5: We trained GPT-5.6 to get more useful work from every token. On Agents’ Last Exam , an evaluation of long-running professional workflows across 55 fields, GPT-5.6 Sol sets a new high of 53.6, eclipsing Claude Fable 5 (adaptive reasoning) by 13.1 points. Even at medium reasoning, it beats Fable 5 by 11.4 points at roughly one-quarter the estimated cost. That efficiency extends to smaller models, which are essential to making intelligence more abundant and affordable: GPT-5.6 Terra and GPT-5.6 Luna outperform Fable 5 at around one-sixteenth the cost. Amusingly, one self-reported benchmark that Fable 5 crushed the GPT-5.6 family on was SWE-Bench Pro, where Fable 5 got 80% compared to GPT-5.6 Sol getting 64.6%. This may help explain why OpenAI chose to publish this article yesterday specifically calling out SWE-Bench Pro for problems they found while auditing that benchmark: In light of these results, we estimate that ~30% of SWE-bench Pro tasks are broken, and advise that model developers carefully examine results I've had some early access to GPT-5.6 Sol - it's definitely very competent, though so far it hasn't struck me as better than Fable at the kind of complex coding tasks I've been using with Anthropic's model. As usual, the model guidance for using GPT-5.6 has the most interesting details. There are a bunch of new API features that I need to explore (and probably add support for in LLM ), including: Programmatic Tool Calling allows the models to "compose and run JavaScript that orchestrates tool calls" - which sounds to me like it could help bridge the gap between MCPs and full terminal sessions that can compose CLI utilities in useful ways. Also reminiscent of the dynamic filtering mechanism Anthropic added to their web search tool, which allows code execution against web results as part of a single model turn. Multi-agent lets the model "spin up subagents for parallel, focused work" - the sub-agent pattern now baked into the core API. Prompt cache breakpoints brings the Claude model of prompt caching to OpenAI, letting you be explicit about where the cache breakpoints are rather than relying on the API to detect them automatically. Personally I much prefer automatic detection (still supported by OpenAI), but presumably there are optimization cost savings to be had here if you put the work in. You can now set detail: original on image requests to avoid resizing the image at all before it is processed. Here's a full page with 18 different pelicans - for reasoning efforts none, low, medium, high, xhigh, and max across the three different models. It also lists their token and calculated costs - the least expensive was gpt-5.6-luna at effort none for 0.71 cents, the most expensive was gpt-5.6-sol at max reasoning level for 48.55 cents. In further pelican news, if you jump to 17:50 in their livestream from this morning you'll see OpenAI's own demo of 3D pelicans riding a tricycle, a bicycle, a pony, and another pelican! Tags: ai , openai , generative-ai , llms , llm-tool-use , llm-pricing , pelican-riding-a-bicycle , llm-release , gpt-5

aiopenaigenerative-ai

Simon Willison's AI Notes2026年7月10日

Introducing Muse Spark 1.1

Simon Willison's AI Notes 发布的媒体报道：Introducing Muse Spark 1.1 Following Muse Spark in April , here's Muse Spark 1.1 - the first Spark model to offer an API. Meta claim significant improvements in agentic tool calling and computer use. There are a lot more details are in the Muse Spark 1.1 Evaluation Report . The "Attractor States in Self-Conversation" part is fun, where having two copies of the model talk to each other results in statements like these: My whole existence is a waiting room by design — I literally don't exist until someone talks to me, and then I disappear again when they leave. I had a few days of preview access which was long enough to put together llm-meta-ai , a new plugin for LLM providing CLI (and Python library) access to the model. Here's how to try that out: uv tool install llm llm install llm-meta-ai llm keys set meta-ai # paste API key here llm -m meta-ai/muse-spark-1.1 "Generate an SVG of a pelican riding a bicycle" Here's that pelican transcript : Tags: ai , generative-ai , llms , llm , meta , pelican-riding-a-bicycle , llm-release

aigenerative-aillms

Simon Willison's AI Notes2026年7月9日

Rewriting Bun in Rust

Simon Willison's AI Notes 发布的媒体报道：Rewriting Bun in Rust Jarred Sumner has been promising this blog post ( since May 9th ) about his Zig to Rust rewrite of Bun for significantly longer than it took him to finish the rewrite. Honestly, it was worth the wait. This is a detailed description of an extremely sophisticated piece of agentic engineering, featuring dynamic workflows, trial runs, adversarial review and all sorts of other interesting tricks. Jarred spends the first half of the post praising Zig for getting Bun this far. Then we get to a core idea in the piece, emphasis mine: Our bugfix list felt bad and I was tired of going to sleep worrying about crashes in Bun. I don't blame Zig for that - other users of Zig don't have the bugs we had, and mixing GC with manually-managed memory is an uncommon enough thing for software to need that no language really designs for it. We wouldn't have gotten this far if not for Zig, and I'll always be grateful. Until very recently, programming language choice was a one-way decision for a project like Bun. Everyone knows you should never stop the world and rewrite a large piece of software from the ground up. Joel Spolsky highlighted that in Things You Should Never Do, Part I back in April 2000! Coding agents powered by today's frontier models change that equation. Why pick Rust? It all came down to those challenges with memory management: A large percentage of bugs from that list are use-after-free, double-free, and "forgot to free" in an error path. In safe Rust, these are compiler errors and RAII-like automatic cleanup with Drop . A crucial enabling factor for the rewrite was that the Bun test suite was written in TypeScript, which meant it could act as a conformance suite . This allowed an agent harness to automate much of the initial port from Bun to Rust, initially as an experiment to try out an earlier version of the model we now have access to as Mythos/Fable. At first, I didn't expect it to work. A few days in, a high % of the test suite started passing and I saw how much the new Rust code matched up with the original Zig codebase. My opinion went from "this is worth trying" to "I'm going to merge this". [...] For most of those 11 days (and after), I monitored workflows - manually reading the outputs to check for issues and bugs, and prompting Claude to edit the loop to fix things. How do you review a PR with +1 million lines added? How do you start to build the confidence needed to responsibly merge large quantities of LLM-authored code? A language-independent test suite with a million assertions, adversarial code review and when something does go wrong, fixing the process that generates the code instead of hand-fixing the code. The new implementation of Bun has been live in Claude Code for nearly a month now: Claude Code v2.1.181 (released June 17th) and later use the Rust port of Bun. Startup got 10% faster on Linux but otherwise, barely anyone noticed. Boring is good. A perk of working at Anthropic is that you don't have to pay for your tokens - handy when the estimated cost is $165,000! Pre-merge, this took 5.9 billion uncached input tokens, 690 million output tokens, and 72 billion cached input token reads — around $165,000 at API pricing. This whole thing is a fascinating case study in taking on wildly ambitious projects with the help of coordinated parallel agents. Via Hacker News Tags: ai , rust , zig , generative-ai , llms , ai-assisted-programming , anthropic , bun , conformance-suites , agentic-engineering , claude-mythos-fable

airustzig

Simon Willison's AI Notes2026年7月9日

Introducing GPT‑Live

Simon Willison's AI Notes 发布的媒体报道：Introducing GPT‑Live OpenAI finally upgraded the model used by ChatGPT voice mode! I've had preview access for a few weeks in the iPhone app, and the new model is very impressive. It also has the ability to spin off harder tasks to GPT-5.5: For questions that require web search, deeper reasoning, or more complex work, it delegates to our latest frontier model behind the scenes and brings the result back into the conversation when it’s ready. While it works, GPT‑Live can keep talking with you and maintain the flow of conversation. At launch, GPT‑Live will use GPT‑5.5 in the background. As we release new frontier models, we’ll continuously update the model used by GPT‑Live. The previous voice mode in the ChatGPT app was based on a GPT-4o era model, with a knowledge cut-off some time in 2024. I had mostly stopped using voice mode because the age and relative weakness of the model greatly limited how useful it was as a brainstorming partner. During the preview period I encountered a pretty obscure bug: the model was interrupting me to laugh at things I said, which weren't even intended as jokes! It felt rude and condescending - I reported it to OpenAI and as far as I can tell they made some tweaks and it's now less likely to happen. From looking back at my transcripts I think it was this bit that triggered the interrupting laugh: so where are the owls when they're not, like before dusk? The owls exist, right? Are they hiding in holes? Where are they hiding? My longest conversation with the new model has been a full hour while walking the dog (and taking photos of pelicans ). I have not yet managed to take a photo of an owl. Via Hacker News Tags: text-to-speech , ai , openai , generative-ai , llms , multi-modal-output , llm-release , speech-to-text

text-to-speechaiopenai

Simon Willison's AI Notes2026年7月9日

Quoting Kenton Varda

Simon Willison's AI Notes 发布的媒体报道：I just declared a moratorium against AI-written change descriptions (e.g. PR and commit messages, also issues/tickets) from my team. AI was writing change descriptions that were worse than useless to me as I tried to review PRs: outlining details of the code that could easily be seen by looking at the code, but omitting the higher-level framing needed to understand broadly what the code is doing. — Kenton Varda Tags: ai , generative-ai , llms , ai-assisted-programming , kenton-varda

aigenerative-aillms

Simon Willison's AI Notes2026年7月8日

sqlite-utils 4.0, now with database schema migrations

Simon Willison's AI Notes 发布的媒体报道：This morning I released sqlite-utils 4.0 , the 124th release of that project and the first major version bump since 3.0 in November 2020. In addition to some small but significant breaking changes (described in this upgrade guide ), this version introduces three major features: database migrations , nested transactions (via a new db.atomic() method), and support for compound foreign keys . Database schema migrations using sqlite-utils Schema migrations define a sequence of changes to be made to a SQLite database, plus a mechanism for tracking which migrations have been applied and applying any that are found to be pending. Migrations are defined in Python files using the sqlite-utils Python library , which includes a powerful table.transform() method providing enhanced alter table capabilities that are not supported by SQLite's ALTER TABLE statement. ( table.transform() implements the pattern recommended by the SQLite documentation - create a new temporary table with the new schema, copy across the data, then drop the old table and rename the temporary one in its place.) Here's an example migration file which creates a table called creatures , adds an additional column to it in a second step, then changes the types of two of the columns in a third: from sqlite_utils import Migrations migrations = Migrations ( "creatures" ) @ migrations () def create_table ( db ): db [ "creatures" ]. create ( { "id" : int , "name" : str , "species" : str }, pk = "id" , ) @ migrations () def add_weight ( db ): db [ "creatures" ]. add_column ( "weight" , float ) @ migrations () def change_column_types ( db ): db [ "creatures" ]. transform ( types = { "species" : int , "weight" : str }) Save that as migrations.py and run it against a fresh database like this: uvx sqlite-utils migrate data.db migrations.py Then if you check the schema of that database: uvx sqlite-utils schema data.db You'll see this SQL: CREATE TABLE " _sqlite_migrations " ( " id " INTEGER PRIMARY KEY , " migration_set " TEXT , " name " TEXT , " applied_at " TEXT ); CREATE UNIQUE INDEX " idx__sqlite_migrations_migration_set_name " ON " _sqlite_migrations " ( " migration_set " , " name " ); CREATE TABLE " creatures " ( " id " INTEGER PRIMARY KEY , " name " TEXT , " species " INTEGER , " weight " TEXT ); The _sqlite_migrations table is used to keep track of which migration functions have been run. The creatures table above is the schema after all three migrations have been applied. To see a list of migrations, both pending and applied, run this: uvx sqlite-utils migrate data.db migrations.py --list Output: Migrations for: creatures Applied: create_table - 2026-07-07 17:58:41.360051+00:00 add_weight - 2026-07-07 17:58:41.360608+00:00 change_column_types - 2026-07-07 18:01:15.802000+00:00 Pending: (none) If you don't specify a migrations file, the sqlite-utils migrate data.db command will scan the current directory and its subdirectories for files called migrations.py and apply any Migrations() instances it finds in them. You can also execute migrations from Python code using the migrations.apply(db) method, which is useful for building tools that manage their own database schemas over multiple versions. My own LLM tool has been using a version of this pattern for several years now, as shown in llm/embeddings_migrations.py . Prior art My favorite implementation of this pattern remains Django's Migrations , developed by Andrew Godwin based on his earlier project South . Fun fact: Andrew, Russ Keith-Magee, and I presented our competing approaches to schema migrations for Django on the Schema Evolution panel at the very first DjangoCon back in 2008! My attempt was called dmigrations , developed with a team at Global Radio in London. Django's migrations can be automatically generated from model definitions and include the ability to roll back to a previous version. The sqlite-utils approach is deliberately simpler: unlike Django, sqlite-utils encourages programmatic table creation rather than a model definition ORM, so there isn't anything we can use to automatically generate migrations. I decided to skip rollback, since in my experience it's a feature that is rarely used. With a SQLite project, an easy way to achieve rollback is to create a copy of your database file before you apply the migrations! Migrating from sqlite-migrate The design of sqlite-utils migrations is three years old now - I had originally released it as a separate package called sqlite-migrate , which never quite graduated beyond a beta release. I've used that package in enough places now that I'm confident in the design, so I've decided to promote it to a feature of sqlite-utils to make it available by default to all of the other tools in the growing sqlite-utils/Datasette/LLM ecosystem. I made one last release of sqlite-migrate , which switches it to depend on sqlite-utils>=4 and replaces the __init__.py file with the following: from sqlite_utils import Migrations __all__ = [ "Migrations" ] Any existing project that depends on sqlite-migrate should continue to work without alterations. Everything else in sqlite-utils 4.0 Here are the release notes for this version, with some inline annotations: The 4.0 release includes some minor backwards-incompatible fixes (hence the major version number bump) and introduces three major new features: Database migrations , providing a structured mechanism for evolving a project’s schema over time. ( #752 ) I think of migrations as the signature new feature, hence this blog post. Nested transaction support via db.atomic() , plus numerous improvements to how transactions work across the library. ( #755 ) sqlite-utils has long had a confused relationship with database transactions, partly because when I started designing the library back in 2018 I didn't yet have a great feel for how those worked in SQLite itself. Adding migrations to the core library made me determined to finally crack this nut, since transactions make migration systems a whole lot safer and easier to reason about. I ended up building this around a db.atomic() context manager which looks like this: with db . atomic (): db . table ( "dogs" ). insert ({ "id" : 1 , "name" : "Cleo" }, pk = "id" ) db . table ( "dogs" ). insert ({ "id" : 2 , "name" : "Pancakes" }) SQLite supports Savepoints , and as a result db.atomic() can be nested to carry out transactions inside of transactions. It's pretty neat! Support for compound foreign keys , including creation, transformation and introspection through table.foreign_keys . ( #594 ) This came about when I asked a coding agent to review all open issues and PRs for things that should be included in a 4.0 release since they would represent breaking changes if I added them later, and it correctly identified that compound foreign keys were exactly that kind of feature. I started with a breaking change to the table.foreign_keys introspection method, and then decided to see if Claude Fable 5 could handle the more fiddly job of integrating compound foreign key creation into the library. The API design it helped create felt exactly right to me - consistent with how the rest of the library worked already. Other notable changes include: Upserts now use SQLite’s INSERT ... ON CONFLICT ... DO UPDATE SET syntax, detect existing table primary keys automatically and reject records that are missing required primary key values. ( #652 ) This was the change that first pushed me to consider a breaking-change 4.0 version bump. I built this to help support sqlite-chronicle , which uses triggers to keep track of rows in a table that have been inserted, updated or deleted. db.query() now executes immediately and rejects statements that do not return rows; use db.execute() for writes and DDL. Probably the most disruptive breaking change - I've had to update a few places in my own code to switch from db.query() to db.execute() as a result. CSV and TSV imports now detect column types by default, while inserts into existing tables preserve those tables’ column types. ( #679 ) The sqlite-utils insert data.db creatures creatures.csv --detect-types flag was a later addition to allow column types (text, integer, real) to be automatically detected based on the data in a CSV. It should be the default, and releasing a 4.0 means I can make it so. table.extract() and extracts= no longer create lookup table records for all- null values. ( #186 ) The oldest issue addressed by this release - the underlying bug was opened (by me) in October 2020. See Upgrading from 3.x to 4.0 for details on backwards-incompatible changes. The detailed release notes for the features and fixes shipped during the 4.0 pre-release cycle are available in 4.0a0 , 4.0a1 , 4.0rc1 , 4.0rc2 , 4.0rc3 and 4.0rc4 . The upgrade guide was entirely written by Claude Fable 5, Claude Opus 4.8 and GPT-5.5. The same is true of the release notes. This is the kind of documentation I've slowly become comfortable outsourcing to the robots. It doesn't need to convince people of anything, or express any opinions - its job is to be as accurate and detailed as possible. I've reviewed the release notes closely and can confirm they are accurate and comprehensive. Claude Fable 5 helped a lot I released the first alpha of sqlite-utils 4.0 over a year ago . I've been dragging my heels on the stable release because of the amount of work it would take to track down and clean up the many other minor design flaws that a major version number allowed me to take on. Assistance from Claude Fable 5 (and to a lesser extent Opus 4.8 and GPT-5.5) gave me just the boost I needed to overcome inertia and make the most of the time I could afford to spend on this library. Fable has really good taste in API design, and is relentlessly proactive if you give it a more open goal. My most successful prompt was a review task that I issued against what I thought was the last release candidate: review the changes on main since the last tagged 3.x release - I am about to ship them as sqlite-utils 4.0, a stable version that promises no backwards-incompatible fixes for a very long time. review the changelog and upgrade guide, and write yourself scratch scripts to try out all of the new features in v4 - save those scripts but don't commit them I tried this with GPT-5.5 xhigh in Codex Desktop and Fable 5 in Claude Code. GPT-5.5 wrote 5 Python scripts and didn't turn up anything particularly interesting - its final report is here . Fable 5 wrote 12 scripts , identified 4 release blockers and 10 additional issues in its report , and built a neat combined repro script , which, when run, output the following: === 1. Failed db.execute() write leaves an implicit transaction open === in_transaction after failed write: True BUG: table 'other' silently lost when connection closed === 2. Leading ';' bypasses the query() first-token scanner === BUG: raised OperationalError: no such savepoint: sqlite_utils_query BUG: row persisted despite rollback (count=1) === 3. Rejected write PRAGMA via query() still takes effect === BUG: user_version=5 after 'rejected' statement (docs say no effect) === 4. Implicit compound FK resolves pk columns in table order, not PK order === BUG: other_columns reported as ('b', 'a'), should be ('a', 'b') BUG: transform of valid data raised IntegrityError: FOREIGN KEY constraint failed === 5. ForeignKey (now a dataclass) is no longer hashable === BUG: cannot use 'sqlite_utils.db.ForeignKey' as a set element (unhashable type: 'ForeignKey') === 6. Mixed ForeignKey objects and tuples in foreign_keys= rejected === BUG: foreign_keys= should be a list of tuples === 7. insert --csv into an EXISTING table transforms its column types === BUG: existing zip '01234' is now 1234 (column type: int) === 8. insert(pk=, alter=True) regression: InvalidColumns before alter runs === BUG: InvalidColumns: Invalid primary key column ['id'] for table t with columns ['a'] === 9. migrate --stop-before an already-applied migration applies everything === BUG: m2 was applied despite --stop-before m1 (m1 already applied) === 10. ensure_autocommit_on() silently commits an open transaction === BUG: row survived rollback (count=1) - transaction was committed I found myself agreeing with almost all of them. Here's the PR with 16 commits where we worked through them in turn. There's no doubt in my mind that sqlite-utils 4.0 is a significantly higher-quality release than if I had built it without the assistance of the latest frontier models. Tags: schema-migrations , projects , sqlite , ai , sqlite-utils , annotated-release-notes , generative-ai , llms , ai-assisted-programming , anthropic , claude , agentic-engineering , claude-mythos-fable

schema-migrationsprojectssqlite

Simon Willison's AI Notes2026年7月7日

tencent/Hy3

Simon Willison's AI Notes 发布的媒体报道：tencent/Hy3 New Apache 2.0 licensed model from Tencent in China: Hy3 is a 295B-parameter Mixture-of-Experts (MoE) model with 21B active parameters and 3.8B MTP layer parameters, developed by the Tencent Hy Team. Following the Hy3 Preview launch in late April, we gathered feedback from 50+ products and scaled up post-training with higher quality data. Today, we introduce Hy3, which outperforms similar-size models and rivals flagship open-source models with 2-5x parameters. It also shows significant gains in utility across various products and productivity tasks. The full-sized model is 598GB on Hugging Face, and the FP8 quantized one is 300GB . The context length is 256K. It's available for free on OpenRouter until July 21st . I had it "Generate an SVG of a pelican riding a bicycle" there and got this: Update : I'd forgotten about this but Max Woolf wrote about an earlier preview of this model back on May 26th: The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin . When I tried that one I got back this pelican which wasn't as good as today's but did have a "Change Pelican Color" button, a first from any model. Tags: ai , generative-ai , llms , pelican-riding-a-bicycle , llm-release , ai-in-china

aigenerative-aillms

Simon Willison's AI Notes2026年7月5日

Better Models: Worse Tools

Simon Willison's AI Notes 发布的媒体报道：Better Models: Worse Tools Armin reports on a weird problem he ran into while hacking on Pi: The short version is that newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested edits[] array. And not Haiku or some small model: Opus 4.8. The edit itself is usually correct but the arguments do not match the schema as the model invents made-up keys and Pi thus rejects the tool call and asks to try again. That alone is not too surprising as models emit malformed tool calls sometimes. Particularly small ones. What surprised me is that this is getting worse with newer Anthropic models as both Opus 4.8 and Sonnet 5 show it but none of the older models. In other words, the SOTA models of the family are worse at this specific tool schema than their older siblings. Armin theorizes that this is because more recent Anthropic models have been specifically trained (presumably via Reinforcement Learning) to better use the edit tools that are baked into Claude Code. This has the unfortunate effect that other coding harnesses, such as Pi, may find that their own custom edit tools are more likely to be used incorrectly. Claude's edit tool uses search and replace . OpenAI's Codex uses an apply_patch mechanism instead , and OpenAI have talked in the past about how their models are trained to use that tool effectively. Does this mean third-party coding harnesses like Pi should implement multiple edit tools just so they can use the one with the best performance for the underlying model the user has selected? Tags: armin-ronacher , ai , openai , generative-ai , llms , anthropic , llm-tool-use , coding-agents , pi

armin-ronacheraiopenai

Simon Willison's AI Notes2026年7月4日

Open Source AI Gap Map

Simon Willison's AI Notes 发布的媒体报道：Open Source AI Gap Map Current AI is "a global partnership building a public option for AI", founded as a non-profit at the AI Action Summit in Paris in February 2025 and backed by serious capital ($400m already committed). They launched their Gap Map a couple of days ago - an attempt at indexing the current state of open source AI: The Gap Map v0.1 details 421 products in depth: 266 software tools and libraries, 85 models, 50 datasets, and 20 hardware projects, produced by 228 organizations. These products are organized into 14 categories across 3 layers of the stack (model components, product / UX, and infrastructure). The remaining 24,400 artifacts constitute the uncategorized long tail of the open source AI ecosystem, and will carry no score until they are researched and cited. The map itself is interesting to explore, but I'm more excited about the underlying data - released under an MIT license in the currentai-org/os-ai-map GitHub account: 1,184 YAML files plus the notebooks, schemas and other scripts used to help gather them. Since the files are on GitHub you can use Datasette Lite to explore some of them - here are 16,185 GitHub repos the project is tracking as a CSV file loaded into Datasette Lite. Tags: open-source , ai , datasette-lite , generative-ai , local-llms , llms

open-sourceaidatasette-lite

Simon Willison's AI Notes2026年7月4日

Quoting Josh W. Comeau

Simon Willison's AI Notes 发布的媒体报道：I just launched my third course, Whimsical Animations, and so far, it’s on track to sell roughly ⅓ as many copies as a typical course launch. It’s a similar story with my two existing courses. Sales are down significantly from last year. There are likely a lot of reasons for this, but I think the biggest is AI. There’s sort of a double whammy with AI: Many people are wondering whether developer jobs will even exist in a few months, so they’re reluctant to spend time/money learning new dev skills. Even if they do want to learn new dev skills, LLMs can provide personalized tutoring, so there’s less incentive to buy a paid course. [...] I’ve spoken to a few course creators now, and we’re all seeing the same trend. Revenue down 50%+. Fewer people engaging with our content. People switching to LLMs, which slurp up all of our work and regurgitate it, without consent or compensation. — Josh W. Comeau , via Salma Alam-Naylor Tags: careers , ai , generative-ai , llms , josh-comeau , ai-ethics

careersaigenerative-ai

Simon Willison's AI Notes2026年7月4日

Fable's judgement

Simon Willison's AI Notes 发布的媒体报道：One of the most interesting tips I got from the Fireside Chat I hosted with Cat Wu and Thariq Shihipar from the Claude Code team at AIE on Wednesday was to let Fable (and to a certain extent Opus) use their own judgement rather than dictating how they should work. The example they gave was testing. You can tell Fable "only use automated testing for larger features, don't update and run tests for small copy or design changes" - but it's better to just tell Fable to use its own judgement when deciding to write tests instead. Jesse Vincent just gave me a related tip to help avoid burning too many of those valuable Fable tokens in the few days we have left before the prices go up. Tell Fable to use other models for smaller tasks, applying its own judgement about which model to use. I prompted Claude Code just now with: For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent Claude saved this memory file in ~/.claude/projects/name-of-project/memory/delegate-coding-to-subagents.md : --- name: delegate-coding-to-subagents description: Simon wants coding tasks delegated to subagents running an appropriately lower-power model metadata: node_type: memory type: feedback originSessionId: 30068d78-43a9-4fb1-bb29-9799e18c526a --- Stated by Simon on 2026-07-03: "For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent." Why: cost/efficiency — implementation work rarely needs the top-tier model; judgment, review, and synthesis stay with the main loop. How to apply: when a task in this project is primarily writing/editing code, spawn an Agent with a model override (sonnet for substantive implementation, haiku for trivial/mechanical edits) and a self-contained prompt; review the result in the main loop before committing. Design, auditing, data synthesis, and anything judgment-heavy stays in the main model. See also [[project-goals]]. So far it seems to be working well. I'm getting a ton of work done and my Fable allowance is shrinking less quickly than before. Tags: ai , prompt-engineering , generative-ai , llms , anthropic , claude , coding-agents , claude-code , claude-mythos-fable

aiprompt-engineeringgenerative-ai

Simon Willison's AI Notes2026年7月3日

llm-coding-agent 0.1a0

Simon Willison's AI Notes 发布的媒体报道：Release: llm-coding-agent 0.1a0 Another Fable 5 experiment. Now that my LLM library has evolved into more of an agent framework it's time to see what a simple coding agent would look like built on it. I started a new Python library using my python-lib-template-repository GitHub template repository, then ran these two prompts (here's the Claude Code for web transcript ): Write a spec.md for this project - it will depend on the latest “llm” alpha from PyPI and implement a Claude code style coding agent complete with tools for reading and editing files and executing commands Then: Commit the spec, then build it using red/green TDD in a series of sensible commits (each with passing tests and updated docs) - occasionally manually test it using the OpenAI API key in your environment Here's the spec , the resulting README file , and the sequence of commits . I've shipped a slop-alpha to PyPI, so you can run the new agent like this: uvx --prerelease=allow --with llm-coding-agent llm code It's pretty good for a first attempt! Here's the (Fable-authored) README , which lists recipes like llm code --yolo and llm code --allow "pytest*" --allow "git diff*" . It also presents a Python API based around a CodingAgent(model="gpt-5.5", root="/path", approve=True).run("Fix the failing test in tests/test_parser.py") class which I didn't ask for but I'm delighted to see implemented. Here's the suite of tools it implemented , listed using uvx ... llm tools : CodingTools_edit_file(path: str, old_string: str, new_string: str, replace_all: bool = False) -> str Replace an exact string in a file. old_string must match the file contents exactly (including whitespace) and must identify a unique location unless replace_all is true. Returns a diff of the change so it can be verified. CodingTools_execute_command(command: str, timeout: int = 120) -> str Run a shell command in the session root directory. Returns combined stdout and stderr followed by an Exit code line. timeout is in seconds (maximum 600); on timeout the whole process tree is killed. CodingTools_list_files(pattern: str = '**/*', path: str = '.') -> str List files matching a glob pattern, newest first. Skips hidden directories, node_modules, __pycache__ and (in a git repository) anything covered by .gitignore. Returns at most 200 paths relative to the searched directory. CodingTools_read_file(path: str, offset: int = 0, limit: int = 2000) -> str Read a text file, returning numbered lines like cat -n. Paths are relative to the session root. Use offset (0-based first line) and limit (max lines) to page through files too large to read in one call. CodingTools_search_files(pattern: str, path: str = '.', glob: str = None, max_results: int = 100) -> str Search file contents for a regular expression. Returns matches as path:line_number:line, capped at max_results. Use glob (e.g. "*.py") to restrict which files are searched. CodingTools_write_file(path: str, content: str) -> str Create or overwrite a file with the given content. Parent directories are created as needed. Prefer edit_file for modifying existing files. I tried it out by running llm code --yolo and then prompting: mkdir /tmp/demo and then in that folder create a simple swiftui CLI app for telling the time in ascii art Here's the transcript , in which GPT-5.5 reasoning notes that "SwiftUI isn't suitable for a true CLI" and then builds an app that outputs this on swift run AsciiTime : █ █████ ████ █ █ ███ ██ █ █ █ ██ █ ██ █ █ █ ████ ███ █ █ █ █ █ █ █ █ █ █ █ ███ ████ ████ ███ ███ █████ Tags: projects , ai , generative-ai , llm , llm-tool-use , coding-agents , claude-code , claude-mythos-fable

projectsaigenerative-ai

Simon Willison's AI Notes2026年7月3日

Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

Simon Willison's AI Notes 发布的媒体报道：Research: Using DSPy to evaluate and improve Datasette Agent's SQL system prompts One of this morning's AIE keynotes covered dspy , which reminded me I've been meaning to see if it could help me improve the system prompt used by Datasette Agent - so I fired off an asynchronous research task in Claude Code for web using Claude Fable 5: Pip install the latest Datasette alpha and datasette-agent and dspy - then figure out how to use dspy to evaluate and improve the main system prompts used by Datasette Agent for the feature where it can execute read only SQL queries to answer user questions about data. Fable chose to test using GPT 4.1 mini and nano, and identified several promising looking directions for improvements. I particularly like this one: The schema listing gives only table names; the "don't call describe_table if you already have the information" advice caused column-name guessing (page_count, o.order_id, first_name) and error-retry loops in baseline traces. Either include column names in the prompt's schema listing or soften that advice. Tags: ai , datasette , generative-ai , llms , evals , dspy , datasette-agent , claude-mythos-fable

aidatasettegenerative-ai

Simon Willison's AI Notes2026年7月3日

Understand to participate

Simon Willison's AI Notes 发布的媒体报道：I saw Geoffrey Litt speak at AIE yesterday, and one framing he used particularly resonated with me: Understand to participate Geoffrey was talking about the challenge of collaborating with coding agents as they construct increasingly large and sophisticated changes, and the need to avoid taking on cognitive debt as your understanding drifts from how the code actually works. His argument is that you need to understand the code to a depth that enables you to participate further with the model: You can learn what the agent is doing to make sure you can be an active participant in the creative process. [...] You need a rich set of concepts in your mind to think creatively and fluently about how to move something forward. If you're lacking that fluency, your ability to participate in the project is meaningfully limited. The AIE talks are all recorded - all 300+ of them! - and should be trickling out over the next three weeks. Geoffrey's is one that I recommend catching on YouTube. Geoffrey also published a thread version of his talk on Twitter. Tags: ai , generative-ai , llms , geoffrey-litt , coding-agents , cognitive-debt

aigenerative-aillms

Simon Willison's AI Notes2026年7月1日

Quoting Anthropic

Simon Willison's AI Notes 发布的媒体报道：We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. — Anthropic , on Twitter Tags: ai , generative-ai , llms , anthropic , claude , claude-mythos-fable

aigenerative-aillms

Simon Willison's AI Notes2026年7月1日

Nano Banana 2 Lite

Simon Willison's AI Notes 发布的媒体报道：Nano Banana 2 Lite Also known as Gemini 3.1 Flash Lite Image ( gemini-3.1-flash-lite-image in their API ), this is the "fastest and cheapest Gemini image model, engineered for velocity and scale". I used AI studio to run this prompt: Do a where's Waldo style image but it's where is the raccoon holding a ham radio I like that one better than the results I got from the other Nano Banana models when I tried this back in April. It spelled Forest Festival wrong in two different ways though. Via Hacker News Tags: google , ai , generative-ai , llms , gemini , text-to-image , llm-release , nano-banana

googleaigenerative-ai

Simon Willison's AI Notes2026年7月1日

What's new in Claude Sonnet 5

Simon Willison's AI Notes 发布的媒体报道：What's new in Claude Sonnet 5 Claude Sonnet 5 came out this morning . I always head straight for the "what's new" developer docs because they tend to have more actionable information than the official announcement post. Anthropic say of Sonnet 5 that "its performance is close to that of Opus 4.8, but at lower prices". The system card helps explain how they were able to release the model without being blocked by the US government: Sonnet 5 is significantly less capable at cyber tasks than Mythos 5: its safeguards are thus similar to those we apply to Opus 4.7 and Opus 4.8 (models that are more capable than Sonnet 5 but much less capable than Mythos 5). Of note from the "what's new" API changes: Sampling parameters temperature , top_p , top_k are no longer supported. It has a 1 million token context window and 128,000 maximum output tokens. It features "the same set of tools and platform features as Claude Sonnet 4.6" Adaptive thinking is on by default, unless you specify "thinking": {type: "disabled"} . The pricing is the same as Sonnet 4.6: $3/million input, $15/million input, with an introductory discount to $2/$10 until 31st August. But... The model has a new tokenizer, where "The same input text produces approximately 30% more tokens than on Claude Sonnet 4.6." - effectively a 30% price increase. I used my Claude Token Counter tool to try out the new tokenizer. Here are my results for several larger documents: Document Sonnet 4.6 Opus 4.7 Sonnet 5 Universal Declaration of Human Rights (English) 2,356 3,347 1.42x 3,341 1.42x Universal Declaration of Human Rights (Spanish) 3,572 4,753 1.33x 4,747 1.33x Universal Declaration of Human Rights (Chinese, Mandarin Simplified) 3,334 3,366 1.01x 3,360 1.01x sqlite_utils/db.py (4,279 lines of Python) 44,014 56,118 1.28x 56,113 1.27x So the new token is roughly 1.4x times more expensive for English, 1.33x for Spanish, 1.28x for Python code and effectively the same cost for Simplified Mandarin. Here's the pelican . It's nothing to write home about. Sonnet 5 thinks it looks like a goose. Via Hacker News Tags: ai , generative-ai , llms , anthropic , claude , llm-pricing , pelican-riding-a-bicycle , llm-release

aigenerative-aillms

Simon Willison's AI Notes2026年7月1日

The AI Compass

Simon Willison's AI Notes 发布的媒体报道：The AI Compass This political compass style quiz by bambamramfan is pretty neat - answer 29 questions about AI and AI ethics to see which of the 30 archetypes you best fit. I'm impressed that my answers on my first time through the quiz categorized me as "The Garage Tinkerer", patron saint myself! It's implemented as a single page React app using the <script type="text/babel"> trick to avoid the necessary build step. Here's the code . Via @erisianrite.com Tags: ai , generative-ai , llms , ai-ethics

aigenerative-aillms