For many reasons, I’m generally bearish on the idea that AGI, or “artificial general intelligence,” will emerge from the LLM models and algorithms most of us are familiar with by now. I wrote almost two years ago about ChatGPT and my views are pretty much the same as they were then. However, in my work as a software engineer, I’ve started using LLMs pretty regularly in order to complete certain tasks1 more quickly. The results are a bit mixed, but it’s useful often enough (and the tasks are tedious enough) that I’d miss it if it were gone.
A lot of people on twitter have been talking about their experiences using Cursor, an IDE2 with an LLM built in, to quickly build simple apps and tools. The promise: describe your idea in plain English, and the LLM will build it — no coding skill required. These promises often come with claims that those of my professional ilk will be out of jobs soon. I’m pretty skeptical about these claims. Writing code usually feels like the easy part of my job, as opposed to considering interactions between complex systems, managing limited resources, and defining the correct set of operations. Tools that produce code mostly for closed systems with well-defined constraints don’t threaten me in the slightest.
Also, in a previous lifetime, I built a large part of my professional identity on helping people from non-technical backgrounds realize their ideas for video games and interactive art using popular “low-code” tools like GameMaker and Stencyl. Given this, it would be enormously hypocritical of me to look down my nose at anyone who takes a crack at LLM coding. Code-generation tools, which allow programmers to work at higher levels of abstraction, have been with us almost as long as code has. Over the years they’ve gotten more and more sophisticated, but they’ve always come with the same set of claims: first, that they’ll open up possibilities for people with fewer technical skills (and push skilled programmers out of the job market). Second, that the people who use these tools are not “real” programmers and are destined for failure once they need to build something more complex than a toy demo. Back around the time I wrote this article, when a lot of the discourse was about heavily-scaffolded frameworks like Ruby on Rails, I thought that the gatekeeping was mean-spirited and silly — but I also thought that coding as a skill was probably still necessary for most people with serious aspirations in that arena. People with ideas they’re invested in realizing, rather than a wish to try out a tool for the sake of it, tend to outgrow the “no coding” sandbox very quickly.
Despite maintaining an AI-skeptical point of view, I wanted to make sure that my general (often negative) feelings about AI weren’t leading me to underestimate its capabilities, especially since the boosters claim so often that the models have gotten dramatically better in the past few months. After all, self-driving cars have surpassed what I thought they’d ever be capable of a few years ago. But my first attempt at a larger AI-assisted coding task didn’t go very well. I asked for an implementation of a medium-complex algorithm from a research paper, combining tree-merging, node ranking, and graph search. The result was certainly “vibey,” with the broad strokes of the operations in place but large and important chunks missing or wrong. The better strategy turned out to be to ask it for implementations of the simpler individual components (producing a textbook breadth-first search is something it can, of course, do very well), and doing the more fiddly integration work myself.
For my most recent attempt, where I used Cursor rather than just prompting an LLM, I did taste something of the magic that people are talking about, at least at first. But I’m still not especially worried about my job.
Years ago, I read online somewhere about a webapp that sounded like an interesting tool for writers. The idea was that it would feed you small “bites” of a longer work to edit individually. Instead of sitting down and confronting an intimidating bulk of text, you could work piece-by-piece in short sessions, and perhaps look at your words, removed from their original context, with fresh eyes. In my memory, this tool was called “Edit Bites,” but I’ve searched many times and have not been able to find a tool by that name, or indeed anything similar at all. This seemed like something that would be well within the scope of what Cursor/Claude could create, and something that would be personally useful to me if it worked.
I started by asking Claude if such a tool existed already. Here’s what I prompted:
Do any software tools exist for writers with the following functionality:
choose a random small selection from a draft (a sentence, paragraph, or page) and show that selection in an edit window
allow the author to edit the selection
when the author is finished, save the changed text to the draft
This would be intended to make editing a large project less intimidating by surfacing small "chunks" one at a time. Does this tool exist?
Claude responded that it didn’t know of any such tool. It gamely encouraged me to pursue the idea (“Your idea could be quite valuable for writers who feel overwhelmed by large projects. It reminds me of the ‘snowflake method’ or other incremental editing approaches, but with a random element to keep things fresh.”) It suggested that this could be accomplished with a plugin, and I asked if it could explore that option further, specifying that I wanted something that would work with Google Docs. It generated some code for an App Script, as well as instructions on how to use it. I plugged it into my Google Doc and — it worked! Pretty much as I had imagined it. I was impressed.
But there were plenty of bugs once I started using it on my novel-in-progress, so I loaded the code into Cursor to see whether the AI could iterate and improve. Simple tasks were easy for the AI to handle, like changing the buttons and adjusting the writing workflow. Other tasks needed my guidance: the original way it chose random sentences for editing was slow, making the app appear to freeze or hang, so I asked it to re-implement that part with a more efficient method of my choosing. Certain other things it “pretended” to implement: I asked if the text immediately preceding and following the excerpt could be shown above and below the edit box, for context. Claude added some code to do this, but it simply didn’t work. When I prompted it to find and fix the problem, it made a show of supplying an improved version of the code, but the improved version didn’t work either. And when I asked it to make the dialog box look nicer and more polished, the result did not, I’m afraid, meet my aesthetic standards.
Another issue was the rich text formatting, which wasn’t preserved after editing. I realized that I wasn’t likely to get good use out of the tool if I had to go back and re-apply (although I suppose you could argue that my writing would be better if I had to repeatedly reconsider my italics). I asked if the edit box, which was plain text, could be changed into a WYSIWYG3 editor with formatting capabilities. It readily supplied one that appeared to work (using a 3rd-party tool, Quill). However, it didn’t actually do the job it needed to do, which was save the formatting back to the main document once I was done editing. When I asked for a fix, the result was akin to someone Googling the problem, grabbing some code from Stack Overflow, and pasting it into the project without regard for whether it actually addressed the issue or not. The AI was as likely to introduce new bugs as solve existing ones, and this tendency increased dramatically as I went on. When I tried to add a word counter showing the session’s progress, it broke editing entirely — and unnecessarily — by removing unrelated lines of code. The debugging all had to be done by me, the old-fashioned way, with step-throughs and log statements. And, at some point, I realized that trying to build a word processor inside another word processor was probably a losing proposition, and that it might be worthwhile to consider other approaches to the basic problem.
So, Claude plus Cursor was very good, even magical, at quickly generating a working prototype from a description of my idea, much faster than I could have done on my own. But its debugging skills were inadequate — I didn’t see a single instance of the AI correctly diagnosing and then fixing a bug in code it had written — and its performance declined dramatically the longer it spent as my “assistant.” This fits in with my previous assessment of LLMs: that their ability to produce human-mimicking text (and code is just a specialized form of text) is impressive at first, but that extended use will almost always reveal that text to flow not from “understanding” (by which I mean, an underlying mental model of the world and its workings that is being conveyed or expressed through that text) but sophisticated regurgitation. This will not be news to anyone.
The simple version of the tool was actually reasonably useful, even if I had to go back and fix my italics. It was rewarding and almost relaxing to click through snippets of my novel and prune the unnecessary or poorly-chosen words. If I can get it into a shape I’m happy with, I may post the code to GitHub so others can try it out, and I might continue to use it for my own longer writing projects. This also won’t be the last time that I use LLM tools to prototype an idea. I encourage non-coders to give it a shot — if anything, it should make coding more approachable. It still doesn’t take much for a human to do a better job than an AI.
Examples: creating regular expressions, writing scripts to make a simple change across a large number of files, writing SQL queries (and simplifying large procedurally-generated queries), reformatting or pretty-printing text, identifying the differences between two blocks of text, generating examples of how to perform a common task in an unfamiliar language, etc.
Integrated Development Environment — software you use to write, build, and test your coding projects.
This stands for “What You See Is What You Get,” and refers to text editing where you can actually see the formatting as it would be printed — bold text looks bold, italics look slanted, etc. This is opposed to something like Markdown where you’d indicate italics _like this_.
Great to have your perspective as a professional.
My experience mirrors yours, but at a much lower non-tech level. Claude helped this amateur writer considerably with refining a long form travel story, but initial wild enthusiasm devolved into skepticism as my interaction with the AI continued. Given the rapid, recent improvements, I’m still optimistic about the future.