I used AI to write code for 3 months. It was a bad choice.

AI is not the villain, nor the hero, that the world wants it to be.

I used AI to write code for 3 months. It was a bad choice.

A couple months ago, I started a contract that would prove to be one of the most difficult software development undertakings I'd ever begun. Some day, there will be an NDA-friendly story time about what I built, why it was hard, etc., but for now, I'd like to spend some time extracting my learnings regarding SWE's best friends and worst enemies: ChatGPT and Copilot.

There is (rightfully) a huge conversation surrounding AI's growing place in the software development space. If you'd like to see a very basic example of this conversation, here is ThePrimeagen's coverage of Devin, one of the most recent and most over-hyped software development centered LLM agents.

If ChatGPT took the world by storm overnight, it took over the collective psyche of the software development space shortly thereafter. It seemed like software development as a profession went from the world's safest career field to one that will go extinct by the weekend. Every thoughtleader needed to make sure that their hot take on the battle was heard as they screamed doom and gloom from the rooftops to every salivating follower and mainstream media reporter that would listen.

Many software development thought leaders took a different approach: AI will not take everyone's jobs, but it will have an overall negative effect on the profession as more developers will churn out more garbage code that is regurgitated from their trained LLM of choice. This demographic of discussion is one that I side with the most, but we will get into that later.

The point is, as with most conversations on The Internet ™️, the most viral takes are the ones at the extremes, while the truth and objective reality usually lands flatly in the middle. I'm not going to be an apologist for most of the harms done by AI, and in general I'm what one might call an AI/LLM skeptic bordering on opposition, but I will say this...

I spent 1 month using AI as much as I could during development

... and I learned some things.

First, let's set the stage for how I used AI. I took several approaches.

I started off by purchasing Copilot, Microsoft's AI pair-programmer. I started off mostly annoyed that prompting Copilot using neoVim (I use neoVim btw) was pretty irritating. I didn't love the workflow, but I wanted to force myself to get used to it to get a real feel of how useful (or less useful) AI is for programmers.

I used ChatGPT as a sounding board and, at first, as a quick reference for API's and libraries I wasn't used to. Here is where I need to reveal some extra context: this project is entirely in Rust, and relies heavily on some lesser known crates. You'll understand why this context is important in a moment, let me set the scene.

The workflow took a while to get used to. Copilot specifically started off as that thing that kept getting in the way. I kept tabbing code into being that I didn't want, and the recommendations were fairly basic at first. I then learned to switch on/off, and I think that's when it became more useful. Whenever I'd have to make an API call, serialize data into a data type, or make library calls that were fairly straightforward, I'd turn Copilot on and let it do its thing before turning it back off. I found this to be a really great way to keep Copilot out of my way but to use it where it was pretty good: filling out boilerplate code.

My usage of ChatGPT was a bit more nuanced. I used it as a straightforward replacement for Google, and for that it was fairly good. It was also fairly good at basic Rust iterator stuff ("I need to loop over this structure and that structure, mutate this vector, then return this type") but from the beginning, I made a point not to just copy/paste code but to read it, understand it and apply it in the right way. ChatGPT's explanation of its own code was okay, if, ironically, a bit robotic, but it helped sometimes.

For folks who are not artistically inclined but want to write blogs, Midjourney is a godsend

Then it went badly

Let's start where Copilot went badly by going back to an earlier criticism I noted: AI will make software developers write worse code. As I kept Copilot on and used it for boilerplate code, I started realizing some things:

  • I was not paying attention to my own data model. After all, Copilot is writing it for me, and if it needs to be changed, Copilot can change it...
  • ... which meant I was more frequently duplicating data, not paying attention to data types, etc. This lead to bad implementations (.clone() galore!) and just messiness all around.
  • I was also just straight up not learning anything. When you say "Copilot only writes my boilerplate code" what you end up with is a rapidly expanding definition of "boilerplate." At first it was just struct serialization, then it was library calls, then it was entire functions. Now all of a sudden, I had all of the worst parts of working with someone else's codebase (unfamiliar code, a lack of knowledge on the context in which it was written, etc.) and none of the benefits (learning from their coding style, being able to talk to the developer, etc.) and my efficiency dropped compared to what it would be if I'd just learned the thing.

What saved me from the detriments of these traps was actually switching from Neovim to Helix. Helix doesn't have a Copilot integration at the time of writing, but everything about it is awesome, so using Helix became my way of de-tox'ing from Copilot cold turkey.

ChatGPT had much more significant and glaring problems. Frankly, any issue outside of the very basics of the Rust programming language and the very basics of the more popular libraries lead to significant stumbling blocks for the LLM we all know and love. When I asked ChatGPT questions about the data model behind a lesser known PDF parsing crate, it straight up hallucinated entire data types, library functions and more. When I would tell it that it's hallucinating and even fed it a specific version number for the crate, the hallucinations would worsen. Asking it about complex issues like passing structs and self references into thread handles would cause it to get everything hysterically wrong.

I have some theories on why ChatGPT dealt so badly with this problem set. First, Rust in general is seeing its heyday only recently, which means discussion about the libraries, source code examples and more are likely pretty sparse in the LLM training data sets compared to JavaScript, PHP, etc. The libraries I was using were fairly new and fairly niche, so that trims down on the size of the training data that would be relevant for my needs even more. Finally, Rust has a specific way in doing things, in dealing with memory, function calls, types, etc. This means ChatGPT has more rules to try to grok, and it does not do that very well. In a language with no typing, like non-typed JS and Python, this problem becomes much less pronounced since there are far fewer rules governing what a "good answer" looks like.

As for code quality, of course it suffered. ChatGPT and Copilot are a great way to generate okay code that you as the developer will then have to go back to refactor. The only folks that will be replaced by ChatGPT and Copilot in the near term are, frankly, the developers whose sole purpose in life is to generate code that doesn't matter and could have been written by an LLM, a monkey with a traumatic brain injury or a dolphin.

How I will use AI in the future

Even after this negativity, I'm not going to completely toss out the creepy robot baby with the bathwater. I'm probably going to abandon Copilot in its entirety because I believe it simply leads to worse code and worse understanding of code over time. That was my more aggressive takeaway from this venture. Even though ChatGPT was more aggressively wrong to a much more detrimental effect, I'm still going to use it... just differently.

LLM's in my understanding cannot and will not ever create the AGI boogeybot that we are all afraid of (or, in the case of those e/acc weirdos, strangely excited for) because all it really does is make mathematical inferences based on large data sets and create renditions of already-existing information in your chosen language. What this means is that the code that an LLM writes is not actually written by the LLM... it's stitched together by the LLM from a wide array of other code snippets that OpenAI has... totally not stolen from the open source community.

What this means is that LLM's are great for tasks that are essentially reliant on a collation of data and a rendition of it in a certain form. They're great for synthesizing information and making it presentable and consumable. That's why they're fairly good at writing listicle blogs, regurgitating documentation in a way that doesn't immediately put you to sleep, etc.

LLM's are very bad at anything that relies more heavily on intuition, creative thinking, etc. simply because they cannot intuit or create, they can only regurgitate previously existing information.

As such, I'll be using ChatGPT as a stand-in for Google: a quick way to get probably correct answers about libraries, functions and requirements that are pretty well-known. I'm not going to use it to write code for me. I'm going to ask it to summarize some documentation I find, but I'm not going to ask it to create that documentation itself.

AI is not going to take your jobs. It's not even all that good at reading and regurgitating documentation... though most developers I know are pretty bad at that too.