https://x.com/valhalla_dev/status/1830123773116236093
Sometimes it pays to listen to the cool kids. No, you shouldn’t start doing drugs, don’t listen to those cool kids, but when the ML nerds say to listen to Karpathy, you should.
Like I mentioned in the last study sesh, I failed stats 1.5 times (veeery nearly failed it the second time around) and struggled through my other college math coursework. One of the things that kept me from starting AI/ML earlier was being afraid of the math. Algebra kills me, as does trigonometry and most other types of mathematics.
What the first taste of Karpathy’s teaching style taught me was that I straight up did not have the right types of teachers in high school and college. In a couple minutes, he explained derivatives better than any of my college professors had in full semesters.
https://x.com/valhalla_dev/status/1830125171887190431
I could write out a whole blog post about how good Karpathy is at teaching math but you’re all probably wanting me to cut to the chase here, so let’s get into the learnings.
The first Karpathy video I watched was The spelled-out intro to neural networks and backpropagation.
https://www.youtube.com/watch?v=VMj-3S1tku0&list=WL&index=4
It’s… really good. In it he walks you through what a neural network actually is, builds out the graphs, walks through the math at an intuitive, not just symbolic, level and explains backpropagation, first manually, then in a more systematic and automated manner. He doesn’t leave the math a mystery, but he also doesn’t get bogged down in it.
He had a great explanation of derivatives in here that made it click: “if you’re looking for the derivative of x with respect to y (dx/dy) then you’re looking for the slope of the line at x of a function y.” You end up with a function like this.
This made no sense to me in college, but Karpathy’s explanation was fantastic. Take a very small and arbitrary h, add it to f(x), subtract the original f(x), divide by h, and observe the output. Then make h smaller. Then a little smaller. It should approach a given number, and that’s your derivative.
This is a very “practical computer science” way to explain math that I love. Don’t worry about the symbolic theory behind it, we have the most complex computers in the history of the world, we can literally just do the math and calculate it several times and let your brain intuit it.
Backpropagation, or starting at the output of a function and going backward throughout its steps to determine the gradient (derivative) of each step of the function, is how these derivatives were applied, and he walks through them manually at first before explaining how to systematize it. It uses the chain rule, which he explained far better than I have and infinitely better than my professors did, to propagate backwards and calculate the gradients.
I didn’t really understand why he explained derivatives until he moved on to explain that the nodes within a neural network are just a series of mathematical functions with weights and biases that are then run through a “squashing function” that normalizes them. Backpropagating is the process of starting at one’s output, finding the local and true derivative of the functions at their individual steps and labeling those “gradients” for each step to determine how much the weights, biases, and overall nodes affect the output. These observations paired with some loss function can help you determine how to “teach” a neural network by changing its weights and biases.