When planning a new software development project
This article is designed to give people with no computer science background some insight into how ChatGPT and similar AI systems work (GPT-3, GPT-4, Bing Chat, etc).
1. What is Artificial Intelligence?
But first, let’s start with some basic terminology that you are probably hearing a lot. What is artificial intelligence?
- Artificial Intelligence: An entity that performs behaviors that a person might reasonably call intelligent if a human were to do something similar.
It is a bit problematic to define artificial intelligence by using the word “intelligent”, but no one can agree on a good definition of “intelligent”. However, I think this still works reasonably well. It basically says that if we look at something artificial and it does things that are engaging and useful and seem to be somewhat non-trivial, then we might call it intelligent. For example we often ascribe the term “AI” to computer-controlled characters in computer games. Most of these bots are simple pieces of if-then-else code (e.g., “if the player is within range then shoot else move to the nearest boulder for cover”). But if we are doing the job of keeping us engaged and entertained, and not doing any obviously stupid things, then we might think they are more sophisticated than the are.
Once we get to understand how something works, we might not be very impressed and expect something more sophisticated behind the scenes. It all depends on what you know about what is going on behind the scenes.
They key point is that artificial intelligence is not magic. And because it is not magic, it can be explained.
So let’s get into it.
2. What is Machine Learning?
2.1 What is Machine Learning?
2.1.1 What is Machine Learning?
Another term you will often hear associated with artificial intelligence is machine learning.
Machine Learning: A means by which to create behavior by taking in data, forming a model, and then executing the model.
- Tobias Kern CEO
Sometimes it is too hard to manually create a bunch of if-then-else statements to capture some complicated phenomenon, like language. In this case, we try to find a bunch of data and use algorithms that can find patterns in the data to model.
But what is a model? A model is a simplification of some complex phenomenon. For example, a model car is just a smaller, simpler version of a real car that has many of the attributes but is not meant to completely replace the original. A model car might look real and be useful for certain purposes, but we can’t drive it to the store.
Just like we can make a smaller, simpler version of a car, we can also make a smaller, simpler version of human language. We use the term large language models because these models are, well, large, from the perspective of how much memory is required to use them. The largest models in production, such as ChatGPT, GPT-3, and GPT-4 are large enough that it requires massive super-computers running in data center servers to create and run.
3. What is a Neural Network?
- There are many ways to learn a model from data. The Neural Network is one such way. The technique is roughly based on how the human brain is made up of a network of interconnected brain cells called neurons that pass electrical signals back and forth, somehow allowing us to do all the things we do. The basic concept of the neural network was invented in the 1940s and the basic concepts on how to train them as were invented in the 1980s. Neural networks are very inefficient, and it wasn’t until around 2017 when computer hardware was good enough to use them at large scale.
- But instead of brains, I like to think of neural networks using the metaphor of electrical circuitry. You don’t have to be an electrical engineer to know that electricity flows through wires and that we have things called resistors that make it harder for electricity to flow through parts of a circuit.
- Imagine you want to make a self-driving car that can drive on the highway. You have equipped your car with proximity sensors on the front, back, and sides. The proximity sensors report a value of 1.0 when there is something very close and report a value of 0.0 when nothing is detectable nearby.
- You have also rigged your car so that robotic mechanisms can turn the steering wheel, push the brakes, and push the accelerator. When the accelerator receives a value of 1.0, it uses maximum acceleration, and 0.0 means no acceleration. Similarly, a value of 1.0 sent to the braking mechanism means slam on the brakes and 0.0 means no braking. The steering mechanism takes a value of -1.0 to +1.0 with a negative value meaning steer left and a positive value meaning steer right and 0.0 meaning keep straight.
- You have also recorded data about how you drive. When the road in front is clear you accelerate. When there is a car in front, you slow down. When a car gets too close on the left, you turn to the right and change lanes. Unless, of course, there is a car on your right as well. It’s a complex process involving different combinations of actions (steer left, steer right, accelerate more or less, brake) based on different combinations of sensor information. Now you have to wire up the sensor to the robotic mechanisms. How do you do this? It isn’t clear. So you wire up every sensor to every robotic actuator.