It’s an artificial intelligence commonplace to say that machine learning, which depends on large amounts of data, works by finding patterns in the data.
The phrase “finding patterns in data” has actually been a staple phrase in things like data mining and knowledge discovery for years now, and it’s been speculated that machine learning, and its Deep learning variant in particular, only continue the tradition of finding such patterns.
AI programs do, indeed, result in patterns, but just as The fault, dear Brutus, lies not in our stars but in ourselves, the fact of those patterns is not something in the data , this is what the AI program does with the data.
Almost all machine learning models work through a learning rule that changes the so-called weights, also called parameters, of the program when the program receives sample data and, optionally, labels attached to that data. It is the value of the weights that counts as “knowing” or “understanding”.
The pattern that is found is really a pattern of how the weights change. The weights simulate how real neurons are supposed to “fire”, the principle formed by psychologist Donald O. Hebb, which has become known as Hebbian learning, the idea that “neurons that fire light up together, connect together”.
Also: AI in sixty seconds
It is the pattern of weight changes that is the model of learning and understanding in machine learning, which the founders of deep learning pointed out. As expressed almost forty years ago, in one of deep learning’s founding texts, Parallel Distributed Processing, Volume I, James McClelland, David Rumelhart, and Geoffrey Hinton wrote:
What is stored are the connection forces between the units that allow these patterns to be created […] If knowledge is the strength of connections, learning must be about finding the right connecting strengths so that the right patterns of activation are produced under the right circumstances.
McClelland, Rumelhart, and Hinton were writing for a select audience, cognitive psychologists and computer scientists, and they were writing in a very different time, a time when people didn’t make easy assumptions that everything a computer did represented “knowledge.” “. They were working at a time when AI programs couldn’t do much at all, and they were mainly concerned with how to produce a computation, any computation, from a fairly limited arrangement of transistors .
Then, starting with the rise of powerful GPU chips around sixteen years ago, computers really began to produce interesting behavior, capped off by the historic ImageNet performance of Hinton’s work with his graduate students in 2012 that marked the advent of deep learning.
Following the new computing achievements, the popular mind started building all sorts of mythologies around AI and deep learning. There’s been a rush of really bad headlines comparing technology to superhuman performance.
Also: Why are AI reports so bad?
The current design of AI has obscured what McClelland, Rumelhart and Hinton have focused on, namely the machine, and how it “creates” patterns, as they put it. They were very familiar with the mechanics of weights building a model in response to what was, in the input, only data.
Why does all this matter? If the machine is the pattern maker, then the conclusions people draw about AI are probably mostly wrong. Most people assume that a computer program perceives a pattern in the world, which can cause people to defer judgment to the machine. If it produces results, it is thought, the computer must be seeing something that humans cannot see.
Except that a machine that builds patterns doesn’t see anything explicitly. It is building a model. This means that what is “seen” or “known” is not the same as the familiar, everyday sense in which humans speak of themselves as knowing things.
Instead of starting from the anthropocentric question, what does the machine know? it is preferable to start from a more precise question, What does this program represent in the relations of its weights?
Depending on the task, the answer to this question takes several forms.
Consider computer vision. The convolutional neural network that underpins machine learning programs for image recognition and other visual perceptions is composed of a set of weights that measure the values of pixels in a digital image.
The pixel grid is already an imposition of a 2D coordinate system on the real world. Provided with the friendly coordinate grid abstraction, the task of representing a neural network boils down to matching the strength of pixel collections to a label that has been imposed, such as “bird” or “blue jay” .
In a scene containing a bird, or more specifically a blue jay, many things can happen, including clouds, sunshine, and passers-by. But the entire scene is not the thing. What matters to the program is the collection of pixels most likely to produce an appropriate label. The pattern, in other words, is a reductive act of focus and selection inherent in the activation of neural network connections.
You could say that a program like this doesn’t “see” or “perceive” as much as it filters.
Also: A new experiment: Does the AI really know cats or dogs — or whatever?
The same is true in games, where the AI masters chess and poker. In full-information chess, for DeepMind’s AlphaZero program, the machine learning task boils down to working out a probability score at each instant of how much a potential next move will ultimately lead to winning, losing or draw.
Since the number of potential future configurations of the game board cannot be calculated by even the fastest computers, computer weights cut short the search for moves by doing what might be called a summary . The program summarizes the probability of success if one were to pursue several moves in a given direction, then compares this summary to the summary of potential moves to take in another direction.
While the state of the chessboard at any time – the position of the pieces and the remaining pieces – can “mean” anything to a human chess grandmaster, it is not clear that the term “mean” has any meaning. makes sense for DeepMind’s AlphaZero for such a synthesis task. .
A similar synthesis task is accomplished for the Pluribus program which in 2019 conquered the most difficult form of poker, No-limit Texas Hold’em. This game is even more complex in that it contains hidden information, the players’ face-down cards, and additional “stochastic” elements of bluffing. But the representation is, again, a summary of the per-turn likelihoods.
Even in human language, what’s in the weights is different than the casual observer might assume. GPT-3, OpenAI’s best language program, can produce amazingly human-like output in sentences and paragraphs.
Does the program “know” the language? Its weights contain a representation of the probability of finding individual words and even entire strings of text in sequence with other words and strings.
You could call this function of a neural network a summary similar to AlphaGo or Pluribus, since the problem looks a bit like chess or poker. But the possible states to represent as connections in the neural network are not only vast, they are infinite given the infinite composability of language.
On the other hand, since the output of a language program such as GPT-3, a sentence, is a fuzzy answer rather than a discrete score, the “right answer” is somewhat less demanding than the gain, loss or draw in chess. or the poker. You can also call this function of GPT-3 and similar programs “indexing” or “inventorying” things in their weights.
Also: What is GPT-3? Everything Your Business Needs to Know About OpenAI’s Revolutionary AI Language Program
Do humans have a similar kind of language inventory or index? There doesn’t seem to be any indication of this so far in neuroscience. Similarly, in the expression to distinguish the dancer from the dance, does GPT-3 identify the multiple levels of meaning in the sentence, or the associations? It’s not clear that such a question even makes sense in the context of a computer program.
In each of these cases – chessboard, cards, strings of words – the data is what it is: a shaped substrate divided in various ways, a set of plastic rectangular paper products, a grouping of sounds or shapes. That such inventions “mean” something, collectively, to the computer, is only one way of saying that a computer adapts in response, for a purpose.
The things that this data invites into the machine – filters, summaries, indexes, inventories, or whatever you want to characterize these representations – are never the thing in itself. They are inventions.
Also: DeepMind: why is AI so good at language? It’s something in the language itself
But, you might say, people see snowflakes and see their differences, and catalog those differences as well, if they feel like it. Certainly, human activity has always sought to find models, by various means. Direct observation is one of the simplest ways, and in a sense what’s done in a neural network is kind of an extension of that.
One could say that the neural network reveals what has always been true in human activity for millennia, that talking about patterns is a thing imposed on the world rather than a thing in the world. In the world, snowflakes have a form, but this form is only a model for a person who collects, indexes and categorizes them. It’s a construct, in other words.
Modeling activity will increase dramatically as more programs are activated on the data of the world and their weights are adjusted to form connections that we hope will create useful representations. Such representations can be incredibly useful. They could one day cure cancer. It is useful to remember, however, that the patterns they reveal are not there in the world, they are in the eye of the beholder.
Also: DeepMind’s ‘Gato’ is mediocre, so why did they build it?