The Perceptron

Created by Cornell University professor Frank Rosenblatt in 1958, the original perceptron consisted of a binary input layer, a series of hidden neurons with potentiometer-adjustable weights, and an output layer. Rosenblatt used the Mark-1 Perceptron to demonstrate that even such a seemingly trivial machine could be "trained" to perform image identification tasks. He spun this result into a theory of perception described in his 1962 monograph Principles of Neurodynamics.

A single-layer perceptron is ultimately a straightforward computational engine. At its most basic, a single layer perceptron consists of an input layer, with a number of "neurons" equal to the number of dimensions in the input dataset, a "hidden" layer of neurons that take in a weighted version of all these inputs and compose them using a particular "activation function" and potentially an output layer that composes the output of all of the hidden neurons. The model is trained by adjusting the weighting of the inputs to each of the hidden layer neurons.

To put it even more simply, a single layer perceptron learns by changing the inputs to a known function, resident in each hidden layer neuron, by an adjustable weighting value. This is most often done by using some sort training algorithm that guides the direction and magnitude of the changes in the weights based off of the success of the model at classifying datapoints found in a preclassified training dataset. In Rosenblatt's case, this process was actuated physically using electric motors to change the value of potentiometers.

However, what if instead of training a model in this automated fashion, feeding it training data and watching it minimize its error, we were to do it ourselves with immediate visual feedback? This is where the notion of a decision boundary becomes central.


The Decision Boundary

The visualization on the left is a plot of a number of datapoints collected from a hypothetical real world experiment. Each point is characterized by a 2 dimensional tensor, plotted here as a cartesian X,Y axis. These datapoints could be anything -- say the width and height of a sampling of human hands, or the rating of a movie given by two different critics. The data ranges in value from 0 to 5 across both of its dimensions and, critically, has been seperated into two classes, one denoted by purple coloration and the other by orange.

Your task is to serve as the learning function of the perceptron. The weights of the inputs to the perceptron hidden layer (as well as the bias, a weight that is always "on") are adjustable using the sliders below the plot. The manipulation of these weights are visualized by the line placed on the plot. This line is the decision boundary. The decision boundary should seperate the points into two groups, orange and purple, as cleanly and sharply as possible.

Classifying New Data

If the decision boundary cleanly seperate the classes, you have effectively trained this perceptron. The values of the weights can now be used to be classify further data into one of the two classes. The interface at the right allows you to see this process in action. Use the sliders to define a new datapoint by picking an x and a y value. Simply based off your visual inspection of the plot, how should this new datapoint be classified?

Pressing the "classify" button will calculate the classification of your new point according to the weights you selected during the training process. The calculation of the weighted inputs (so your new point * your two weights, summed) and the result of the activation function are displayed above the button. How was your new point classified in actuality?



Weight Calculation (w1*in1 + w2*in2 + 1*bias):

Heaviside Function (1 if weight calculation > 0, else 0):


The Limits of Perception

This plot, like the one above, records a different sample of datapoints with two dimensions x and y. However, their distribution on the decision plane is different than in our first classification task. How might you describe their geometry?

As before, use the weight sliders to position your decision boundary. Are you able to produce a clean boundary between the two classes?

In fact, a one layer perceptron is only able to distinguish linearly seperable classes -- that is classes that can be split by a linear-function goverened decision boundary. This was shown by Minksy and Papert in their 1970 book Perceptrons.