AI, Perceptron, and the Uber Story.

Ayan Halder
5 min readJul 3, 2019

--

Almost everyone speaks about artificial intelligence nowadays. Not only its usage helps you raise venture capital, it makes you sound cooler and updated with technology trends.

My introduction to deep learning was through TensorFlow; and after reading about some of its real-life applications, I’ve been curious about its functionality ever since. Here I talk about how neural networks, the heart (or should we call the brain?) of deep learning, work. Before we dive deeper, let me tell you that the article has a very limited usage of mathematical equations but nothing that you can’t read through.

Artificial Neural Network and Perceptron

The core unit of an artificial neural network (ANN) is a Perceptron (sounds straight out of a sci-fi movie, isn’t it?).

A perceptron is a single ANN unit comprising of input neurons, an activation function, and the output. The input neurons serve as the inputs (each attribute of a bigger, complex algorithm), the activation functions add non-linearity to the network, and finally the output is obtained.

So, the core functionality is that multiple inputs with their initial assumed weights are added up, a non-linearity is applied, and an output is obtained. It’s that simple!

The image below shows how a single perceptron looks like:

A Perceptron

Here, X(1) , X(2),….X(n) are the input neurons, and w(1), w(2),…., w(n) are the initial weights.

Just like single output perceptron, we can have dual or multi-output perceptron, and the outputs can be binary, continuous, or categorical.

Activation Function

Activation function helps in adding non-linearity to the output. The reason a non-linearity is more effective than a linear solution is because in real-life scenarios, almost nothing is linear. The nonlinearity helps in explaining the data better. Usage of the activation functions is contextual; however, Rectifier and Sigmoid functions are highly used.

The Sigmoid function is used where a probability is expected as an output. ReLU (Rectified Linear Unit) function is widely used since it doesn’t activate the neuron if the input is negative. This means that at a time only a few neurons are activated making the network efficient and easy for computation.

Single Layer Neural Network

All right, so now that we understand what a Perceptron is, let’s introduce a hidden layer and create a single layer neural network. It looks like the following:

A single layer neural network

The hidden layer adds some complexity to the computation and is used in cases when multiple pre-computations and complex algorithms are in play. The number of hidden layers depend on the expected functional complexity.

Let’s take Uber’s example: Uber wants to show the “estimated arrival time” and the “fare” when the user searches for a ride. So, there will be two outputs — Arrival time and Fare.

Let’s assume the following input parameters:

· Rider’s current location

· Rider’s destination location

· Current location of the driver

· No. of passengers in the car (eg: UberPool).

During the very first iteration, the weights are generally assumed (in a way that the cumulative sum of weights is 1). Now, each hidden layer receives the input and its corresponding weight from each of the input neurons. There would be some computation happening in each neuron in the hidden layer (for example, z1 might put more weight on the rider’s location and destination while z2 might prioritize the number of passengers in the car only). After the pre-computations, they are summed up, an activation function is applied and sent as an output. This is when you see the arrival time and fare.

Now, this was just the first iteration and the weights were assumed. It’s quite certain that they were gravely incorrect. Here comes the idea of cost function and back propagation — something that distinguishes AI from binary classification.

Once the ride is completed, Uber will record the actual time of arrival and the actual fare that they should have charged (based on driver’s feedback, total travel duration, traffic etc.). A cost function is calculated based on the difference in calculated output from actual output.

Below is a typical cost function:

A typical cost function

In the cost function, the y-bar is the calculated output and y is the actual one. The goal then is to minimize the cost function to the greatest possible extent. This is done through Back Propagation, i.e., feeding the error to the network and auto-adjusting the initial weights based on the error.

After the adjustment, the second iteration starts. This continues until a satisfactory output is obtained.

Over time, each neuron in the hidden layer prioritizes the data from the input neurons by considering some while rejecting others. For example, through multiple iterations, if z1 “learns” that inputs from x1 and x3 are more valuable since in those situations, the weight readjustment is close to 0, however whenever there’s an input from x2, a major readjustment happens to the weight, z1 starts either deprioritizing or entirely rejecting input from x2. This entirely depends on what each neuron in the hidden layer is slated to do.

This is how a machine learns by itself and gets better with more data. The weights are fine-tuned with each error back propagation and the cost function almost tends to zero through many epochs.

Image credits: Google Images

--

--

Ayan Halder
Ayan Halder

Written by Ayan Halder

Product at Arkose Labs. I write about anti-fraud products and strategies.

No responses yet