Perceptron is one of the simplest artificial neural network architectures. It is a simple feedforward (no recurrent connections) neural network consisting of connected nodes.
A single layer perceptron takes several variables, $x_1,x_2,…,x_m$, and predicts a single output $\hat y$ based on a linear combination of these inputs followed by a non-linear function.
Weights, $w_1,w_2,…,w_m$, express the importance of the respective inputs to the output. The weighted sum of inputs $\sum_{j} w_jx_j$ added with a bias embodies a linear equation. The output of the linear equation is passed into a non-linear activation function to produce the final output $\hat y$. The equation of the output is as follow:
$$ \hat y = g(b+\sum_{i=1}^{m}x_iw_i) $$
Why do we need activation functions? Most relationships between input and output cannot be represented by a simple straight line. Therefore, by introducing non-linearity, the neural network is better able to understand complex patterns from data.
Moreover, non-linearity are useful when we want to constrain our data within a certain range (e.g. probability between 0 and 1)
A commonly used activation function is the sigmoid function.
$$ \sigma(z) = \frac{1}{1 + e^{-z}} $$