Experimenting with Gradient Descent in Python

For awhile now, the Computer Science department at my University has offered a class for non-CS students called “Data Witchcraft“. The idea, I suppose, is that when you don’t understand how a technology works, it’s essentially “magic“. (And when it has to do with computers and data, it’s dark magic at that.)

But even as someone who writes programs all day long, there are often tools, algorithms, or ideas we use that we don’t really understand–just take at face value, because well, they seem to work, we’re busy, and learning doesn’t seem necessary when the problem is already sufficiently solved.

One of the more prevalent algorithms of this sort is Gradient Descent (GD). The algorithm is both conceptually simple (everyone likes to show rudimentary sketches of a blindfolded stick figure walking down a mountain) and mathematically rigorous (next to those simple sketches, we show equations with partial derivatives across n-dimensional vectors mapped to an arbitrarily sized higher-dimensional space).

Screen Shot 2016-12-15 at 11.54.44 PM.png — Image from course slides

So most often, after learning about GD, you are sent off into the wild to use it, without ever having programmed it from scratch. From this view, Gradient Descent is a sort of incantation we Supreme Hacker Mage Lords can use to solve complex optimization problems whenever our data is in the right format, and we want a quick fix. (Kind of like Neural Networks and “Deep Learning”…) This is practical for most people who just need to get things done. But it’s also unsatisfying for others (myself included).

However, GD can also be implemented in just a few lines of code (even though it won’t be as highly optimized as an industrial-strength version).

That’s why I’m sharing some implementations of both Univariate and generalized Multivariate Gradient Descent written in simple and annotated Python.

Anyone curious about a working implementation (and with some test data in hand) can try this out to experiment. The code snippets below have print statements built in so you can see how your model changes every iteration.

Full Repository

To download and run the full repo, clone it from here: https://github.com/adpoe/Gradient_Descent_From_Scratch

But the actual algorithms are also extracted below, for ease of reading.

Univariate Gradient Descent

Requires NumPy.

Also Requires data to be in this this format: [(x1,y1), (x2,y2) … (xn,yn)], where Y is the actual value.

	def gradient_descent(training_examples, alpha=0.01):
	"""
	Apply gradient descent on the training examples to learn a line that fits through the examples
	:param examples: set of all examples in (x,y) format
	:param alpha = learning rate
	:return:
	"""
	# initialize w0 and w1 to some small value, here just using 0 for simplicity
	w0 = 0
	w1 = 0

	# repeat until "convergence", meaning that w0 and w1 aren't changing very much
	# –> need to define what 'not very much' means, and that may depend on problem domain
	convergence = False
	while not convergence:
	# initialize temporary variables, and set them to 0
	delta_w0 = 0
	delta_w1 = 0

	for pair in training_examples:
	# grab our data points from the example
	x_i = pair[0]
	y_i = pair[1]

	# calculate a prediction, and find the error
	h_of_x_i = model_prediction(w0,w1,x_i)
	delta_w0 += prediction_error(w0,w1, x_i, y_i)
	delta_w1 += prediction_error(w0,w1,x_i,y_i)*x_i

	# store previous weighting values
	prev_w0 = w0
	prev_w1 = w1

	# get new weighting values
	w0 = w0 + alpha*delta_w0
	w1 = w1 + alpha*delta_w1
	alpha -= 0.001

	# every few iterations print out current model
	# 1. –> (w0 + w1x1 + w2x2 + … + wnxn)
	print "Current model is: ("+str(w0)+" + "+str(w1)+"x1)"
	# 2. –> averaged squared error over training set, using the current line
	summed_error = sum_of_squared_error_over_entire_dataset(w0, w1, training_examples)
	avg_error = summed_error/len(training_examples)
	print "Average Squared Error="+str(avg_error)


	# check if we have converged
	if abs(prev_w0 – w0) < 0.00001 and abs(prev_w1 – w1) < 0.00001:
	convergence = True

	# after convergence, print out the parameters of the trained model (w0, … wn)
	print "Parameters of trained model are: w0="+str(w0)+", w1="+str(w1)
	return w0, w1


	############################
	##### TRAINING HELPERS #####
	############################
	def model_prediction(w0, w1, x_i):
	return w0 + (w1 * x_i)

	def prediction_error(w0, w1, x_i, y_i):
	# basically, we just take the true value (y_i)
	# and we subtract the predicted value from it
	# this gives us an error, or J(w0,w1) value
	return y_i – model_prediction(w0, w1, x_i)

	def sum_of_squared_error_over_entire_dataset(w0, w1, training_examples):
	# find the squared error over the whole training set
	sum = 0
	for pair in training_examples:
	x_i = pair[0]
	y_i = pair[1]
	sum += prediction_error(w0,w1,x_i,y_i) ** 2
	return sum

view raw

uni_gd.py

hosted with ❤ by GitHub

Multivariate Gradient Descent

Requires NumPy, same as above.

Also Requires data to be in this this format: [(x1,..xn, y),(x1,..xn, y) …(x1,..xn, y)], where Y is the actual value. Essentially, you can have as many x-variables as you’d like, as long as the y-value is the last element of each tuple.

	def multivariate_gradient_descent(training_examples, alpha=0.01):
	"""
	Apply gradient descent on the training examples to learn a line that fits through the examples
	:param examples: set of all examples in (x,y) format
	:param alpha = learning rate
	:return:
	"""
	# initialize the weight and x_vectors
	W = [0 for index in range(0, len(training_examples[0][0]))]

	# W_0 is a constant
	W_0 = 0

	# repeat until "convergence", meaning that w0 and w1 aren't changing very much
	# –> need to define what 'not very much' means, and that may depend on problem domain
	convergence = False
	while not convergence:

	# initialize temporary variables, and set them to 0
	deltaW_0 = 0
	deltaW_n = [0 for x in range(0,len(training_examples[0][0]))]

	for pair in training_examples:
	# grab our data points from the example
	x_i = pair[0]
	y_i = pair[1]

	# calculate a prediction, and find the error
	# needs to be an element-wise plus
	deltaW_0 += multivariate_prediction_error(W_0, y_i, W, x_i)
	deltaW_n = numpy.multiply(numpy.add(deltaW_n, multivariate_prediction_error(W_0, y_i, W, x_i)), x_i)

	#print "DELTA_WN = " + str(deltaW_n)
	# store previous weighting values
	prev_w0 = W_0
	prev_Wn = W

	# get new weighting values
	W_0 = W_0 + alpha*deltaW_0
	W = numpy.add(W,numpy.multiply(alpha,deltaW_n))
	alpha -= 0.001

	# every few iterations print out current model
	# 1. –> (w0 + w1x1 + w2x2 + … + wnxn)
	variables = [( str(W[i]) + "*x" + str(i+1) + " + ") for i in range(0,len(W))]
	var_string = ''.join(variables)
	var_string = var_string[:-3]
	print "Current model is: " + str(W_0)+" + "+var_string
	# 2. –> averaged squared error over training set, using the current line
	summed_error = sum_of_squared_error_over_entire_dataset(W_0, W, training_examples)
	avg_error = summed_error/len(training_examples)
	print "Average Squared Error="+str(sum(avg_error))
	print ""

	# check if we have converged
	if abs(prev_w0 – W_0) < 0.00001 and abs(numpy.subtract(prev_Wn, W)).all() < 0.00001:
	convergence = True

	# after convergence, print out the parameters of the trained model (w0, … wn)
	variables = [( "w"+str(i+1)+"="+str(W[i])+", ") for i in range(0,len(W))]
	var_string = ''.join(variables)
	var_string = var_string[:-2]
	print "RESULTS: "
	print "\tParameters of trained model are: w0="+str(W_0)+", "+var_string
	return W_0, W


	################################
	##### MULTIVARIATE HELPERS #####
	################################
	# generalize these to just take a w0, a vector of weights, and a vector x-values
	def multivariate_model_prediction(w0, weights, xs):
	return w0 + numpy.dot(weights, xs)

	# again, this needs to take just a w0, vector of weights, and a vector of x-values
	def multivariate_prediction_error(w0, y_i, weights, xs):
	# basically, we just take the true value (y_i)
	# and we subtract the predicted value from it
	# this gives us an error, or J(w0,w1) value
	return y_i – multivariate_model_prediction(w0, weights, xs)

	# should be the same, but use the generalize functions above, and update the weights inside the vector titself
	# also need to have a vector fo delta_Wn values to simplify
	def multivariate_sum_of_squared_error_over_entire_dataset(w0, weights, training_examples):
	# find the squared error over the whole training set
	sum = 0
	for pair in training_examples:

	x_i = pair[0]
	y_i = pair[1]

	# cast back to values in range [1 –> 20]
	prediction = multivariate_model_prediction(w0,weights,x_i) / (1/20.0)
	actual = y_i / (1/20.0)
	error = abs(actual – prediction)
	error_sq = error ** 2

	sum += error_sq

	return sum

view raw

mv_grad_desc.py

hosted with ❤ by GitHub

My data set is included in the full repo. But feel free to try it on your own, if you’re experimenting with this. And enjoy.

Experimenting with Gradient Descent in Python

Full Repository

Univariate Gradient Descent

Multivariate Gradient Descent

Published by adpoe

Leave a comment Cancel reply

Full Repository

Univariate Gradient Descent

Multivariate Gradient Descent

Share this:

Related

Published by adpoe

Leave a comment Cancel reply