*Note: I am experimenting with exporting Jupyter notebooks into a WordPress ready format. This notebook refers specifically to to the Nature Conservancy Kaggle, for classifying fish species, based only on photographs.*

# Category: Artificial Intelligence

I’ve been messing around with a few things during my time off for the Holidays:

**Kaggle**- This one, for the Nature Conservatory, specifically

**Keras**- For using Convolution Neural Networks, without the headaches

**Jupyter Notebooks**- For presenting data analysis results in Python
- Also experimenting with whether I can port these to WordPress easily

So here’s a quick combination of these things, in the form of a simple guide to using **Keras** on the **Nature Conservatory** image recognition Kaggle.

Hopefully it serves as an easy introduction to get up and running with neural networks for this competition.

### Jupyter Notebook

For awhile now, the Computer Science department at my University has offered a class for non-CS students called “**Data Witchcraft**“. The idea, I suppose, is that when you don’t understand how a technology works, it’s essentially “**magic**“. (And when it has to do with computers and *data*, it’s dark magic at that.)

But even as someone who writes programs all day long, there are often tools, algorithms, or ideas we use that we don’t really understand–just take at face value, because well, they seem to work, we’re busy, and learning doesn’t seem necessary when the problem is already sufficiently solved.

One of the more prevalent algorithms of this sort is **Gradient Descent (GD)**. The algorithm is both **conceptually simple** (everyone likes to show rudimentary sketches of a blindfolded stick figure walking down a mountain) and **mathematically rigorous **(next to those simple sketches, we show equations with partial derivatives across n-dimensional vectors mapped to an arbitrarily sized higher-dimensional space).

So most often, after learning about GD, you are sent off into the wild to use it, without ever having programmed it from scratch. From this view, Gradient Descent is a sort of incantation we Supreme Hacker Mage Lords can use to solve complex optimization problems whenever our data is in the right format, and we want a quick fix. (Kind of like Neural Networks and “Deep Learning”…) This is practical for most people who just need to ** get things done**. But it’s also unsatisfying for others (myself included).

However, GD can also be implemented in just a few lines of code (even though it won’t be as highly optimized as an industrial-strength version).

That’s why I’m sharing some implementations of both Univariate and generalized Multivariate Gradient Descent written in simple and annotated Python.

Anyone curious about a working implementation (and with some test data in hand) can try this out to experiment. The code snippets below have print statements built in so you can see how your model changes every iteration.

### Full Repository

To download and run the full repo, clone it from here: ** https://github.com/adpoe/Gradient_Descent_From_Scratch **

But the actual algorithms are also extracted below, for ease of reading.

### Univariate Gradient Descent

Requires NumPy.

Also Requires data to be in this this format: [(x1,y1), (x2,y2) … (xn,yn)], where Y is the actual value.

### Multivariate Gradient Descent

Requires NumPy, same as above.

Also Requires data to be in this this format: [(x1,..xn, y),(x1,..xn, y) …(x1,..xn, y)], where Y is the actual value. Essentially, you can have as many x-variables as you’d like, as long as the y-value is the last element of each tuple.

My data set is included in the full repo. But feel free to try it on your own, if you’re experimenting with this. And enjoy.

### tl;dr

In which I peel back the curtain and outline the innerworkings of a particularly insidious artificial intelligence, whose sole purpose in life is to systematically learn the optimal strategy for a terrifyingly addictive video game, known only to the internet as: **Flappy Bird**… and in which I also provide code to program a similar AI of your own.

More pointedly, this short post outlines a practical way to get started using a** Reinforcement Learning** technique called **Q-Learning**, as applied to a **Python** **Flappy Bird clone**, programmed by @TimoWilken.

**>> Grab the code base: https://github.com/adpoe/Flappy-AI <<**

### Humble Beginnings

So you want to beat Flappy Bird, but after awhile it gets tedious. I agree. Instead, why don’t we program an AI to do it for us? A genius plan, but where do we start?

First, we need a **Flappy Bird** game to ~~hack~~ upate. The candidate that I suggest is a Python implementation created by Timo Wilken and available for download directly at: https://github.com/TimoWilken/flappy-bird-pygame. This Flappy Bird version is implemented using the **PyGame **library, which is a dependency going forward.

Here are instructions for PyGame installation. If you get this runing, the hard work is done. (apt-get or homebrew are highly recommended.)

**How the Game Works**

The first challenge we’ll have in implementing the framework for a Flappy AI is determining exactly how the game workes in its original state.

By using the debugger and stepping through the game’s code during some trial runs, I was able to figure out where key decisions where made, how data flowed into the game, and exactly where I would need to position my AI-agent.

At its basic level, I created an “Agent” class, and passed that class into the running game code. Then, at each loop of the game, I examined the variables available to me, and then passed a ‘MOUSEBUTTONUP’ command to the PyGame event queue whenever the AI decided to jump. Otherwise, I did nothing.

**State Representations**

From there, the next step was determining a way to model the problem. I decided to use follow the basic guidelines outlined by Sarvagya Vaish, here.

First, I discretized the space in which the bird sat, relative to the next pipe. I was able to get pipe data by accessing the **pipe object** in the original game code. Similarly, I was able to get bird data by accessing the **bird object**.

From there, I could determine the location of the **bird** and the** pipes** __relative to each other__. I discretized this space as a 25×25 grid, with the following parameters:

Using this methodology, I created a **state tuple **that looked like this:

(**height_category={0,1,2,3,4}, dist_category={0,1,2,3,4} , collision=True/False)**

Then, each iteration of the game loop, I was able to determine the bird’s relative position, and whether it had made a collision with the pipes or not.

**If there was no collision, I issued a reward of +1.**

**If there was a collision, I issued a reward of -1000.**

I tried many different state representations here, but mostly it was matter of determining an optimal number of grid spaces and the right parameters for those spaces.

Initially, I started with a 9×9 grid, but moved to 16×16 because I got to a point in 9×9 where I just couldn’t make any more learning progress.

### What Works

Very generally, we want to have a ** tighter grid around the pipes**, as this is where most collisions happen. And we want a

**. This seemed to give me the best results, as we need different strategies at different locations on the grid.**

*looser grid as we move outwards***Exploration Approaches**

Our next task is implementing an exploration approach. This is necessary because if we don’t randomly explore the state sometimes, there might be optimal strategies that we are never able to find, simply because we will never be in those states!

Because we have only two choices at any given state (JUMP—or—STAY), implementing exploration was relatively simple.

I started out with a high **exploration factor** (I used **1/time_value+1**), and then I generated a random number between [0,1). __If the random number was less than the exploration factor__, then I explored.

Over time the exploration factor got lower, and therefore the AI explored less frequently.

Exploration essentially consisted of flipping a fair coin (generating a Boolean value randomly).

**If true:**then I chose to JUMP.**If false:**I chose to STAY.

The main problem I encountered with this method is that the exploration factor was very at the beginning, and sometimes choices were made that were not representative of actual situations that the bird would encounter in ‘true’ gameplay.

BUT, because these decisions were made earlier, they were weighted more heavily in the overall Q-Learning algorithm.

This isn’t ideal, but exploration is necessary, and overall the algorithm works well. So it wasn’t a large problem, overall.

**Learning Rates & Their Impact**

Very simply, “Learning Rates” dicatate how much we weigh new information about some state over old information. Learning rates can be an value in the range [0,1]. With 0 meaning we never update values (bad), and 1 meaning that we only EVER care about what happened the last time we were in state (short-sighted).

The first learning rate I tried was **alpha=(1/time+1).** However, this gave very poor results in practice.

This is because time is NOT the most important factor in determining a strategy from any given state. Rather, it is how many times we’ve been to that state.

The problem is that we make extremely poor choices at the beginning of the game (because we simply don’t know any better). But with** alpha=(1/time+1)**, the results of these these poor choices are weighted the most highly.

Once I changed the learning factor to **alpha=1/N(s,a),** I immediately saw * dramatically better* results. (That is, where N(s,a) tracks how many times we’ve been in a given state and performed the same action.)

**Training**

My final, __“Smart” bird __is the result of about **4 hours of training**.

I don’t actually think there would be a way to make the training more efficient, aside from speeding up the gameplay in some way.

Overall, I the results I received from the investment of time I put it in reasonable.

__Given more time__, I would probably ** discretize the space even more finely** (maybe a 36×36 grid) – so that I could find even more optimal strategies from a more fine-tuned set of positions in the game-space.

**How to Use my Smart Bird**

To use my smart bird, simply take the following steps:

- cd into a directory containing my source code
- Ensure that this directory includes the file named ‘
**txt**’ - Run the command:
**python flappybird.py “qdata.txt”**

- Watch Flappy crush it. (the game will run 10x)

### Learning From Scratch

Probably more instructive than using my trained bird though, is to simply start training a new bird from scratch. You will see the agony and the ecstasy as he does a terrible number of dumb things, slowly learning how to beat the game.

It’s surprisingly enjoyable (though sometimes frustrating) and highly recommended. Start the process by running:

**python flappybird.py**

With no other args. Then pass in the ‘qdata.txt’ file next time you run the game, to keep your learning session going.

**Citations**

I consulted the following resources to implement my AI. If you want to do similar work, I’d recommend these resources. These people are much smarter than me. I’m just applying their concepts.

**Implementation ideas**based on discussion here: http://sarvagyavaish.github.io/FlappyBirdRL/**For understanding what I was doing better:**http://mnemstudio.org/path-finding-q-learning-tutorial.htm

### Teleology (a.k.a – what is the purpose of this post?)

Recently, I finished an artificial intelligence project that involved implementing the **Minimax** and **Alpha-Beta** pruning algorithms in Python.

These algorithms are standard and useful ways to optimize decision making for an AI-agent, and they are fairly straightforward to implement.

I haven’t seen any actual working implementations of these using Python yet, however. So I’m posting my code as an example for future programmers to improve & expand upon.

It’s also useful to see a working implementation of abstract algorithms sometimes, when you’re seeking greater intuition about how they work in practice.

My hope is that this post provides you with some of that intuition, should you need it–and that it does so at an accelerated pace.

### Ontology (a.k.a – how does this abstract idea __actually work__, in reality?)

Let’s start with **Minimax** itself.

**Assumptions: This code assumes you have already built a game tree relevant to your problem, and now your task is to parse it. **If you haven’t yet built a game tree, that will be the first step in this process. I have a previous post about how I did it for my own problem, and you can use that as a starting point. But keep in mind that YMMV.

My implementation looks like this:

**How-to:** To use this code, create a new instance of the Minimax object, and pass in your GameTree object. This code should work on any GameTree object that has fields for: 1) child nodes; 2) value. (That is, unless I made an error, which of course, is very possible)

After the Minimax object is instantiated, run the minimax() function, and you will see a trace of the program’s output, as the algorithm evaluates each node in turn, before choosing the best possible option.

**What you’ll notice: **Minimax needs to evaluate **every single node** in your tree. For a small tree, that’s okay. But for a huge AI problem with millions of possible states to evaluate (think: Chess, Go, etc.), this isn’t practical.

**How we solve: **To solve the problem of looking at every single node, we can implement a pruning improvement to Minimax, called *Alpha-Beta*.

**Alpha-Beta Pruning Improvement**

Essentially, Alpha-Beta pruning works keeping track of the best/worst values seen as the algorithm traverses the tree.

Then, if ever we get to a node with a child who has a higher/lower value which would disqualify it as an option–we just skip ahead.

Rather than going into a theoretical discussion of WHY Alpha-Beta works, this post is focused on the HOW. For me, it’s easier to see the how and work backwards to why. So here’s the quick and dirty implementation.

**How-to:** This algorithm works the same as Minimax. Instantiate a new object with your GameTree as an argument, and then call **alpha_beta_search()**.

**What you’ll notice: **Alpha-Beta pruning will * always *give us the same result as Minimax (if called on the same input), but it will require evaluating far fewer nodes. Tracing through the code will illustrate why.

### Phenomenology (a.k.a – how will I perceive this, myself?)

This isn’t the most robust implementation of either algorithm (in fact it’s deficient in many ways), so I wouldn’t recommend it for industrial use.

However, this code should simply illustrate how each algorithm works, and it will provide output you can trace through and compare against–as long as you are able to construct the GameTree for your problem.

From there, it’s only a matter of time until you’ll understand it intuitively. This is one of those things that took a little while for me to grasp–so hopefully having a clear example will help others get there more quickly. Good luck.

### Background

I’ve been working on a AI project today and came across this problem.

Given input data structured like so:

**[‘A’, [‘B’, (‘D’, 3), (‘E’, 5)], [‘C’, [‘F’, [‘I’,(‘K’,0), (‘L’, 7)],(‘J’,5)], [‘G’, (‘M’,7), (‘N’,8)], (‘H’,4)]]**

I need to parse and build tree which has an arbitrary branching factor, and values only at the leaves.

(**As for why:** Later, I’ll be running Minimax and some other algorithms on this tree, in order to algorithmically determine the best possible game move. More on that in another post.)

This seemed like a good problem to solve recursively. And to avoid a *soul-sucking* debug session, I decided my goal was to solve it as succinctly as possible.

Here’s what I came up with. **Why I’m posting:** This seems like it would be a very common AI/Data-Structures problem, but my first few searches on the subject came up with nada. Nothing even closely related to the problem I’m solving. So doing my part to fix that now.

### Working Code

### Signing Out

Side note. I’m actually not sure what this tree (with weights only at the leaves) would be called technically. It reminds me of the tree made during Huffman Encoding, but it’s not quite a match for that since we aren’t summing the values in all parent nodes. If you know the technical name, let me know, so I can update.