Phylo

Phylogenic Analysis of Pokemon Sprites

Vectorization

Pokemon sprites are creative and unique, but even they can be converted into numbers.

During vectorization, each sprite is converted into a 96 x 96 = 9216 dimension array.

Each four-channel pixel is reduced to a single color quantum between 0 and 4.

Principle Components Analysis (PCA)

PCA reduces the number of dimensions in a dataset to aid visualization and model-fitting.

Current creature.

K-Means Clustering

This clustering method forms 10 groups of similar vectors (creatures).

Cluster Centers

Cluster Members

Candidate Creatures

Every sprite is a vector... Can we generate vectors that are creature-like?

Randomly-generating color values just produces noisy vectors.

Computing the expectation of each color values from a subsample offers better shapes.

Logistic Regression

Humans cannot look at every single vector. Classification can help predict which vectors might be worth a closer look.

Active proportion measures the proportion of cells that are not transparent.

The distribution of color quantum 3 contributed the most weight to the classifier.

The classifier achieved 82% accuracy on both the training and test datasets, with no false positives. All misclassifcations represented known creatures labeled as noise.

Mean Train Accuracy = 0.826
Mean Test Accuracy = 0.820
w = array([-5.37720197,  2.15129582, -2.7828926 , 12.3475064 , -1.90558047])

With no false positives, the classifier likely suffers from overfitting. Below are some of the true negative cases that fell close to the prediction boundary.

Genetic Algorithms

To find new candidate creatures, select an initial population, score each candidate based on its fitness, create offspring through crossovers, and add random mutations to avoid converging too early.

The progression below shows each generation of offspring as a column:

The crossovers and mutations eroded the sprite details, so kernel smoothing and other image processing techniques helped reduce grainy areas. This grid illustrates a spectrum of shading vectors and thresholds, resulting in different kinds of sprites:

Linear Optimization

Clustering in small bounded areas offered a way to look for new sprites in gaps between known creatures. Linear optimization searches for vectors that minimize a cost function while satisfying the boundary constraints.

We built an interactive tool in a Jupyter notebook to traverse vectors in the PCA space. Users control two dimensions, but get 9216-dimension results.

Generative Adversarial Neural Networks

This method takes the longest time to train and the most resources to run. But we hope it will provide the most compelling new creature sprites in the future.

New Creature Sprites

They're a little rough around the edges, but here are some of the interesting candidate sprites we found:

Draw to Search

Computes the Hamming Distance between your sketch and all creature vectors to find the closest match.

Your reference image will appear here.

Need a reference image?

Draw a creature here.

Your creature will appear here. Your creature will appear here. Your creature will appear here.

See the closest matches here.