Skip to content

Commit

Permalink
Add Speciation
Browse files Browse the repository at this point in the history
  • Loading branch information
MaxHalford committed Apr 17, 2017
1 parent d0befdb commit 6106464
Show file tree
Hide file tree
Showing 19 changed files with 508 additions and 277 deletions.
5 changes: 5 additions & 0 deletions HACKME.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,8 @@ Talking about parallelism, there is a reason why the populations are run in para
- The documentation is built with [mkdocs](https://mkdocs.readthedocs.io).
- Each page has an associated markdown file in the `docs/` folder.
- You can `mkdocs serve` to enable live editing of the documentation.

## Performance

1. `go test -bench . -cpuprofile=cpu.prof`
2. `go tool pprof main.test cpu.prof or `go-torch main.test cpu.prof`
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
- [Features](#features)
- [Usage](#usage)
- [Implementing the Genome interface](#implementing-the-genome-interface)
- [Using the Slice interface](#using-the-slice-interface)
- [Instantiating a GA struct](#instantiating-a-ga-struct)
- [Running a GA](#running-a-ga)
- [Models](#models)
Expand Down Expand Up @@ -255,10 +256,12 @@ Let's have a look at the GA struct.
type GA struct {
// Fields that are provided by the user
MakeGenome GenomeMaker
Topology Topology
NPops int
PopSize int
Model Model
Migrator Migrator
MigFrequency int // Frequency at which migrations occur
Speciator Speciator
Logger *log.Logger

// Fields that are generated at runtime
Expand All @@ -273,12 +276,14 @@ type GA struct {
You have to fill in the first 5 fields, the rest are generated when calling the `GA`'s `Initialize()` method.

- `MakeGenome` is a method that returns a random genome that you defined in the previous step. gago will use this method to produce an initial population. Again, gago provides some methods for common random genome generation.
- `Topology` is a struct which tells gago how many populations (`NPopulations`), species (`NSpecies`), individuals (`NIndividuals`) to use. GAs with multiple populations that you shouldn't worry about if you're a GA novice. The same goes for the number of species.
- `NPops` determines the number of populations that will be used.
- `PopSize` determines the number of individuals inside each population.
- `Model` determines how to use the genetic operators you chose in order to produce better solutions, in other words it's a recipe. A dedicated section is available in the [model section](#models).
- `Migrator` and `MigFrequency` should be provided if you want to exchange individuals between populations in case of a multi-population GA. If not the populations will be run independently. Again this is an advanced concept in the genetic algorithms field that you shouldn't deal with at first.
- `Speciator` will split each population in distinct species at each generation. Each specie will be evolved separately from the others, after all the species has been evolved they are regrouped.
- `Logger` is optional, you can read more about in the [logging section](#logging-population-statistics).

Essentially only `MakeGenome`, `Topology` and `Model` are required to initialize and run a GA.
Essentially, only `MakeGenome`, `NPops`, `PopSize` and `Model` are required to initialize and run a GA.


### Running a GA
Expand Down Expand Up @@ -341,7 +346,7 @@ The purpose of a partitioning individuals is to apply genetic operators to simil

Using speciation/speciation with genetic algorithms became "popular" when they were first applied to the [optimization of neural network topologies](https://www.wikiwand.com/en/Neuroevolution_of_augmenting_topologies). By mixing two neural networks during crossover, the resulting neural networks were often useless because the inherited weights were not optimized for the new topology. This meant that newly generated neural networks were not performing well and would likely disappear during selection. Thus speciation was introduced so that neural networks evolved in similar groups so that new neural networks wouldn't disappear immediately. Instead the similar neural networks would evolve between each other until they were good enough to mixed with the other neural networks.

With gago it's possible to use speciation on top of all the rest. For the time, the only kind of speciation is fitness based. Later on it will be possible to provided a function to compare two individuals based on their genome. What happens is that a population of `n` individuals is grouped into `k` species before applying an evolution model to each cluster. The `k` species are then merged into a new population of `n` individuals. This way, species don't interact with other species.
With gago it's possible to use speciation on top of all the rest. To do so the `Speciator` field of the `GA` struct has to specified.

<div align="center">
<img src="https://docs.google.com/drawings/d/e/2PACX-1vRLr7j4ML-ZeXFfvjko9aepRAkCgBlpg4dhuWhB-vXCQ17gJFmDQHrcUbcPFwlqzvaPAXwDxx5ld1kf/pub?w=686&h=645" alt="speciation" width="70%" />
Expand Down
61 changes: 50 additions & 11 deletions distance.go
Original file line number Diff line number Diff line change
@@ -1,26 +1,65 @@
package gago

// A Metric returns the distance between two genomes.
type Metric func(a, b Individual) float64

// A DistanceMemoizer computes and stores Metric calculations.
type DistanceMemoizer struct {
Metric func(a, b Genome) float64
Distances map[Genome]map[Genome]float64
Metric Metric
Distances map[string]map[string]float64
nCalculations int // Total number of calls to Metric for testing purposes
}

func makeDistanceMemoizer(metric func(a, b Genome) float64) DistanceMemoizer {
// makeDistanceMemoizer initializes a DistanceMemoizer.
func makeDistanceMemoizer(metric Metric) DistanceMemoizer {
return DistanceMemoizer{
Metric: metric,
Distances: make(map[Genome]map[Genome]float64),
Distances: make(map[string]map[string]float64),
}
}

func (dm *DistanceMemoizer) getDistance(a, b Genome) float64 {
if dist, ok := dm.Distances[a][b]; ok {
return dist
// GetDistance returns the distance between two Individuals based on the
// DistanceMemoizer's Metric field. If the two individuals share the same ID
// then GetDistance returns 0. DistanceMemoizer stores the calculated distances
// so that if GetDistance is called twice with the two same Individuals then
// the second call will return the stored distance instead of recomputing it.
func (dm *DistanceMemoizer) GetDistance(a, b Individual) float64 {
// Check if the two individuals are the same before proceding
if a.ID == b.ID {
return 0
}
// Create maps if the genomes have never been encountered
if _, ok := dm.Distances[a.ID]; !ok {
dm.Distances[a.ID] = make(map[string]float64)
} else {
// Check if the distance between the two genomes has been calculated
if dist, ok := dm.Distances[a.ID][b.ID]; ok {
return dist
}
}
if dist, ok := dm.Distances[b][a]; ok {
return dist
if _, ok := dm.Distances[b.ID]; !ok {
dm.Distances[b.ID] = make(map[string]float64)
} else {
if dist, ok := dm.Distances[b.ID][a.ID]; ok {
return dist
}
}
// Calculate the distance between the genomes and memoize it
var dist = dm.Metric(a, b)
dm.Distances[a][b] = dist
dm.Distances[b][a] = dist
dm.Distances[a.ID][b.ID] = dist
dm.Distances[b.ID][a.ID] = dist
dm.nCalculations++
return dist
}

// Return the average distance between a Individual and a slice of Individuals.
func calcAvgDistances(indis Individuals, dm DistanceMemoizer) map[string]float64 {
var avgDistances = make(map[string]float64)
for _, a := range indis {
for _, b := range indis {
avgDistances[a.ID] += dm.GetDistance(a, b)
}
avgDistances[a.ID] /= float64(len(indis) - 1)
}
return avgDistances
}
53 changes: 42 additions & 11 deletions distance_test.go
Original file line number Diff line number Diff line change
@@ -1,19 +1,50 @@
package gago

import (
"math"
"testing"
)

func L1Distance(x1, x2 Genome) (dist float64) {
var g1 = x1.(Vector)
var g2 = x2.(Vector)
for i := range g1 {
dist += math.Abs(g1[i] - g2[i])
}
return
}

func TestDistanceMemoizer(t *testing.T) {
var dm = makeDistanceMemoizer(L1Distance)
var (
dm = makeDistanceMemoizer(l1Distance)
a = Individual{Genome: Vector{1, 1, 1}, ID: "1"}
b = Individual{Genome: Vector{3, 3, 3}, ID: "2"}
c = Individual{Genome: Vector{6, 6, 6}, ID: "3"}
)
// Check the number of calculations is initially 0
if dm.nCalculations != 0 {
t.Error("nCalculations is not initialized to 0")
}
// Check the distance between the 1st and itself
if dm.GetDistance(a, a) != 0 {
t.Error("Wrong calculated distance")
}
// Check the number of calculations is initially 0
if dm.nCalculations != 0 {
t.Error("nCalculations should not have increased")
}
// Check the distance between the 1st and the 2nd individual
if dm.GetDistance(a, b) != 6 {
t.Error("Wrong calculated distance")
}
// Check the number of calculations has increased by 1
if dm.nCalculations != 1 {
t.Error("nCalculations has not increased")
}
// Check the distance between the 2nd and the 1st individual
if dm.GetDistance(b, a) != 6 {
t.Error("Wrong calculated distance")
}
// Check the number of calculations has not increased
if dm.nCalculations != 1 {
t.Error("nCalculations has increased")
}
// Check the distance between the 1st and the 3rd individual
if dm.GetDistance(a, c) != 15 {
t.Error("Wrong calculated distance")
}
// Check the distance between the 1st and the 3rd individual
if dm.GetDistance(b, c) != 9 {
t.Error("Wrong calculated distance")
}
}
101 changes: 57 additions & 44 deletions ga.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,16 @@ import (
"time"
)

// A Topology holds all the information relative to the size of a GA.
type Topology struct {
NPopulations int // Number of populations
NSpecies int // Number of species each population is split into
NIndividuals int // Initial number of individuals in each population
}

// Validate the properties of a Topology.
func (topo Topology) Validate() error {
if topo.NPopulations < 1 {
return errors.New("'NPopulations' should be higher or equal to 1")
}
if topo.NSpecies < 0 {
return errors.New("'NSpecies' should be higher or equal to 1 if provided")
}
if topo.NIndividuals < 1 {
return errors.New("'NIndividuals' should be higher or equal to 1")
}
return nil
}

// A GA contains population which themselves contain individuals.
type GA struct {
// Fields that are provided by the user
MakeGenome GenomeMaker
Topology Topology
NPops int
PopSize int
Model Model
Migrator Migrator
MigFrequency int // Frequency at which migrations occur
Speciator Speciator
Logger *log.Logger

// Fields that are generated at runtime
Expand All @@ -52,25 +33,34 @@ type GA struct {
func (ga GA) Validate() error {
// Check the GenomeMaker presence
if ga.MakeGenome == nil {
return errors.New("'GenomeMaker' cannot be nil")
return errors.New("GenomeMaker cannot be nil")
}
// Check the topology is valid
var topoErr = ga.Topology.Validate()
if topoErr != nil {
return topoErr
// Check the number of populations is higher than 0
if ga.NPops < 1 {
return errors.New("NPops should be higher than 0")
}
// Check the number of individuals per population is higher than 0
if ga.PopSize < 1 {
return errors.New("PopSize should be higher than 0")
}
// Check the model presence
if ga.Model == nil {
return errors.New("'Model' cannot be nil")
return errors.New("Model cannot be nil")
}
// Check the model is valid
var modelErr = ga.Model.Validate()
if modelErr != nil {
return modelErr
}
// Check the migration frequency in the presence of a migrator
// Check the migration frequency if a Migrator has been provided
if ga.Migrator != nil && ga.MigFrequency < 1 {
return errors.New("'MigFrequency' should be strictly higher than 0")
return errors.New("MigFrequency should be strictly higher than 0")
}
// Check the speciator is valid if it has been provided
if ga.Speciator != nil {
if specErr := ga.Speciator.Validate(); specErr != nil {
return specErr
}
}
// No error
return nil
Expand All @@ -93,18 +83,22 @@ func (ga *GA) findBest() {
// individual in each population. Running Initialize after running Enhance will
// reset the GA entirely.
func (ga *GA) Initialize() {
ga.Populations = make([]Population, ga.Topology.NPopulations)
ga.Populations = make([]Population, ga.NPops)
ga.rng = makeRandomNumberGenerator()
var wg sync.WaitGroup
for i := range ga.Populations {
wg.Add(1)
go func(j int) {
defer wg.Done()
// Generate a population
ga.Populations[j] = makePopulation(ga.Topology.NIndividuals, ga.MakeGenome, j)
// Evaluate it's individuals
ga.Populations[j] = makePopulation(
ga.PopSize,
ga.MakeGenome,
randString(3, ga.rng),
)
// Evaluate its individuals
ga.Populations[j].Individuals.Evaluate()
// Sort it's individuals
// Sort its individuals
ga.Populations[j].Individuals.SortByFitness()
// Log current statistics if a logger has been provided
if ga.Logger != nil {
Expand All @@ -114,7 +108,8 @@ func (ga *GA) Initialize() {
}
wg.Wait()
// The initial best individual is initialized randomly
ga.Best = MakeIndividual(ga.MakeGenome(makeRandomNumberGenerator()))
var rng = makeRandomNumberGenerator()
ga.Best = MakeIndividual(ga.MakeGenome(rng), rng)
ga.findBest()
}

Expand All @@ -127,7 +122,7 @@ func (ga *GA) Enhance() {
// Migrate the individuals between the populations if there are enough
// populations, there is a migrator and the migration frequency divides the
// generation count
if ga.Topology.NPopulations > 1 && ga.Migrator != nil && ga.Generations%ga.MigFrequency == 0 {
if len(ga.Populations) > 1 && ga.Migrator != nil && ga.Generations%ga.MigFrequency == 0 {
ga.Migrator.Apply(ga.Populations, ga.rng)
}
// Use a wait group to enhance the populations in parallel
Expand All @@ -137,14 +132,8 @@ func (ga *GA) Enhance() {
go func(j int) {
defer wg.Done()
// Apply speciation if a positive number of species has been speficied
if ga.Topology.NSpecies > 0 {
var species = ga.Populations[j].speciate(ga.Topology.NSpecies)
// Apply the evolution model to each cluster
for k := range species {
ga.Model.Apply(&species[k])
}
// Merge each cluster back into the original population
ga.Populations[j].Individuals = species.mergeIndividuals()
if ga.Speciator != nil {
ga.Populations[j].speciateEvolveMerge(ga.Speciator, ga.Model)
} else {
// Else apply the evolution model to the entire population
ga.Model.Apply(&ga.Populations[j])
Expand All @@ -165,3 +154,27 @@ func (ga *GA) Enhance() {
ga.findBest()
ga.Age += time.Since(start)
}

func (pop *Population) speciateEvolveMerge(spec Speciator, model Model) {
var (
species = spec.Apply(pop.Individuals, pop.rng)
pops = make([]Population, len(species))
)
// Create a slice of population from the obtained species and evolve each one separately
for i, specie := range species {
pops[i] = Population{
Individuals: specie,
Age: pop.Age,
Generations: pop.Generations,
ID: randString(3, pop.rng),
rng: pop.rng,
}
model.Apply(&pops[i])
}
// Merge each species back into the original population
var i int
for _, pop := range pops {
copy(pop.Individuals[i:i+len(pop.Individuals)], pop.Individuals)
i += len(pop.Individuals)
}
}
Loading

0 comments on commit 6106464

Please sign in to comment.