Back at the beginning of June I won the Greed Tournament, a three week programming competition held by CodeCombat. In my previous post I went through the first half of my code, talking in detail about my coin collection strategy. In this post I’ll cover the other half, which mainly focuses on unit selection.
Unit selection is the task of, on each turn, deciding which unit the base should attempt to build. Units build instantly but cost gold, which is scattered around the map and gathered competitively by collectors. Greed has a number of units to choose from — there are collectors, which are controlled by the player to collect gold, and five types of military unit, which automatically advance towards the enemy base when built, fighting any enemy military encountered on the way. In order to succeed, your army must make it to the enemy base and destroy it within the time limit.
Unit selection logic is not as important as coin gathering, but if both competitors have good coin gathering algorithms, this can be the deciding factor. A weaker army might overcome a stronger one with the right mix of units, or buying one more collector might just give you the edge if you can last long enough for it to pay off.
A good strategy must also carefully consider when these units should be built. Buying that extra collector might increase your future income, but the short-term deficit may leave you vulnerable to enemy attack. Equally, an attacker must wait until the right moment before fielding an army, or the enemy may be able to counter it with overwhelming force when it arrives.
Excellent decisions here can turn a loss into a draw and a draw into a win.
Before I talk about the main strategy I’ll mention a bit of scaffolding that I used in my code.
The playing field in Greed is symmetrical and the units available to both humans and ogres have identical stats and abilities. It seems obvious, then, that any strategy written for one side should be transferable to the other (this also applies to coin collection, where peasants and peons are interchangeable). By submitting to both ladders, our code will fight a wider range of opponents, giving us more information about the robustness of the strategy. We can also play our code against our submission on the opposite ladder, allowing us to try out variations in strategy against a known baseline. It also gives twice the opportunity to win first place. :)
The problem is that the names are different! What humans know as a peasant, the ogres call a peon, so if we talk about allied peasants in the code, only the human side will understand what we mean.
To solve this problem, I wrote a little compatibility layer. This is the first thing that appears at the top of my code:
This abstracts away the side-specific names. Whenever we need to refer to a unit type, we can now do so by looking up the relevant key in one of these dictionaries. Since my code was originally written to play in the human ladder, the abstract names I have chosen for the units are simply the human names.
if statement at the end simply swaps the tables
when the code is playing as ogres,
so we can submit the exact same code to both ladders
There is probably a small performance penalty in doing these lookups, bringing us slightly closer to Greed’s statement limit. In my solution I never found this to be a problem, since after optimising critical loops I had a comfortable amount of headroom. However, if you wanted to avoid this impact, you could invent a way of preprocessing the code to replace all the names before pasting it into the CodeCombat editor.
My unit selection code started as a few simple if statements, but quickly grew into a tangled mess as I expanded it to account for unexpected situations caused by my competitors. To manage the increasing complexity, I divided my code into a set of high-level behaviours, taking inspiration from the Subsumption architecture, a pattern that I had learned about during my undergraduate studies.
Subsumption is a neat way of organising behaviours for an agent that needs to be proactive about its actions but also needs to respond rapidly to changes in its environment. It was invented by Robert Brooks for use on real-life robots, but I also found it useful here to keep my code clean.
The essential principle is that the agent is composed of layers of behaviours that each know how to control the agent. Each behaviour can read all the sensory input to the agent and has the ability to directly control the agent when it wants to perform an action. However, behaviours higher in the list have the ability to override those beneath, suppressing their actions. This allows for the creation of “instinctual” low level behaviours that react quickly to specific stimuli, as well as higher level goal-orientated actions.
Because the behaviours are self-contained, they can be easily tested in isolation, before being composed to produce more complex intelligence.
Here is the framework for my code:
That’s it! This code runs every turn and simply iterates through the behaviours (my code calls them states — an artefact from the previous design), determines which one wants control and swaps it in in place of the old one. Then we simply run the currently installed behaviour to determine which unit to build.
I should stress that although my code is inspired by the subsumption architecture, it is not entirely the same and, in fact, misses out on most of the novelty. Usually, all behaviours run at the same time, in parallel, and, if the agent has several outputs, behaviours can drive any output they want. When a more important behaviour takes control, it only overrides the outputs that it wants to use, leaving lesser behaviours free to continue using other outputs. This allows the agent to pursue multiple goals at the same time.
My code only allows one behaviour at a time,
making it essentially just a fancy way of organising a block of
However, since there is only one output — the choice of unit —
the overall result is equivalent.
This is something to bear in mind
if you’re thinking about using the architecture to solve a different problem;
the interaction between behaviours becomes much more interesting
when multiple outputs are available.
For more information, I highly recommend reading
the Wikipedia article or Brooks’ papers.
Behaviours in Detail
As you can see above, my code defines 6 different behaviours, which my code refers to as states (a holdover from a previous state-machine based approach). These are:
- Build military units to attack the enemy base
- Build military units to defend our own base
- Save up gold in preparation for an enemy attack
- Build collectors to gather more gold
- Annoy the enemy by building a single soldier to attack with
- Do Nothing
- Should be obvious :)
The more important behaviours (attack, defend) sit on top of the others and can override them at will. Each behaviour is just an object with the following members:
- a string that uniquely identifies this behaviour.
- a method that returns true if the behaviour wants control of what to build, otherwise false.
- a method that returns the type of unit that should be built next, if any.
Lets look at each state in detail. We’ll start from the bottom this time, starting with the lowest, least priority layer.
This is the “default” state that we will hit if none of the others want control. This is the simplest state there is — it always takes control and, unsurprisingly, does nothing.
This state can only activate once and simply builds a single soldier.
Because units take a long time to get from one base to another, it is usually possible to field a much stronger army as a defender than as an attacker due to the extra gold acquired while the attacker’s army is marching. Therefore, most players wait for their opponent to attack before building an army of their own. By sending a single soldier, I try to provoke the enemy into attacking, tipping the balance in my favour. This is a virtually risk free move — since the soldier is the cheapest unit, the enemy must spend at least as much gold to counter it, and will have to overspend if they wish for their own soldier to survive the battle. If I am lucky, the enemy will spend more, producing an army which I will be able to defeat efficiently once it reaches my base.
Since the poke behaviour is right at the bottom of the layers, it will only be triggered if none of the other behaviours have anything to do. This means that it won’t interrupt any more important plans.
This state takes control if we need to increase our gold collection effort and builds collectors. The conditions under which this state asks for control are fairly simplistic. It takes control if any of the following hold:
We haven’t built a collector yet. This is to make sure we always buy our first collector as soon as the game starts.
The enemy has 70 gold or less, we’ve built less than 6 collectors and our enemy has built more than us. This helps us maintain competition against an opponent with more collectors who otherwise might starve us out. The risk involved here is that spending gold on a collector might leave us open to an attack while our reserves build back up. This is why we only build the collector when the enemy is low on gold.
There are two or more coins per allied collector on the field. This means that gold is still plentiful, and building another collector will hopefully pay for itself within a reasonable amount of time.
These heuristics are far from perfect and were quite fiddly to tune, but they worked well enough for the competition.
This state’s purpose is to make us start saving gold and forces us to build nothing. This will kick in if the enemy has an army on the field that we would not currently be able to mount a sufficient defense against. This is crudely measured by comparing the value of the enemy’s army (in gold) to the value of ours, plus any additional gold in reserve. Hopefully, by the time the army arrives, we will have built up enough gold for a decent counter-attack.
This state takes control if there is a need to defend the base. The defend state will only ask for control in situations where it thinks the game will be lost if we do not defend. Therefore, it gets priority over everything else, except attacking. We check this by summing the value of enemy military units near to our base, taking control if this value comes close to or exceeds the value of our own military.
This state takes control if it thinks we should attack. The attack state will only ask for control if it thinks that the attack will result in a victory (or at least, a chance at it), and victory is what we’re after, so this has authority to override everything else.
We will attack if:
- Time has almost run out, in a last-ditch attempt at stealing a win
- The enemy’s forecast future strength is lower than ours by some threshold
The forecast is computed elsewhere based on linear regression. It attempts to estimate an upper bound on the opponent’s strength (military + gold) at the point in the future that our units would arrive at the enemy base, assuming we built them now. This is a very cautious estimate, so if our current strength is above this we have an almost certain chance of winning.
Once the behaviour has been entered, the threshold value for taking control drops. This prevents the agent from abandoning an otherwise favourable attack because of small fluctuations in the forecast strength.
The attack and defend behaviours
both build military units,
but when it comes to deciding which type of military unit to build,
my code makes no distinction between the two.
Deciding the next unit is handled by the
in both cases:
This function evaluates the units in our current army
and tries to build a unit that keeps the army balanced
according to the ratios set.
if statements are ordered by importance,
so soldiers, who hold the front line, come first,
followed by spell-casters to keep them alive.
Once we have these,
the function will select a griffin-rider/fangrider
to add some ranged support,
followed by a knight.
Of course, if the more important soldier and spell-caster units die,
we will build more to replace them before trying to bring out
the support units.
This function was written quite late in the competition after most of the other work had been done. For the vast majority of development my code just iterated through a fixed list of unit types, building each in succession. The simple list approach worked for the most part, but could not recover if the initial forces died too quickly. In this case, my base would begin sending out support units without a front-line to defend them, leading to their quick deaths.
The balance and ordering of units was largely found through manual experimentation against opponents that beat me during development. I didn’t really have much of an idea for the unit balance, so if I encountered an enemy that beat my army composition, I would change my army composition to be closer to theirs and rely on the strength of my gold collection algorithm to push the outcome towards a draw or victory.
Now I’ve explained the architecture behind my unit selection process and shown how you can build simple self-contained pieces that combine to produce complex selection behaviour. We’ve looked briefly at the Subsumption architecture, the inspiration for my solution, and discussed the similarities and differences between the two. We’ve also seen how I constructed the individual behaviours and put them together to produce a final strategy that is versatile and easy to extend. Finally, I’ve explained my approach to army composition and how it evolved through the course of the tournament.
It’s been almost two months since I came in first place in the Greed tournament human ladder and my source code was released, but my solution is still ranked sixth on the ladder (at the time of writing) with no further updates. Though some newer players have made their way into the very top spots, it is clear that my strategy is still very strong.
Next time, I’ll talk about my gold prediction routine, which allows my attacking behaviour to forecast the enemy’s future strength so that it can safely decide whether to launch an attack.