# Categorically Variable

Only search Categorically Variable.

## Tutorial: Compiling Indicators and Expert Advisors from Source

When you receive the code for an expert advisor or indidator which we have developed for you, it will come in a package consisting of include files (with a .mqh extension) and source code files (with a .mq4 extension). So, what do you do with them?

## Are Green Number Runners More Likely to Bail?

Comrades Marathon runners are awarded a permanent green race number once they have completed 10 journeys between Durban and Pietermaritzburg. For many runners, once they have completed the race a few times, achieving a green number becomes a possibility. And once the idea takes hold, it can become something of a compulsion. I can testify to this: I am thoroughly compelled! For runners with this goal in mind, every finish is one step closer to a green number. They are slowly chipping away, year after year and the idea of bailing is anathema. However, once the green number is in the bag, does the imperative to complete the race fade?

I am going to explore the hypothesis that runners with green numbers are more likely to bail.

Let’s start by looking at the proportions of runners who finish the race as opposed to those who do not finish (DNF) and those who enter but do not start (DNS). As can be seen from the plot below, the proportion of runners who finish the race seems to increase with the number of medals that the runners in question have. So, for example, of the runners with one medal, 68.6% finished while only 21.7% were DNF. For runners with ten medals, 87.1% finished and only 9.5% were DNF.

On the face of it, this seems to make sense: there is a natural selection effect. Runners who have more medals are probably a little more hard core and thus less likely to bail. Less experienced runners might be more likely to jump on the bus when the going gets really tough.

But, unfortunately, it is not quite that simple.

The analysis above has a serious problem: consider those runners with one medal. We are comparing the number of finishers (those that have just received that medal) to non-finishers (who already have a medal!). So we are not really comparing apples with apples! What we really should be working with are the number of finishers who had i-1 medals before the race and the number of non-finishers who had i medals.

Compiling these data takes a little work, but nothing too taxing. Let’s consider an anonymous (but real) runner whose Comrades Marathon history looks like this:

What we want is a table that shows how many times he ran with a given number of medals. So, for our anonymous hero, this would be:

Things went well for the first seven years. On the first year he had no medal (column 0) but he finished (so there is a 1 in the first row). The same applies for columns 1 to 6. Then on year 7 he finished, gaining his seventh medal (hence the 1 in the first row of column 6: he already had 6 medals when he ran this time!). However, for the next three years (when he already had 7 medals) he got a DNF (hence the 3 in the second row of column 7). On his fourth attempt he got medal number 8 (giving the 1 in the first row of column 7: he already had 7 medals when he ran this time!). And the following year he got medal number 9. Then he suffered a string of 3 DNFs (the 3 in the second row of column 9), followed by a series of 5 DNSs (the 5 in the third row of column 9). To illustrate the proportions, when he had 7 medals he got DNS 0% (0/4) of the time, DNF 75% (3/4) of the time and finished 25% (1/4) of the time.

Those are the data for a single athlete. To make a compelling case it is necessary to compile the same statistics for many, many runners. So I generated the analogous table for all athletes who ran the race between 1984 and 2013. A melted and abridged version of the resulting data look like this:

The important information here is the proportion of DNF entries for each medal count. We can see that 11.8% (0.11860858) of runners DNF on the first time that they ran. Similarly, of those runners who had already completed the race once (so they had one medal in the bag), 11.7% (0.11739666) did not finish. Of those who ran again after just achieving a green number, 10.8% (0.10827034) were DNF. It will be easier to make sense of all this in a plot.

Wow! Now that is interesting. Just to be sure that everything is clear about this plot: every column reflects the proportions of finishers, DNFs and DNSs who already had a given number of medals. There are a number of intriguing things about these data:

1. all three proportions remain almost identical for runners who already had between 0 and 6 medals;
2. the proportion of finishers then starts to ramp up for those with 7 and 8 medals (the DNS proportion remains unchanged, the DNFs decrease);
3. there is a decrease in the proportion of finishers who already have 9 medals and a corresponding increase in the proportion of DNSs, while the DNFs remain unchanged;
4. the proportion of finishers then increases slightly for those who already have 10 medals.

What conclusions can we draw from this? The second point seems to indicate a growing level of determination: these athletes are really close to their green number and they are less likely to sacrifice their medal. The third point is interesting too: the proportion of DNFs stays roughly the same but the DNS percentage grows from 4.1% for those with 8 medals to 7.8% for those with 9 medals. Why would this be? Well, I am really not sure and I would welcome suggestions. One possibility is that these runners are determined to have a good race so they might overtrain and end up injured or ill.

Are the differences in the proportion of DNFs statistically significant?

The miniscule p-value from the proportion test indicates that there definitely is a significant difference in the proportion of DNFs across the entire data set (for those with between 0 and 30 medals). But it does not tell us anything about which of the proportions are responsible for this difference. We can get some information about this from a pairwise proportion test. Here is the abridged output.

For between 0 and 6 medals there is no significant difference (p-value is roughly 1). The DNF proportion for those with 7 medals does start to differ from those with 4 medals or fewer, but the p-values are not significant. When we get to athletes who have 8 medals there is a significant difference in the proportion of DNFs all the way from those with 0 medals to those with 6 medals. However, the proportion of DNFs for those with 9 medals is not significantly different from any of the other categories. Finally, the DNF proportion for those athletes who already have 10 medals does not differ significantly from the athletes with any number of fewer medals.

So, no, it does not seem that runners with green numbers are more likely to bail (a conclusion that makes me personally very happy!). And good luck to the anonymous runner: I hope that you will be back in 2014 and that you will crack your green number!

Oh, and one last thing: as I mentioned before, the analysis above is based on the period 1984 to 2013. There are some serious issues with the data in the earlier years. Here is a breakdown of the number of runners in each of the categories across the years:

Certainly something is deeply wrong in 1984! In the early years it does not make any sense to discriminate between DNF and DNS since there were no independent records kept: we simply know whether or not an athlete finished. The introduction of the ChampionChip timing devices improved the quality of the data dramatically. These chips have been used by all Comrades Marathon runners since 1997 although there is a delayed effect on the quality of the data.

Despite these issues, the conclusions of the analysis above remain essentially unchanged if you simply lump the DNF and DNS data together (because we cannot always make a meaningful divide between them!).

## The Green Number Effect

Following up on a suggestion from my previous post, here are the statistics for medal count versus age.

## Age Distribution of Comrades Marathon Athletes

I can clearly remember watching the end of the 1989 Comrades Marathon on television and seeing Wally Hayward coming in just before the final gun, completing the epic race at the age of 80! I was in awe.

Since I have been delving into the Comrades Marathon data, this got me thinking about the typical age distribution of athletes taking part. The plot below indicates the ages of athletes who finished the race, going all the way back to 1984. You can clearly spot the two years when Wally Hayward ran (1988 and 1989). My data indicates that he was only 79 on the day of the 1989 Comrades Marathon, but I am not going to quibble over a year and I am more than happy to accept that he was 80!

## Kagi Chart Indicator

In addition to a range of data analysis services, Exegetic Analytics also implements algorithms for automated FOREX trading. I am currently developing an expert advisor (EA) for a client. The strategy was developed on the ProRealTime charting software using Kagi Charts. My client wants to automate the strategy and implement it in MQL on the MetaTrader platform. One snag: Kagi Charts are independent of time. Or, more accurately, they do not have a uniform time axis. Charts in MetaTrader are of the classical variety with a nice linear time axis. So my first problem was to implement something analogous to the Kagi Chart under MetaTrader.

## Medal Allocations at the Comrades Marathon

It is a bit of a mission to get the complete data set for this year’s Comrades Marathon. The full results are easily accessible, but come as an HTML file. Embedded in this file are links to the splits for individual athletes. So with a bit of scripting wizardry it is also possible to download the HTML files for each of the individual athletes. Parsing all of these yields the complete result set, which is the starting point for this analysis.

## Algorithmic Trading Status [May 2013]

Well, we can’t expect every month to be a good one. Last month’s results from my automated trading were pretty encouraging. Things during May 2013 were not quite as rosy. However, looking at the big picture, it’s not that bad. This month I will add in results from a second trading account. The monthly profit for each of these accounts since the beginning of the year is

Account A Account B
2013/01 121.82 57.17
2013/02 -36.43 -79.49
2013/03 153.59 98.67
2013/04 275.83 77.49
2013/05 -228.26 -54.19
total 286.55 58.89

Both accounts are running the same strategies but using slightly different parameters and risk settings. There was a single manual trade on Account B, which lost \$24.70, otherwise all trades were automated.

The details for each of the strategies running on both of the accounts are given below. Gemini #1 makes the most frequent trades on EURUSD, yielding a moderate profit. Gemini #2 trades a lot less frequently and was generally unprofitable this month. In fact, this strategy alone accounts for the fact that May 2013 was a losing month. Ernie produced small but consistent profits in most cases.

pair strategy chart trades profit hours
Account A
AUDUSD ernie H4 11 22.18 8.72
EURUSD ernie H4 7 16.06 2.56
EURUSD gemini #1 M5 45 193.10 7.35
EURUSD gemini #2 M5 7 -450.11 33.72
EURUSD gemini #1 M15 3 0.60 4.65
GBPUSD ernie H4 9 -7.40 4.97
Account B
AUDUSD gemini #1 H4 4 7.88 6.20
AUDUSD ernie H4 8 16.88 5.08
EURUSD gemini #1 M5 44 28.62 9.22
EURUSD gemini #2 M5 7 -111.89 2.39
EURUSD gemini #1 H4 3 3.78 6.43
EURUSD gemini #2 H4 1 -23.10 1.55
EURUSD ernie H4 5 9.67 2.58
GBPUSD gemini #1 H4 4 -8.12 7.51
GBPUSD genini #2 H4 1 0.70 5.06
GBPUSD ernie H4 4 49.40 1.84

I am reducing the risk on gemini #2 for June 2013, and increasing the risk on gemini #1 and ernie.

## Optimisation of Cable Morning Trade Strategy on the EURUSD

As promised, I optimised the Cable Morning Trade strategy on the EURUSD. I varied only the trading times (ostensible the open and close of the market) to start with.

The entry time is on the x-axis and the exit time is on the y-axis. There is a clear preference for opening trades between 08:00 and 20:00 GMT. There is little structure along the y-axis indicating that the exit time is not too important. This suggests that the majority of trades reach their profit target rather than being forcibly closed at the end of the day.

## Analysis of Cable Morning Trade Strategy

A couple of years ago I implemented an automated trading algorithm for a strategy called the “Cable Morning Trade”. The basis of the strategy is the range of GBPUSD during the interval 05:00 to 09:00 London time. Two buy stop orders are placed 5 points above the highest high for this period; two sell stop orders are placed 5 points below the lowest low. All orders have a protective stop at 40 points. When either the buy or sell orders are filled, the other orders are cancelled. Of the filled orders, one exits at a profit equal to the stop loss, while the other is left to run until the close of the London session.