Casting a Wide (and Sparse) Matrix in R
I routinely use
cast() from the reshape2 package as part of my data munging workflow. Recently I’ve noticed that the data frames I’ve been casting are often extremely sparse. Stashing these in a dense data structure just feels wasteful. And the dismal drone of page thrashing is unpleasant.
Kaggle: Walmart Trip Type Classification
Walmart Trip Type Classification was my first real foray into the world of Kaggle and I’m hooked. I previously dabbled in What’s Cooking but that was as part of a team and the team didn’t work out particularly well. As a learning experience the competition was second to none. My final entry put me at position 155 out of 1061 entries which, although not a stellar performance by any means, is just inside the top 15% and I’m pretty happy with that. Below are a few notes on the competition.Read more
Installing Mongodb 3 2 On Windows 7
Review: Mastering Python Scientific Computing
Review: Learning Shiny
I was asked to review Learning Shiny (Hernán G. Resnizky, Packt Publishing, 2015). I found the book to be useful, motivating and generally easy to read. I’d already spent some time dabbling with Shiny, but the book helped me graduate from paddling in the shallows to wading out into the Shiny sea.Read more
Using Checksum to Guess Message Length: Not a Good Idea!
A question posed by one of my colleagues: can a checksum be used to guess message length? My immediate response was negative and, as it turns out, a simple simulation supported this knee-jerk reaction.Read more
For a moment this morning I was regretting the fact that R doesn’t have a
goto statement, but then…
Making Sense Logarithmic Loss
Installing Xgboost Ubuntu
2015 Data Science Salary Survey
The recently published 2015 Data Science Salary Survey conducted by O’Reilly takes a look at the salaries received, tools used and other interesting facts about Data Scientists around the World. It’s based on a survey of over 600 respondents from a variety of industries. The entire report is well worth a read, but I’ve picked out some highlights below.Read more
Evolution of First Names: Unisex Names and Nicknames
Evolution of First Names: Fashionable and Popular Names
Last week I took a high level look at the trends in children’s names over the last century. Today I’ll dig a little deeper and examine the ebb and flow in popularity of some specific names.Read more
Visualising James Bond movies
Graph from Sparse Adjacency Matrix
I spent a decent chunk of my morning trying to figure out how to construct a sparse adjacency matrix for use with
graph.adjacency(). I’d have thought that this would be rather straight forward, but I tripped over a few subtle issues with the Matrix package. My biggest problem (which in retrospect seems rather trivial) was that elements in my adjacency matrix were occupied by the pipe symbol.
Evolution of First Names: Changes over the Last Century
In light of recent developments, a bit of work that I did almost two years ago has become rather relevant.
LIBOR and Bond Yields
I’ve just been looking at the historical relationship between the London Interbank Offered Rate (LIBOR) and government bond yields. LIBOR data can be found at Quandl and comes in CSV format, so it’s pretty simple to digest. The bond data can be sourced from the US Department of the Treasury. It comes as XML and requires a little more work.Read more
Guy Kawasaki on Personal Branding
Kelsey Jones of Search Engine Journal interviews Guy Kawasaki of Canva. The key take-home message is that maintaining a personal brand is vital even if you are permanently employed. Specifically, it’s important to keep a visible record of who you have worked for and your personal successes.Read more
#MonthOfJulia Day 38: Imaging
#MonthOfJulia Day 37: Fourier Techniques
Data Scientists: Respect in the Workplace?
Data Scientists are often among the best educated and most experienced on a team. Are you getting the respect you deserve?Read more