In the previous installment we sucked some data from the National Health and Nutrition Examination Survey into R and did some preliminary work: selecting only the fields of interest, renaming columns and removing missing data. Now we are going to play with some categorical data.
There is already one categorical field in the data representing gender. However, the labels are not ideal:
Reference to the excellent codebook accompanying the data reveals that one should interpret 1 as male and 2 as female. We can make things a little more transparent by converting this field to a factor and introducing appropriate labels.
That’s better! Next we introduce a new categorical field which indicates age group. The boundaries between these fields are somewhat arbitrary (and might be rather politically incorrect), but they more or less make sense. Note that respondents above the age of 80 had their ages simply coded as 80.
Finally we introduce BMI categories. These are rather broad categories, but will suffice for our analysis.