Tuesday, 17 September 2013

Algorithm Efficiency for Nested For/If Loops in R

Algorithm Efficiency for Nested For/If Loops in R

This is my first post on SO, so I apologize if this question has been
asked somewhere else, but I can't seem to figure out how to even phrase my
question so it's difficult to look.
The issue I'm having is that I'm working with a data frame which contains
a variable, which is a factor, called PrimaryType. This variable has like
15 levels and I want to create new binary variables off of this so that I
can perform statistical analyses on the various levels. Here is the code
I'm using:
df <- read.csv('Data/ChiCrime11_13.txt', header=T, sep='\t')
for (i in 1:nrow(df)){
for (crimes in levels(df$PrimaryType)){
if (df$PrimaryType == crimes) {
df[crimes] <- 1
}
else{
df[crimes] <- 0
}
}
}
The problem I'm having is that my data frame is over 900,000 observations
long and so clearly this process is going to take a LOT of time to run
(900,000^15 iterations I believe). This brings me to my question: Is there
a way to make this more efficient?
Any thoughts/advice would be appreciated. Thanks!

No comments:

Post a Comment