data science, digital politics, smart cities...|


Types are a perennial headache in social science computing. One of the reasons I like perl is that it is so tolerant to variable types changing dynamically according to their context – saves a lot of time when scripting and also much easier to explain to students. It’s pretty clear which one of these is easier to explain to a beginner:

Yesterday however I ran into an even more annoying typing problem in R. I needed to export a large dataset (600,000 obs x 100 vars) which I only had in .RData. Using write.table() quickly hits R’s upper memory buffer. So I set up a simple loop to divide up the file on the basis of area_id, a variable with around 40 unique values:

What do I get? 40 files with nothing in them. Subset clearly isn’t working. A closer look at the last value country took turns this up:

What’s happening? R doesn’t think country or even country[1] are equivalent to 6. But when I assign country[1] to another variable (without making any explicit attempt to change types) then everything works. It’s not really clear to me why that should be. But this sort of typing difficulty is one of the things that puts beginners off: and I think it’s especially a shame in R since this language should be oriented towards the needs of small script writers.

By | 2012-11-20T17:53:25+00:00 November 20th, 2012|Perl, Programming, Python, R, Social Science Computing|0 Comments