It knows where I live…

I recently came across an intriguing post by fellow Jesuan James Black, about making maps out of Google location data. This is the result.

Google is infamous for its data-harvesting practices. It really is quite amazing how much it knows about you. If you have a smartphone, the chances are Google knows almost exactly where you are, almost all of the time.

There is a service called ‘Google Takeout‘, which allows you to download all the data Google has collected about you. I did this. If you have clicked ‘yes’ at all the appropriate moments, Google has a record of: you web browser bookmarks, your contacts, your ‘Hangouts’ (text messages), your YouTube history, your emails, your Google documents, your Google+ social history and photographs, your calendar, and your location. It’s not doing anything wrong here – I’ve agreed to each of these pieces of information being stored by Google, usually by skipping over long ‘terms and conditions’ blurbs to click ‘yes’ and get to where I want to be.

Google uses all this information to ‘improve your Google experience’ – offering you advice about things you might be interested in (i.e. adverts) based on your interests, and where you are. Clever. So with this wealth of data at my fingertips, I thought I’d have a look at what I’d been up to.

The ‘location’ data that I got from google is an astonishing 152379 measurements… Since the start of November?!! It contains regular updates of latitude, longitude (usually accurate to around 25m) and altitude, an estimate of which direction and how fast I’m moving (if I’m moving), and (most amazingly) a guess at whether I’m sitting still, walking, cycling, or in a vehicle. Incredible. It only had data from the last few months (since I got a shiny new phone), but… well… wow! And quite scary.

So after half an hour of fiddling around in R (a statistical computing language), this is what I got:

Where I've been since 20th March 2013, according to Google. I't definitely missing a few spots, but it's worryingly comprehensive!

Where I’ve been since 20th March 2013, according to Google. I’t definitely missing a few spots, but it’s worryingly comprehensive! A conference in Barcelona, and some trips around the UK.

To get the full impact of this kind of data, we have to look a little closer: let’s zoom in to Cambridge, where I spend most of my time.

My life in Cambridge... I can see home, work and all the places I spend time around town.  Amazing!

My life in Cambridge… I can see home, work and all the places I spend time around town. Red dots are records of my location. Amazing!

And a whole other layer of activity can be revealed by looking at what I was doing in all of these places:

Things get really clear here: you can see where I spend most of my time staying still (black dots) is at home, and around my department.  The spread of points gets a bit wider here, partly because I move around the department quite a bit, but also I imagine because the walls are pretty thick, and reduce the gps accuracy! My commute to and from work gets really obvious, and I can tie each of the black blobs to an event!

Things get really clear here: you can see where I spend most of my time staying still (black dots) is at home, and around my department. The spread of points gets a bit wider here, partly because I move around the department quite a bit, but also I imagine because the walls are pretty thick, which reduces the gps accuracy. My commute to and from work gets really obvious, and I can tie each of the black blobs to an event.

And putting this all together:

In this map,

Purple dots are ‘uncategorised’ location markers – i.e. Google wasn’t sure what I was doing. In this map, I can link each of the black ‘still’ blobs to specific events: a dinner and lunch at Jesus College, a dinner at Clare college, a trip to the pub to celebrate a friend passing his PhD viva, dinner out with two sets of other friends, 3 pub visits, watching the fireworks, visiting friends in their houses, and doing my christmas shopping! You can get even more detail if you add time into the equation – you can see exactly when I was at all of these places.  And all of this is available for anyone in Google to look at… hmm…

And finally, here is a day:

Follow the dot again! This day is more typical: I get to work, go to coffee around 11:20 (watch the dot move!), go out to get some lunch at 1:30pm, head back and eat it in the coffee room, then back to the office at around 2:20pm. At around 6pm I go briefly to the zoology department, then to The Mill pub to meet some friends. After the pub, I have dinner at Clare college, a drink in Clare MCR, then home via Magdelene St. You really can track my movements to the minute!

Follow the dot! I get to work, go to coffee around 11:20 (watch the dot move!), go out to get some lunch at 1:30pm, head back and eat it in the coffee room, then back to the office at around 2:20pm. At around 6pm I go briefly to the zoology department, then to The Mill pub to meet some friends. After the pub, I have dinner at Clare college, a drink in Clare MCR, then home via Magdelene St. You really can track my movements to the minute.

I’m going to steer clear of the morals of this kind of information hoarding, and leave this as a fun, and revealing exercise. I have always been aware that Google have been keeping an eye on me, but the extent of it is staggering!

The Nitty-Gritty

For the geeky folks among you, this is how I did it:

NB for complete beginners (after LinwoodC3′s comment):

  1. Install R - follow the ‘Getting Started’ instructions.
  2. Install RStudio, an excellent, free interactive environment for R.
  3. In RStudio, install the packages you need to run the scripts below, by copying and pasting the following:

install.packages(“ggmap”)
install.packages(“ggplot2″)
install.packages(“jsonlite”)
install.packages(“plyr”)

And on to the good stuff…

1) Download the .json location data from Google Takeout.

2) Import the data into R, get it into a useful format, and plot it.  I haven’t included the code for the animated gif, as it gets a bit more involved (and to be perfectly honest, my code get’s all messy and horrible because I got tired :P). In short: make a function to prude a series of images, then stitch them all into a .gif using ImageMagick (not in R). If you’re interested, just ask.


# Load in the raw data from the JSON file ----------------------------------------------------
require(jsonlite)
require(plyr)

raw = fromJSON('/Path/To/LocationHistory.json')

# Get the 'locations' part of the list
locs = raw$locations

# these are all the columns that it contains...
names(locs)
# they're in various formats...
lapply(locs,class)

# Get columns into useful formats -----------------------------------------

ldf = data.frame(t=rep(0,nrow(locs)))

# time is in POSIX * 1000 (milliseconds) format, convert it to useful scale...
ldf$t = as.numeric(locs$timestampMs)/1000
class(ldf$t) = 'POSIXct'

# lat/lon are xE7, convert them to usable numbers...
ldf$lat = as.numeric(locs$latitudeE7/1E7)
ldf$lon = as.numeric(locs$longitudeE7/1E7)

# Accuracy doesn't need changing.
ldf$accuracy = locs$accuracy

# Activity guesses (it can tell when you're on a bike?!) are in a list... we can unpack these lists to get the most likely activity for each location (takes a while, depending on the size of your dataset).

# get the most likely activity type and confidence for each time point.
act = laply(locs$activitys, function(f) {
 if(is.null(f[[1]])) data.frame(activity=NA,confidence=NA,stringsAsFactors=F) else data.frame(activity=f[[2]][[1]][[1]][1],confidence=f[[2]][[1]][[2]][1],stringsAsFactors=F)
},.progress="text")

# combine activity data with the main dataset
ldf$activity = as.character(act[,1])
ldf$confidence = as.numeric(act[,2])

# Velocity, altitude and heading need no alteration:
ldf$velocity = locs$velocity
ldf$altitude = locs$altitude
ldf$heading = locs$heading

# We now have a data.frame 'ldf', which contains all the location data google has on us!

# And plot it... ----------------------------------------

require(ggplot2)
require(ggmap)

EU = get_map(c(0,48),5,source='google')

ggmap(EU) + geom_point(data=ldf,aes(lon,lat),colour='red')

# Let's just look at the UK stuff...

# first, define a quick funciton to clip the data to the current map
mapclip = function(df,map) {
 coord = attr(map,'bb')
 return(subset(df,lon>coord$ll.lon & lon<coord$ur.lon & lat>coord$ll.lat & lat<coord$ur.lat))
}

UK = get_map('Oxford, UK',7,scale=2)
ldf.uk = mapclip(ldf,UK)

ggmap(UK) + geom_point(data=ldf.uk,aes(x=lon,y=lat),colour='red',size=2)

# Let's have a look at Cambridge...

cam = get_map('Cambridge, UK',13,scale=4)
clip = coord_map(projection='mercator',xlim=c(0.1,0.15),ylim=c(52.19,52.23))
ldf.cam = mapclip(ldf,cam)
ggmap(cam) + geom_point(data=ldf.cam,aes(x=lon,y=lat),colour='red',size=1,alpha=0.5) + clip

# And what do I do in cambridge?
ldf.cam.act = subset(ldf.cam,activity %in% c("inVehicle","onBicycle","onFoot","still"))
ggmap(cam) + geom_point(data=ldf.cam.act,aes(x=lon,y=lat,colour=activity),size=2) + scale_colour_manual(values=c('red','orange','yellow','black')) + clip

# A little closer? And add in the non-active markers, to get more of a trend.
camclose = get_map('Cambridge, UK',15,scale=4)
ldf.cam.act.close = mapclip(ldf.cam.act,camclose)
ldf.cam.close = mapclip(ldf,camclose)
ggmap(camclose) + geom_point(data=ldf.cam.close,aes(lon,lat),colour='purple',alpha=0.6) + geom_point(data=ldf.cam.act.close,aes(x=lon,y=lat,colour=activity),size=2) + scale_colour_manual(values=c('red','orange','yellow','black'))

About these ads

19 thoughts on “It knows where I live…

  1. Pingback: M-A-O-L » It knows where I live…

      • what a letdown, omniscient google =D
        no, it’s what I thought, just wanted to make sure that it’s not hidden somewhere and I simply oversee it. Thanks for the reply!

  2. This is awesome!! I came across this post in the GooglePlus R community. I’m a complete novice to R and just exploring but enjoyed this!!! Trying it myself. One unmentioned step is to make sure you have the jsonlite and plyr packages installed.

    Below are the absolute first steps performed by any person trying this:

    1. Make sure R is installed on your machine; instructions here http://cran.us.r-project.org/doc/manuals/r-release/R-admin.html#Obtaining-R

    2. I use RStudio which is a FREE easy to understand environment for R; downloaded here http://www.rstudio.com/

    3. Open RStudio

    4. #install the packages required; type what you see below and wait until you get the md5 sums checked notification
    install.packages(“ggmap”)
    install.packages(“ggplot2″)
    install.packages(“jsonlite”)
    install.packages(“plyr”)

    Then start from “The Nitty Gritty” instructions. There may be more, but this is what I’ve done before trying it!!! Excited to see where I’ve been and what Google has on me!!!

  3. Loving this post, thanks for sharing. I latched onto it via the Google+ R group too fyi. I was a google latitude user for ages, sharing my fine grain location with my family etc (was very handy enabling my wife to check where I am without her having to call me and ask) until they shut it down and apparently moved the facility to google+. I never successfully got it working since that migration however and after a short, irritated burst of faffing, gave up. I really miss that service. Certainly puts the ‘to date’ Ed Snowden revelations into perspective. Imagine the furore if people knew how much they really know about our prole behaviour!

  4. Yes Richard, I was disappointed when they shut down Latitude, too. As it was purely opt-in and they had seemingly decent controls in the API too, I didn’t have a problem with it. It’s also interesting to see where and how you spend your time, although an employer poking into that is a creepy proposition.

  5. This encouraged me to look into Google+ location sharing again and its much clearer now, to the point that it now works for me. Pretty similar to the old latitude actually so now its integrated with google+, you could argue ‘better’.

  6. Oscar, I’m interested in learning how to use the timestamp in the json file. You said to ask if we wanted to know. I would appreciate any help you could provide. Thank you!

    • Of course! I’ll help as much as I can.

      I’m not quite sure what you’re after though… A general understanding of the POSIX time format? Or something more specific?

      ‘POSIX time’ is the number of seconds that have passed since 00:00:00 on 1st Jan 1970 (http://en.m.wikipedia.org/wiki/Unix_time). Once you have that number, you basically just tell the computer how you want it presented (e.g. with the strptime function, http://rfunction.com/archives/1912). To use this function, you first of all have tot tell the computer that the number you’re giving it is in the POSIX format (class(yourdata) = ‘POSIXct’), then you can get a legible time and date from the POSIX number using strptime (see link above). Does that help?

      You can see near the start of my code that I convert the json timestampMS data into normal POSIX format (divide by 1000; the line starting ldf$t), and then its all fairly plain sailing from there.

      Anything more specific, an away!

  7. in case anybody feels like making some gifs of their data, here’s a function using the animation package. I recommend limiting the amount of the data you use with the fraction argument. also check out the html export function of the package, does a nice job as well and also adds some controls.

    ### gif
    makeGIF <- function(data, map, fraction = 0.1, name = "tracking.gif"){
    library(animation)
    # inter frame time
    ani.options(outdir = getwd())
    imax = round(nrow(data) * fraction)
    print(paste("number of frames: ",imax))
    # gif loop
    saveGIF({
    # set up text progress bar
    pb <- txtProgressBar(min = 0, max = imax, style = 3)
    for(i in 2:imax){
    # plotting
    map <- ggmap(ingolstadt) +
    # current position
    geom_point(data = data[i,], aes(x = lon, y = lat, colour = activity)) +
    # old positions
    geom_point(data = data[1:i,], aes(x = lon, y = lat, colour = activity), alpha = 0.2) +
    scale_fill_manual(values=scales::hue_pal()(4))
    print(map)
    # update progress bar
    setTxtProgressBar(pb, i)
    }
    close(pb)
    },movie.name = name, interval = 0.1)
    }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s