I recently came across an intriguing post by fellow Jesuan James Black, about making maps out of Google location data. This is the result.
Google is infamous for its data-harvesting practices. It really is quite amazing how much it knows about you. If you have a smartphone, the chances are Google knows almost exactly where you are, almost all of the time.
There is a service called ‘Google Takeout‘, which allows you to download all the data Google has collected about you. I did this. If you have clicked ‘yes’ at all the appropriate moments, Google has a record of: you web browser bookmarks, your contacts, your ‘Hangouts’ (text messages), your YouTube history, your emails, your Google documents, your Google+ social history and photographs, your calendar, and your location. It’s not doing anything wrong here – I’ve agreed to each of these pieces of information being stored by Google, usually by skipping over long ‘terms and conditions’ blurbs to click ‘yes’ and get to where I want to be.
Google uses all this information to ‘improve your Google experience’ – offering you advice about things you might be interested in (i.e. adverts) based on your interests, and where you are. Clever. So with this wealth of data at my fingertips, I thought I’d have a look at what I’d been up to.
The ‘location’ data that I got from google is an astonishing 152379 measurements… Since the start of November?!! It contains regular updates of latitude, longitude (usually accurate to around 25m) and altitude, an estimate of which direction and how fast I’m moving (if I’m moving), and (most amazingly) a guess at whether I’m sitting still, walking, cycling, or in a vehicle. Incredible. It only had data from the last few months (since I got a shiny new phone), but… well… wow! And quite scary.
So after half an hour of fiddling around in R (a statistical computing language), this is what I got:
To get the full impact of this kind of data, we have to look a little closer: let’s zoom in to Cambridge, where I spend most of my time.
And a whole other layer of activity can be revealed by looking at what I was doing in all of these places:
And putting this all together:
And finally, here is a day:
I’m going to steer clear of the morals of this kind of information hoarding, and leave this as a fun, and revealing exercise. I have always been aware that Google have been keeping an eye on me, but the extent of it is staggering!
The Nitty-Gritty
For the geeky folks among you, this is how I did it:
NB for complete beginners (after LinwoodC3’s comment):
- Install R – follow the ‘Getting Started’ instructions.
- Install RStudio, an excellent, free interactive environment for R.
- In RStudio, install the packages you need to run the scripts below, by copying and pasting the following:
install.packages(“ggmap”) install.packages(“ggplot2″) install.packages(“jsonlite”) install.packages(“plyr”)
And on to the good stuff…
1) Download the .json location data from Google Takeout.
2) Import the data into R, get it into a useful format, and plot it. I haven’t included the code for the animated gif, as it gets a bit more involved (and to be perfectly honest, my code get’s all messy and horrible because I got tired :P). In short: make a function to prude a series of images, then stitch them all into a .gif using ImageMagick (not in R). If you’re interested, just ask.
# Load in the raw data from the JSON file ---------------------------------------------------- require(jsonlite) require(plyr) raw = fromJSON('/Path/To/LocationHistory.json') # Get the 'locations' part of the list locs = raw$locations # these are all the columns that it contains... names(locs) # they're in various formats... lapply(locs,class) # Get columns into useful formats ----------------------------------------- ldf = data.frame(t=rep(0,nrow(locs))) # time is in POSIX * 1000 (milliseconds) format, convert it to useful scale... ldf$t = as.numeric(locs$timestampMs)/1000 class(ldf$t) = 'POSIXct' # lat/lon are xE7, convert them to usable numbers... ldf$lat = as.numeric(locs$latitudeE7/1E7) ldf$lon = as.numeric(locs$longitudeE7/1E7) # Accuracy doesn't need changing. ldf$accuracy = locs$accuracy # Activity guesses (it can tell when you're on a bike?!) are in a list... we can unpack these lists to get the most likely activity for each location (takes a while, depending on the size of your dataset). # get the most likely activity type and confidence for each time point. act = laply(locs$activitys, function(f) { if(is.null(f[[1]])) data.frame(activity=NA,confidence=NA,stringsAsFactors=F) else data.frame(activity=f[[2]][[1]][[1]][1],confidence=f[[2]][[1]][[2]][1],stringsAsFactors=F) },.progress="text") # combine activity data with the main dataset ldf$activity = as.character(act[,1]) ldf$confidence = as.numeric(act[,2]) # Velocity, altitude and heading need no alteration: ldf$velocity = locs$velocity ldf$altitude = locs$altitude ldf$heading = locs$heading # We now have a data.frame 'ldf', which contains all the location data google has on us! # And plot it... ---------------------------------------- require(ggplot2) require(ggmap) EU = get_map(c(0,48),5,source='google') ggmap(EU) + geom_point(data=ldf,aes(lon,lat),colour='red') # Let's just look at the UK stuff... # first, define a quick funciton to clip the data to the current map mapclip = function(df,map) { coord = attr(map,'bb') return(subset(df,lon>coord$ll.lon & lon<coord$ur.lon & lat>coord$ll.lat & lat<coord$ur.lat)) } UK = get_map('Oxford, UK',7,scale=2) ldf.uk = mapclip(ldf,UK) ggmap(UK) + geom_point(data=ldf.uk,aes(x=lon,y=lat),colour='red',size=2) # Let's have a look at Cambridge... cam = get_map('Cambridge, UK',13,scale=4) clip = coord_map(projection='mercator',xlim=c(0.1,0.15),ylim=c(52.19,52.23)) ldf.cam = mapclip(ldf,cam) ggmap(cam) + geom_point(data=ldf.cam,aes(x=lon,y=lat),colour='red',size=1,alpha=0.5) + clip # And what do I do in cambridge? ldf.cam.act = subset(ldf.cam,activity %in% c("inVehicle","onBicycle","onFoot","still")) ggmap(cam) + geom_point(data=ldf.cam.act,aes(x=lon,y=lat,colour=activity),size=2) + scale_colour_manual(values=c('red','orange','yellow','black')) + clip # A little closer? And add in the non-active markers, to get more of a trend. camclose = get_map('Cambridge, UK',15,scale=4) ldf.cam.act.close = mapclip(ldf.cam.act,camclose) ldf.cam.close = mapclip(ldf,camclose) ggmap(camclose) + geom_point(data=ldf.cam.close,aes(lon,lat),colour='purple',alpha=0.6) + geom_point(data=ldf.cam.act.close,aes(x=lon,y=lat,colour=activity),size=2) + scale_colour_manual(values=c('red','orange','yellow','black'))
Hah, I like the occasional breaks for coffee, a few metres away from where presumably your desk is.
Yep… I am pretty close to the coffee room!
This. Very much wow.
Pingback: M-A-O-L » It knows where I live…
Great post! I’d been meaning to look at some of the mapping packages, etc., in R. This perfect. Thank you for sharing your code.
which of the ‘products’ available at google takeout contains the geo data? did not find an obvious candidate…
If Google has your location data, it should be available as an export option – maybe they don’t have anything on you :)?
what a letdown, omniscient google =D
no, it’s what I thought, just wanted to make sure that it’s not hidden somewhere and I simply oversee it. Thanks for the reply!
This is awesome!! I came across this post in the GooglePlus R community. I’m a complete novice to R and just exploring but enjoyed this!!! Trying it myself. One unmentioned step is to make sure you have the jsonlite and plyr packages installed.
Below are the absolute first steps performed by any person trying this:
1. Make sure R is installed on your machine; instructions here http://cran.us.r-project.org/doc/manuals/r-release/R-admin.html#Obtaining-R
2. I use RStudio which is a FREE easy to understand environment for R; downloaded here http://www.rstudio.com/
3. Open RStudio
4. #install the packages required; type what you see below and wait until you get the md5 sums checked notification
install.packages(“ggmap”)
install.packages(“ggplot2”)
install.packages(“jsonlite”)
install.packages(“plyr”)
Then start from “The Nitty Gritty” instructions. There may be more, but this is what I’ve done before trying it!!! Excited to see where I’ve been and what Google has on me!!!
Apologies LinwoodC3 – I didn’t expect so many people to look at this, so didn’t go in to much of the basic detail. I will add your comment to the actual blog text…
Loving this post, thanks for sharing. I latched onto it via the Google+ R group too fyi. I was a google latitude user for ages, sharing my fine grain location with my family etc (was very handy enabling my wife to check where I am without her having to call me and ask) until they shut it down and apparently moved the facility to google+. I never successfully got it working since that migration however and after a short, irritated burst of faffing, gave up. I really miss that service. Certainly puts the ‘to date’ Ed Snowden revelations into perspective. Imagine the furore if people knew how much they really know about our prole behaviour!
Yes Richard, I was disappointed when they shut down Latitude, too. As it was purely opt-in and they had seemingly decent controls in the API too, I didn’t have a problem with it. It’s also interesting to see where and how you spend your time, although an employer poking into that is a creepy proposition.
This encouraged me to look into Google+ location sharing again and its much clearer now, to the point that it now works for me. Pretty similar to the old latitude actually so now its integrated with google+, you could argue ‘better’.
Oscar, I’m interested in learning how to use the timestamp in the json file. You said to ask if we wanted to know. I would appreciate any help you could provide. Thank you!
Of course! I’ll help as much as I can.
I’m not quite sure what you’re after though… A general understanding of the POSIX time format? Or something more specific?
‘POSIX time’ is the number of seconds that have passed since 00:00:00 on 1st Jan 1970 (http://en.m.wikipedia.org/wiki/Unix_time). Once you have that number, you basically just tell the computer how you want it presented (e.g. with the strptime function, http://rfunction.com/archives/1912). To use this function, you first of all have tot tell the computer that the number you’re giving it is in the POSIX format (class(yourdata) = ‘POSIXct’), then you can get a legible time and date from the POSIX number using strptime (see link above). Does that help?
You can see near the start of my code that I convert the json timestampMS data into normal POSIX format (divide by 1000; the line starting ldf$t), and then its all fairly plain sailing from there.
Anything more specific, an away!
As someone very new to R as well, and not in the UK, is there a simple way to change the instructions so that the initial map is the US?
Ahh, figured it out!
Great! Glad you made it work – sorry I didn’t get back to you in time. Enjoy being freaked out by Google!
in case anybody feels like making some gifs of their data, here’s a function using the animation package. I recommend limiting the amount of the data you use with the fraction argument. also check out the html export function of the package, does a nice job as well and also adds some controls.
### gif
makeGIF <- function(data, map, fraction = 0.1, name = "tracking.gif"){
library(animation)
# inter frame time
ani.options(outdir = getwd())
imax = round(nrow(data) * fraction)
print(paste("number of frames: ",imax))
# gif loop
saveGIF({
# set up text progress bar
pb <- txtProgressBar(min = 0, max = imax, style = 3)
for(i in 2:imax){
# plotting
map <- ggmap(ingolstadt) +
# current position
geom_point(data = data[i,], aes(x = lon, y = lat, colour = activity)) +
# old positions
geom_point(data = data[1:i,], aes(x = lon, y = lat, colour = activity), alpha = 0.2) +
scale_fill_manual(values=scales::hue_pal()(4))
print(map)
# update progress bar
setTxtProgressBar(pb, i)
}
close(pb)
},movie.name = name, interval = 0.1)
}
Hey Oscar :). I’m working on a project that involves visualization of the movement of rats in sewers, and this looks very much like what I want to get. But you didn’t include the entire code, so I guess what you’re getting now is just the images? I didn’t try it yet because I will need to adapt the code to my own data. Could you give the full code (even if it’s messy)?
Hi Melissa,
Interesting project! What exactly ate you trying to do? Is it the mapping, or the generation of animated gifs that you’re interested in? Shoot me an email, and I’ll see how I can help! (oscarbranson at [google’s email service])