Creating a Globe of Data

Before starting, you can see the final result of this post on World Poverty.

Some months ago, I was impressed with the web Chrome Experiments. In that site, you can find a lot of experiments made using the new WebGL technology, that it’s supposed to work in the most of new browsers. WebGL is the most recent standard for 3D representations on the Web. So, with WebGL, a new form of data representation is now possible. In fact, there are artists, scientists, game designers, statistics and so on, creating amazing visualizations of their data.

Google WebGL Globe

Google WebGL Globe

One of these new ways of representations was made by Google. It’s called WebGL Globe and allows to show statistical geo-located data. The only thing you need is split up your data into several series of latitude, longitude and magnitude in JSON format, as the next example illustrates:

var data = [
  [
    'seriesA', [ latitude, longitude, magnitude, latitude, longitude, magnitude, ... ]
  ],
  [
    'seriesB', [ latitude, longitude, magnitude, latitude, longitude, magnitude, ... ]
  ]
];

JSON, acronym for JavaScript Object Notation, is not only a format to represent data in Javascript. It’s also the data type that WebGL Globe needs to work. In this format, a list is inclosed between brackets, “[” for start and “]” to end. Therefore, the data series for WebGL Globe is a list of lists. Every one of these lists have two elements. The first one is the name of the serie and the second one is another list containing the data. The data is written comma separated, so that you must indicate your information in set of three elements: the first is the geographical coordinate for latitude, the second one is the same for longitude, and the third one is the value of the magnitude you would like to represent.

Let’s say we want to represent information from the Human Poverty Index. The first we need is to download the data in the format provided by United Nations’ site for the Multidimensional Poverty Index, that has replaced the old Human Poverty Index. Now we got a spreadsheet document, it’s time to open it and collect just the data we need, thus, go to the page 5 of the book, and copy and paste the cells into a clean spreadsheet. We clean all the date we don’t need like titles, captions, extra columns, etc and we leave just country names, the second “Value” column under the cell “Multidimensional Poverty Index”, the population under poverty in thousands, and the “Intensity of deprivation” column. The next step is to remove the rows with no data for that indicators, marked as “..”. After doing this, we should have a document with 4 columns and 109 rows.

Spreadsheet before getting coordinates for countries

Spreadsheet before getting coordinates for countries

But, although we have the name of the countries, we need the geographical coordinates for them. There are several services that provide the latitude and longitude for a given address. In the case of having just the name of a country, the main coordinates for the capital is provided. We will use geopy, which is a Python library able to connect to different providers and get several kinds of information. To use geopy, a terminal or console is needed in order to get installed, that is very easy with just a command.

$ easy_install geopy

After that, we can open a terminal or interfactive console like iPython and just get the latitude and longitude of, for instance, “Spain”, with next commands:

>>> from geopy import geocoders

>>> g = geocoders.Google()

>>> g.geocode("Spain")
(u'Spain', (40.463667000000001, -3.7492200000000002))

In this way, we can build a list of our countries and pass it to the next script:

>>> from geopy import geocoders

>>> g = geocoders.Google()

>>> countries = ["Slovenia", "Czech Republic", ...]
>>> for country in countries:
try:
    placemark = g.geocode(country)
    print "%s,%s,%s" % (placemark[0], placemark[1][0], placemark[1][1])
except:
    print country
....:
    ....:
Slovenia,46.151241,14.995463
Czech Republic,49.817492,15.472962
United Arab Emirates,23.424076,53.847818
...

Now, we can select all the results corresponding to the latitudes and longitudes of every country and copy them with Ctrl-C or mouse right-click and copy. Go to our spreadsheet, in the first row of a new column, and then paste all. We should see a dialogue for paste the data, and on it, check the right option in order to get the valies separated by commas.

Paste the result comma separated

Paste the result comma separated

Done this, we hace almost all the coordinates for all the countries. Anyway, could be some locations for which the script didn’t get the right coordinates, like “Moldova (Republic of)” or “Georgia”. For these countries, and after a carefull supervision, the better thing to do is to run several tries fixing the names (trying “Moldova” instead of “Moldova (Republic of)”) or just looking the location in Wikipedia –for example for Georgia, Wikipedia provides a link in the information box at the right side with the exact coordinates. When the process is over, we remove the columns with the names and order the columns in order to get first the latitude, second the longitude, and the rest of the columns after that. We almost have the data prepared. After this, we need to save the spreadsheet as CSV file in order to be processed by a Python script that converts it into the JSON format that WebGL Globe is able to handle. The script that processes the CSV file and produces a JSON output is the detailed the next:

import csv
lines = csv.reader(open("poverty.csv", "rb"))
mpis = []  # Multidimensional Poverty Index
thousands = []  # People, in thousands, in a poverty situation
deprivations = []  # Intensity of Deprivation
for lat, lon, mpi, thousand, deprivation in lines:
    mpis += (lat, lon, mpi)
    thousands += (lat, lon, thousand)
    deprivations += (lat, lon, deprivation)
print """
[
["Multidimensional Poverty Index", [%s]],
["People affected (in thousands)", [%s]],
["Intensity of Deprivation", [%s]]
""" % (",".join(mpis),
       ",".join(thousands),
       ",".join(deprivations))

And the output will look like:

[
["Multidimensional Poverty Index", ["46.151241", "14.995463", "0", ... ]
...

Now, if we copy that output into a file called poverty.json we will have our input data for WebGL Globe. So, the last step is setup the Globe and and the data input file all toghether. We need to download the webgl-globe.zip file and extract the directory named as “globe”  into a directory with the same name. In it, we copy our poverty.json file and now edit the index.html in order to replace the apparitions of “population909500.json” with “poverty.json”, and do some other additions like the name of the series. Finally, to see the result, you can put all the files in a stativ web server and browse the URL. Another option, just for local debugging, is run the next command under the directory itself:

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

And then, go to http://localhost:8000 to see the result.

Globe before normalization

Globe before normalization

It seems like there is something wrong with two of the series: the population in poverty conditions, and the intensity of the poverty. This is because we need to normalize the values in order to get values in the range o to 1. To do that, we open again our CSV file as a spreadsheet, calculate the sum of the columns that we want to normalize, and then, we create a new column in which every single cell is the result of the division of the old value of cell by the total sum of all the values in the old column, We repeat the proccess with the another column and replace the old ones with just the values in the new ones. Now, we can run the steps of generate the JSON file and try again.

Now, you can click on World Poverty to see everything properly woriking.

3 Comments

Filed under Tasks

3 Responses to Creating a Globe of Data

  1. Pingback: ¡Se acabó! « versae

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>