Category Archives: Topics

The second course that I have to design

I was lurking co-workers’ blog posts when I realized that I had to pick a topic for my second course as requirement for the program. This time the course can be designed for the graduate level (the first one actually also was it, though).

In the last months, I have spent a considerable amount of time reading about big data, data mining, machine learning, and statistical analysis; as well as art history, woman rights movements, and representation of body parts. All of them for my current research on human representation of face in world painting, that it is expected to materialize in an abstract for DH 2014 first, and in the first chapter of my disseration later. Second and third chapters of my thesis may include an authorship attribution study of a very famous Spanish novel, and a computer-based sentiment and meter analysis of the set of a specific kind of poetry plays.

All this work is being carried out thanks to the extensive documentation and reading of primary and secondary sources, as well as by dealing with considerable amounts of data generated mainly ad-hoc for this purposes. In the proccess, I started to follow a certain workflow, 1) data collection and curation, 2) data cleansing, 3) auto annotation of meta data, 4) data formating, and finally 5) data analysis employing a varying set of tools and concepts borrowed from Computer Sciences.

Consequently, that made me think that my second course for this PhD will be on Data for the Humanities Research. So, let’s talk to my super visor to see if he is as happy as I am with this topic 😀

Leave a Comment

Filed under Tasks, Topics

Creating a Globe of Data (PH2)

Lesson Goals

This is a lesson designed for intermediate users, although beginner users should be able to follow along.

In this lesson we will cover the next main topics:

  • Use of Python to produce a visualization of World Poverty Index on interactive globe.
  • Transform CSV data into JSON notation in Python.
  • Get spatial coordinates using Google and other sources from geopy library.

After seeing the basics of Python and how it could help us in our daily work, we will introduce one of the many options for visualization of data. In this case, we will combine a data source in CSV format that will be processed to transform them into JSON notation. Finally we will represent all the information in a world globe, designed for modern browsers using the WebGL technology. During the process, we will need to get the spatial coordinates for countries across the world. And before starting, you can see the final result of this unit on World Poverty, so don’t be afraid about all the new names mentioned above, we will explain them below.

The Globe of Data

Since the ending of 2009, some browsers started to implement an incipient specification for rendering 3D content on the Web. Although it is not yet a part of W3C‘s specifications –the W3C is the organization that proposes, defines and approves almost all Internet standards–, WebGL, that it is how is called, is being supported by all major browsers and the industry.

WebGL is the most recent way for 3D representations on the Web. So, with WebGL, a new form of data representation is made available. In fact, there are artists, scientists, game designers, statisticians and so on, creating amazing visualizations from their data.

Google WebGL Globe

Google WebGL Globe

One of these new ways of representations was made by Google. It is called WebGL Globe and is intended to show statistical geo-located data.

JSON

JSON, acronym for JavaScript Object Notation, is not only a format to represent data in Javascript, but it is also the data type that WebGL Globe needs to work. In this format, a list is inclosed between brackets, “[” for start and “]” to end. Therefore, the data series for WebGL Globe is a list of lists. Every one of these lists have two elements. The first one is the name of the serie and the second one is another list containing the data. Although is good to know how JSON lists are encoded, there are libraries for Python to do that conversion for you, so you only have to handle pure Python objects. The next code snippet shows how native list and dictionaries Python data objects are transformed into JSON.

>>> import json

>>> json.dumps([1, 2, 3])
    '[1, 2, 3]'

>>> json.dumps({"key1": "val1", "key2": "val2"})
    '{"key2": "val2", "key1": "val1"}'

The data for WebGL Globe is written comma separated, so you must indicate your information in a set of three elements: the first is the geographical coordinate for latitude, the second one is the same for longitude, and the third one is the value of the magnitude you would like to represent, but normalized between 0 and 1. This means if we have the values 10, 50, 100 for magnitudes, these will have to be translated into 0.1, 0.5 and 1.

The only thing you now need is to split up your data into several series of latitude, longitude and magnitude in JSON format, as the next example illustrates:

var data = [
  [
    'seriesA', [ latitude, longitude, magnitude, latitude, longitude, magnitude, ... ]
  ],
  [
    'seriesB', [ latitude, longitude, magnitude, latitude, longitude, magnitude, ... ]
  ]
];

This said, we can create an array in Python with the format described above and then convert that to JSON using the json library. Due to JSON notation is actually handle in Python as a string, and because is easy to produce syntax errors if you try to write JSON directly, we recommended to create the objects in Python and convert them into JSON, so we can guaranee that the final JSON is free of errors.

>>> import json

>>> data = [
 ...: "seriesA", [34.56, -5.23, 0.89, 27.78, 10.56, 0.12, ...],
 ...: "seriesB", [34.56, -5.23, 0.89, 27.78, 10.56, 0.12, ...],
 ...: ...
 ...: ]

>>> json.dumps(data)
'["seriesA", [34.56, -5.23, 0.89, 27.78, 10.56, 0.12, ...], "seriesB", [34.56, -5.23, 0.89, 27.78, 10.56, 0.12, ...]] ...'

The Data Set

Let’s say we want to represent information from the Human Poverty Index. Then we need to download the data in the format provided by United Nations’ site for the Multidimensional Poverty Index, which has replaced the old Human Poverty Index. Now we got a spreadsheet document, it’s time to open it and collect just the data we need, thus, go to the page 5 of the book, and copy and paste the cells into a clean spreadsheet. We clean what we don’t need like titles, captions, extra columns, etc. and leave just country names, the second “Value” column under the cell “Multidimensional Poverty Index”, the population under poverty in thousands, and the “Intensity of deprivation” column. The next step is to remove the rows with no data for that indicators, marked as “..”. After doing this, we should have a document with 4 columns and 109 rows that. Then remember to normalize all the values between 0 and 1 . Or you can simply download the cleaned and normalized file in CSV format or Excel (XLS) to avoid get lost in spreadsheet manipulation.

multidimensional_poverty_index_2011_ph2

Spreadsheet before normalizing

But, although we have the name of the countries, we need the geographical coordinates for them. There are several services that provide the latitude and longitude for a given address. In the case of having just the name of a country, the main coordinates for the capital is provided. We will use geopy, which is a Python library able to connect to different providers and get several kinds of information. To use geopy, a terminal or console is needed in order to get installed, that is very easy with just a command.

$ easy_install geopy

After that, we can open a terminal with the common Python interpreter, or an interactive console like IPython, and just get the latitude and longitude of, for instance, “Spain”, with next commands:

>>> from geopy import geocoders

>>> g = geocoders.Google()

>>> g.geocode("Spain")
(u'Spain', (40.463667000000001, -3.7492200000000002))

By default, geopy will try to get only one match, but you can easily avoid that behaviour adding the argument exactly_one equals to False. Then geopy will return a list of elements and it will be your task to get just one.  Google has a reduced limit of queries per day, so you should try a different provider for the geocoder if reach that limit.

>>> from geopy import geocoders

# Using GeoNames as provider
>>> g = geocoders.GeoNames()

# Getting the whole list of matches and getting just one
>>> g.geocode("Spain", exactly_one=False)[0]
(u'Spain', (40.463667000000001, -3.7492200000000002))

In this way, we can build a list of our countries from our spreadsheet and pass it to the next below. To build the list of countries you can simply copy the column of countries into your code editor, and replace ‘n’ with ‘”, “‘ so the result it’s something like:

["Slovenia", "Czech Republic", "United Arab Emirates", "Estonia", "Slovakia", "Hungary", "Latvia", "Argentina", "Croatia", "Uruguay", "Montenegro", "Mexico", "Serbia", "Trinidad and Tobago", "Belarus", "Russian Federation", "Kazakhstan", "Albania", "Bosnia and Herzegovina", "Georgia", "Ukraine", "The former Yugoslav Republic of Macedonia", "Peru", "Ecuador", "Brazil", "Armenia", "Colombia", "Azerbaijan", "Turkey", "Belize", "Tunisia", "Jordan", "Sri Lanka", "Dominican Republic", "China", "Thailand", "Suriname", "Gabon", "Paraguay", "Bolivia (Plurinational State of)", "Maldives", "Mongolia", "Moldova (Republic of)", "Philippines", "Egypt", "Occupied Palestinian Territory", "Uzbekistan", "Guyana", "Syrian Arab Republic", "Namibia", "Honduras", "South Africa", "Indonesia", "Vanuatu", "Kyrgyzstan", "Tajikistan", "Viet Nam", "Nicaragua", "Morocco", "Guatemala", "Iraq", "India", "Ghana", "Congo", "Lao People's Democratic Republic", "Cambodia", "Swaziland", "Bhutan", "Kenya", "Sao Tome and Principe", "Pakistan", "Bangladesh", "Timor-Leste", "Angola", "Myanmar", "Cameroon", "Madagascar", "Tanzania (United Republic of)", "Yemen", "Senegal", "Nigeria", "Nepal", "Haiti", "Mauritania", "Lesotho", "Uganda", "Togo", "Comoros", "Zambia", "Djibouti", "Rwanda", "Benin", "Gambia", "Côte d'Ivoire", "Malawi", "Zimbabwe", "Ethiopia", "Mali", "Guinea", "Central African Republic", "Sierra Leone", "Burkina Faso", "Liberia", "Chad", "Mozambique", "Burundi", "Niger", "Congo (Democratic Republic of the)", "Somalia"]

And use this list in the next script:

>>> from geopy import geocoders

>>> g = geocoders.GeoNames()

>>> countries = ["Slovenia", "Czech Republic", ...]
>>> for country in countries:
try:
    placemark = g.geocode(country, exactly_one=False)[0]
    print placemark[0] +","+ placemark[1][0] +","+ placemark[1][1]
except:
    print country
....:
....:
Slovenia,46.151241,14.995463
Czech Republic,49.817492,15.472962
...

Now, we can select all the results corresponding to the latitudes and longitudes of every country and copy them with Ctrl-C, Cmd-C or mouse right-click and copy. Go to our spreadsheet, in the first row of a new column, and then paste all. We should see a dialogue for paste the data, and on it, check the right option in order to get the values separated by commas.

Paste the result comma separated

Paste the result comma separated

Done this, we have almost all the coordinates for all the countries. There could be some locations for which the script didn’t get the right coordinates (geopy raise an error and the script just print the country name instead), like “Moldova (Republic of)” or “Georgia”. For these countries, and after a careful supervision, the better thing to do is to run several tries fixing the names (trying “Moldova” instead of “Moldova (Republic of)”) or just looking the location in Wikipedia –for example for Georgia, Wikipedia provides a link in the information box at the right side with the exact coordinates. When the process is over, we remove the columns with the names and sort the columns in order to get first the latitude, second the longitude, and the rest of the columns after that. We almost have the data prepared. After this, we need to save the spreadsheet as CSV file in order to be processed by a Python script that converts it into the JSON format that WebGL Globe is able to handle.

Reading CSV Files

Instead of passing a list of countries to geopy, we can use our clean and normalized CSV file as input to produce the JSON file we need.

A CSV file is a data format for printing tables into plain-text data. There are a plenty of dialects for CSV, but the most common is to print one row per line and every field comma separated. For example, the next table will have the output shown in below.

Field 1 Field 2
Row 1 Value Cell 1 Row 1 Value Field 2
Row 2 Value Cell 1 Row 2 Value Field 2

And the output will be:

Field 1,Field 2
Row 1 Value Cell 1,Row 1 Value Cell 2
Row 2 Value Cell 1,Row 2 Value Cell 2

And depending on the case, you can choose what character will be used as a separator instead of the “,”, or just leave the header out. But what happens if I need to print commas? Well, you can escape then or just use a double quote for the entire value.

"Row 1, Value Cell 1","Row 1, Value Cell 2"
"Row 2, Value Cell 1","Row 2, Value Cell 2"

And again you can think what is next if I need to print double quotes. In that case can change the character for quoting or just escape with a slash. This is the origin of all the dialects for CSV. However we are not covering this that deep and we will focus on CSV reading through Python. To achieve it we use the standard  “csv”  library and invoke the “reader” method with a file object after opening it from disk. This done, we can just iterate for every line as a list and store every value in a variable for the iteration.

In our case every line has, in this order, latitude, longitude, value for multidimensional poverty index, value for thousands of people in a poverty situation, and finally value for the intensity of deprivation. Note that our CSV file has no header, so we do not have to ignore the first line then. We will use three lists to store the different values of our series and finally, using the json library we could print a JSON output to a file. The final poverty.py script that processes the CSV file and produces the JSON file is detailed next:

import csv
import json
from geopy import geocoders

# Load the GeoNames geocoder
g = geocoders.GeoNames()

# Every CSV row is split into a list of values
file_name = "multidimensional_poverty_index_normalized_2011_ph2.csv"
rows = csv.reader(open(file_name, "rb"))

# Init the the lists that will store our data
mpis = []  # Multidimensional Poverty Index
thousands = []  # People, in thousands, in a poverty situation
deprivations = []  # Intensity of Deprivation

# Iterate through all the rows in our CSV
for country, mpi, thousand, deprivation in rows:
    try:
        # Get the coordinates of the country
        place, (lat, lon) = g.geocode(country, exactly_one=False)[0]
        # Fill the
        mpis = mpis + [lat, lon, mpi]
        thousands = thousands + [lat, lon, thousand]
        deprivations = deprivations + [lat, lon, deprivation]
    except:
        # We ignore countries that geopy is unable to process
        print "Unable to get coordinates for " + country

# Format the output
output = [
    ["Multidimensional Poverty Index", mpis],
    ["People affected (in thousands)", thousands],
    ["Intensity of Deprivation", deprivations]
]

# Generate the JSON file
json_file = open("poverty.json", "w")
json.dump(output, json_file)

And the JSON file poverty.json, using GeoNames, must look like:

[["Multidimensional Poverty Index", ["46.25", "15.1666667", "0", "49.75", "15.0", "0.01", "24.0", "54.0", "0.002", ... ]
...

Take into account that this script will omit some countries, and will print their names on the screen. If you chose a different provider in geopy, you will probably get slightly different coordinates and unrecognizable country names.

Unable to get coordinates for Bolivia (Plurinational State of)
Unable to get coordinates for Congo (Democratic Republic of the)

Putting it all together

Now, we have the poverty.json file, our input data for WebGL Globe. So, the last step is setup the Globe and and the data input file all together. We need to download the webgl-globe.zip file and extract the directory named “globe”  into a directory with the same name. In it, we copy our poverty.json file and now edit the provided index.html in order to replace the apparitions of “population909500.json” with “poverty.json”, and do some other additions like the name of the series. The resulting index.html, excluding style block, must look like the next one.

<!DOCTYPE HTML>
<html lang="en">
  <head>
    <title>WebGL Poverty Globe</title>
    <meta charset="utf-8">
  </head>
  <body>

  <div id="container"></div>

  <div id="info">
    <strong><a href="http://www.chromeexperiments.com/globe">WebGL Globe</a></strong>
    <span class="bull">&bull;</span> Created by the Google Data Arts Team
    <span class="bull">&bull;</span> Data acquired from <a href="http://hdr.undp.org/">UNDP</a>
  </div>

  <div id="currentInfo">
    <span id="serie0" class="serie">Multidimensional Poverty Index</span>
    <span id="serie1" class="serie">Population (in thousands)</span>
    <span id="serie2" class="serie">Intensity of Deprivation</span>
  </div>

  <div id="title">
    World Poverty
  </div>

  <a id="ce" href="http://www.chromeexperiments.com/globe">
    <span>This is a Chrome Experiment</span>
  </a>

  <script type="text/javascript" src="/globe/third-party/Three/ThreeWebGL.js"></script>
  <script type="text/javascript" src="/globe/third-party/Three/ThreeExtras.js"></script>
  <script type="text/javascript" src="/globe/third-party/Three/RequestAnimationFrame.js"></script>
  <script type="text/javascript" src="/globe/third-party/Three/Detector.js"></script>
  <script type="text/javascript" src="/globe/third-party/Tween.js"></script>
  <script type="text/javascript" src="/globe/globe.js"></script>
  <script type="text/javascript">

    if(!Detector.webgl){
      Detector.addGetWebGLMessage();
    } else {

      var series = ['Multidimensional Poverty Index','Population (in thousands)','Intensity of Deprivation'];
      var container = document.getElementById('container');
      var globe = new DAT.Globe(container);
      console.log(globe);
      var i, tweens = [];

      var settime = function(globe, t) {
        return function() {
          new TWEEN.Tween(globe).to({time: t/series.length},500).easing(TWEEN.Easing.Cubic.EaseOut).start();
          var y = document.getElementById('serie'+t);
          if (y.getAttribute('class') === 'serie active') {
            return;
          }
          var yy = document.getElementsByClassName('serie');
          for(i=0; i<yy.length; i++) {
            yy.setAttribute('class','serie');
          }
          y.setAttribute('class', 'serie active');
        };
      };

      for(var i = 0; i<series.length; i++) {
        var y = document.getElementById('serie'+i);
        y.addEventListener('mouseover', settime(globe,i), false);
      }

      var xhr;
      TWEEN.start();
      xhr = new XMLHttpRequest();
      xhr.open('GET', 'poverty.json', true);
      xhr.onreadystatechange = function(e) {
        if (xhr.readyState === 4) {
          if (xhr.status === 200) {
            var data = JSON.parse(xhr.responseText);
            window.data = data;
            for (i=0;i<data.length;i++) {
              globe.addData(data[1], {format: 'magnitude', name: data[0], animated: true});
            }
            globe.createPoints();
            settime(globe,0)();
            globe.animate();
          }
        }
      };
      xhr.send(null);
    }
  </script>
  </body>
</html>

Finally, to see the result, you must put all the files in a static web server and browse the URL. The fastest way to do this is running a local web server in Python, and despite the fact that you will be the only one able to see the globe, managing HTML files and small websites is out of the scope of this lesson. Run the next command under the globe directory itself.

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

Then, go to http://localhost:8000 and navigate to the index.html to see the result.

Globe before normalization

Globe before normalization

If it seems like this, it is because there is something wrong with some of the series. Remember that we need to normalize the values in order to get values in the range o to 1. To do that, we open again our CSV file as a spreadsheet, calculate the sum of the columns that we want to normalize, and then, we create a new column in which every single cell is the result of the division between the old value of cell by the total sum of all the values in the old column. We repeat the process with the other two columns and replace the old ones with just the values in the new ones. We run the steps to generate a new JSON file and try again.

Now, you can click on World Poverty to see everything properly working.

Suggested Readings

The Python Standard Library Documentation

Lutz, Learning Python

  • Ch. 9: Tuples, Files, and Everything Else

7 Comments

Filed under Topics

Some notes about the new features of neo4j-rest-client

This is the first time I write in technical terms and about an own development. But first, a bit of history. Back to December 2009, I met Neo4j database. Neo4j was one of the first graph databases with serious uses in the real world. I was amazed about how easy was creating nodes and edges (they call them relationships in an even more intuitive nomenclature). But, like everything else in the graph world, was written in Java with no other language binding, except a very basic Python binding, neo4j.py. A couple of months later, they released the REST standalone server and then, because having neo4.py working was really hard for pure Python coders, I decided to write a client side library. That’s how neo4-rest-client was born. It was a really basic tool, but started to grow and grow, and later the next year the first packaged version was release in the Python Package Index. Since then, everything has been improved, as much as the Neo4j REST API as the Python community around. The Neo4j guys finally deprecated neo4.py and released a new python-embedded client, also based on then Java runtime, at the same time that other alternatives just appeared into the scene: bublflow, neo4django, or the newest py2neo, for example. However, neo4j-rest-client was always as low level as possible: it didn’t manage cache, or lazy loads, or delayed requests, to name just a few. But when the Cypher plugin, the preferred method to query the graph in Neo4j, become part of the core, I decided to implement some cool features based on it.

The first thing was to have better iterables for objects, as well as laziness in loads and requests. I implemented a way to query the graph database using Cypher, but taking advantage from the current neo4j-rest-client objects like Node or Relationship. So, every time you make a query, you can get the objects as returned by the server, what it is called the RAW response. Using the `constants.RAW` option in the `returns` parameter of the method `query` from `GraphDatabase` objects.


from neo4jrestclient.client import GraphDatabase
from neo4jrestclient.constants import RAW

gdb = GraphDatabase("http://localhost:7474/db/data/")

q = "START n=node(*) RETURN ID(n), n.name"
params = {}
gdb.query(q, params=params, returns=RAW)

Or you can use `params` to pass safely parameters to your query.


q = "START n=node({nodes}) RETURN ID(n), n.name"
params = {"nodes": [1, 2, 3, 4, 5]}

gdb.query(q, params=params, returns=RAW)

Independently of the way you define your query, the last line can omit the `returns` if the value is `RAW`, but the power of this parameter is the possibility of passing casting functions in order to format the results.

from neo4jrestclient.client import Node

q = "START n=node({nodes}) RETURN ID(n), n.name!, n"
returns = (int, unicode, Node)
gdb.query(q, returns=returns)

Or you can even create your own casting function, what is really useful when using nullable properties, referenced in Cypher as `?` and `!`.


from neo4jrestclient.client import Node

def my_custom_casting(val):
    try:
        return unicode(val)
    except:  # Never ever leave an except like this
        return val

q = "START n=node({nodes}) RETURN ID(n), n.name!, n"
returns = (int, my_custom_casting, Node)
gdb.query(q, returns=returns)

Now I can assure that if the name of a node is not present,  a proper RAW value will be returned. But what happens if the number of columns don’t math the number of elements passed to be used as casting functions? Nothing, remaining elements will be returned as RAW, as usual. Nice graceful degradation 😀

On the other hand, and using the new queries feature, I implemented  some filtering helpers than could eventually replace the Lucene query method used so far. The star here is the `Q` object.


from neo4jrestclient.query import Q

The syntax, borrowed from Django and inspired by lucene-querybuilder, is the next one:


Q(property_name, lookup=value_to_match, [nullable])

The `nullable` option can take a `True` (by default), `False` or a `None`, and set the behaviour of Cypher when an element doesn’t have the queried property. In real examples, it will look like:


lookup = Q("name", istartswith="william")
williams = gdb.nodes.filter(lookup)

The complete list of lookup options is in the documentation. And lookups can be as complicated as you want.


lookups = (
    Q("name", exact="James")
    & (Q("surname", startswith="Smith")
       | ~Q("surname", endswith="e"))
)
nodes = gdb.nodes.filter(lookup)

The `filter`  method, added to nodes and relationships, can take an extra argument `start`, in order to set the `START` instead of using all the nodes or relationships (`node(*)`). The `start` parameter can be a mixed list of integers and Node objects,  a mixed list of integers and Relationship objects, or an Index object.


n1 = gdb.nodes.create()
start = [1, 2, 3, n1]
lookup = Q("name", istartswith="william")
nodes = gdb.nodes.filter(lookup, start=start)

index = gdb.nodes.indexes.create(name="williams")
index["name"]["w"] = n1
nodes = gdb.nodes.filter(lookup, start=index)
nodes = gdb.nodes.filter(lookup, start=index["name"])

Or using just the index:


nodes = index.filter(lookup, key="name", value="w")

Also, all filtering functions support lazy loading when slicing, so you can safely do slices in huge graph databases, because internally is using `skip` and `limit` Cypher options before doing the query.

Finally, just mention about the ordering method that allows you to order ascending (default) or descending, just concatenating calls.


from neo4jrestclient.constants import DESC

nodes = gdb.nodes.filter(lookup)[:100]
nodes.order_by("name", DESC).order_by("age")

And that’s all. Let’s see what the future has prepared for Neo4j and the Python neo4j-rest-client!

Leave a Comment

Filed under Topics

Making Sense of Teaching Computer Science Tools to Linguists

As a part of the PhD program, I am required to design and defend two different courses. One intended for undergraduate students and another one for grads. I don’t know yet if both can be for graduate students, but my first course will be. It is going to be about some useful tools that any experimental and not just theoretical linguist should know. As today, we are getting more and more accostumed to hear terms like digital humanists, digital history, or digital whatever. There are even a sub-disciplne (if we can call it that way) named Computational Linguistic. However, it seems to me like two ancient soccer rival teams: what a traditional linguist does is pure Linguistic, but what a computational linguist does is just the future. Again, the everlasting fight between old and new ways to do things. When what really should matter is what are your questions, and what tools do you have to find the answers. That’s why I am proposing this course: to teach tools and new ways to think about them, and to make students able to solve their own research problems. And what is even more important, make them lose the fear to experiment with Computer Science and Linguistic.

That being said, I am going to introduce you a very early scrap of my pretended syllabus. Every week is going to have a one hours class, in order to explain and introduce concepts, and another two hours lab class in which test, experiment and expand the subjects previously covered that week.

  1. Computers Architecture. One of the most important things that virtually anybody should know, is how actually a computer works. In order to understand what is possible to do and what is not, it is needed to know the common components of almost any current computer, like RAM memory, CPU, GPU, hard drives, input/output operations and devices, etc. Also, a brief introduction on the existing types will be given. Once you know how a machine is built, you can control and understand things like having enough memory to run the programs, why this file freezes my computer when loading, and so on.
  2. Fundamentals of Programming. First thing to note is almost everything inside your computer is a program. And I will say more, an important aomunt of processes in your everyday life are pretty similar to computer programs. The order you follow when you are taking a shower, that awesome recipe for cooking pumpkin pie, the steps you give before starting the engine of your car, or the movements you do while dancing. All of them are a way similar to computer programs, or better said, to algorithms. A computer program is a set of instructions that a machine run one by one. Those programs are usually algorithms, in the sense of they are steps to achieve an output given a specific input. Very likely, an introduction to coding using something like pseudo-languages, flux diagrams, or NetLogo, will be given.
  3. Programming Languages. A brief introduciton to programming languages and why they are the way they are. Some linguists are really accostumed to handle with language peculiarities, however, natural languages seem to answer to the hypothesis of the Universal Grammar, as argued by the generative grammar studies of Noam Chomsky. A grammar is a set of rules, but in natural languages, unlinke formal languages like those used to code, the set of rules is usually huge. Fortunatelly, programming languages are built using a little set of rules, and the grammar that describe them can be, according to Chomsky, classified according the way they generate sentences. We could even say that study and learn how to program is like understand how another language works. So, in the end, programming languages are just a kind of language: constructed languages. And instead of learning how the brain works in order to understand it, you have to know how machines do. Once this is clear, it is time to meet Python.
  4. Writing Code. After the introduction to Python, students have to learn how a well structure of the code looks like. Concepts like loops, flow control statements, functions and parameters will be taught. Then it is the moment to make them know what libraries are and how to create their own ones. Finally, notions of Object-Oriented Programming (OOP) in Python will be shown, just in order to guide them in the use of objects and third-party libraries. Regrettably a hard-core lesson like this one is really needed in order to expand the skills of the students facing real life problems in their research. Thinking exactly the way machines do, it is a the best manner to efficiently code.
  5. Python Libraries. After getting some basic knowledge about programming, some third-party libraries will be presented. In special, the commonly used Natural Language Toolkit or simply called NLTK. This library is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, part-of-speech tagging (POS), parsing, and semantic reasoning.
    The scientific Python extensions scipy, numpy, pylab and matplotlib will be also introduced, but briefly because visualization is covered in another Week 8.
  6. R Language. R is a language and environment for statistical computing and graphics. Its syntax and usage differ a bit in regard to Python and it is much more focused on manipulating and analyzing data sets. Although R has buil-in functions and libraries for almost any measure, there is a very active community behind that provide even more, the Comprehensive R Archive Network or CRAN. Learn how to use it and where to find the function that does exactly what you want is as important as know the language. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible for any research porpuse. This week just will cover basic statistical concepts like population, sample, frequency, and measures of center (mode, median, mean), spread (range, interquartile range, variance, standard deviation) and shape (symmetric, skewnessm kurtosis).
  7. Statistics. More advanced features will be introduced, not just how to calculate, but when and why. ANOVA, Chi Square, Pearson coefficient, arithmetic regression are some of them. Knowing of the meaning of statistical measures is really important to understand your data and what is happening with them. However, the point will always be how to get these measures working on R, instead of discussing theoretical and deep aspects of them.
  8. Plotting. Both R and Python have powerful tool to represent and visualize data. Unfortunately, the syntax of these functions can be a little tricky and deserve a whole week to be explained. Produce good charts and visualization of your data can be a crucial step in order to get your research properly understood. That’s why this week will introduce different methods to plot data: bar charts, pie charts, scatter plots, histograms, heatmaps, quadrilateral mesh, spectograms, stem plots, cross correlation, etc.
  9. Regular Expressions. As defined in Wikipedia, “a regular expression provides a concise and flexible means to ‘match’ (specify and recognize) strings of text, such as particular characters, words, or patterns of characters.” The origins of regular expressions lie in automata theory and formal language theory. These fields study models of computation and ways to describe and classify formal languages.  Given a formal definition, we have Σ, an alphabet of symbols, and constants: the empty set, the empty string, and literal characters. The operations of concatenation, alternation and Kleen star (and cross), define regular expressions. However, we will use the POSIX Basic Regular Expressions syntax, because is the most used and easy to learn. Of course, it does include boolean “OR”, grouping and quantification. Linguists can benefit of regular expressions because is a tool that allows filtering and discovering data. Let’s say we have a ser of words in Spanish and we want to extact all conjugations of the verb “jugar” (to play). The regular expression “^ju(e?)g[au].*$” will return the proper matches.
  10. CHAT Format. CHAT is a file format for transcribing conversations in natural language using plain text. First designed for transcribing children talkings, is actually a quite powerful format for any kind of transcription. It allows to register parts of sentence, omissions, speakers, dates and times, and more features, including someones from phonology. There is even an application that translate CHAT XML files to Phon files.
    Besides, due to CHAT files are actually plain text file, the Python Natural Language Toolki already has a parser for them, so we can do certain kind of analysis using just Python and our already adquired skills for regular expressions.
  11. CHILDES CLAN. Besides the specification of the CHAT format (.cha), CHILDES is intended for . One of the tools that provides is CLAN, the most general tool available for transcription, coding, and analysis. Over a corpora of texts written in CHAT format, CLAN  does provide methods such as headers, gems, comments, and postcodes that can be used for some  qualitative data analysis (QDA).
  12. Presentations. Althougth at the beginning this week was intented for introducing PRAAT and CHILDES Phon for  analysis of phonological data, later I thought that should be even more useful to present students’ projects to the class for comenting and getting feedback. So, this week won’t have any new content, but students will have to comment and critize their classmates’ works.

Quite interesting. Not only from a linguistic point of view, but also for computer scientist who enjoys teaching computer science to others. Fortunately, the linguist Prof. Yasaman Rafat accepted to be my co-supervisor for the linguistic side of this course. However, because the course is bridging bewteen two disciplines, I still needed a computer scienctist for guiding me in the process of teaching to non-technical students. Fortunately I asked to the Biology and Computer Science (and Python expert) Prof. Mark Daley, and he gently said yes. So now there is no obstacle at all to make this course a real thing :)

5 Comments

Filed under Tasks, Topics

Creating a Globe of Data (revisited for Programming Historian Second Edition)

Module Goals

After seeing the basics of Python and how it could help us in our daily work, we will introduce one of the many options for visualization of data. In this case, we will combine a data source in CSV format that will be processed to transform them into JSON notation. Finally we will represent all the information in a world globe, designed for modern browsers using the WebGL technology. During the process, we will need to get the spatial coordinates for countries across the world. And before starting, you can see the final result of this unit on World Poverty, so don’t be afraid about all the new names mentioned above, we will explain them below.

The Globe of Data

Since the ending of 2009, some browsers started to implement an incipient specification for rendering 3D content on the Web. Although it is not yet a part of W3C‘s specifications –the W3C is the organization that proposes, defines and approves almost all Internet standards–, WebGL, that it is how is called, is being supported by all major browsers and the industry.

WebGL is the most recent way for 3D representations on the Web. So, with WebGL, a new form of data representation is made available. In fact, there are artists, scientists, game designers, statisticians and so on, creating amazing visualizations from their data.

Google WebGL Globe

Google WebGL Globe

One of these new ways of representations was made by Google. It is called WebGL Globe and allows to show statistical geo-located data.

JSON & World Coordinates

JSON, acronym for JavaScript Object Notation, is not only a format to represent data in Javascript, the language of the browsers. It is also the data type that WebGL Globe needs to work. In this format, a list is inclosed between brackets, “[” for start and “]” to end. Therefore, the data series for WebGL Globe is a list of lists. Every one of these lists have two elements. The first one is the name of the serie and the second one is another list containing the data. Although is good to know how JSON lists are encoded, there are libraries for Python to do that conversion for you, so you only have to handle pure Python objects.

>>> import json

>>> json.dumps([1, 2, 3])
    '[1, 2, 3]'

>>> json.dumps({"key1": "val1", "key2": "val2"})
    '{"key2": "val2", "key1": "val1"}'

The data for WebGL Globe is written comma separated, so you must indicate your information in a set of three elements: the first is the geographical coordinate for latitude, the second one is the same for longitude, and the third one is the value of the magnitude you would like to represent, but normalized between 0 and 1. This means if we have the values 10, 50, 100 for magnitudes, these will have to be translated into 0.1, 0.5 and 1.

Birefly, “A geographic coordinate system is a coordinate system that enables every location on the Earth to be specified by a set of numbers.” These numbers are often chosen to represent the vertical position and horizontal position of a point in the globe (more precisely is even possible add the elevation). They are commonly referred to angles from equatorioal plane, but as far as we are concerned those angles can be transform into a couple of single numbers with several decimals places

Latitude and Longitude of the Earth (Source: Wikipedia.org)

Latitude and Longitude of the Earth (Source: Wikipedia.org)

The only thing you now need is to split up your data into several series of latitude, longitude and magnitude in JSON format, as the next example illustrates:

var data = [
  [
    'seriesA', [ latitude, longitude, magnitude, latitude, longitude, magnitude, ... ]
  ],
  [
    'seriesB', [ latitude, longitude, magnitude, latitude, longitude, magnitude, ... ]
  ]
];

This said, we can write the data for our globe in pure Python and then apply a conversion into JSON.

>>> data = [
 ...: "seriesA", [34.56, -5.23, 0.89, 27.78, 10.56, 0.12, ...],
 ...: "seriesB", [34.56, -5.23, 0.89, 27.78, 10.56, 0.12, ...],
 ...: ...
 ...: ]

>>> json.dumps(data)
'["seriesA", [34.56, -5.23, 0.89, 27.78, 10.56, 0.12, ...], "seriesB", [34.56, -5.23, 0.89, 27.78, 10.56, 0.12, ...]] ...'

The Data Set

Let’s say we want to represent information from the Human Poverty Index. The first we need is to download the data in the format provided by United Nations’ site for the Multidimensional Poverty Index, that has replaced the old Human Poverty Index. Now we got a spreadsheet document, it’s time to open it and collect just the data we need, thus, go to the page 5 of the book, and copy and paste the cells into a clean spreadsheet. We clean all the date we don’t need like titles, captions, extra columns, etc and we leave just country names, the second “Value” column under the cell “Multidimensional Poverty Index”, the population under poverty in thousands, and the “Intensity of deprivation” column. The next step is to remove the rows with no data for that indicators, marked as “..”. After doing this, we should have a document with 4 columns and 109 rows.

Spreadsheet before getting coordinates for countries

Spreadsheet before getting coordinates for countries

But, although we have the name of the countries, we need the geographical coordinates for them. There are several services that provide the latitude and longitude for a given address. In the case of having just the name of a country, the main coordinates for the capital is provided. We will use geopy, which is a Python library able to connect to different providers and get several kinds of information. To use geopy, a terminal or console is needed in order to get installed, that is very easy with just a command.

$ easy_install geopy

After that, we can open a terminal or interfactive console like iPython and just get the latitude and longitude of, for instance, “Spain”, with next commands:

>>> from geopy import geocoders

>>> g = geocoders.Google()

>>> g.geocode("Spain")
(u'Spain', (40.463667000000001, -3.7492200000000002))

In this way, we can build a list of our countries and pass it to the next script:

>>> from geopy import geocoders

>>> g = geocoders.Google()

>>> countries = ["Slovenia", "Czech Republic", ...]
>>> for country in countries:
try:
    placemark = g.geocode(country)
    print placemark[0] +","+ placemark[1][0] +","+ placemark[1][1]
except:
    print country
....:
....:
Slovenia,46.151241,14.995463
Czech Republic,49.817492,15.472962
United Arab Emirates,23.424076,53.847818
...

Now, we can select all the results corresponding to the latitudes and longitudes of every country and copy them with Ctrl-C or mouse right-click and copy. Go to our spreadsheet, in the first row of a new column, and then paste all. We should see a dialogue for paste the data, and on it, check the right option in order to get the values separated by commas.

Paste the result comma separated

Paste the result comma separated

Done this, we have almost all the coordinates for all the countries. Anyway, there could be some locations for which the script didn’t get the right coordinates, like “Moldova (Republic of)” or “Georgia”. For these countries, and after a carefull supervision, the better thing to do is to run several tries fixing the names (trying “Moldova” instead of “Moldova (Republic of)”) or just looking the location in Wikipedia –for example for Georgia, Wikipedia provides a link in the information box at the right side with the exact coordinates. When the process is over, we remove the columns with the names and sort the columns in order to get first the latitude, second the longitude, and the rest of the columns after that. We almost have the data prepared. After this, we need to save the spreadsheet as CSV file in order to be processed by a Python script that converts it into the JSON format that WebGL Globe is able to handle.

Reading CSV Files

A CSV file is a data format for printing tables intoto plain-text data. There are a plenty of dialects for CSV, but the most common is to print onw row per line and every field comma separated. For example, the next table will have the output shown in below.

Field 1 Field 2
Row 1 Value Cell 1 Row 1 Value Field 2
Row 2 Value Cell 1 Row 2 Value Field 2

And the output will be:

Field 1,Field 2
Row 1 Value Cell 1,Row 1 Value Cell 2
Row 2 Value Cell 1,Row 2 Value Cell 2

And depending on the case, you can choose what character will be used as a separator insted of the “,”, or just leave the header out. But what happens if I need to print commas? Well, you can escape then or just use a double quote for the entire value.

"Row 1, Value Cell 1","Row 1, Value Cell 2"
"Row 2, Value Cell 1","Row 2, Value Cell 2"

And again you can think what is next if I need to print double quotes. In that case can change the character for quoting or just escape with a slash. This is the origin of all the dialects for CSV. However we are not covering this that deep and we will focus on CSV reading through Python. To achieve it we use the standard  “csv”  library and invoke the “reader” method with a file object after opening it from disk. This done, we can just iterate for every line as a list and store every value in a variable for the iteration.

 

In our case every line has, in this order, latitude, longitude, value for multidimensional poverty index, value for thousands of people in a poverty situation, and finally value for the intensity of deprivation. Note that our CSV file has no header, so we do not have to ignore de first line then. We will use three lists to store the different vales of our series and finally, using the

json

library we could print a JSON output to a file. The script that processes the CSV file and produces the JSON output is the detailed the next:

import csv
lines = csv.reader(open("poverty.csv", "rb"))
mpis = []  # Multidimensional Poverty Index
thousands = []  # People, in thousands, in a poverty situation
deprivations = []  # Intensity of Deprivation
for lat, lon, mpi, thousand, deprivation in lines:
    mpis = mpis + (lat, lon, mpi)
    thousands = thousands + (lat, lon, thousand)
    deprivations = deprivations + (lat, lon, deprivation)
output = [
    ["Multidimensional Poverty Index", mpis],
    ["People affected (in thousands)", thousands],
    ["Intensity of Deprivation", deprivations]
]
print json.dumps(output)

And the output must look like:

[
["Multidimensional Poverty Index", ["46.151241", "14.995463", "0", ... ]
...

Putting it all together

Now, if we copy that output into a file called poverty.json we will have our input data for WebGL Globe. So, the last step is setup the Globe and and the data input file all toghether. We need to download the webgl-globe.zip file and extract the directory named as “globe”  into a directory with the same name. In it, we copy our poverty.json file and now edit the index.html in order to replace the apparitions of “population909500.json” with “poverty.json”, and do some other additions like the name of the series. Finally, to see the result, you can put all the files in a static web server and browse the URL. Another option, just for local debugging, is run the next command under the directory itself:

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

And then, go to http://localhost:8000 to see the result.

Globe before normalization

Globe before normalization

It seems like there is something wrong with two of the series: the population in poverty conditions, and the intensity of the poverty. This is because we need to normalize the values in order to get values in the range o to 1. To do that, we open again our CSV file as a spreadsheet, calculate the sum of the columns that we want to normalize, and then, we create a new column in which every single cell is the result of the division of the old value of cell by the total sum of all the values in the old column, We repeat the proccess with the another column and replace the old ones with just the values in the new ones. Now, we can run the steps of generate the JSON file and try again.

Now, you can click on World Poverty to see everything properly woriking.

Suggested Readings

The Python Standard Library Documentation

Lutz, Learning Python

  • Ch. 9: Tuples, Files, and Everything Else

4 Comments

Filed under Topics

Mis estadísticas

Una vez más, como parte del curso, tenemos la tarea de exponer las estadísticas que Google Analytics recolecta de nuestros blogs y crear una entrada explicando algunos detalles.

Aunque con poco tiempo de vida –lo inició a principios de enero–, creo que se pueden ver algunas cosas interesantes. Lo primero es mostrar las visitas totales en el rango adecuado, 277 visitas, con 121 visitantes únicos absolutos, 650 páginas vistas, un promedio de páginas vistas de 2,35, más de 4 minutos de tiempo en el sitio, un 62,09% de rebote y un 43,32% de nuevas visitas. No parece grandes datos, pero adentrémonos un poco más.

Relación entre visitar y páginas vistas

Relación entre visitar y páginas vistas

El gráfico anterior muestra la relación entre las visitas únicas (naranja) y las páginas vistas (azules). Claramente los días de más visitas coinciden con las fechas de publicación de las entradas, ya que es cuando más publicidad se le da, bien en redes sociales bien en otros foros como Yutzu. También parece tener sentido que el número de páginas vistas vaya parejo a las visitas, lo que indica que los que llegan a las entradas normalmente navegan hasta otro post, una media en el sitio de 4 minutos da para leer una entrada (según la extensión) y posteriormente descubrir algunas otras. Eso me hace pensar que quizás mi estrategia de enlazar las entradas en las redes sociales surte efecto, pero veamos lo que dicen los datos.

Fuentes del tráfico

Fuentes del tráfico

Es decir, que pese a mi esfuerzo de difusión, los motores de búsqueda aun no parecen efectivos con la búsqueda de mis contenidos y Facebook y twitter juntos no suman más que los enlaces seguidos desde la propia plataforma de blogs del CulturePlex, que está muy por encima del resto de fuentes. Pero hay un curioso (direct) ((none)) que viene significar que más del 23% del tráfico se obtiene de introducir la dirección del post o el blog directamente en el navegador, lo cual se explica porque Yutzu no hace clickeables los enlaces, así que Analytics no es capaz de adivinar el origen de esas visitas. Una pena, porque no tenemos entonces forma de discernir la pltaforma Yutzu de las veces en las que se introduce la URL de verdad, que deberían ser las ocasiones en las que se copia y pega y cuando yo, que soy el autor, la escribo en la barra de direcciones porque la conozco previamente.

Esto empieza a tener sentido entonces, veamos cómo de fieles son las visitas y qué hacen las nuevas en comparación con las que vuelven.

Tipología de los visitantes

Tipología de los visitantes

Pareciera que Canadá está por delante en visitantes nuevos al resto de países como Italia o España. Mi gozo en un poco, la mayor parte de mis contactos en redes residen en España y no son siquiera un muestras significativa en la estadística. Poco me leen, presupongo, no más de 24 segundos pese a ser los segundos que más visitan el blog. Además de éstas y de algunas visitas marginales e infieles desde México o Rusia (rebote seguro, los rusos pasan el mismo tiempo en el blog que mis congéneres españoles), hay un par de outliers curiosos. El primero es un visitante nuevo de zona desconocida que estuvo por encima de la media en tiempo de lectura y el segundo es, presumo, de la misma zona sin identificar, quizás alguien que volvió y se leyó el blog entero, porque esa zona de la que ni siquiera Google sabe estuvo más de media hora leyendo. También es posible que Analytics se haya hecho un lío para identificar qué zona es y los haya agrupado a todos por igual, dando información que se pude interpretar erróneamente, por lo que es mejor que este tipo de desinformación no se tenga demasiado en cuenta.

Pero, ¿qué es lo que se lee?

Resumen del contenido

Resumen del contenido

Parece que mis entradas más leídas son las siguientes:

  1. / (portada del blog)
  2. La retórica del código.
  3. Datos que amenazan.
  4. Miedo mainstream.
  5. De humanistas, crisis y WikiLeaks.
  6. «No puedo parar de crear, soy un creador».
  7. Creatividad espectacular.

Siendo el primero de ellos muy superior en visitas al segundo y tercero y en otro orden de magnitud respecto al resto, por lo que concluyo que la portada del blog es lo más visitado y luego están las entradas que según el momento en que se escriban, ya que las visitas provienen en su mayoría de los compañeros de clase, tendrá más o menos atención. Quizás deba revisar mi estrategia de difusión puesto que no parece que haya aumentado con el tiempo sino más bien disminuido, ya que los posts más nuevos aparecen en las posiciones más bajas.

También reseñable es el hecho de que algunas de las posiciones contengan la palabra preview, esto significa que hay entradas que tienen prácticamente el mismo número de visitas que ediciones hechas por mí al crearla.

Por último, y como curiosidad, señalar que sólo se han hecho dos búsquedas en mi blog, una de la cadena «JL» y otra de «Humanista». Ambas provenientes de un Internet Explorer para Windows. ¿Qué querría encontrar?

Búsquedas

Búsquedas

Leave a Comment

Filed under Topics

Creatividad espectacular

En la última clase, que se remonta a hace un par de semanas –antes de la reading week que yo como trabajador no pude disfrutar–, el Prof. Suárez proyectó una buena parte del documental Dans le ventre du Moulin, de Mariano Franco y Marie Belzil sobre el trabajo dirigido por Robert Lepage para el cuadragésimo aniversario de la fundación de Québec, ciudad natal del polifacético canadiense.

Como si del producto de una empresa al uso se tratase, Lepage establece una jerarquía bien estructurada, una planificación meticulosa y se encarga de delegar con total naturalidad las tareas en sus personas de confianza, quienes, por su parte, parecen sentirse siempre de acuerdo en las correcciones e indicaciones que  hace, puesto que supervisa prácticamente todo el trabajo. No existe, al contrario que en el caso de Ferrán Adriá, una distinción clara entre el compositor y los intérpretes, por usar la misma terminología que el chef. Todos los implicados en el proyecto son trabajadores y creadores, todos aportan, la sensación de comunidad e igualdad con el mismo Lepage se palpa desde el principio del documental. Podría pensarse que Lepage no hace nada, que sólo se encarga de dirigir la creatividad de su equipo. Bajo mi punto de vista eso no lo hace menos impresionante. Dirigir a un equipo humano tan extenso y multidisciplinar no debe ser fácil, menos aun teniendo en cuenta la dificultad de mantener la motivación y el entusiasmo entre tantas personas con dedicaciones tan dispares. En este caso, más que ser un discípulo de un gran maestro cada miembro del proyecto aporta su grano de arena y se siente parte del resultado final. Una gestión personal bastante más avanzada que la del cocinero catalán.

Presupongo que es más fácil ser creativo en colaboración que en solitario y no por ello los creadores aislados tienen más o menos mérito. Cuanta más gente rara y ajena tengas alrededor más sencillo será aportar visiones distintas al mismo dominio y ofrecer aproximaciones nuevas e innovadoras. Esto nos permite indagar en el origen de la creatividad, ya que cambiando el contexto de una localidad y proyectándola sobre los contextos de localidades extranjeras o importando éstas últimas al ámbito de trabajo se puede conseguir ver de manera natural cuáles con las piezas que encajan y cuáles no. Este punto de vista de la creatividad orientado hacia la innovación no podría tampoco subsistir sin una fuerte componente de impacto social, aplicabilidad y «reproducibilidad», algo de lo que Lepage parece saber un poco pues está consiguiendo eludir el problema de la autoría de las ideas creando una marca alrededor de su nombre. Una estrategia más que inteligente necesaria como ya han demostrado en el pasado artistas de la talla de Pink Floyd (tuve la suerte de ver no hace mucho el espectáculo The Wall de Roger Waters), Jean Michel Jarre (con su impresionante directo bajo las pirámides de Giza) o incluso el dúo francés Daft Punk que marcó un antes y un después en los lives de la música electrónica.

No sabría decir si la creatividad depende de un condicionamiento estético previo o si lo promueve según el impacto que alcance, en cuyo caso la dinámica del arte estaría más supeditada a la publicidad y la propaganda que a los criterios y juicios de los espectadores o la intencionalidad primera de los autores. El arte es dinámico y cambiante, entonces ¿cómo es posible que me siga conmoviendo casi bajo los mismos términos? Han cambiado las formas, los lenguajes y las manifestaciones pero parece que el resultado final sigue siendo el mismo: transmitir la virtualidad y el estado intencional del artista. A un margen dejo el debate entre cultura e industria cultural, ya que esto último nos condiciona en la sociedad de consumo y nos educa para reaccionar de determinadas maneras ante ciertas obras.

Pero volviendo un poco sobre la descontextualización de lo local, un ejemplo claro de que este hecho potencia sobremanera la creatividad podemos encontrarlo en el dispositivo creado por Microsoft para convertir el cuerpo humano tal cual en un gamepad de su consola Xbox 360 sin necesidad de añadidos o controles externos, el/la Kinect. Prácticamente desde el momento en que el dispositivo salió al mercado fue hackeado (por un español de hecho) y los nuevos controladores libres fueron puestos a disposición de todo aquel que quisiese usar lo que había legítimamente adquirido para un uso distinto del que la compañía que lo distribuye lo había preparado. Creo que ya no es necesario enlazar ni mostrar los trabajos que desarrolladores, investigadores y artistas de todo el mundo están llevando a cabo tomando como base este dispositivo. Tanto es el éxito cosechado que la propia Microsoft ya ha anunciado la liberación de un SDK para hacer más fácil desarrollar y crear con Kinect y la compañía que lo fabrica, PrimeSense, ha dado licencias a otra de las grandes del sector para que distribuya la tecnología en otros entornos, en concreto a través de Asus y un dispositivo al que posiblemente llamen Xtion.

¿Y qué pasa si queremos importar el uso de Kinect a un entorno como el académico? Pues que surgen ideas como las de Élika o Diego. Tomando como base lo poco que sabemos de Rayuella155!, un posible escenario puede ser aquel en el que podamos visualizar un capítulo de la obra, aprovechando que no suelen ser muy extensos, y mediante gestos marcar palabras o conjuntos de palabras con los que componer un mapa mental de lo que se está contando para luego comprobar cómo otros lectores han marcado y efectuado sus propios mapas de lectura. Quizás exista una relación entre alguna de las características de los mapas y los capítulos que previamente han leído, por lo que sería interesante intentar extraer las metáforas que encierran esos grafos y cambiarlas por otras a fin determinar qué está provocando el que distintas personas marquen de formas diferentes el mismo capítulo o si estos mapas dependen de la lectura anterior. Es sólo una idea, ofrecerlo casi como un juego pero que sirva para obtener algún resultado.

3 Comments

Filed under Topics

«No puedo parar de crear, soy un creador»

En la clase de la semana pasada, que básicamente consistió en el visionado del documental «Un día en elBulli» en el que se cuentan los entresijos e implicaciones de trabajar en una factoría de creatividad aplicada a la cocina, se propuso como actividad una entrada en el blog de cada cuál hablando sobre creatividad y humanidades.

Para un amante del humor surrealista como yo es difícil ver un documental acerca de Ferrán Adriá y no rememorar una y otra vez el magistral sketch de Muchachada Nui, dignos herederos de los maestros Monty Python. Sobre todo cuando el cocinero –que es lo que en última instancia él defiende ser a capa y espada– proclama que no puede parar de crear, que él es un creador, o que él es el compositor y los pobres pinches de cocina los meros intérpretes. Buena analogía, no lo dudo, pero algo soberbia quizás. Dejo un momento el vídeo para que lo disfruten pero advierto que casi al final igual desvaría un poco.

El humor, conducto social desde al menos el año 1900 antes de Cristo –o eso dice al respecto Paul McDonald de la Universidad de Wolverhampton–, y da la sensación que desde que existe el lenguaje, es una parte importante de nuestras vidas. No sólo por las implicaciones fisiológicas que la risa parece tener, como consecuencia del humor, sino incluso en un sentido casi existencial. Decía Nietzsche que «el hombre sufre tan profundamente que ha debido inventar la risa»; tristemente hay quienes están ya incluso demasiado aturdidos como para reírse en lugar de llorar, según la definición de Joseph Klatzmann. En su «Análisis de la Comicidad», José Serra Masana defiende que la comedia también induce a la catarsis del espectador a través de la risa y el distanciamiento pero de un modo inversamente proporcional a como la tragedia lo hace con las compasión y las lágrimas. Obviamente cualquiera de los expertos en comedias puede darme los matices correspondientes sobre este asunto del que sólo reflejo lo que leo.

Lo importante, más allá del efecto conseguido de supuesta purificación, es la multitud de ingredientes para la fórmula del humor. Desde los verbales capaces de manejar toda la retórica existente, dominarla y someterla en beneficio de su propio fin hasta el humorismo de situación de los tartazos y los resbalones. Entran entonces viejos actores en nuevos escenarios con todo un arsenal de usos. El oxímoron, por ejemplo, cobra una nueva vida en los chistes gráficos y memes de Internet. E incluso se generan, por el procedimiento de copia de patrón y cambio de detalles, miríadas de versiones a partir de un supuesto original en sitios como 4chan o tumblr. Este tipo de sitios, donde suele abundar cierto contenido de mal gusto, otras veces de índole sexual, y casi siempre tratando lo absurdo con un extraño tipo de reglas que se auto impone sin saber muy bien porqué, congregan todos los elementos para el humor negro y muy políticamente incorrecto.

Mil y una veces me ha pasado algo así

Mil y una veces me ha pasado algo así

No parece existir un tipo de humor que satisfaga a todas las personas por igual puesto que eso depende en gran medida de la sensibilidad del individuo o incluso, si me apuran, de su necesidad de integración en otro grupo mayor. Lo curioso, sin embargo, es que tanto en el humor absurdo como en el humor (muy) incorrecto da la sensación de que se presentan los mismos patrones creativos una y otra vez y que, como un algoritmo genético, se valen de las propias evoluciones de sí mismos basadas en leves mutaciones para crear nuevas instancias de las bromas que estarán mejor contextualizadas o se percibirán como nuevas, que no originales. Algo que me recuerda demasiado a lo que hablé la semana anterior sobre los gafapastas. ¿Cuánto debe modificarse algo para que no provoque rechazo? ¿Cuánto puede cambiarse un chiste sobre la Troll face para que siga siendo «gracioso»? ¿Hasta qué punto es creativo el humor absurdo?

Preguntas difíciles de responder puesto que su respuesta implicaría disponer de una forma de medir y, hasta donde yo sé, el humor no es mesurable. Finalmente parece que el contenido de este post no está muy relacionado con las humanidades. Supongo que este tipo de humor no es digno de estudio o no interesa lo suficiente. Estudiar los mecanismos sobre los cuáles a través de la experiencia multimedia se pueden inducir en los seres humanos estados de ánimo es básicamente lo que hace el humor en Internet. En mi humilde opinión, algo bastante atractivo.

En la biblioteca...

En la biblioteca...

1 Comment

Filed under Topics