Calculating an average face

Example of average face

In the comings and goings of my thesis, now that I am certain that my main topic was way too big (my supervisor liked it but he says that it was more like a whole research curricula that a single thesis), I try to focus and cover only a fragment of the initial goal. Big Culture stills sounds in my head since that October of 2012 when I first thought of the idea. However, my thesis won’t be a monograph anymore, but a set of articles related to Big Data in Humanities.

One of these articles is already on process, and its topic is related to the representation of faces in world painting. An abstract has been sent to DH 2014, hosted at University of Lausanne, Switzerland. After a successful preliminary work for DH 2013, this time I have been working on deeper analysis from our proudly collected data set of 47k faces in paintings across time. As a part of the research process, and as usual in any paper conceived in the CulturePlex, there is some programming involved. In this case a lot of matplotlib, scipy, numpy, IPython, and Pandas (and even PIL/Pillow), a set of Python libraries and tools that quickly become our main stack for data analysis.

One interesting challenge that came from this research was the generation of an average face. The first thing I noticed was that my machine was not able to handle that amount of images due to its limited RAM (4GB), so I asked for help to SHARCNet and Prof. Mark Daley kindly offered me a user in one of the 32GB Mac machines available, so I managed to get installed IPython with all the libraries inside a virtualenv, I copied all the files needed, and then started the Notebook.

$ipython notebook --pylab inline --no-browser --ip=<YOUR PUBLIC IP HERE> --NotebookApp.password=python -c "from IPython.lib import passwd; print(passwd())"  Among the features that I have available for a face (after applying face detection algorithms), there is the centroid of the face. From that point, and using height and width as well, I can trace a rectangle that delimits the boundaries of a face. Then I center all the faces by their centroid and resize all the images to have same height. In order to calculate the average face, I first implemented a solution that made use of opacity/alpha levels in matplotlob, but that seems to be limited to 256 layers (don’t know if can be increased) and works pretty slow. consuming all the resources of the machine really fast. After trying some other methods, I came with the idea that an average image is as simple as a standard statistical mean calculated for every single pixel. Images were in RGB color model, so corresponding matrices had 3 dimensions. If I were used grey-scale images the whole process would have been 3 times faster, although for the sizes of images that I am handling (faces of 200 by 200 pixels), there is almost no difference. A simplified version of the code used is shown below, although is subjected to performance improvements. def face_detail(face): mode = 'RGB' desired_height = 250 center_at = [400, 400] img = faces.load_image(face) features = faces.load_features(face) center_pct = ( features["center_x_pct"], features["center_y_pct"] ) height = features['height'] width = features['width'] painting_height = features['painting_height'] painting_width = features['painting_width'] # Resizing pil_img = PILImage.fromarray(img, mode) resize_height = 1.0 * painting_height * desired_height / height resize_width = 1.0 * painting_width * desired_height / height resized_img = pil_img.resize( (int(resize_width), int(resize_height)) ) # Shifting shift_point = [ center_at[1] - (center_pct[1] * resize_height / 100.0), center_at[0] - (center_pct[0] * resize_width / 100.0), [0] ] shifted_img = ndimage.shift( resized_img, shift_point, mode='constant', cval=256, ) # Cropping xlim = slice(center_at[0] * 0.5, center_at[0] * 1.5) ylim = slice(center_at[1] * 0.5, center_at[1] * 1.5) cropped_img = shifted_img[xlim, ylim] return cropped_img def get_average_face(faces): imgs = [] center_at = [400, 400] for index, face in faces.iterrows(): try: img = face_detail(face) if img is not None: # Adding images array_img = np.array(img) array_img.resize(center_at + [3]) imgs.append(array_img) except Exception as e: msg = "Error found when processing image {}:nt{}" print(msg.format(face, e)) # Averaging avgface = np.array(imgs).mean(axis=0) avgface = avgface.astype(numpy.uint8) return avgface fig, ax = plt.subplots(1, 1, figsize=(10, 10), dpi=300, facecolor="none") average_face = get_average_face(faces) ax.imshow(average_face, interpolation='bicubic')  Some other problems still need to be addressed, i.e. face rotations. The use of affine and projective transformations can solve that, as well as replacing the method of resizing and shifting to re-center all the faces. 1 Comment Filed under Tasks The second course that I have to design I was lurking co-workers’ blog posts when I realized that I had to pick a topic for my second course as requirement for the program. This time the course can be designed for the graduate level (the first one actually also was it, though). In the last months, I have spent a considerable amount of time reading about big data, data mining, machine learning, and statistical analysis; as well as art history, woman rights movements, and representation of body parts. All of them for my current research on human representation of face in world painting, that it is expected to materialize in an abstract for DH 2014 first, and in the first chapter of my disseration later. Second and third chapters of my thesis may include an authorship attribution study of a very famous Spanish novel, and a computer-based sentiment and meter analysis of the set of a specific kind of poetry plays. All this work is being carried out thanks to the extensive documentation and reading of primary and secondary sources, as well as by dealing with considerable amounts of data generated mainly ad-hoc for this purposes. In the proccess, I started to follow a certain workflow, 1) data collection and curation, 2) data cleansing, 3) auto annotation of meta data, 4) data formating, and finally 5) data analysis employing a varying set of tools and concepts borrowed from Computer Sciences. Consequently, that made me think that my second course for this PhD will be on Data for the Humanities Research. So, let’s talk to my super visor to see if he is as happy as I am with this topic 😀 Leave a Comment Filed under Tasks, Topics Computer Tools for Linguists My last blog entry was about a new course I would like to design and teach. But some of you were thinking, “hey, what happened to the other course, the one about linguists?” And you are right. No one word since then. But now, I am finally ready to release the course to the world. I will defend on April 11th, but all the content is ready. So, here you go (it is only the content; exercises, syllabus and defense document not included… until I pass the defense) The course is a hands-on and pragmatic introduction to those computer tools that can be used in Linguistic, from basic computer concepts, to give an understanding of how machines work, to applications, programming languages and programs that can make the life of the linguist researcher a little bit easier. The course will cover subjects such as computers architecture, programming languages, regular expressions, the general purpose language Python, the statistical language R, and the set of tools for child language analysis, CHILDES. Feel free to report any error! 2 Comments Filed under Tasks Thinking about teaching web development to humanists Now that my first course on computer tools for linguists is on its way (I already have almost half of the lessons designed), it is time to think about the next one. The CulturePlex laboratory is, so far, a multidisciplinary environment where people have backgrounds from different fields, mixing up Computer Sciences-like disciplines with Humanities-like disciplines. However, because the rise of computers in every aspect of our life’s, programming literacy is increasingly becoming a demanded skill. Such was the case that how-to-code courses are now a must for researchers across Academia. In the field of Humanities, people are using computer tools to formulate new questions as well as to solve the new and the old ones . This new trend is usually called and marketed as Digital Humanities, but the term is now under discussion in such limits that some people even consider the discussion itself as Digital Humanities. But more than that, it is really about the crystallization of the needs of both current and future researchers. Therefore, our goal is to stop being just multidisciplinary, and start being a poly-disciplinary laboratory. And in order to fulfill this gap, my second course is addressing the needs of digital humanists by creating an intensive intersession course. This course will cover all the aspects of web development, from scratch and zero knowledge of programming, to pretty complex web sites with some logic and even persistence in relational databases. The name, Web Development From Scratch for Humanists, says it all. After finishing this course, students will be able to take an arbitrary data set from their investigations, and build a query-able website to show the data to the world. And to so, a preliminary outline for this course is shown below: Week 1 Day 1: Introduction to Computers and Architecture Day 2: Programming Languages and Python. Conditionals, Loops, and Functions Day 3: Data Types. Recursion Day 4: Libraries and Oriented Programming Week 2 Day 1: Internet and the Web Day 2: Frameworks. Introduction to Django Day 3: Views and Templates Day 4: HTML Fundamentals Week 3 Day 1: CSS Day 2: Introduction to Javascript Day 3: jQuery and AJAX Day 4: Bootstrap and D3.js Week 4 Day 1: Introduction to Relational Databases Day 2: Schemas and Models Day 3: Decorators and User Authentication Day 4: Migrations Week 5 Day 1: REST Interfaces Day 2: Agile Integration Day 3: Git and Control Version Systems Day 4: Test Driven Development And also, as an experiment, we probably will be running this course in the lab, just to see if it is too ambitious and simply unrealistic, or on the contrary, something that we can achieve with a lot of effort. Time will say it. Leave a Comment Filed under Tasks Making Sense of Teaching Computer Science Tools to Linguists As a part of the PhD program, I am required to design and defend two different courses. One intended for undergraduate students and another one for grads. I don’t know yet if both can be for graduate students, but my first course will be. It is going to be about some useful tools that any experimental and not just theoretical linguist should know. As today, we are getting more and more accostumed to hear terms like digital humanists, digital history, or digital whatever. There are even a sub-disciplne (if we can call it that way) named Computational Linguistic. However, it seems to me like two ancient soccer rival teams: what a traditional linguist does is pure Linguistic, but what a computational linguist does is just the future. Again, the everlasting fight between old and new ways to do things. When what really should matter is what are your questions, and what tools do you have to find the answers. That’s why I am proposing this course: to teach tools and new ways to think about them, and to make students able to solve their own research problems. And what is even more important, make them lose the fear to experiment with Computer Science and Linguistic. That being said, I am going to introduce you a very early scrap of my pretended syllabus. Every week is going to have a one hours class, in order to explain and introduce concepts, and another two hours lab class in which test, experiment and expand the subjects previously covered that week. 1. Computers Architecture. One of the most important things that virtually anybody should know, is how actually a computer works. In order to understand what is possible to do and what is not, it is needed to know the common components of almost any current computer, like RAM memory, CPU, GPU, hard drives, input/output operations and devices, etc. Also, a brief introduction on the existing types will be given. Once you know how a machine is built, you can control and understand things like having enough memory to run the programs, why this file freezes my computer when loading, and so on. 2. Fundamentals of Programming. First thing to note is almost everything inside your computer is a program. And I will say more, an important aomunt of processes in your everyday life are pretty similar to computer programs. The order you follow when you are taking a shower, that awesome recipe for cooking pumpkin pie, the steps you give before starting the engine of your car, or the movements you do while dancing. All of them are a way similar to computer programs, or better said, to algorithms. A computer program is a set of instructions that a machine run one by one. Those programs are usually algorithms, in the sense of they are steps to achieve an output given a specific input. Very likely, an introduction to coding using something like pseudo-languages, flux diagrams, or NetLogo, will be given. 3. Programming Languages. A brief introduciton to programming languages and why they are the way they are. Some linguists are really accostumed to handle with language peculiarities, however, natural languages seem to answer to the hypothesis of the Universal Grammar, as argued by the generative grammar studies of Noam Chomsky. A grammar is a set of rules, but in natural languages, unlinke formal languages like those used to code, the set of rules is usually huge. Fortunatelly, programming languages are built using a little set of rules, and the grammar that describe them can be, according to Chomsky, classified according the way they generate sentences. We could even say that study and learn how to program is like understand how another language works. So, in the end, programming languages are just a kind of language: constructed languages. And instead of learning how the brain works in order to understand it, you have to know how machines do. Once this is clear, it is time to meet Python. 4. Writing Code. After the introduction to Python, students have to learn how a well structure of the code looks like. Concepts like loops, flow control statements, functions and parameters will be taught. Then it is the moment to make them know what libraries are and how to create their own ones. Finally, notions of Object-Oriented Programming (OOP) in Python will be shown, just in order to guide them in the use of objects and third-party libraries. Regrettably a hard-core lesson like this one is really needed in order to expand the skills of the students facing real life problems in their research. Thinking exactly the way machines do, it is a the best manner to efficiently code. 5. Python Libraries. After getting some basic knowledge about programming, some third-party libraries will be presented. In special, the commonly used Natural Language Toolkit or simply called NLTK. This library is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, part-of-speech tagging (POS), parsing, and semantic reasoning. The scientific Python extensions scipy, numpy, pylab and matplotlib will be also introduced, but briefly because visualization is covered in another Week 8. 6. R Language. R is a language and environment for statistical computing and graphics. Its syntax and usage differ a bit in regard to Python and it is much more focused on manipulating and analyzing data sets. Although R has buil-in functions and libraries for almost any measure, there is a very active community behind that provide even more, the Comprehensive R Archive Network or CRAN. Learn how to use it and where to find the function that does exactly what you want is as important as know the language. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible for any research porpuse. This week just will cover basic statistical concepts like population, sample, frequency, and measures of center (mode, median, mean), spread (range, interquartile range, variance, standard deviation) and shape (symmetric, skewnessm kurtosis). 7. Statistics. More advanced features will be introduced, not just how to calculate, but when and why. ANOVA, Chi Square, Pearson coefficient, arithmetic regression are some of them. Knowing of the meaning of statistical measures is really important to understand your data and what is happening with them. However, the point will always be how to get these measures working on R, instead of discussing theoretical and deep aspects of them. 8. Plotting. Both R and Python have powerful tool to represent and visualize data. Unfortunately, the syntax of these functions can be a little tricky and deserve a whole week to be explained. Produce good charts and visualization of your data can be a crucial step in order to get your research properly understood. That’s why this week will introduce different methods to plot data: bar charts, pie charts, scatter plots, histograms, heatmaps, quadrilateral mesh, spectograms, stem plots, cross correlation, etc. 9. Regular Expressions. As defined in Wikipedia, “a regular expression provides a concise and flexible means to ‘match’ (specify and recognize) strings of text, such as particular characters, words, or patterns of characters.” The origins of regular expressions lie in automata theory and formal language theory. These fields study models of computation and ways to describe and classify formal languages. Given a formal definition, we have Σ, an alphabet of symbols, and constants: the empty set, the empty string, and literal characters. The operations of concatenation, alternation and Kleen star (and cross), define regular expressions. However, we will use the POSIX Basic Regular Expressions syntax, because is the most used and easy to learn. Of course, it does include boolean “OR”, grouping and quantification. Linguists can benefit of regular expressions because is a tool that allows filtering and discovering data. Let’s say we have a ser of words in Spanish and we want to extact all conjugations of the verb “jugar” (to play). The regular expression “^ju(e?)g[au].*$” will return the proper matches.
10. CHAT Format. CHAT is a file format for transcribing conversations in natural language using plain text. First designed for transcribing children talkings, is actually a quite powerful format for any kind of transcription. It allows to register parts of sentence, omissions, speakers, dates and times, and more features, including someones from phonology. There is even an application that translate CHAT XML files to Phon files.
Besides, due to CHAT files are actually plain text file, the Python Natural Language Toolki already has a parser for them, so we can do certain kind of analysis using just Python and our already adquired skills for regular expressions.
11. CHILDES CLAN. Besides the specification of the CHAT format (.cha), CHILDES is intended for . One of the tools that provides is CLAN, the most general tool available for transcription, coding, and analysis. Over a corpora of texts written in CHAT format, CLAN  does provide methods such as headers, gems, comments, and postcodes that can be used for some  qualitative data analysis (QDA).
12. Presentations. Althougth at the beginning this week was intented for introducing PRAAT and CHILDES Phon for  analysis of phonological data, later I thought that should be even more useful to present students’ projects to the class for comenting and getting feedback. So, this week won’t have any new content, but students will have to comment and critize their classmates’ works.

Quite interesting. Not only from a linguistic point of view, but also for computer scientist who enjoys teaching computer science to others. Fortunately, the linguist Prof. Yasaman Rafat accepted to be my co-supervisor for the linguistic side of this course. However, because the course is bridging bewteen two disciplines, I still needed a computer scienctist for guiding me in the process of teaching to non-technical students. Fortunately I asked to the Biology and Computer Science (and Python expert) Prof. Mark Daley, and he gently said yes. So now there is no obstacle at all to make this course a real thing

Final Post: Gamex and Faces in Baroque Paintings

Face recognition algorithms (used in digital cameras) allowed us to detect faces in paintings. This has gave us the possibility of having a collection of faces of a particular epoch (in this case, the baroque). However, the results of the algorithms are not perfect when applied in paintings instead of pictures. Gamex gives the chance to clean this collection. This is very important since these paintings are the only historical visual inheritance we have from the period. A period that started after the meet of two worlds.

1. Description

Gamex was born from the merging of different ideas we had at the very beginning of the Interactive Exhibit Design course. It basically combines motion detection, face recognition and games to produce an interactive exhibit of Baroque paintings. The user is going to interact with the game by touching, or more properly poking, faces, eyes, ears, noses, mouths and throats of the characters of the painting. We will be scoring him if there is or there is not a face already recognized on those points. Previously, the database has a repository with all the information the faces recognition algorithms have detected. With this idea, we will be able to clean mistakes that the automatic face recognition has introduced.

The Gamex Set

2. The Architecture

A Tentative Architecture for Gamex explains the general architecture in more detail. Basically we have four physical components:

• A screen. Built with a wood frame and elastic-stretch fabric where the images are going to be projected from the back and where the user is going to interact poking them.
• The projector. Just to project the image from the back to the screen (rear screen projetion).
• Microsoft Kinect. It is going to capture the deformations on the fabric and send them to the computer.
• Computer. Captures the deformations send by the Kinect device and translates them to touch events (similar to mouse clicks). These events are used in a game to mark on different parts of the face of people from baroque paintings. All the information is stored in a database and we are going to use it to refine a previously calculated set of faces obtained through face recognition algorithms.

3. The Technology

There were several important pieces of technology that were involved in this project.

Face Recognition

Recent technologies offers us the possibility of recognizing objects in digital images. In this case, we were interested in recognizing faces. To achieve that, we used the libraries OpenCV and SimpleCV. The second one just allowed us to use OpenCV with Python, the glue of our project. There are several posts in which we explain a bit more the details of this technology and how we used.

Multi Touch Screen

One of the biggest part of our work involved working with multi-touch screens. Probably because it is still a very new technology where things haven’t set down that much we have several problems but fortunately we managed to solved them all. The idea is to have a rear screen projection using the Microsoft Kinect. Initially though for video-game system Microsoft Xbox 360, there is a lot of people creating hacks (such as Simple Kinect Touch) to take advantage of the abilities of this artifact to capture deepness. Using two infrared lights and arithmetic, this device is able to capture the distance from the Kinect to the objects in front of it. It basically returns an image, in which each pixel is the deepness of the object to the Kinect. All sorts of magic tricks could be performed, from recognizing gestures of faces to deformations in a piece of sheet. This last idea is the hearth of our project. Again, some of the posts explaining how and how do not use this technology.

Calibrating the multi-touch screen

Games

Last but not least, Kivy. Kivy is an open source framework for the development of applications that make use of innovative user interfaces, such as multi-touch applications. So, it fits to our purposes. As programmers, we have developed interfaces in many different types of platforms, such as Java, Microsoft Visual, Python, C++ and HTML. We discovered Kivy being very different from anything we knew before. After struggling for two or three weeks we came with our interface. The real thing about Kivy is that they use a very different approach which, apart from having their own language, the developers claim to be very efficient. At the very end, we started to liked and to be fair it has just one year out there so it will probably improve a lot. Finally, it has the advantage that it is straightforward to have a version for Android and iOS devices.

4. Learning

There has been a lot of personal learning in this project. We never used before the three main technologies used for this project. Also we included a relatively new NoSQL database system called MongoDB. So that makes four different technologies. However, Javier and me agree that one of the most difficult part was building up the frame. We tried several approaches: from using my loft bed as a frame to a monster big frame (with massive pieces of wood carried from downtown to the university in my bike) that the psyco duck would bring down with the movement of the wings.

It is also interesting how ideas changes over the time, some of them we probably forgot. Others, we tried and didn’t work as expected. Most of them changed a little bit but the spirit of our initial concept is in our project. I guess creative process is a long way between a driven idea and the hacks to get to it.

5. The Exhibition

Technology fails on the big day and the day of the presentation we couldn’t get our video but there is the ThatCamp coming soon. A new opportunity to see users in action. So the video of the final result, although not puclib yet, is attached here. It will come more soon!

6. Future Work

This has been a long post but there is still a few more things to say. And probably much more in the future. We liked the idea so much that we are continuing working on this and we liked to mention some ideas that need to be polished and some pending work:

• Score of the game. We want to build a better system for scores. Our main problem is that the data that we have to score is incomplete and imperfect (who has always the right answers anyway). We want to give a fair solution to this. Our idea is to work with fuzzy logic to lessen the damage in case the computer is not right.
• Graphics. We need to improve our icons. We consider some of them very cheesy and needs to be refined. Also, we would like to adapt the size of the icon to the size of the face the computer already recognized, so the image would be adjusted almost perfectly.
• Sounds.  A nice improvement but also a lot of work to have a good collection of midi or MP3 files if we don’t find any publicly available.
• Mobile versions. Since Kivy offers this possibility, it would be silly not to take advantage of this. At the end, we know addictive games are the key to entertain people on buses. This will convert the application in a real crowd sourcing project. Even if this implies to build a better system for storing the information fllowing the REST principles with OAuth and API keys.
• Cleaning the collection. Finally, after having enough data it would be the right time to collect the faces and have the first repository of “The Baroque Face”. This will give us an spectrum of how does the people of the XVI to XVIII looked like. Exciting, ¿isn’t it?
• Visualizations. Also we will be able to do some interesting visualizations, like heat maps where the people did touch for being a mouth, or an ear, or a head.

6. Conclusions

In conclusion we can say that the experience has been awesome. Even better than that was to see the really high level of our classmates’ projects. In the honour of the truth, we must say that we have a background in Computer Science and we played somehow with a little bit more of adventage. Anyway, it was an amazing experience the presentation of all the projects. We really liked the course and we recommend to future students. Let’s see what future has prepared for Gamex!

Some of the projects

This post was written and edited together to my classmate Roberto. So you can also find the post on his blog.

Building the proper screen

The last step in the project, after we were able to overcome all the technical difficulties like Kivy language, was the building of the a suitable screen for our purposes, this is a poke-able rear screen. Doing this we avoid the problem of calibrating the Kinect device each time and for each user, and foremost we could do the setup just once.

The first attempt of a rear screen

Our first attempt was building a very big frame and using a table cover or bed sheet as the screen. But we found several serious problems:

1. The frame was too big for moving.
2. The frame wasn’t rigif enough and with the interaction of the users it got deformation.
3. The screen, after an user interaction, never got back to normal and keept deformed forever.

Among all the problems (and others just not commented here), the last one was totally frustrating, because all the platform depends on the stability of the screen. If the screen is not just plain, Kinect will detect that, and it will translate points to the Kivy application when there is no one actually. Everything was wrong.

Choosing the most beautiful tissue

The alternative was to use a stretch fabric with the capacity to recover the initial shape after any, virtually, number of interactions. Even with hard punchs. But we didn’t know where to buy that or if it was even cheap enough for our students’ pockets. Fortunatelly, Prof. William Turkel recommend us Fabric Land, with three locations in the city and a lot of options for fabrics. I must say that the place was a bit weird for me, with a bunch of middle age ladies looking for good materials. I felt like Mrs. Doubtfire there. Finally one girl, very gentile, young and nice, helped us to find what we wanted, and she sold us at the price of \$5 per meter!

Colours pins! That's the most decorative we can be

And with all the raw material ready, we got down to work and we built, after several tries, the proper rear and stretch screen. Just in the first test we discovered that accuracy was amazing. And the most interesting thing: somehow, the fact that the screen is elastic, demands interaction from the users and keeps the people playing. So, we can say mission accomplished!

A Tentative Architecture for Gamex

The other day, during the weekly class, I just realized there is no document explaining the architecture of our project. Roberto and I have been talking a lot about it, but never wrote anything detailed from a technical point of view. In past entries I talked about the idea behind the project, called Gamex, and how it is going to work. Because we are really excited with the project, sometimes we get lost our selves in the universe of coding (and fighting with Kivy) and we forget to document technical issues. So I hope to fix that with the current blog post.

General Architecture of Gamex

The architecture of the project, as depicted in the image above, has different parts that I’m going to describe illustrating a complete application cycle:

1. In a previous phase, and in order to save some time and increase performance, we pre-process a set of baroque paintings applying different techniques to recognize faces in pictures. All the data is stored using JSON files for medatadata and JPEG for images format.
2. Then, the user, I mean, the audience of the exhibition, walk in front of the screen.
3. The screen, thanks to a projector, shows a slideshow of baroque paintings and calls for the action, it demands interactivity. Maybe some fancy text with blinking effect or similar. Every phrase it correspond to a different game and different process to collect data. These ones are the current games we are developing (it is very easy to extend to collect data to, for example, just the virgins, or the saints, or the children, etc):
• “Hey! Punch these people in the face,” to collect information from the position of the heads in the painting.
• “Dude, better if you get this people’s eyes out,” to collect the pair of points related to the position of the eyes.
4. When the user punch or touch the screen, a change in the deepness of the screen is produced. This alteration is observed by a Microsoft Kinect device.
5. The signal is encoded as a point or set of points using STK and sent to a TUIO server.
6. Tha main application written in Kivy is listening to that server. When a point is received into the TUIO server, the Kivy application translate it into a mouse click in the application frontend.
7. In that moment, the information about the point is stored in a MongoDBNoSQL database, actually a key-value store. We create something similar to a table for every image with two different lists:
1. The first one store a list of points and a timestamp for the face punch game.
2. The second one is intended for the get the eyes out game and store a couple of points.
8. If the point provided by the user is inside an area previously calculated in our metadata, we assign high points to provide some feedback to the user. If the point is new, we mark that information and show different score.
9. When the number of points the user gives is similar to the number of faces we get in the preprocessing stage, the slideshow browse to another image and it starts again.

The second game is a way to tune the information we are collecting. For that game, Gamex is going to show paintings with information about where the faces are supposed to be. With the information about the position of the faces in the painting and the position of the eyes, we can calculate the size of the head and, even, create a probability heat map of where the faces are. Finally we will be able to enhance the algorithms used to detect faces.

Of course, this project is senseless if we are not able to get a lot of users and if we require only a minimum effort in the users side to interact. Let’s see what we are able to do.

Faces in Baroque Paintings

As a part of our project, we have to provide a way by which the user can point out the faces in a serie of Baroque paintings. Theses faces are previously calculated using face recognition algorithms and SimpleCV, a simplified API for OpenCV thar, of course, doesn’t work on my 64bits machine. In fact, there are differente methods, with advantages and drawbacks, used at the same time. But algorithms are not perfect, and in general they miss some “obvious” faces as long as they wrongly tag things likes dresses as faces.

Intro screen

Above these lines you can see the initial screen before starting to play. In the future we are thinking about including more than one game. The first one is focused on identifying faces through the input of the users. The second one could be, for example, a game in order to filter that previous selection, something like, once we hace faces selected and extracted, a Invaders-like game in which users has to shot just to faces of certain kind. Very useful to filter and provide even more information for our algortihms.

However, we need to ask for the user the minimal interaction ever, so we expect from the user just one single touch on the screen. And this raises a problem: how to, with a single point in the screen, get the face the user is trying to point. To resolve this we are exploring two approaches.

The first one is to wish the users are going to make a circle around the heads or faces of the characters on the painting. But this is against our main goal to request as minimal interactions as possible. Our idea is the users can punch the screen where they think faces are, not draw a circles around. We need to think in another approach.

How the algorithms work so far.

So the next step is to provide a way to make even more usefull the input from the users. And we decided the creation of a second game. In one hand, we previously did a detection of some faces, so we can calculate how near are the single touchs to the already recognized faces. But we don’t know this information for new faces that our algorithms missed out. If this happens, all we have is a point in the painting and no way to figure out if it is relative to a face. In this scenario, we mark these points for second stage in our process. Thus a second game in which we show to the users these images properly cropped and we pretend they just poke eyes out. Then, we have information about almost the center of the face (the previous single points), the current position of the eyes, and information about the medium size of the rest of the heads our algorithms did detect before: we can now triangulate the shape of the face/head. So in this way we filter a lot of information just making the users play a different game. The common feature in both games? The violence: Punch and poke eyes out!

First Proof of Concept: Multi-touchable Surface with Kinect, SKT and Kivy

Slowly but surely. That’s our mantra in this project since it implies a lot of coding and to handle very different techonolgies, some of them –I really mean all of them– in the very stages of development. Anyway, we finally decided to implement our multi-touchable surface using a projector, a Kinect device and the libraries SKT and Kivy. As the next video shows, we are getting some advances, but we develop slower than we expect.

And finally, our first proof of concept of everything working together. It’s about a very basic project that is able to handle several interactions at the same time. At the moment, it only allows to draw lines and identify how many different contact points are there using one color for each one.

Behind scenes, what it happens is described below:

1. The Simple Kinect Touching allow us to define the boundaries of the projected screen.
2. We adjust a bunch of parameters related with deepness in order to focus only in the coordinates that mean something is touching the screen.
3. Then, that information is transformed and sent according the TUIO protocol to a local server. Now we have service streaming of data relative to the touchs and movements on the screen.
4. In this point, we run our Kivy client application, that we call gamex, by setting the input interface as a TUIO server instead of the mouse.
5. Black magic.
6. Profits!

And that’s it. We really hope to have time to focus in the development of the machine learning application once the most difficult technical issues seem almost solved. However, to do the projection on a bed sheet or table cover is going to be kind of traumatic for setting the paramereters in the Kinect recognition layer. We will need a very rigid wood frame or something for making the surface as smooth as we can.

1 Comment