April 20, 2008

A Course on Recommender Systems

Last week I gave an internal course on Recommender Systems in Telefonica. Although I only had 12 hours I ended up preparing a course that could well expand over at least 24 hours of class. Except from some details I am pretty happy with the syllabus I came up with. Just in case it might be of any help to anyone, this is the index of the course:

PART I. Introduction to Recommender Systems
  1. The paradox of choice
  2. What is a Recommender System?
    1. The recommender problem
    2. General scheme of a RS
    3. Tools of the trade
  3. Approaches to Recommendation
    1. Collaborative Filtering
      1. User-based
      2. Item-based
    2. Memory-based
    3. Content-based
    4. Other approaches
      1. Demographic Methods
      2. Utility Methods
      3. Knowledge-based
    5. Hybrid approaches
      1. Weighted
      2. Switching
      3. Mixed
      4. Feature Combination
      5. Cascade
      6. Feature Augmentation
  4. Evaluating RS
  5. Personalized Search
Part II. Data mining for RS
  1. Introduction
    1. Why mine data?
    2. Data mining tasks
  2. Data Preprocessing
    1. Types of data
    2. Problems with data
    3. Aggregation
    4. Sampling
    5. Reducing dimensionality (SVD)
    6. Feature Selection
    7. Discretization and Binarization
    8. Variable Transformation
    9. Feature Selection
  3. Distance Measures
  4. Classification
    1. General Approach
    2. Decision Trees
    3. Rule-based
    4. Nearest-Neighbor
    5. Bayesian Classifiers
    6. Artificial Neural Networks
    7. Support Vector Machines
    8. Ensambles of classifyiers
    9. Issues in classifyiers
      1. Model Overfitting
    10. Evaluation of Classifiers
    11. Comparing Classifiers
    12. Metrics for classifyiers
  5. Cluster Analysis
    1. Introduction
    2. K-means
    3. DBSCAN
    4. Cluster Validation
  6. Association Analysis
    1. Frequent Itemset Generation and the Apriori Principle
    2. Rule Generation
PART III. Designing a RS
  1. Defining the problem
  2. Working with the data
  3. Taking context into account
  4. The decision process
  5. Presenting results
  6. Some notes on domain-specific adaptation

April 01, 2008

The Future of Web Search Workshop

Yahoo! Researh Barcelona and the UPF are organizing a very interesting two-day workshop in Andorra starting on Thursday. The workshop is the third of a series that, under the same name of "Future of Web Search", started in 2006. I will be giving a talk entitled "Search and Recommendation: two sides of the same coin?". Below is the abstract that can give you an idea of what I will be talking about:

Recently the field of Recommender Systems has gained growing popularity among the research community with new conferences such as the ACM Recsys going into its 2ond edition and established conferences such as SIGKDD or SIGCHI focusing a great deal of attention on this topic.

The Recommendation field started from a different background than web search, namely Data Mining and HCI versus Information Retrieval. While the goal of Recommendation Systems is to optimize a fitness function between content and users by "discovering" hidden relations in the data, Search Engines focus on "retrieving" pre-existing data.

However there are clear trends that point to both fields coming closer together. On the one hand, web search is becoming more and more personalized, highlighting the need for user profiling and collaborative filtering. On the other hand, it is becoming clear that in many cases search strategies are essential for the performance of Recommender Systems.

As a result, some claim that search is just a "simpler form of recommendation", where the fitness function to be optimized is that of a generic average user (e.g. using algorithms such as Page Rank) Obviously statements in the opposite direction can also be made. In this talk we will assume that the audience is familiar with Web Search systems and therefore we will focus on describing the basic techniques and current research trends in Recommender Systems, highlighting where and how they are similar or different. At the end of the talk, We hope to convey the message that the "Future of Web Search is in Recommendation", hoping that such a claim will spark an interesting discussion and debate throughout the workshop.


Here you can see the detailed workshop program with very interesting speakers including Yahoo's own CDO,
Usama Fayyad.

March 24, 2008

Call for Students: CLAM in Google Summer of Code 08

(Please help distribute)

We are glad to announce that 2008 summer is also going to be a Summer of Code for CLAM . In other words, CLAM has been accepted as a mentoring organization for the Google Summer of Code, a program that offers student developers stipends of 4500 USD to write code for open source projects.

CLAM (C++ Library for Audio and Music) is a project that aims at developing a full-featured application framework for Audio and Music Applications. It offers a conceptual metamodel as well as many different tools for that particular domain. One of its most relevant features is the availability of a visual building dataflow application that allows to develop rapid prototypes without writing code. The project started 7 years ago and, among other highlights, it won the ACM award to the Best Open Source Multimedia Software in 2006.

Now we are looking for smart students who enjoy coding free software so that they can earn some bucks for the summer. Last year, GSoC 2007 was a very fun and productive experience and we are looking forward to repeat it. Take a look at the CLAM GSoC 2008 wiki page for more information on how to apply and some sample ideas for projects.

We are waiting for you!

Application deadline: March 31

If you have any question about any of the information below please
contact clam-info@iua.upf.edu or join the #clam channel at FreeNode
IRC.

March 19, 2008

CLAM in GSoC 2008!


We are glad to announce that 2008 summer is also going to be a Summer of Code for CLAM. Google just announced the list of mentoring organizations for GSoC 2008 and CLAM is in it!

Now we seek smart students who enjoy coding free software so that they can earn some bucks for the summer. Last year, GSoC 2007 was a very fun and productive experience and we are willing to repeat it. Take a look at the CLAM GSoC 2008 wiki page for more information on how to apply and some sample ideas for projects.

We are waiting for you!

soc-clam-flyer_2008 deadline extended

March 18, 2008

Command of the day: hotspot2calltree

Today, the command of the day is hotspot2calltree.

I do like a lot using kcachegrind to tune my C++ code. You can use KCacheGrind to navigate through the actual function calls in a given execution of your program and seeing very graphically where the time is spent.

Today i needed to optimize some python code but that's not C++ code. No problem. Add in your code this:


import hotshot
prof = hotshot.Profile("profile.hotspot")
prof.runcall(myFunction)
prof.close()

And then, at the shell:

sudo apt-get install kcachegrind-converters
hotspot2calltree -o profile.callgrind profile.hotspot
kcachegrind profile.callgrind

And now you get a nice kcachegrind profile you can navigate on.

March 06, 2008

Preparing for GSoC 2008

GSoC 2008 is already here! We are preparing our submision for CLAM as organization and I hope we are as lucky as last year. For GSoC 2007 we got 6 fervent students who pushed CLAM a big step forward. We still don't know whether we will be selected as organization or not. We haven't even filled the submision data. But it is time to trigger some resorts. So, what to do now?

If you are an experienced CLAM developer, please consider becoming a mentor. The more mentors the more students we can cope with.

If you are a user, is the time to push your favourite feature into the GSoC project proposals.

If you are an student wanting to be part of the program, I advice you to get involved with the project from now as we will consider early involvement a big plus for eligibility.

If you are Xavi, Pau or myself, then you should fill CLAM submision instead of blogging ;-)

I love summer.

February 23, 2008

Linux audio tutorials released online


banner_portadas.png

A couple of years ago, Pau Guillamet and I wrote a series of tutorials on Linux audio tools for the mainstream Spanish magazine Personal Computer & Internet.
Well, after all this time the tutorials finally see the light!

Actually, I had the editors permission to release it online from some time ago; but this last week I had the perfect excuse to work on its formatting (using wiko, of course) since I gave a seminar on this topic at the esmuc. A seminar that, by the way, I enjoyed very much giving, and I wouldn’t mind repeating the experience!

Indeed, some applications –Ardour specially I’d dare say– have changed a lot during this lapse of time. And other apps have not change that much. Anyway I hope it can be of some use to people willing to introduce themselves to the power or the Linux audio tools.

Comments on the tutorials can go as comments of this blog.

Enjoy!

February 18, 2008

CLAM 1.2 released

Many things have happened in between our last two releases but we finally managed to pull the 1.2 release together codenamed "the gsocked plugged in release". This release includes all the cool stuff our students from the Google Summer of Code developed. For this reason CLAM was also featured in the GSoC blog.

Congratulations to everyone who worked on this release and special thanks to David who did a great job as release manager for this one.

More news and downloads in CLAM website.

February 12, 2008

Everything is a Graph (part 2)

In a previous post I talked about how graphical models of computation are being used beyond the "traditional" areas of networking and low-level system modeling. It may come as a surprise that being this such a rich an useful paradigm it has not become so widely spread as the object-oriented approach. So now I will discuss whether that assessment is true and what might be the reasons.

Why don't we have graph-oriented programming languages?

First of all it is interesting to note that object-oriented itself comes from graph-oriented (or process-oriented) approaches. It is widely accepted that Kristen Nygaard's Simula language was the first OO language to see the light. However, Simula was a (as its name might imply) a simulation language in which the most important concept were the processes. Simula did follow a graph-oriented approach and it was only in their later versions (Simula 67 and later) that the idea of "objects" was explicitly presented. So, in some sense, OO can be seen as a generalization of the graph-oriented approach. As a matter of fact you can indeed understand a graph as a set of interconnected objects called nodes. In the same sense you can read an OO design in a graphical way where classes or objects are nodes and relations become the graph edges. You can read more about this interpretation of OO in my thesis.

Furthermore, at some point in time a few people realized that it would make sense to define a graph-oriented paradigm and design languages to support it. This gave birth to the so called actor-oriented and process-oriented languages. I am unsure why these languages miserably failed to make it mainstream but I can safely assume that it was a mix of different factors such as bad PR (who would chose an approach called actor-oriented?), bad implementation, and even bad timing.

However, does that mean that graph-oriented languages have failed as a whole? And my answer to this would be a definite NO. What happens is that the graph-oriented paradigm lends itself much better to a graphical (for based on graphics) a representation. Therefore, graph-oriented languages skipped on or two steps in the logical evolution of a programming language: general purpose textual language -> general purpose graphical notation -> domain specific graphical models. The OO paradigm started producing a large collection of textual languages, then a general purpose graphical notation (UML), and is currently gearing toward domain-specific graphical modeling languages.

Curiously enough graph-based models jumped directly to the latter and you can find many examples of domain-specific graphical languages that go from some with a broader scope such as Simulink or Ptolemy to some that target a more specific domain such as CLAM, Pd, or GStreamer.

And now with OO tending to the same place and tools like Metaedit offering ways to quickly develop your graphical DSM we are seeing how the OO and graph-oriented paradigms are finally coming together again.

So yes, everything is still an object... and almost everything is becoming a graph!

February 11, 2008

Comand of the day: kig

Each task has a tool. With graphics and figures too. If i want to edit a photograph or to perform some nice effects on an existing image, that's a task for Gimp. If I need some cute artwork for icons, banners, web... I prefer to vector with my so loved Inkscape. When such drawings are not so artistical and need some related and formal figures and diagrams, i normally consider dia, limited but correct. If i want to plot a complex graph i let the task to graphviz's dot and i just declare the vertexs. If I want to plot some program data output, python plus matplotlib is the faster way to process and render it. Automating execution to visualization process is very convenient. Some time ago I used gnuplot for that but python is more flexible towards data formats. Often what you need is to explore such data, interactive visualization can be revealing with tools such as qtiplot.

But my problem today wasn't none of the above but drawing some geometric diagrams (angles, vectors, tangents...) to illustrate some trigonometric equations. I was thinking on QCad but Kig did perfectly. Kig is about doing geometrical manipulation: intersections, angle transportation, angle mesurements, solidarized elements, python scripting...



I was to integrate kig png exporting into WiKo figure generation but i found two show-stoper problems: Command line options for batch exporting seems not to work and there is no way to control the viewport but by resizing the windows and controlling the zoom. While the concept of limited size canvas exists, I didn't find a way to resize it :-( Anyway is worth to dig in such a tool even doing patches to have batch exporting working.

February 07, 2008

CLAM 1.2, the GSoCket plugged-in release


clam12-releasecomposite.png
We are jubilous to announce CLAM 1.2 “GSoCket plugged-in release”. We had to wait for some months to make this release as we had to redeploy the multiplatform release infrastructure. Thus, the feature buffer of this release is pretty full. It incorporates both, the results of the Summer of Code students work and the involvement of David and Pau with the crew at Barcelona Media Foundation Audio Research Line.

We want to thank the involvement of GSoC students Hernan Hordiales, Bennet Kolasinsky, Greg Kellum, Andreas Calvo, Roman Goj and Abe Kazemzadeh, Google Inc, and Barcelona Media audio lab members for their precious involvement in CLAM.

A summarized list of changes follows. See also the CHANGES files for details, or the development screenshots for a visual guided tour. As usual binary packages for Windows, MacOSX and several flavors of Linux are available to download.

Summary of changes:

The most exciting feature is the new plugin system (acalvo) which enables third party algorithms to be distributed separately
from the core binaries. LADSPA plugins support has been enhanced and a first iteration on FAUST integration. The wiki contains how-to’s that cover most of that.

Most of the GSoC work come as plugins: a SMS Synthesizer (gkellum), a Voice synthesis/analysis (akazem) and some some cool guitar effects (hordia). Also not included as plugins but in the main repository several enhancements have been done on the SMS transformations (hordia) and the tonal analysis (rgoj).

Some interesting work has been done on the Barcelona Media Audio Lab on having a system to simulate 3D room acoustics which can be reproduced on several exhibition systems. Some precomputed room databases are available to try. Check the wiki NetworkEditor Tutorial for more information.

Regarding the applications, Network Editor incorporates new usability enhancements, a new on-line Tutorial and a new Spectrogram like view. The Annotator received Bennet Kolasinsky attention improving its the flexibility of its interface, the practical effects are multiple segmentation and low-level descriptors panes and that we are pretty close to visualization and auralization plugins.

Enjoy.

January 31, 2008

Impressions on Django (II): Development environment

On my previous post, I explained some basic ideas on how Django works. The post explained, mostly the programming model which is a clever implementation of the classical Model-View-Controller pattern. Not that new. But compared to other web development environments I worked with (PHP, Zope/Plone or plain mod_python), Django is a clear improvement because is both simple and pragmatical.

The programming model is something that has a direct impact on how you design the product, but there are other factors that determine how easy a developer can develop within a platform. In this post I arge about such factors, how Django afects the development workflow.

Editing and controlling source code

Compared to Zope, another python web application environment I've been fighting against, Django is a clear step forward to control your work.

First of all, the developer has the control of the source. Files are stored on the file system! It may seem an obvious assertion, but, if you have been using Zope you will know why I am so happy with that: Zope forces you to store all the code into a large binary file which acts as virtual filesystem. You cannot use a regular version control system such subversion to control your changes. ...and worst, it forces you to input code in web forms!! Argh!! Django ends with such a nightmare. Files are back to the file system and to your preferred editor.

Modularity

Still Django applications are modular in the same sense that Zope: you can combine several applications (modules) on a single site. For example, the administration interface is an application itself that can be used to edit any model of any installed application. You can disable it or enable it as a whole or for specific models. There is also an authentication application that can be used to transversally control the access to your application features. Every other application can use the authentication application to obtain information about validated users and to limit the access to certain features depending on the user profile. Session management and per-session storage, RSS feeds, SiteMap files, data mining interface... they all come for free, provided as applications. Besides the included applications, a nice application repository is available.

Debug cycle and exploration

Debugging your code is also very straight forward. While developing with Django, you are deploying a development Web Server not related to Apache, that you start from the console, and it is not visible from the outside. This enables each developer having her own instance of the web site. No administration passwords are needed and reloading is easier. Every message you print you'll get it in the console so it is ideal for debugging by tracing compared to using plain mod_python which runs on the apache server and you are to monitor the apache error log. Also, when some code fails with an exception, Django constructs a very handy web page with a lot of information: the back-trace enriched with code context, parameters and local variables inspection, all the environment values, the request values... All that available in a collapsible error web page. Lately I've been tempted of importing a failing python code into Django just to get such a big amount of information.

Also by running './manage.py shell' you get a python shell executed in the same conditions your code will be. Indeed the 'manage.py' script has very useful subcommands to infer model definitions from existing databases, to see database definitions from models, to import and export test data fixtures...

The next post

So yes, Django is a mod_python with gears and a Zope without the fat. However, while I don't regret choosing Django, it is not that perfect. On my next post, the last one on those first impressions on Django series, I will explain the problems we found while using the framework. The limitations we found and how it could be (is being) improved.

Well, maybe not the next post. Some other topics have appeared while writing this series of posts and they are likely to require an entry.

January 26, 2008

Pay-what-you-listen

It occurred to me that with all these services (such as lastfm, mystrands...) tracking what you listen during the day we can finally implement a really fair business model for music: pay-what-you-listen. Imagine you paid a fixed subscription price and you were ensured that this money would go directly to the artist and divided according to how many tracks from that artist you played during that month. I'd sign in for this service today!!

January 25, 2008

3D audio made with Clam


While it is true that the clam-devel mailing-list an irc channel have been a little quiet recently –specially compared with the summer period (well it was called “summer of code” for a good reason!–, this doesn’t mean that we recently had a low development activity. (Being an open-source project the commits say it all)

The quietness is related to David and me being now involved with the acoustics group of the Fundació Barcelona Media, where we work in a more traditional –and so less distributed– fashion collaborating with people who actually sit together. Further, I enjoy very much working with such skilled and interdisciplinary team (half are physicists and half computer scientists), and also assessing that Clam is very useful in these 3D-audio projects. These latest developments on 3D audio rendering where mostly driven, by the IP-RACINE European project aiming to enhance the digital cinema.

The kind of development we do in Clam also changed since last summer. Instead of improving the general infrastructure (for example the multi-rate data-flow system or the NetworkEditor) or improving the existing signal processing algorithms, what we’ve done is… writing plugins. Among many other things the new plugins feature a new lightweight spectrum and fft, and efficient low-latency convolutions.

And this feels good. Not only because the code-compile cycle is sooo fast, but because it means that the framework infrastructure is sufficiently mature and its extension mechanisms are very useful in practice. Further, rewriting the core spectral processing classes allowed us to do a lot of simplifications in the new code and its dependencies. Therefore, the new plugins only depends on the infrastructure, which I’d dare to say is the more polished part of Clam.

And now that IP-RACINE final project demos have been successfully passed, it is a great time to show some results here.

Flamencos in a virtual loft

Download and watch the video in the preferred format:

demo_ipracine_flamencos-small.jpg

Listen to it carefully through the headphones (yes, it will only work with headphones!) You should be able to hear as if you were actually moving in the scene, identifying the direction and distance of each source. It is not made by just automating panning and volumes: but modeling the room so it takes into account how the sound rebounds into all the surfaces of the room. This is done with ray-tracing and impulse-responses techniques.

This stereo version has been made using 10 HRTF filters. However, our main target exhibition set up was 5.0 surround, which gives a better immersive sensation than the stereo version. So, try it if you have a surround equipment around:

Credits: Images rendered by Brainstorm Multimedia and audio rendered by Barcelona Media. An music performed by “Artelotú”

Well, the flamenco musicians in the video should be real actors. Ah! Wouldn’t have been nice?

What was planned

The IP-Racine final testbed was all about integration work-flows among different technological partners. All the audio work-flow is very well explained in this video (Toni Mateos speaking, and briefly featuring me playing with NetworkEditor.)

So, one of the project outcomes was this augmented reality flamencos video in a high-definition digital cinema format. To that end a chroma set was set up (as shows the picture below), and it was to be shoot with a hi-end prototype video camera with position and zoom tracking. The tracking meta-data stream fed both the video and audio rendering, which took place in real-time — all quite impressive!

flamencos_croma-small.jpg
The shouting of the flameco group “Artelotú” in a chroma set

Unfortunately, at the very last moment a little demon jumped in: the electric power got unstable for moment and some integrated circuits of the hi-end camera literally burned.

That’s why the flamencos are motionless pictures. Also, in absence of a camera with position tracking mechanism we choose to freely define the listener path with a 3D modelling tool.

How we did it

In our approach, a database of pressure and velocities impulse-responses (IRs) is computed offline for each (architectural) environment using physically based ray-tracing techniques. During playback, the real-time system retrieves IRs corresponding to the sources and target positions, performs a low-latency partitioned convolution and smoothes IR transitions with cross-fades. Finally, the system is flexible enough to decode to any surround exhibition setup.

complete-surround-with-crossfade.pngbig_surround_network.png

The audio rendering (both real-time and offline) is done with Clam, while the offline IR calculation and 3D navigation are done with other tools.

The big thanks

This work is a collaborative effort, so I’d like to mention all the FBM acoustics/audio group: Toni Mateos, Adan Garriga, Jaume Durany, Jordi Arques, Carles Spa, David García and Pau Arumí. And of course we are thankful to whoever has contributed to Clam.

And last but not least, we’d like to thank “Artelotú” to the flamenco group that put the duende in such a technical demo.

Lessons for Clam

To conclude, this is my quick list of lessons learnt during the realization of this project using Clam.

  • The highly modular and flexible approach of Clam was very suited for this kind of research-while-developing. The multi-rate capability and data type plugins, where specially relevant.
  • The data-flow and visual infrastructure is sufficiently mature.
  • Prototyping and visual feedback is very important while developing new components. The NetworkEditor data monitors and controls were the most valuable debugging aids.
  • Everybody seems to like plugins!

January 24, 2008

Why opening APIs is not enough: the Facebook vs. Lastfm case

Every day I read of more people that are becoming deceived with Facebook and its usuability... I am one of them. The other day I had a depressive experience when I tried to find an interesting message someone had sent over the past few weeks. I could not find the message. Was it a wall posting? or something SuperWall or FunWall? Or maybe simply a note... or a link. Or a message or anything sent with any of the dozens of applications that I have to have.

Facebook is ok to stay tuned to what your friends are up to. The problem is that in order to do that you are forced to accept the many applications that people end up aggregating. And then it is a complete mess. You have many friends with many different apps and organizing information in a sensible way is impossible so you end up reading things you don't care about and possibly missing interesting information. Opening the API made Facebook big but, will it die of success? Opening things up and hoping that they will organize themselves works in some cases (e.g. the web) but in some others is a recipe for disaster.

Plaxo does a much better job of organizing things and keeping them manageable. However plaso is still too new to guess in which direction it will grow.

Lately I have been becoming a more intensive lastfm user. And I can say that it is becoming my favorite social network. Of course I am an absolute music lover and that makes a difference in this case. But that is not the point. The point is that lastfm is a "focused" and manageable app. You sign in because of music and find friends because of music but you can also add your pre-existing friends and laugh at their bad taste :-) Also lastfm has been constantly adding new features like the recently announced availability of full tracks for free.

You can argue that lastfm is simply a tiny part of what facebook is. So what? It is useful, enjoyable and fun, what else can you ask for? Completeness? If you think so I'd recommend you read Barry Schwarzt's The Paradox of Choice: Why More is Less, or watch his great talk at Google.

So instead of a dominating social network like FB aggregating everything I envision dedicated ones (such as lastfm, flixter, mystrands....) becoming more and more popular and aggregating services like Plaxo being used as a common entry into all these different worlds.

In a sense is a bit like the evolution that took place in Software Framework design. At one point people were trying to build the "one framework for all". Even if that was ever possible, users of such a monster would be unable to understand and use it. The same has happened to all-encompassing metadata standards such as MPEG7, ontologies...

Bottom-up design has always been better than top down approaches and I believe Social Networks will prove no different in that sense.

January 22, 2008

Telefonica Research: Doctoral Researcher position in Recommender Systems

(Thought I'd pass this on as it clearly involves me :-)


The research group on Internet in Telefónica R&D Barcelona invites for applications for a Junior Research position in the area of Recommendation Systems.

We are looking for dynamic, creative, and resourceful individuals to join our research efforts in modeling of complex systems and networks related to recommending engines. Our research impacts all areas of the company, including projects related to IPTV or internet content distribution. The successful candidate will join a multi-disciplinary team of scientists dedicated to advance and use computational methods to solve challenging user-oriented problems.

The applicant should have a Master degree in Computer Science, Electrical Engineering, Applied Mathematics, Statistics, or other related scientific disciplines, combined with strong computational modeling and/or algorithmic skills. Knowledge and experience in additional areas such as statistical data analysis, data mining, signal processing, machine learning and pattern recognition, and other topics in artificial intelligence are desirable.

The candidate will carry out both theoretical and applied research leading to a PhD degree in Computer Science and will actively participate in innovative projects related to this area of research in the company. Our research group follows an open research model in collaboration with universities and other research institutions and favor the dissemination of our work both through publications and technology transfer. The successful candidate will also be enrolled in a local university and will be tutored by a Professor in order to obtain the PhD degree. Otherwise particular agreements with the candidate's
original institution are also feasible.

Although this particular position is designed for a doctoral student candidate the group is also actively seeking for postdoctoral candidates in this area. If you are in this situation please do not hesitate to apply.

We offer competitive salary and benefits and a great working atmosphere
in beautiful Barcelona (Spain).

Screening of applications will begin immediately and continue until the position is filled. An initial appointment for a two years term is anticipated with the possibility of reappointment.

Inquiries and applications should be sent to

Xavier Amatriain xar@tid.es>

with the subject line "RSDoc Application"


* Telefónica is a world leader in the telecommunication sector, with presence in Europe, Africa and Latin America. As of March 2007, Telefónica had 206.6 million customers.

Telefónica Research and Development is the innovation company of the Telefónica Group. Owned 100% by Telefónica, this subsidiary was formed it 1988, with the aim of strengthening the Group's competitiveness through technological innovation.

It is the most important private R&D company in Spain, in terms of both activities and resources, and in terms of number of staff, and it is one of the most important companies on the continent as regards participation in European Research projects.

January 20, 2008

Impressions on Django (I)

During the last months, I've been involved in a project on (text) information retrieval. The project required to develop a web interface over a python core and we chose Django as framework. Other option was using plain mod_python as we did for EfficiencyGuardian. But some time ago, in the context of the SIMAC project, Xavier Oliver, who was the responsible of implementing BOCA, CLAM Annotator's collaborative back-end, used Django and he was very enthusiastic on how rapid he had all the web working. So I wanted to give it a try for our current work. Here I am posting my first impressions as newbie django user. What Django has to offer?

Database abstraction


The most important part of Django is a persistent object model which maps python objects into database entities giving you a nice abstraction on the database layer. By defining such model classes, you get an object oriented programming interface to query and change related database tables, and even navigating and joining through relations as they were regular connected objects.

I normally dislike too transparent interfaces when they deal with efficiency sensible things such as database access. But Django states very clearly when and how such access is done allowing you to control it but using a high level object oriented idiom.

Administration interface


You can enable an existing administration module for your site. That is a web interface to create, edit and remove your model objects. You can control the way they are edited by adding extra properties to the model classes attributes.

Attributes have more types than real SQL types. Fields can be, for example, URL's, emails, telephone numbers or zip codes, and Django gives you for free custom validation and custom administration interfaces for them.

The administration interface also considers table relations providing, for instance, web interfaces for choosing related objects by foreign key and buttons to add a new related objects.

Application logic


Often, as in our case, the administration interface is pretty close to what the final application will be. But direct manipulation of a model is not enough for most web applications. Normally you need some additional application logic: Which functionalities are presented to the user, and which is the user dialog with the system to perform such functionalities.

Three elements are combined to build up such application logic in Django: URL mappings, views and page templates. URL mappings map regular expressions of requesting URL into calls to python functions. Such p?thon functions are the views which perform the needed actions on the system and construct an output web page. Views usually inject python data into an HTML skeleton, the page template, to generate the response web page.

Django provides some convenience views to create, update, delete and listing model objects, also supporting common features such as pagination, date based browsing, validation and destructive action confirmation.

The next entry


Here i explained the basics of Django execution model. On the next entry i'll write some impressions on Django as development environment compared to other environments i used for web development such as Zope, mod_python and vanilla PHP.

Everything is a graph (part 1)

Alan Kay summarized object-orientation by stating that "everything is an object". Therefore any part of the world that needs to be modeled in a software system can be described in terms of objects and their relations.

Already in my thesis I worked on relating the object-oriented paradigm to graphical (or graph-based) models of computation in the context of signal processing systems. Actually there are many graph-based frameworks and applications in the context of signal processing, multimedia, and related fields. However, lately I have been working with graphical models in many different situations.

Graphical models are gaining more importance in data mining, for instance, through the use of bayesian belief networks and other graphical models. On the other hand the study of complex networks and systems has introduced yet other ways to look at graphs from a statistic perspective.

So in a sense, graphs can give an us an equivalent yet complementary view to the object-oriented paradigm. Where in OO we have objects in graphs we have nodes, and where in OO we have "relations between objects" in graphical models we have edges. So we can conclude that Everything is a Graph.

January 19, 2008

Qt 3 and Qt4 relicensed as GPL v3

I read in Thiago Maceira's blog that Trolls have relicensed Qt3 and Qt4 as GPL v3.

I still had no time to analyze how this will affect clam. Relicensing CLAM has proven to be a hard political and burocratic problem, but a transition from GPL v2 to GPL v3 is easier since we have the 'or later' notice, so we have the door open to that controversial upgrade. Definitely, trolls' move to v3 is something that will boost the adoption scene.

A nice lateral consequence of this affects directly to one CLAM application, SMSTools, which still uses Qt3, and not Qt4 (althought Zack Welch started a temptative Qt4 port). Until now, Qt3 didn't enjoyed the same nice dual licensing Qt4 has. Former Qt3 licensing was a modified GPL that had problems on using Qt3 for non-unix platforms such as windows. So with Qt3 relicensing we can now freely distribute SMSTools precompiled binaries for Windows.

Update: I read in the official announcement that they didn't droped GPLv2, they just added a third license to the existing dual licensing so you can use one of three licensing schemes: non-free, GPLv2 and GPLv3. Definitely they want to make developers live easier, not just by building excellent API's.

January 18, 2008

BiBTeX Tooltip in html WiKo output

I was busy yesterday on providing better BiBTeX support for the HTML output in WiKo. I was adding bibliography tooltips. They were an original suggestion from Pau Arumi, and i agree with him that they will be very handy for reading an article in electronic format. The former html bibliography page is still available: just click instead of hovering.

To have it working, you should download the latest wiko version, install 'python-bibtex' debian/ubuntu package and appending to your css style sheet the latest statements of this css concerning 'bibref' class.

Then, just place any bibtex files on the same folder the wiki files are and refer a bibliography entry in the wiki files as '@cite:SomeBibtexId'. By 'WiKonpiling' you'll get both normal LaTeX bibliography and this html equivalent.

Take a look on the final feel at this chapter of my master thesis.