Projects

Ambiverse LogoIn my research, I mainly worked on three projects, AIDA, YAGO, and STICS. All of them contribute to the EXIST Transfer of Research project turned start-up: Ambiverse. At Ambiverse, we are using the entity disambiguation system AIDA and the entity-based search engine STICS to provide deep text analytics. The goal of Ambiverse is to provide solutions for automatic text understanding and intelligent text production.

AIDA is a framework and online tool for entity detection and disambiguation. Given a natural-language text, for example news articles, it maps mentions of ambiguous names onto canonical entities (e.g., individual people or places) registered in the YAGO knowledge base. The source code, JSON Web Service API, and demo is available on the AIDA website.

YAGO is a huge semantic knowledge base, derived from WikipediaWordNet and GeoNames. Currently, YAGO has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities. All the data as well as several interfaces to browse and query the data are available on the YAGO website.

STICS is an entity-centric search engine that makes use of AIDA and YAGO. By extending the Google slogan of “things, not strings” to support also entity categories, STICS provides powerful functionality for querying and analyzing news and other text corpora in terms of entities, semantic classes, and text phrases. You can search, for example, for presidents of the United States and the JFK airport, and see how STICS distinguishes between JFK and JFK.

AIDA Example
Finding the meaning in a sentence with AIDA.

punakeaDuring the past few years, I also developed an application for the Mac platform, together with Daniel Bär: Punakea. A Mac App trying to help you cope with the day-to-day struggle of managing your files. Designed to complement Spotlight, it allows you to tag your files and bookmarks, freeing you of the strict hierarchy of the Finder’s folder structure. During the Punakea development, we also created and open sourced the NTagging.framework – it does the heavy lifting of Punakea in the background, and is compatible with OpenMeta.

A former project is Wikulu – adapting natural language processing techniques for Wiki engines, simplifying the organization and discovery of knowledge in Wikis.