Mark-up your text automatically
The Data Enrichment Service (DES) is infrastructure for information extraction. Many documents contain information in an unstructured or semi-structured form that could provide additional value if it can be extracted. The DES helps you to do exactly that.
The DES can operate on text, HTML or XML and PDF documents and, in conjunction with our OpenUp platform can be used as part of a process to aggregate, extract and publish information from a variety of sources. As an example, you could harvest an RSS feed, extract the entities from the content as RDF, store that RDF and then create an output to display all of the entities found in that feed.
The DES is built on top of the GATE system as created by the University of Sheffield and has been built by TSO's GATE-certified developers. GATE is a framework for developing text processing applications and is widely used in both academic and commercial organisations.
The DES provides two main functions - DES Starter and the ability to host vertical GATE applications.
As an example of a potential use of the DES, TSO have created the OpenUp Client demonstration. Feel free to give it a try!
DES Starter is an in-built part of the DES service. It allows you to extract things such as names, places and organisations. It is focussed on UK data, with much of the work on data.gov.uk being utilised.
The Starter service can be used on its own or as part of a custom vertical GATE application running on the DES infrastructure. It is able to return outputs in various formats including XML and JSON.
We realise that many uses of information extraction are for linking data. To that end, wherever possible, the Starter service provides URI for the entities that it has extracted. Many of these links use DBpedia or data.gov.uk URIs and thus provide easy linking with other datasets.
If you want to run your own application on the DES infrastructure then please get in touch. We can provide various options in terms of scale and security.
If your application needs to make use of RDF hosting then we can also provide that as part of our OpenUp platform. As an example, the DES Starter entity resolution makes use of DBpedia hosted in our RDF store and resolves entities on-the-fly.
The DES is available to use as a web-based API. But if you would like to try it and see what it can do you can use the DES demonstration.
The Starter service is free to use within certain limits. Currently these are set at 10K uploads and 10000 documents per day.
For data publishers needing to enrich more than 10,000 documents per day with an SLA, we can offer the Professional version of the Data Enrichment Service for a monthly fee.
The DES Starter is currently a beta release. It is perfectly usable but, as with any beta product, you may find bugs or mistakes. The DES is under constant development and undergoes frequent releases. When the DES Starter goes to final release API versioning will be turned on but for now please be aware that things may change.