JISC report examines economic and research benefits of text mining in UK

    Sir Mark Walport, director of the Wellcome Trust, said of the recommendations in the report: “This is a complete no-brainer. This is scholarly research funded from the public purse, largely from taxpayer and philanthropic organisations. The taxpayer has the right to have maximum benefit extracted and that will only happen if there is maximum access to it.”

    Text mining draws on data analysis techniques such as natural language processing and information extraction to find new knowledge and meaningful patterns within large collections.

    Torsten Reimer, JISC programme manager, explains, “Text mining is already producing efficiencies and new knowledge in areas as diverse as biological science, particle physics, media and communications. It has been used to hypothesise the causes of rare diseases and how pre-existing drugs could be used to target different diseases.

    “The technique was also used recently to analyse the vast amount of text produced on websites, blogs and social media such as Twitter – where copyright holders allowed – and showed that the messages exchanged on Twitter during the English riots of 2011 were not to blame for inciting riots,” added Torsten.

    The business benefit of text mining is in identifying emerging trends, and to explore consumer preferences and competitor developments. Text mining is particularly used in larger companies as part of their customer relationship management strategy and in the pharmaceutical industry as part of their research and development strategy.

    The report shows that such techniques could enable researchers in UK universities to gain new knowledge that would otherwise remain undiscovered because there is just too much relevant literature for any one person to read. Such discoveries could lead to benefits for society and the economy.

    The UK has a number of strengths that put it in a good position to be a key player in text mining development, such as the existence of good framework conditions for innovation and the natural advantage of its native language.

    Professor Douglas Kell, chief executive of the BBSRC, says, “This report shows the importance of implementing the recommendations of the Hargreaves Review  as current copyright law is also imposing restrictions, since text mining involves a range of computerised analytical processes which are not all readily permitted within UK intellectual property law. In order to be ‘mined’, text must be accessed, copied, analysed, annotated and related to existing information and understanding.  Even if the user has access rights to the material, making annotated copies can be illegal under current copyright law without the permission of the copyright holder.

    “The report also shows that text mining can add enormous value to the benefit of the UK economy, as long as the text is freely available and unencumbered. Otherwise there is a real risk that we will miss discoveries that could have significant social and economic impact.”

    Torsten added, “These laws are inhibiting text mining’s wider usage and making academic institutions nervous of taking it up. Without wider usage, the potential for text mining to generate gains for the economy and society cannot be exploited and the UK economy will be less able to take advantage of its strong public research base. There is a danger that the UK may be left behind as other countries such as Japan adopt a more liberal approach that encourages text mining usage.”

    The report identifies a number of barriers that we need to overcome to make best use of text mining tools in the future.  Firstly, text mining is a complex technical process that requires skilled staff; secondly it requires unrestricted access to information sources; thirdly copyright can be a barrier.

    The report authors conclude that more work needs to be undertaken to raise awareness of the potential benefits and value of text mining to UK further and higher education.

    An event at the Wellcome Trust last night started the process of looking at how publishers, researchers and policy makers can make this happen.

    Read the report