Evaluation, Ahead of Print.
Evaluators worldwide are dealing with a growing amount of unstructured electronic data, predominantly in textual format. Currently, evaluators analyze textual Big Data primarily using traditional content analysis methods based on keyword search, a practice that is limited to iterating over predefined concepts. But what if evaluators cannot define the necessary keywords for their analysis? Often we should examine trends in the way certain organizations have been operating, while our raw data are gigabytes of documents generated by that organization over decades. The problem is that in many cases we do not know what exactly we need to look for. In such cases, traditional analytical machinery would not provide an adequate solution within reasonable time—instead, heavy-lifting Big Data Science should be applied. We propose an automated, quantitative, user-friendly methodology based on text mining, machine learning, and data visualization, which assists researchers and evaluation practitioners to reveal trends, trajectories, and interrelations between bits and pieces of textual information in order to support evaluation. Our system automatically extracts a large amount of descriptive terminology for a particular domain in a given language, finds semantic connections between documents based on the extracted terminology, visualizes the entire document repository as a graph of semantic connections, and leads the user to the areas on that graph where most interesting trends can be observed. This article exemplifies the new method on 1700 performance reports, showing that the method can be used successfully, supplying evaluators with highly important information which cannot be revealed using other methods. Such exploratory exercise is vital as a preliminary exploratory phase for evaluations involving unstructured Big Data, after which a range of evaluation methods can be applied. We argue that our system can be successfully implemented on any domain evaluated.