I’d like to propose a session on getting the most out of text-mining historical documents through visualizations. There has been a lot of attention recently lavished (rightfully, for the most part) on Google’s n-gram tool and the recent Science article. And text-mining has been gaining a lot of attention from humanists, particularly as easily adopted new tools and programs become available.
I’m working on two big projects that try to extract meaningful patterns from large collections (newspapers in one, transcribed manuscripts in another) and then make sense of those patterns through visualizations. Most of this happens in the form of mapping (geography and time being the two most common threads in these sources), but also in other forms of graphing and visualizations (word clouds, for instance).
A major challenge, it seems to me, is that there is not a widely understood common vocabulary for how to visualize large-scale language patterns. How, for example, do you visualize the most commonly used words in a particular historical newspaper as they spread out across both time and space simultaneously?
We’ve been experimenting with that in our projects, but I’d like to hash this issue out with folks working on similar (or not so similar!) problems.
#1 by Tess on April 14, 2011 - 4:14 pm
I’m really interested in the variety of applications for this technique. Laura Mandell showed this type of visualization with tracking relationships through Southey’s correspondence at TILTS in February.
dhhub.org/demos/voyeur/ I’m interested in applying this to periodicals, as mentioned above.
#2 by Natalie Houston on April 12, 2011 - 8:12 pm
I’d be really interested in this session. One of the challenges I’ve found is that many people in text-focused disciplines feel put off by traditional graphs (i.e., “we’re in English, don’t show me charts”). As you suggest, it might be best thought of as a visual vocabulary problem.
#3 by sethgrimes on April 7, 2011 - 5:54 pm
Check out the slides from a class I presented last year on text mining & visualization: www.slideshare.net/SethGrimes/text-mining-and-visualization-4937750 .
Seth, twitter.com/sethgrimes
#4 by Laurel Stvan on March 30, 2011 - 3:40 am
I’m intrigued by this, too. With lots of data or just diachronic data, using only tables of word frequencies or changes in collocations may not be as effective as showing commonalities via different colors and weights in word clouds, for instance. What other visualization methods can show patterns across several axes?