I’d like to propose a session on getting the most out of text-mining historical documents through visualizations. There has been a lot of attention recently lavished (rightfully, for the most part) on Google’s n-gram tool and the recent Science article. And text-mining has been gaining a lot of attention from humanists, particularly as easily adopted new tools and programs become available.
I’m working on two big projects that try to extract meaningful patterns from large collections (newspapers in one, transcribed manuscripts in another) and then make sense of those patterns through visualizations. Most of this happens in the form of mapping (geography and time being the two most common threads in these sources), but also in other forms of graphing and visualizations (word clouds, for instance).
A major challenge, it seems to me, is that there is not a widely understood common vocabulary for how to visualize large-scale language patterns. How, for example, do you visualize the most commonly used words in a particular historical newspaper as they spread out across both time and space simultaneously?
We’ve been experimenting with that in our projects, but I’d like to hash this issue out with folks working on similar (or not so similar!) problems.