There are two topics I’m interested in right now. Here’s my first session idea: I’m compiling a corpus of both old and new media vernacular texts as part of a semantic/anthropological examination of American beliefs about health. (It’s called CADOH—Corpus of American Discourses on Health). I’ve been using the pilot stages of it to look at the distribution of terms such as fat, stress, cold, and oil. I’m envisioning its final form as a mix of vernacular discussions. While good corpora exist already for prose from contemporary magazine, newspaper, and fiction (e.g. COCA), I’m aiming to include more transient conversations about health, including blog posts and their comments, listervs, online forums and wikis, letters to the editor, and radio transcripts. So I’m proposing a helpathon in order to hear from others who have dealt with compiling current materials. The bootcamp sessions on the text encoding initiative, managing digital projects, and using regular expressions should all be helpful. But I’d also like to compare information on ways to gather, annotate, and share text samples. In using xml to annotate the metadata, what have others have found most useful– hand coding? Oxygen? Other resources? To make it useful for others, I’ll need to get copyright access for sharing. What ways to request copyrighted info have been helpful? (besides a big pot of money.) And, once the copyright issues are dealt with, what’s the best way to make the corpus accessible? Would this be a good Omeka project?