creators_name: Baron, A. creators_name: Rayson, P. creators_id: a.baron@comp.lancs.ac.uk creators_id: type: conference_item datestamp: 2008-08-13 08:43:17 lastmod: 2008-08-13 08:43:17 metadata_visibility: show title: VARD2: A tool for dealing with spelling variation in historical corpora ispublished: pub subjects: QA75 full_text_status: public pres_type: paper abstract: When applying corpus linguistic techniques to historical corpora, the corpus researcher should be cautious about the results obtained. Corpus annotation techniques such as part of speech tagging, trained for modern languages, are particularly vulnerable to inaccuracy due to vocabulary and grammatical shifts in language over time. Basic corpus retrieval techniques such as frequency profiling and concordancing will also be affected, in addition to the more sophisticated techniques such as keywords, n-grams, clusters and lexical bundles which rely on word frequencies for their calculations. In this paper, we highlight these problems with particular focus on Early Modern English corpora. We also present an overview of the VARD tool, our proposed solution to this problem, which facilitates pre-processing of historical corpus data by inserting modern equivalents alongside historical spelling variants. Recent improvements to the VARD tool include the incorporation of techniques used in modern spell checking software. date: 2008-05 date_type: published event_title: Postgraduate Conference in Corpus Linguistics event_location: Aston University, Birmingham event_dates: 22nd May 2008 event_type: conference refereed: FALSE official_url: http://acorn.aston.ac.uk/conf_proceedings.html related_url_type: org citation: Baron, A. and Rayson, P. (2008) VARD2: A tool for dealing with spelling variation in historical corpora. In: Postgraduate Conference in Corpus Linguistics, 22nd May 2008, Aston University, Birmingham. document_url: http://eprints.comp.lancs.ac.uk/1817/1/BaronRaysonAston2008.pdf