<didl:DIDL xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:didl="urn:mpeg:mpeg21:2002:02-DIDL-NS" xsi:schemaLocation="urn:mpeg:mpeg21:2002:02-DIDL-NS 
			 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/did/didmodel.xsd"><didl:Item><didl:Descriptior><didl:Statement mimeType="application/xml; charset=utf-8"><dii:Identifier xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:mpeg21:2002:01-DII-NS
		 	http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/dii/dii.xsd" xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS">http://eprints.comp.lancs.ac.uk/1817/</dii:Identifier></didl:Statement></didl:Descriptior><didl:Descriptior><didl:Statement mimeType="application/xml; charset=utf-8"><oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/">
        <dc:title>VARD2: A tool for dealing with spelling variation in historical corpora</dc:title>
        <dc:creator>Baron, A.</dc:creator>
        <dc:creator>Rayson, P.</dc:creator>
        <dc:subject>QA75 Electronic computers. Computer science</dc:subject>
        <dc:description>When applying corpus linguistic techniques to historical corpora, the corpus
researcher should be cautious about the results obtained. Corpus annotation
techniques such as part of speech tagging, trained for modern languages, are
particularly vulnerable to inaccuracy due to vocabulary and grammatical shifts in language over time. Basic corpus retrieval techniques such as frequency profiling and concordancing will also be affected, in addition to the more sophisticated techniques such as keywords, n-grams, clusters and lexical bundles which rely on word frequencies for their calculations. In this paper, we highlight these problems with particular focus on Early Modern English corpora. We also present an overview of the VARD tool, our proposed solution to this problem, which facilitates pre-processing of historical corpus data by inserting modern equivalents alongside historical spelling variants. Recent improvements to the VARD tool include the incorporation of techniques used in modern spell checking software.</dc:description>
        <dc:date>2008-05</dc:date>
        <dc:type>Conference or Workshop Item</dc:type>
        <dc:type>NonPeerReviewed</dc:type>
        <dc:format>application/pdf</dc:format>
        <dc:identifier>http://eprints.comp.lancs.ac.uk/1817/1/BaronRaysonAston2008.pdf</dc:identifier>
        <dc:relation>http://acorn.aston.ac.uk/conf_proceedings.html</dc:relation>
        <dc:identifier>Baron, A. and Rayson, P. (2008) VARD2: A tool for dealing with spelling variation in historical corpora. In: Postgraduate Conference in Corpus Linguistics, 22nd May 2008, Aston University, Birmingham.</dc:identifier>
        <dc:relation>http://eprints.comp.lancs.ac.uk/1817/</dc:relation></oai_dc:dc></didl:Statement></didl:Descriptior><didl:Component><didl:Descriptior><didl:Statement mimeType="application/xml; charset=utf-8"><dii:Identifier xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:mpeg21:2002:01-DII-NS
		 	    http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/dii/dii.xsd" xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS">http://eprints.comp.lancs.ac.uk/1817/1/</dii:Identifier></didl:Statement></didl:Descriptior><didl:Resource ref="http://eprints.comp.lancs.ac.uk/1817/1/BaronRaysonAston2008.pdf"></didl:Resource></didl:Component></didl:Item></didl:DIDL>