<mets:mets OBJID="oai:generic.eprints.org:1817" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mods="http://www.loc.gov/mods/v3" LABEL="Eprints Item" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-0.xsd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mets="http://www.loc.gov/METS/"><mets:metsHdr CREATEDATA="2009-01-08T18:34:31Z"><mets:agent TYPE="ORGANIZATION" ROLE="CUSTODIAN"><mets:name>Lancaster University Computing Department</mets:name></mets:agent></mets:metsHdr><mets:dmdSec ID="DMD_oai:generic.eprints.org:1817_mods"><mets:mdWrap MDTYPE="mods"><mets:xmlData><mods:titleInfo><mods:title>VARD2: A tool for dealing with spelling variation in historical corpora</mods:title></mods:titleInfo><mods:name type="personal"><mods:namePart type="given">A.</mods:namePart><mods:namePart type="family">Baron</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">P.</mods:namePart><mods:namePart type="family">Rayson</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:abstract>When applying corpus linguistic techniques to historical corpora, the corpus
researcher should be cautious about the results obtained. Corpus annotation
techniques such as part of speech tagging, trained for modern languages, are
particularly vulnerable to inaccuracy due to vocabulary and grammatical shifts in language over time. Basic corpus retrieval techniques such as frequency profiling and concordancing will also be affected, in addition to the more sophisticated techniques such as keywords, n-grams, clusters and lexical bundles which rely on word frequencies for their calculations. In this paper, we highlight these problems with particular focus on Early Modern English corpora. We also present an overview of the VARD tool, our proposed solution to this problem, which facilitates pre-processing of historical corpus data by inserting modern equivalents alongside historical spelling variants. Recent improvements to the VARD tool include the incorporation of techniques used in modern spell checking software.</mods:abstract><mods:classification authority="lcc">QA75 Electronic computers. Computer science</mods:classification><mods:originInfo><mods:dateIssued encoding="iso8061">2008-05</mods:dateIssued></mods:originInfo><mods:genre>Conference or Workshop Item</mods:genre></mets:xmlData></mets:mdWrap></mets:dmdSec><mets:amdSec ID="TMD_oai:generic.eprints.org:1817"><mets:rightsMD ID="rights_oai:generic.eprints.org:1817_mods"><mets:mdWrap MDTYPE="mods"><mets:xmlData><mods:useAndReproduction>
<p><strong>For work being deposited by its own author:</strong> 
In self-archiving this collection of files and associated bibliographic 
metadata, I grant Lancaster University Computing Department the right to store 
them and to make them permanently available publicly for free on-line. 
I declare that this material is my own intellectual property and I 
understand that Lancaster University Computing Department does not assume any 
responsibility if there is any breach of copyright in distributing these 
files or metadata. (All authors are urged to prominently assert their 
copyright on the title page of their work.)</p>

<p><strong>For work being deposited by someone other than its 
author:</strong> I hereby declare that the collection of files and 
associated bibliographic metadata that I am archiving at 
Lancaster University Computing Department) is in the public domain. If this is 
not the case, I accept full responsibility for any breach of copyright 
that distributing these files or metadata may entail.</p>

<p>Clicking on the deposit button indicates your agreement to these 
terms.</p>
    </mods:useAndReproduction></mets:xmlData></mets:mdWrap></mets:rightsMD></mets:amdSec><mets:fileSec><mets:fileGrp USE="reference"><mets:file SIZE="807814" ID="oai:generic.eprints.org:1817_482_1" MIMETYPE="application/octet-stream" OWNERID="http://eprints.comp.lancs.ac.uk/1817/1/BaronRaysonAston2008.pdf"><mets:FLocat LOCTYPE="URL" xlink:href="http://eprints.comp.lancs.ac.uk/1817/1/BaronRaysonAston2008.pdf" xlink:type="simple"></mets:FLocat></mets:file></mets:fileGrp></mets:fileSec><mets:structMap><mets:div DMDID="DMD_oai:generic.eprints.org:1817_mods" AMDID="TMD_oai:generic.eprints.org:1817"><mets:fptr FILEID="oai:generic.eprints.org:1817_482_1"></mets:fptr></mets:div></mets:structMap></mets:mets>