<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="../../lib/p5_to_atmohtml.xsl"?>
<!--* <?xml-stylesheet type="text/xsl" href="../../lib/atmo-odd.xsl"?> *-->
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_odds.rng" 
  type="application/xml" 
  schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_odds.rng" 
  type="application/xml" 
  schematypens="http://purl.oclc.org/dsdl/schematron"?>
<!DOCTYPE TEI [
<!ENTITY acirc   "&#226;" ><!-- small a, circumflex accent -->
<!ENTITY aelig   "&#230;" ><!-- small ae diphthong (ligature) -->
<!ENTITY cap    "&#x2229;" ><!--/cap B: =intersection-->
<!ENTITY cup    "&#x222A;" ><!--/cup B: =union or logical sum-->
<!ENTITY eacute  "&#233;" ><!-- small e, acute accent -->
<!ENTITY ecirc   "&#234;" ><!-- small e, circumflex accent -->
<!ENTITY icirc   "&#238;" ><!-- small i, circumflex accent -->
<!ENTITY intersect "&cap;">
<!ENTITY ldquo  "&#x201C;" ><!--=double quotation mark, left-->
<!ENTITY mdash "&#x2014;">
<!ENTITY ocirc   "&#244;" ><!-- small o, circumflex accent -->
<!ENTITY omacr   "&#x014D;" ><!-- small o, macron -->
<!ENTITY prime  "&#x2032;" ><!--/prime =prime or minute-->
<!ENTITY Prime  "&#x2033;" ><!--=double prime or second-->
<!ENTITY ucirc   "&#251;" ><!-- small u, circumflex accent -->
<!ENTITY union "&cup;">
<!ENTITY uuml    "&#252;" ><!-- small u, dieresis or umlaut mark -->

<!ENTITY nsTEI    "http://www.tei-c.org/ns/1.0" ><!-- TEI namespace name -->
<!ENTITY nsATMO   "http://uyghur.linguistics.indiana.edu/2015/ns/0.1" ><!-- ATMO namespace name -->
<!ENTITY extreme "http://conferences.idealliance.org/extreme/html">
]>
<TEI xmlns="http://www.tei-c.org/ns/1.0"
  xmlns:atmo="http://uyghur.linguistics.indiana.edu/2015/ns/0.1" 
  xmlns:rng="http://relaxng.org/ns/structure/1.0"
  xmlns:sch="http://purl.oclc.org/dsdl/schematron"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
  xml:lang="en">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Basic text and sentence markup for ATMO v1</title>
        <author xml:id="CMSMcQ">C. M. Sperberg-McQueen</author>
        <sponsor>Annotated Turki Manuscripts from the Jarring Collection Online</sponsor>
        <principal>Arienne M. Dwyer</principal>
        <principal>C. M. Sperberg-McQueen</principal>
        <funder>Henry Luce Foundation</funder>
      </titleStmt>
      <publicationStmt>
        <authority>Annotated Turki Manuscripts from the Jarring Collection Online</authority>
        <pubPlace>Lawrence, Kansas</pubPlace>
        <date>2016</date>
        <availability>
          <licence>
 
            <p><ref type="license"
            target="http://creativecommons.org/licenses/by-sa/4.0/"><graphic
            rend="border-width:0"
            url="https://i.creativecommons.org/l/by-sa/4.0/88x31.png"
            /></ref></p>

            <p>This ODD specification of a TEI-based schema for
            transcriptions of manuscripts, by C. M.  Sperberg-McQueen
            and the project Annotated Turki Manuscripts from the
            Jarring Collection Online (ATMO), is licensed under a <ref
            type="license"
            target="http://creativecommons.org/licenses/by-sa/4.0/">Creative
            Commons Attribution-ShareAlike 4.0 International
            License</ref>. Permissions beyond the scope of this
            license may be available at <ref
            target="http://uyghur.linguistics.indiana.edu/2015/atmo-licensing.xhtml"
            >http://uyghur.linguistics.indiana.edu/2015/atmo-licensing.xhtml</ref>.</p>

          </licence>
        </availability>
      </publicationStmt>
      <notesStmt>
        <note type="ns">http://uyghur.linguistics.indiana.edu/ns/ATMOv1</note>
      </notesStmt>
      <sourceDesc>
        <p>Created in electronic form; no source to describe.</p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
      <change who="#CMSMcQ" when="2018-05-13">
	Mark as obsolete.
      </change>
      <change who="#CMSMcQ" when="2018-05-07">Got ODD processor
        working again. Added attributes, made content models
        for <gi>w</gi>, <gi>ow</gi>, and <gi>s</gi> work.
        The toy-att and toy-phr documents now parse.
      </change>
      <change who="#CMSMcQ" when="2018-05-06">Ran into ODD 
        processing challenges.</change>
      <change who="#CMSMcQ" when="2018-05-06">Complete the initial
        draft, now ready to begin testing the schema.</change>
      <change who="#CMSMcQ" when="2018-05-06">Copy editing, trying
        to get to the definitions of w and m.</change>
      <change who="#CMSMcQ" when="2018-05-05">Add description of
        tier-by-tier structure, add examples from 351, return to
        the long march through the schemaSpec elements.</change>
      <change who="#CMSMcQ" when="2018-05-02">Begin adjusting
      schemaSpec elements.</change>
      <change who="#CMSMcQ" when="2018-05-02">Merge old list of
      elements with new one.</change>
      <change who="#CMSMcQ" when="2018-05-02">Revamp overall
      structure.  (This is beginning to look like work avoidance.)</change>
      <change who="#CMSMcQ" when="2018-05-02">Revise discussion of
      levels (w, mw, ow, m, ...).</change>
      <change who="#CMSMcQ" when="2018-05-02">Begin by checking
      to-do list and revising (again).</change>
      <change who="#CMSMcQ" when="2018-05-01">Resume attempt
      to complete this schema.</change>
      <change who="#CMSMcQ" when="2017-10-13">Resume attempt
      to get some sort of quick and dirty version completed; this is
      even more fantastically long overdue today than it was a year
      ago.</change>
      <change who="#CMSMcQ" when="2016-11-04">Try to get some
      sort of quick and dirty version completed; this is fantastically
      long overdue.</change>
      <change who="#CMSMcQ" when="2016-06-06">Started initial version
      of this ODD document, adapted from
      http://uyghur.linguistics.indiana.edu/2015/12/transcr-v1.odd.xml.</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <front>
      <titlePage>
        <docTitle>
          <titlePart>ATMO Basic markup: TEI customizations for text-
          and sentence-oriented markup</titlePart>
        </docTitle>

        <docAuthor>C. M. Sperberg-McQueen, Black Mesa Technologies LLC</docAuthor>
        <docDate>6 June 2016, last rev. 6 May 2018</docDate>
        <!--* <docDate>6 June 2016, rev. 13 October 2017</docDate> *-->
        <!--* <docDate>6 June 2016, rev. 4 November 2016</docDate> *-->
        <titlePart><idno>ATMO Technical report 2016-02</idno></titlePart>

      </titlePage>

      <div xml:id="navbar" type="navbar">
        <head>Nearby documents</head>
        <list>
          <!--* <item><xref href="online.html">Online interface to Thutmose II</xref></item>
      <item><xref href="progdoc.xml">Programmers' documentation</xref></item>
    *-->
          <item xml:id="siteroot"><ref target="./../..">Home</ref></item>
        </list>
        <divGen type="toc"/>
      </div>

    </front>
    <body>
      <note place="block">
	<p>Note: as of 13 May 2018, this document is obsolete; it has
	been replaced by <ref target="../../2018/05/atmo-schemas-PTS.xml">ATMO
	Technical Document 2018-02</ref>, which defines the current
	version of all three ATMO base schemas.</p>
      </note>
      
      <p>This document presents schemas for basic text- and
      sentence-oriented markup for transcriptions of digital
      facsimiles with and without linguistic annotation, developed for
      the project "Annotated Turki Manuscripts from the Jarring
      Collection Online".  The reader is assumed to have either a
      working knowledge of TEI markup, a willingness to learn about
      TEI markup by consulting the TEI <title>Guidelines</title>, or
      an unusual tolerance for tedium.  For the use of the first two
      classes of reader, references to TEI elements are throughout
      this document hyperlinked to their reference documentation on
      the TEI's web site; the reference documentation in turn is
      linked to the prose description of the element in context.
      Readers who wish to see how the TEI defines any element are
      expected to follow the hyperlinks; no attempt will be made to
      document any TEI elements here.
      </p>

      <note place="block">
        <p>In its current state, this document is incomplete. All the
          components for the base grammars are believed to be present
          and defined, but this belief has not yet been put to the test.
          For a sketch of current plans for completion of the document,
          see <ref target="#workplan">below</ref>.</p>
      </note>
 
      <note place="block"> 
        <p>Here and there in the text non-TEI elements may be marked
        up in a way that does not distinguish them from TEI elements;
        they should not be hyperlinked to the TEI site but are.  Those
        hyperlinks will fail.  When time allows, we will correct the
        markup; when even more time allows, we will make them link to
        the ATMO reference documentation.)</p>
      </note>


      <div xml:id="design">
        <head>Design of this vocabulary</head>
        <p>This section describes [or: will when complete describe]
        the overall design of the ATMO text- and sentence-oriented
        markup whose schemas are defined in this document.  It also
        addresses a few specific technical questions that require
        special treatment.
        </p>
 
        <p><hi>This discussion needs to be expanded to be clearer to
        readers not already familiar with the material.  That is
        unfortunately not likely to happen soon.</hi></p>

        <div xml:id="design-context">
          <head>Project context</head>
          <p><hi>Discussion of project context here.</hi></p>
        </div>

        <div xml:id="text-and-sourceDoc">
          <head>Relation of ATMO transcription, text, and sentence vocabularies</head>
          <p>All ATMO documents begin life as transcriptions of
          manuscripts, using the ATMO TEI customization for
          transcriptions described in <ref
          target="../../2015/12/transcr-v1.odd.xml"
          >ATMO technical report 2015-03</ref>.  Such transcriptions use
          the TEI <gi>sourceDoc</gi> element and have a structure
          which mirrors the physical organization of a manuscript, with
          writing surfaces, zones of writing, and lines.
          </p>
          <p>Documents to be translated and annotated are then
          transduced into the TEI customizations described here, 
          which use the TEI <gi>text</gi> element and have
          structures based on the logical organization of the text,
          with sections, paragraphs, verse lines and line groups,
          and sentences.  There are two structures here, not one,
          because sentences generally nest within paragraphs,
          but do not necessarily nest neatly within (or neatly
          contain) verse structures.  So (oversimplifying slightly and
          leaving some details for later) we will define both a purely
          textual view, in which sentences sometimes cross structural
          boundaries, and a sentence view, in which textual structures
          sometimes cross sentence boundaries. 
          </p>
          <p>
            In order to support translation back and forth among the
            three forms without loss of information, each form needs
            to include the same information, covering the physical,
            the textual, and the sentential structure of the source,
            each conceived of as a hierarchical structure of nesting
            constituents.  How we represent these multiple structures
            in XML is described below.
          </p>
        </div>

        <div xml:id="design-concur">
          <head>Concurrent hierarchies</head>
          <p>We analyse our documents as having three distinct
          structures, each representable as a tree of nesting
          elements.  One describes the <term>physical</term> structure
          of pages, regions on pages, lines in regions, and characters
          within lines.  One describes what is sometimes called the
          <term>logical</term> structure of texts or collections of
          texts, paragraphs, blocks of verse, and phrase-level
          elements occurring within paragraphs and verses.  A third
          divides the text into <term>sentences</term> or
          sentence-like units, each sentence consisting of words.</p>
          <p>For brevity we will refer to these as the P, L, and S
          structures.</p>
          <!--*
          <p> 
            At this point, we confront the widely discussed problem of
            how to represent overlapping structures in XML.  Both the
            <gi>sourceDoc</gi> encoding and the <gi>text</gi> encoding
            identify a hierarchical structure in the source, with each
            structural unit in the source represented by an XML
            element in the XML document, and with most XML elements
            corresponding to structural units in the source.  The
            simple identification of structural units with XML
            elements, however, cannot be used for all three structures
            at the same time when the structures conflict in some
            cases (e.g.  paragraphs or sentences cross page
            boundaries, sentences cross verse boundaries).  We use
            empty elements to represent the information from the
            <soCalled>secondary</soCalled> hierarchy of the document.
            These empty elements are often called
            <soCalled>milestones</soCalled> (since like mile markers
            on a highway they mark boundaries between regions), but
            they are not to be confused with the TEI
            <gi>milestone</gi> element, which is a specific form of
            milestone marker and is just one of many elements used as
            milestone elements in the schemas defined here.  In
            transcriptions, the <gi>sourceDoc</gi>, <gi>zone</gi>,
            <gi>line</gi> and related elements are the
            <soCalled>primary</soCalled> or <term>dominant</term>
            hierarchy and most of the elements described here form the
            <soCalled>secondary</soCalled> or <term>recessive</term>
            hierarchy.  In the <gi>text</gi> hierarchies defined by this
            document, the elements <gi>sourceDoc</gi>, <gi>zone</gi>,
            <gi>line</gi>, etc. constitute a recessive hierarchy.
            </p>
            *-->
          <p>Each structure forms a tree, so each is individually
          suitable for representation using XML.  But the nodes
          in the three trees don't nest within each other:  we have
          three trees over essentially the same frontier of leaves,
          not one tree with elements of three different kinds.</p>
          <p>The data structure is essentially that defined by the
          SGML specification with its optional feature CONCUR.
          (See <ptr target="#hsm1998"/> for further discussion
          of CONCUR.)</p>
        </div>

        <div xml:id="design-th">
          <head>Trojan-horse markup</head>
 
          <p>Since CONCUR has not been widely implemented for XML (see
          <ptr target="#schonefeld2006"/> and <ptr
          target="#schonefeld2007"/>), we use a different
          representation of our concurrent structures, adapting a
          markup idiom introduced in by Steven J. DeRose (<ptr
          target="#DeRose2004"/>) called <term>Trojan-Horse
          markup</term>.</p>
          <p>In our XML representations of documents, we choose one of
          the structures to represent in the conventional way using
          XML elements (one XML element for each node in the tree
          structure.  The structure so represented we call the
          <term>dominant</term> tree.  The other trees are also
          represented, by Trojan-Horse (TH) elements; we call them
          <term>recessive</term> trees.
          </p>
          <p>It will be seen that we have our choice of three
          different views of the document, with different dominant
          trees: P, L, and S.  The three views are equivalent in the
          information they record, and we generate the desired view on
          demand.  (To reduce confusion, we normally treat the S view
          as the <soCalled>master copy</soCalled> and generate the
          other views on demand when needed.  But some maintenance
          tasks are best performed in views P or L, and some
          presentations of documents on the Web are best based on
          views P and L.)</p>
 
          <p><hi>Examples needed.</hi></p>
 
        </div>

        <div xml:id="design-validation">
          <head>Validating multiple concurrent structures</head>
 
          <p>Each distinct tree structure in the document can readily
          be validated using any document-oriented schema language
          (DTDs, Relax NG, XSD, ...).  If there are constraints on the
          interrelation of the trees (if an element E in tree X cannot
          begin within element F of tree Y, for example), then
          validation of those constraints requires either something
          like rabbit/duck grammars (<ptr target="#rd"/>) or a
          predicate-based schema language (like assertions in XSD or
          like Schematron) and some indeterminate amount of ingenuity
          in formulating predicates to check the constraints.</p>
 
          <p>The ATMO project does not currently impose any
          inter-tree constraints; the trees are treated as completely
          independent of each other.</p>
 
          <p>It is simplest to define each tree structure
          independently, and validate it independently of the others:
          a pre-processor can strip all markup related to the
          recessive views before invoking a validator.</p>
 
          <p>But for editing documents in conventional XML editors, a
          schema that describes just the dominant tree in the current
          document view does not suffice: we need a schema which also
          describes and accepts the Trojan-Horse elements needed for
          the recessive views.  Because there are non inter-tree
          constraints, the TH-elements for recessive views can occur
          at any point in any element of the dominant view.</p>
        </div>
 
        <div xml:id="design-whatshere">
          <head>The schemas defined here</head>
          <p>In the light of the preceding discussion, it will be
          clear that we need several schemas, of different kinds:
          one <term>base</term> schema for each different
          view (P, L, S), and one <term>augmented</term>
          schema, which adds declarations for TH-elements
          for the recessive views and modifies the content models of
          the base schema to make them allow TH-elements
          at any location.</p>
          <p>The base schema for view P is defined in <ref
          target="../../2015/12/transcr-v1.odd.xml"
          >ATMO technical report 2015-03</ref>; the augmented schema
          for P will be added there soon.</p>
          <p>This document defines first the base schemas and
          then the augmented schemas for views L and S.</p>
          <p>These views share many elements, and in fact they have
          non-nesting elements only (as far as is now known) for verse
          texts, since sentences do not nest within the verse
          structures.</p>
          <p>Some complications also arise for the augmented
          schemas from the fact that all three views share the same
          header, and may share other elements.  Conceptually, the
          universe of all elements should be the same for all
          schemas; it is the union of four sets of elements:
            <list>   
              <item>
                <p><ident>H</ident>: <gi>teiHeader</gi> and all of its possible
                  descendants (this should be the same in 
                  all ATMO schemas), also <gi>TEI</gi>.
                </p>
              </item>
              <item>
                <p><ident>P</ident>: <gi>sourceDoc</gi> and all of its possible
                descendants in the base grammar for the page-oriented view</p>
              </item>
              <item>
                <p><ident>T</ident>: <gi>text</gi> and all of its possible
                  descendants in the text-oriented view</p>
              </item>
              <item>
                <p><ident>S</ident>: <gi>text</gi> and all of its possible
                  descendants in the sentence-oriented view</p>
              </item>
            </list>
          </p>
          <p>Some elements must be declared as conventional content
            elements.
            Some elements must be declared as Trojan-Horse elements.
            Some elements must be declared as both; within the header
            they will appear as conventional elements, and within the
            text or source-document they will appear as Trojan-Horse
            elements.  The rules are:
            <list>
              <item>
                <p>Any element type in <ident>H</ident> must be able to appear as a
                  conventional content element.</p>
              </item>
              <item>
                <p>Any element type in the set associated with the
                  dominant view must be able to appear as a conventional
                  content element, and will in fact never appear as a
                  Trojan-Horse element.</p>
              </item>
              <item>
                <p>Any element type not in the set for the dominant view
                  but in the set for a recessive view must be able to appear as a
                  Trojan-Horse element.</p>
              </item>
            </list>
            It follows that in the augmented grammars, elements must
            be declared as follows:
            <list>
              <item>
                <p>In grammar PTS (augmented grammar for P):
                <list>
                  <item>
                      <p>Elements in <ident>P</ident> are declared as
                        content elements, with the addition of Trojan
                        Horse attributes (e.g. <att>th:doc</att>, which
                        indicates the document type of the element;
                        strictly speaking this is redundant but it
                        simplifies processing). </p>
                      <p>For elements which are members of both
                          <ident>P</ident> &intersect; <ident>H</ident>
                        the Trojan-Horse attributes are optional.</p>
                      <p>For elements in <ident>P</ident> but not in
                          <ident>H</ident>, the Trojan-Horse attributes
                        are required.</p>                    
                  </item>
                  <item>
                    <p>Elements in either recessive set
                      (<ident>T</ident>,
                      <ident>S</ident>) which are neither in 
                      <ident>P</ident> nor in <ident>H</ident>
                      are declared as Trojan-Horse elements:
                      required to be empty, and required to
                      carry appropriate Trojan-Horse attributes.</p>
                    <p>Elements in either recessive set
                        (<ident>T</ident>,
                        <ident>S</ident>) which are in <ident>H</ident>
                        but not <ident>P</ident>
                        are declared as a choice between
                        content elements (with no Trojan Horse
                        attributes) or empty Trojan Horse elements
                        (with required Trojan-Horse attributes).</p>
                  </item>
                  <item>
                    <p>Elements in <ident>H</ident> not already
                      described are declared as content elements,
                      with no Trojan-Horse attributes.</p>
                  </item>
                </list>
                </p>
              </item>
            </list>
          </p>
          <p>Or in summary form:
            <list>
              <item>
                <p><ident>P</ident> &union; (<ident>H</ident> \
                  (<ident>T</ident> &union; <ident>S</ident>)): 
                  content-element declaration.</p>
              </item>
              <item>
                <p>(<ident>T</ident> &union; <ident>S</ident>) \
                  (<ident>P</ident> &union; <ident>H</ident>): 
                  Trojan-Horse declaration.</p>
              </item>
              <item>
                <p>(<ident>H</ident> &union; 
                  <ident>T</ident> &union; <ident>S</ident>)
                  \ <ident>P</ident>: 
                  both content-element and Trojan-Horse declarations.</p>
              </item>
            </list>
            And similarly (mutatis mutandis) for the other two augmented
            grammars.
          </p>
          <p>A Venn diagram may be helpful here. Here <ident>D</ident>
            is the set of elements available in the text area of the
            dominant view, <ident>R</ident> the union of the elements in
            the text areas in the two recessive views, and
              <ident>H</ident> the header, as above.  The
            areas within the Venn diagram are labeled C if those
            elements get conventional content-element declarations,
            TH if they get Trojan-Horse declarations, and C+TH
            if they should be able to take either form.
            <figure>
              <graphic url="images/DRH.png" rend="foo"/>
              <p>A Venn diagram showing the intersections of the
              various sets of elements and how they are to be declared.</p>
            </figure>
          </p>
        </div>
 
        <div xml:id="qs">
          <head>Open questions</head>
          <p>Some design decisions remain open.</p>
          <list>
            <item><p>Does Oxygen deal acceptably well with Perso-Arabic
            script in attribute values?  If so, we can have <gi>m</gi>, etc.,
            be empty elements carrying all information in attributes.
            If not, all or some of those attributes need to be brought out
            into sub-elements.</p></item>
            <item><p>How should the synchronization of base text and
            extensive commentary (as in Prov. 450) be managed?  (Do we
            have examples like that among our transcripts?)</p></item>
            <item><p>How should text movement between views be managed?
            Are there in fact any headings added in the margin?  If so, view
            P will want them in one place and views T and S in another.</p></item>
          </list>
        </div>
 
      </div>
 
      <div xml:id="docstruc">
        <head>Basic structure of documents</head> 
        <p>We follow the basic structure defined by TEI: texts have a
        <gi>body</gi> optionally preceded and followed by
        <gi>front</gi> and <gi>back</gi> matter.</p>
        <p>Special material in the front matter may include a title
        page (<gi>titlePage</gi>) or similar area for title, author, and similar
        information &mdash; it's not required that it be a separate
        physical page &mdash; with specialized markup for document
        title, author, and date.</p>
        <p>The body of the text is optionally divided into recursive
        sections (<gi>div</gi>), which may have optional headings
        (<gi>head</gi>) as well as other specialized material
        at the beginning or ending (<gi>opener</gi>, <gi>closer</gi>,
        <gi>epigraph</gi>, <gi>signed</gi>, etc.)</p>
        <p>The text itself is preceded in the TEI document by a
        header with metadata.</p>
        <p>The text may be an anthology, in which case its
        <gi>body</gi> element is replaced by a <gi>group</gi> element
        which in turn contains other texts (marked up as <gi>text</gi>
        elements).  (Embedded texts can also be marked with
        <gi>text</gi>, but most TEI users ignore the nesting rather
        than use nested <gi>text</gi> elements.)</p>
 
        <div xml:id="startingpoint">
          <head>The overall structure of the schema</head>
          <p>The schemas for view T and view S are both
          essentially a list of instructions to include particular
          TEI modules or declare new elements.  They share
          a large common core, but differ in some details.
          </p>
          <schemaSpec ident="T-base" docLang="en" prefix="tei_" xml:lang="en">
            <specGrpRef target="#Common-Core"/>
            <specGrpRef target="#Verse-dominant"/>
            <!--* analysis and annotation *-->
            <specGrpRef target="#Words-and-Morphs"/>
            <specGrpRef target="#Phrases"/>
            <specGrpRef target="#TEI-modifications-grammar-T"/>
          </schemaSpec>
 
          <schemaSpec ident="S-base" docLang="en" prefix="tei_" xml:lang="en">
            <specGrpRef target="#Common-Core"/>
            <!--* analysis and annotation *--> 
            <specGrpRef target="#Sentences"/>
            <specGrpRef target="#Words-and-Morphs"/>
            <specGrpRef target="#Phrases"/>
            <specGrpRef target="#TEI-modifications-grammar-S"/> 
          </schemaSpec>
 
          <specGrp xml:id="Common-Core">
            <moduleRef key="tei" except="" />
            <moduleRef key="msdescription" except=""/>
 
            <!--* coarse structure *-->
            <specGrpRef target="#header-inclusion"/>
            <specGrpRef target="#text-structure-inclusion"/>
 
            <!--* phrases *-->
            <specGrpRef target="#core-inclusion"/>
            <specGrpRef target="#gaiji-inclusion"/>
            <specGrpRef target="#figures-inclusion"/>
            <specGrpRef target="#linking-inclusion"/>
            <specGrpRef target="#names-dates-inclusion"/>
            <specGrpRef target="#primary-sources-inclusion"/>
            <specGrpRef target="#verse-common"/>
          </specGrp>
          <p>Excluded from this list are the modules certainty,
          corpus, dictionaries, drama, iso-fs, nets, spoken, tagdocs,
          and textcrit.
          </p>
          <p>For the moment, we include all the elements of the 
          manuscript description module, even though they are used
          only in the TEI header.  As a task for the future,
          we should remove any elements not needed in the records
          for our manuscripts, so that there is less clutter in menus
          of possible elements.
          </p>
        </div>
        <div xml:id="teiHeader">
          <head>The header module</head>
          <p>We include the header module: 
            <specGrp xml:id="header-inclusion">
              <moduleRef key="header"
                except="refsDecl cRefPattern refState
                correspAction correspContext correspDesc 
                listPrefixDef namespace prefixDef xenoData
                typeNote"
              />
            </specGrp>
          </p>
          <p>From the header module we exclude declarations for several elements we expect never to
            use: <list>
              <item>
                <p>The <gi>correspAction</gi>, <gi>correspContext</gi>, and <gi>correspDesc</gi>
                  elements all relate to collections of letters.</p>
              </item>
              <item>
                <p>The <gi>prefixDef</gi>, <gi>listPrefixDef</gi>, 
                  <gi>namespace</gi>, and <gi>xenoData</gi> elements all
                  relate to technical capabilities of TEI markup which
                  we are not using at the moment.
                </p>
              </item>
              <item>
                <p>
                  The
                  <gi>typeNote</gi> element describes information
                  relevant to the interpretation of the <att>rend</att>
                  attribute; it is irrelevant to manuscripts.
                </p>
              </item>
            </list>
          </p>
 
          <p>It may perhaps be more useful to identify the elements in the TEI header module which
            are included, rather than just those which are excluded. 
            These fall into several classes. Those we expect to need most frequently are
            the standard parts of a TEI header and elements we know we will need:
            <list>
              <item>The <gi>teiHeader</gi> encloses the header.</item>
              <item>The <gi>fileDesc</gi>,
                <gi>authority</gi>,
                <gi>availability</gi>,
                <gi>cRefPattern</gi>,
                <gi>distributor</gi>,
                <gi>edition</gi>,
                <gi>editionStmt</gi>,
                <gi>extent</gi>,
                <gi>funder</gi>,
                <gi>idno</gi>,
                <gi>licence</gi>,
                <gi>notesStmt</gi>,
                <gi>principal</gi>,
                <gi>publicationStmt</gi>,
                <gi>refsDecl</gi>,
                <gi>refState</gi>,
                <gi>seriesStmt</gi>,
                <gi>sponsor</gi>,
                and 
                <gi>titleStmt</gi>
                elements are all standard elements for the bibliographic
                description of the TEI document itself.</item>
 
              <item>The <gi>sourceDesc</gi> element will contain
                the description of the source manuscript in the
                form of an <gi>msDesc</gi> element.</item>
 
              <item><gi>handNote</gi> is used to describe the hands of
                a manuscript.</item>
 
              <item>The <gi>encodingDesc</gi>, <gi>projectDesc</gi>,
                and <gi>samplingDecl</gi> elements provide basic
                information about the preparation of the digital
                facsimile.  The <gi>rendition</gi> element can be used
                to document special values of the <att>rend</att>
                attribute; the <gi>styleDefDecl</gi> element can be
                used to identify a formal styling language (like CSS)
                used in the definition of rendering styles.
              </item> 
              <item> 
                <p>The <gi>refsDecl</gi>, <gi>cRefPattern</gi>, and
                <gi>refState</gi> elements relate to machinery for
                canonical references; we expect to provide canonical
                numberings for sentences in annotated texts and expect
                to use these to document them.</p>
              </item>
 
              <item>The <gi>revisionDesc</gi>,
                <gi>change</gi>, and
                <gi>listChange</gi> elements are used to record the history of the
                TEI document.
              </item>
 
              <item><gi>appInfo</gi> and <gi>application</gi> allow the encoding description
                to include explicit records of automatic processing of the document by software; we
                will use these elements to record the translation of the manuscript descriptions from
                the Master DTD to TEI P5 and the preparation of the scanned images in JPEG form
                from the TIFFs supplied by LUB.</item>
 
 
            </list>
          </p>
 
          <p>Some elements we do not expect to need very often, but include in the schema
            in case we need them, on the theory that having unused elements in the schema
            will cause fewer headaches during the project than lacking an element we need.
            <list>
              <item>The <gi>biblFull</gi> element may be needed for bibliographic data
                within the annotations in the header.</item>
 
              <item>The <gi>profileDesc</gi>,
                <gi>calendar</gi>,
                <gi>calendarDesc</gi>,
                <gi>catRef</gi>,
                <gi>classCode</gi>,
                <gi>creation</gi>,
                <gi>keywords</gi>,
                <gi>langUsage</gi>,
                <gi>language</gi>, and
                <gi>textClass</gi>
                elements describe the text(s) contained in the manuscript, in general
                terms. We will not use them in the initial description of any manuscript,
                but if we enhance the metadata for any manuscript, these elements are likely to 
                be useful.</item>
              <item>In the <gi>encodingDesc</gi> element, the 
                <gi>catDesc</gi>,
                <gi>category</gi>,
                <gi>classDecl</gi>,
                <gi>geoDecl</gi>,
                <gi>tagUsage</gi>,
                <gi>tagsDecl</gi>, and
                <gi>taxonomy</gi>
                elements may be useful for enriched manuscript descriptions; they won't
                be part of the initial representation.
              </item> 

              <item>The <gi>editorialDecl</gi>,
                <gi>correction</gi>,
                <gi>hyphenation</gi>,
                <gi>interpretation</gi>,
                <gi>normalization</gi>,
                <gi>punctuation</gi>,
                <gi>quotation</gi>,
                <gi>segmentation</gi>, and
                <gi>stdVals</gi> elements are all relevant for 
                transcriptions, though not for digital facsimiles.
                TODO:  include them!
              </item>
            </list>
          </p>
 
          <p>Some elements should perhaps be suppressed; further thought is needed.
            <list> 
              <item><gi>abstract</gi></item>
              <item><gi>scriptNote</gi></item>
            </list>
          </p>
        </div>
 
        <div xml:id="text-structure-exclusions">
          <head>The text-structure module</head>
          <p>The basic text-structure module defines elements used in structuring the text, such as
            <gi>div</gi> and related elements; it also includes title pages and material that often
            appears there, as well as the essential <gi>TEI</gi> element.
            <specGrp xml:id="text-structure-inclusion">
              <!--* <moduleRef key="textstructure"
                except="div1 div2 div3 div4 div5 div6 div7 
                floatingText imprimatur 
                salute"
                /> *-->
 
              <moduleRef key="textstructure"
                         include="TEI text front body back group div
                                  titlePage byline docAuthor docTitle
                                  docDate titlePart
                                  opener epigraph closer trailer signed postscript
                                  "
              />
          </specGrp></p>
          <p>We are thus omitting: <gi>argument</gi>, <gi>div1</gi>,
          <gi>div2</gi>, <gi>div3</gi>, <gi>div4</gi>, <gi>div5</gi>,
          <gi>div6</gi>, <gi>div7</gi>, <gi>docEdition</gi>,
          <gi>docImprint</gi>, <gi>floatingText</gi>,
          <gi>imprimatur</gi>, and <gi>salute</gi>.
          </p> 
        </div>
      </div>

      <div xml:id="logical">
        <head>Paragraph-, phrase-, and character-level units</head>
 
        <div xml:id="core-module">
          <head>The core module</head>
 
         <p>From the core module, we include a number of elements
         for use in transcription and annotation of the text.  First,
         specialized elements important for generic text structure:
                <gi>head</gi>,
                <gi>l</gi>,
                <gi>lg</gi>,
                <gi>list</gi>,
                <gi>item</gi>,
                <gi>note</gi>,
                <gi>p</gi>,
                <gi>sp</gi>,
                <gi>speaker</gi>.
         </p>
         <specGrp xml:id="core-inclusion"> 
           <moduleRef key="core"
                      include="head list item note p sp
                               speaker"/>
           <specGrpRef target="#core-inscription"/>
           <specGrpRef target="#core-semantic-phrases"/>
           <specGrpRef target="#core-other"/>
         </specGrp>
          <p>Verse (<gi>lg</gi> and <gi>l</gi>) is included
          in view T.</p>
          <specGrp xml:id="Verse-dominant"> 
            <moduleRef key="core"
              include="l lg"/>
          </specGrp> 
         <p>Then, phrase-level elements for signaling properties
         of the inscription and/or the transcription: 
                <gi>add</gi>
                <gi>corr</gi>
                <gi>gap</gi>
                <gi>del</gi>
                <gi>hi</gi>
                <gi>unclear</gi>
         </p>
         <specGrp xml:id="core-inscription"> 
           <moduleRef key="core"
                      include="add corr del gap hi unclear"/>
         </specGrp>
         <p>A number of phrase-level elements signal specific
         semantic or linguistic properties of a passage: 
                <gi>date</gi>
                <gi>distinct</gi>
                <gi>emph</gi>
                <gi>foreign</gi>
                <gi>gloss</gi>
                <gi>mentioned</gi>
                <gi>name</gi>
                <gi>num</gi>
                <gi>ptr</gi>
                <gi>q</gi>
                <gi>ref</gi>
                <gi>rs</gi>
                <gi>term</gi>
                <gi>title</gi> 
         </p>
         <specGrp xml:id="core-semantic-phrases"> 
           <moduleRef key="core"
                      include="date distinct emph foreign gloss
                               mentioned name num ptr q ref rs term title"/>
         </specGrp>
         <p>And finally, we also include: 
                <gi>index</gi>
                <gi>milestone</gi>
         </p>
          <p>We include the following elements not for use within
          the text but because they are used in existing headers:
                <gi>author</gi>
                <gi>pubPlace</gi>
                <gi>resp</gi>
                <gi>respStmt</gi>
                <!--*
                    <gi n="35">author</gi> 
                    <gi n="35">date</gi> 
                    <gi n="27">hi</gi>
                    <gi n="70">name</gi> 
                    <gi n="32">note</gi> 
                    <gi n="447">p</gi> 
                    <gi n="35">pubPlace</gi>
                    <gi n="27">q</gi> 
                    <gi n="3">quote</gi> 
                    <gi n="105">resp</gi> 
                    <gi n="105">respStmt</gi> 
                    <gi n="35">textLang</gi> 
                    <gi n="70">title</gi>
                    *-->
          </p>
         <specGrp xml:id="core-other"> 
           <moduleRef key="core"
                      include="index milestone author pubPlace resp respStmt"/>
         </specGrp>

          <p>In the version of the TEI used as the basis for this
          design, this means we are omitting the following
          elements from the core module:
                <gi>abbr</gi>
                <gi>addrLine</gi>
                <gi>address</gi>
                <gi>analytic</gi>
                <gi>bibl</gi>
                <gi>biblScope</gi>
                <gi>biblStruct</gi>
                <gi>binaryObject</gi>
                <gi>cb</gi>
                <gi>choice</gi>
                <gi>cit</gi>
                <gi>citedRange</gi>
                <gi>desc</gi>
                <gi>divGen</gi>
                <gi>editor</gi>
                <gi>email</gi>
                <gi>expan</gi>
                <gi>gb</gi>
                <gi>graphic</gi>
                <gi>headItem</gi>
                <gi>headLabel</gi>
                <gi>imprint</gi>
                <gi>label</gi>
                <gi>lb</gi>
                <gi>listBibl</gi>
                <gi>measure</gi>
                <gi>measureGrp</gi>
                <gi>media</gi>
                <gi>meeting</gi>
                <gi>monogr</gi>
                <gi>orig</gi>
                <gi>pb</gi>
                <gi>postBox</gi>
                <gi>postCode</gi>
                <gi>ptr</gi>
                <gi>publisher</gi>
                <gi>quote</gi>
                <gi>ref</gi>
                <gi>reg</gi>
                <gi>relatedItem</gi>
                <gi>said</gi>
                <gi>series</gi>
                <gi>sic</gi>
                <gi>soCalled</gi>
                <gi>stage</gi>
                <gi>street</gi>
                <gi>teiCorpus</gi>
                <gi>textLang</gi>
                <gi>time</gi>
          </p>
        </div>
 
 
        <div xml:id="names-dates-exclusions">
          <head>The names and dates module</head>
          <p>We include the module for names and dates because it includes elements for country and
          city (<gi>settlement</gi>), which we need. We exclude most
          of the rest of it.</p>
          <specGrp
              xml:id="names-dates-inclusion">
            <moduleRef key="namesdates"
                       include="persName placeName rs country
                                settlement orgName"/>
          </specGrp>
          <p>The elements <gi>country</gi>,
          <gi>settlement</gi>, and <gi>orgName</gi>
          are included primarily because they are used in
          the headers (in the manuscript descriptions).</p>
        </div>
 
        <div xml:id="primary-sources-module">
          <head>The primary sources module</head>
          <p>We include the module for primary sources
          in order to get the elements elements <gi>damage</gi>,
          <gi>fw</gi>, and <gi>space</gi>.
          We may wish to exclude other elements here, but we keep them
          for now.
          <specGrp
              xml:id="primary-sources-inclusion">
            <moduleRef key="transcr"
                       include="damage fw metamark handShift"
                       /> 
          </specGrp>
          </p> 
        </div>

        <div xml:id="misc-modules">
          <head>Miscellaneous modules</head>

          <p>A few modules are included for the sake of a relatively
          small number of elements from each module.</p> 
          <p>We include the gaiji module to allow the encoding of non-Unicode
          characters, if and when we encounter them.</p>
          <specGrp
              xml:id="gaiji-inclusion">
            <moduleRef key="gaiji" />
          </specGrp>

          <p>We include the figures and tables module because
          documentation of non-Unicode glyphs requires
          <gi>figure</gi>,
          and there is tabular material in some of our manuscripts.
          (There may also be drawings which should be included
          via <gi>figure</gi>.)</p>
          <specGrp
              xml:id="figures-inclusion">
            <moduleRef key="figures" exclude="notatedMusic formula" />
          </specGrp>

          <p>We include the module for linking in order to get the
          elements <gi>seg</gi> and <gi>ab</gi>
          The other elements in the module we exclude,
          at least for now.</p>
          <specGrp xml:id="linking-inclusion">
            <moduleRef key="linking"
                       include="ab seg"
                       />
          </specGrp>
 
          <p>The <gi>caesura</gi> element from the verse module is included
            in all views.  (It's empty and cannot conflict with any
            hierarchy.)</p>
          <specGrp xml:id="verse-common"> 
            <moduleRef key="verse"
              include="caesura"/>
          </specGrp> 
        </div>
      </div>
 
      <div xml:id="pos-annotation">
        <head>Linguistic annotation</head>

        <p>This section describes markup specifically designed for
        linguistic annotation; the elements described here are all
        part of the primary tree in the S view of a document. Some of
        the elements (<hi>specific list to be determined</hi>) will
        be recessive in the P and T views; some (including <gi>w</gi>
        for tagging individual words) will be common to all views
        and thus dominant in all views.</p>
 
        <p>We model a number of elements over their analogs in the
          TEI module for analysis and interpretation, notably
          <gi>s</gi> for sentences, <gi>w</gi> for words, and
          <gi>m</gi> for morphemes or other sub-word units.  </p>
        <p>But we do not in fact include the TEI module for 
          analysis and interpretation; instead, we define 
          analogous elements in the ATMO namespace.</p>

 
        <div xml:id="pos-problem">
          <head>Project requirements for linguistic annotation</head>
 
          <p>One of the principal aims of the project is to provide
            detailed linguistic annotation, word by word and morpheme by
            morpheme, of selected texts.</p>
 
          <p>Note, however, that resource limitations make it impossible
            to annotate every document, and the fact that we use human
            taggers means that even documents which are being annotated
            will need to be available (and valid) before the annotation is
            complete.  So the schemas defined here need to work for
            documents that do not have such annotation, as well as for
            documents that do have it, and for documents which are
            partially annotated.</p>
 
          <p>The problem we need to solve in this part of this
          document is to construct an XML representation of markup for
          the internal structure of words which satisfies as well and
          simply as possible the following requirements and
          desiderata.  (See also <ref
          target="../../2017/06/pos-tags/index.xhtml">ATMO Technical
          Report 2017-01</ref> for a discussion of the annotation from
          a purely linguistic point of view.)
          <list>
            <item>
              <p>The markup should identify segments of type word,
              affix, and clitic.</p>
              <p>To simplify the overall markup, any parts of words not
              marked as affixes or clitics will also be marked up as
              segments.  (In theory, segments which are neither words
              nor affixes might be left untagged; in practice, it's
              simpler if the they are tagged.)</p>
              <p>For simplicity of exposition, each of the smallest
              segments identified will be called a morph in this
              discussion. For affixes and clitics, the term
              <mentioned>morpheme</mentioned> would be appropriate
              enough, but <mentioned>morpheme</mentioned> and even
              <mentioned>morph</mentioned> may be a stretch for word stems
              which themselves are composed of several morphemes; this
              can happen when an annotation aims to record the
              inflectional morphology of the text, but not the
              derivational morphology.  The patience of the reader is
              requested for this terminological awkwardness.  The
              generic <mentioned>segment</mentioned> may be used
              instead even though it is not in general restricted to
              the smallest (atomic) units identified by segmentation.</p>
              <p>The examples below typically use the following
              elements with the meanings indicated:
              <list>
                <item><p><gi>w</gi> marks segments identified as words.</p></item>
                <item><p><gi>m</gi> marks segments identified as morphs,
                typically (but not necessarily) individual morphemes;
                the use of <gi>m</gi> does not imply that the
                morph so tagged is not an affix or clitic.</p></item>
                <item><p><gi>clitic</gi> marks clitics.</p></item>
                <item><p><gi>aff</gi> marks affixes.</p></item>
                <item><p><gi>mm</gi> marks segments for which
                the tagger specifically wishes to avoid any
                implication that the segment is monomorphemic.</p></item>
              </list> 
              (Note: At this point, any exceptions to these patterns in the
              examples below are errors which should be corrected.)
              </p>
            </item>
 
            <item>
              <p>The markup should make it possible to reconstruct the
              appropriate conventional written form for the materials.
              Ideally the process should be easy or at least
              straightforward.</p>
 
              <p>In particular, it should be possible to ensure that
              white space is supplied where appropriate and only where
              appropriate.</p>
 
              <p>In effect, this means that it is desirable to ensure
              that the <term>orthographic words</term> of the
              conventional written form of the text can be
              reconstructed.  By <term>orthographic word</term> is meant
              a token delimited by white space or punctuation; the
              concept does not apply in all writing systems.  Often
              (i.e. in many or most linguistic analyses, of languages
              written in writing systems which use white space to
              delimit words) orthographic words and the words which
              appears as units in linguistic analysis are the same; one
              of the problems addressed below is how to deal with cases
              where they are not the same.</p>
            </item>
 
            <item>
              <p>It is not the task of this discussion to specify how to
              segment text into words or words into morphs, but how
              to represent a segmentation given by the linguistic
              analysis.</p>
              <p>This description thus has no opinion on whether English
              <mentioned>man</mentioned> consists of one morph or two,
              or whether French <mentioned>dix-sept</mentioned>
              <gloss>seventeen</gloss>, Uyghur <mentioned>tört
              besh</mentioned> <gloss>four or five</gloss>, or Uyghur
              <mentioned>on besh</mentioned> <gloss>fifteen</gloss> are
              respectively one word or two.  In all three cases, the
              problem we face is to ensure that both of the obvious
              analyses can be represented.</p>
            </item>
            <item>
              <p>The markup should represent the most common cases
              simply.</p>
            </item>
            <item>
              <p>The markup should support validation of the structural
              relations among words, affixes, and clitics.</p>
              <p>Together with the preceding desideratum, this means in
              effect that in-line markup is likely preferable to
              stand-off markup.</p>
            </item>
            <item>
              <p>The markup should have as clean a translation into
              vanilla TEI as can conveniently be managed.</p>
            </item>
          </list>
          It is not guaranteed that all of these can be satisfied 
          fully at the same time; tradeoffs may be necessary.
        </p>
      </div>
 
      <div xml:id="ann-sentences">
        <head>Sentences</head> 
 
        <p>
          Since linguistic annotation proceeds sentence by sentence,
          in the S view all tagged words should be gathered together
          into sentences or sentence-like units.  Paragraphs, for
          example, will not directly contain character data, or even
          <gi>atmo:lit</gi> / <gi>atmo:lat</gi> pairs of the kind used
          in the initial transcriptions. (It should be explained that
          <mentioned>lit</mentioned> is short for
          <gloss>literal</gloss> or <gloss>literatim</gloss>; the
          <gi>atmo:lit</gi> element contains a literatim transcription
          of material in the manuscript. <mentioned>lat</mentioned> is
          short for <gloss>Latin</gloss> and contains an
          information-preserving transliteration of the mterial into
          the Latin alphabet.)  Instead, paragraphs should be
          segmented into sentences.  See <ref
          target="#ann-connect">below</ref> (<ptr
          target="#ann-connect"/>) for the pattern modifications
          necessary for this.
        </p>
        <p>Markup for sentences is (in this version of these
          schemas, at least) specific to the S grammar.</p>
        <note place="block">
          <p><label>Open question:</label> Should sentence markup
            also be part of the T grammar?  Sentences do normally 
            nest within paragraphs, etc., but we do not expect 
            nesting of sentences and verse, so including sentences
            in the T grammar would involve making them sometimes
            normal matryoshka elements, and sometimes Trojan-Horse
            elements.</p>
          <p><label>Tentative answer:</label> no; sentences will
            not be part of the T grammar.  Rationale:  easier to 
            understand and design the schemas, easier to understand
            and work with the markup in any single view.</p> 
        </note>
 
        <specGrp xml:id="Sentences">
          <specGrpRef target="#ATMO-s-wrap-element"/>
          <specGrpRef target="#ATMO-s-element"/>
        </specGrp>
 
 
        <div xml:id="ann-s-wrap">
          <head>Sentence wrappers</head>
          <p>
          Each sentence has a regular structure:
          <eg><![CDATA[
            <atmo:s-wrap>
              <atmo:s>...</atmo:s>
              <tei:gloss>...</tei:gloss>
            </atmo:s-wrap>
              ]]></eg>
          Here:
          <list>
            <item><p>The element <gi>atmo:s-wrap</gi> serves (as
            the name suggests) to wrap all the parts of the structure
            together into a unit.</p></item>
            <item><p>The <gi>atmo:s</gi> element contains the
            sentence as transcribed from the manuscript.</p>
            <p>Its contents are described further below.</p></item>
            <item><p>The <gi>tei:gloss</gi> element contains a
            translation of the sentence; in this project the gloss is
            always in English; if glosses in other languages are to be
            added, the <gi>tei:gloss</gi> element should be made
            repeatable (by wrapping it in a <gi>rng:oneOrMore</gi>
            element).</p>
            <p>Its contents are as in TEI.</p></item>
          </list>
          </p>
          <specGrp xml:id="ATMO-s-wrap-element">
 
            <elementSpec ident="s-wrap" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>sentence wrapper</gloss>
              <desc>wraps a sentence up together with its English gloss</desc>
              <classes>
                <memberOf key="att.global"/>
                <memberOf key="model.phrase"/>
              </classes>
              <content autoPrefix="false">
                <rng:group>
                  <rng:ref name="atmo_s"/>
                  <rng:ref name="tei_gloss"/>
                </rng:group> 
              </content>
              <constraintSpec ident="no-atmo-s-wrap-in-header" 
                scheme="isoschematron">
                <constraint>
                  <sch:ns pref="atmo"
                    uri="http://uyghur.linguistics.indiana.edu/2015/ns/0.1"/>
                  <sch:ns prefix="tei"
                    uri="http://www.tei-c.org/ns/1.0"/>
                  <sch:report test="ancestor::tei:teiHeader">
                    Do not use atmo:s-wrap within the TEI Header.
                  </sch:report>
                </constraint>
              </constraintSpec>
              <remarks>
                <p>A constraint forbidding s-wrap to appear within
                  the TEI header would be useful, but has been
                  omitted for now, since the schema compiler had
                  trouble with it.</p>
              </remarks>
            </elementSpec>
 
          </specGrp>
 
        </div>
        <div xml:id="ann-s">
          <head>The <gi>s</gi> element</head>
          <p>
            The <gi>atmo:s</gi> element contains a sentence
            as transcribed from the manuscript.  It may have any of
            several different forms.
          </p>
 
          <list> 
            <item>
              <p>When the <soCalled>horizontal</soCalled> structure
                is used, the <gi>atmo:s</gi> element will most commonly 
                contain a single <gi>atmo:phr</gi> elements.
                When the sentence contains phrase-level markup 
                (e.g. for rubrication, for a quotation, for a
                phrase in a foreign language), there will be a
                sequence of <gi>atmo:phr</gi> elements interspersed 
                with TEI phrase-level elements needed for the case
                at hand. The TEI phrase-level elements should themselves
                contain <gi>atmo:phr</gi> elements).
                In the case of manuscripts not annotated or intended
                for annotation, the <gi>atmo:phr</gi> elements will
                contain just an <gi>atmo:lit</gi> / <gi>atmo:lat</gi> 
                pair.</p>
            </item>
            <item> 
              <p>
                When the <soCalled>vertical</soCalled> structure is 
                used, the <gi>atmo:s</gi> will in the simple case
                contain a sequence of <gi>atmo:w</gi> elements.  Between
                words, <gi>atmo:pc</gi> elements may occur to mark
                punctuation characters.  And phrase-level elements may 
                also occur within the sentence; these will in turn 
                contain a sequence of
                <gi>atmo:w</gi>, <gi>atmo:pc</gi>, 
                and in some cases further nested phrase-level elements.
              </p>
 
            </item>
          </list>
 
          <note place="block">
          <p>Two open problems may be mentioned: 
            <list>
              <item>
                <p><label>Open problem:</label> 
                  placing punctuation into separate tokens
                  while tokenizing.</p>
              </item>
              <item>
                <p><label>Open problem:</label> 
                  defining correct whitespace behavior for punctuation,
                  establishing conventions for documenting deviant cases.</p>
              </item>
            </list>
          </p>
          </note>
 
          <specGrp xml:id="ATMO-s-element">
            <elementSpec ident="s" mode="add" ns="&nsATMO;" prefix="atmo_" >
              <classes>
                <memberOf key="att.global"/>
              </classes>
              <content autoPrefix="false">
                <rng:choice>
                  <!-- no word elements -->
                  <rng:oneOrMore>
                    <rng:choice>
                      <rng:ref name="atmo_phr"/> 
                      <rng:ref name="atmo_w"/>
                      <rng:ref name="atmo_pc"/>
                      <rng:ref name="tei_model.phrase" />
                    </rng:choice>
                  </rng:oneOrMore>
                  <!-- word elements -->
                </rng:choice> 
              </content>
            </elementSpec>
 
          </specGrp>
 
          <p><hi>The tag-set documentation of <gi>s-wrap</gi>
          and <gi>s</gi> elements should be improved.</hi></p>
        </div>
        </div>
 
        <div xml:id="ann-words">
          <head>Words and morphemes</head>
          <p>
            Most word types in any dictionary, and most word tokens
            in any text, will for our purposes have substantially
            similar structures.  We describe that basic structure first,
            and postpone until later any discussion of complications
            (see <ptr target="#pos-irr-exc"/>).  An attempt has been
            made to make the considerations general, but it is the needs
            of the ATMO project which are central.  The XML encodings
            are inspired by the TEI usage of the BNC and the XML version
            of the Brown corpus.
          </p>
          <p>
            Most of the English examples are constructed; some are drawn
            from the Brown corpus.  They sometimes place POS information
            and other annotation on the word (to echo the usage of
            Brown) and sometimes on parts of the word (to be more
            similar to the Uyghur examples here).  Uyghur examples are
            from data collected by the Uyghur Light Verbs (UyLVs)
            project; I've used examples from published texts where
            possible, but it has not always been possible.  The
            linguistic analyses are those in the UyLVs data, unless
            otherwise noted.
          </p>
          <p>
            In the base case:
            <list type="ordered">

              <item n="1"><p>Every word (viewed as a linguistic object) is
              written as one blank- or punctuation-delimited sequence of
              characters (= one orthographic word).</p></item>

              <item n="2"><p>Every orthographic word represents (spells?
              writes?) one linguistic word.</p></item>
 
              <item n="3"><p>Every linguistic word is composed of a
              sequence of one or more morphs.</p></item>

              <item n="4"><p>Every orthographic word is composed of a
              sequence of one or more morphs.</p></item>

              <item n="5"><p>For now we ignore the distinctions among types
              of morphs. A morph is a morph; we don't attempt to
              distinguish sub-classes of morphs.</p></item>
            </list>
          </p>
          <p>
            Examples for the base case: 
            English <mentioned>characters</mentioned>
            <eg><![CDATA[
  <w>
    <m reg="character" pos="N"/>
    <m reg="s" pos="PL"/>
  </w>
  ]]></eg>
            Uyghur <mentioned>atasighe</mentioned> <gloss>to [the] father</gloss>:
            <eg><![CDATA[
  <w>
    <m reg="ata" ipa="atʰa" pos="N"     ilg="father"/>
    <m reg="si"  ipa="si"   pos="POSS3" ilg="POSS3"/>
    <m reg="ghe" ipa="ɣɛ"   pos="DAT"   ilg="DAT"/>
  </w>
  ]]></eg>
          </p>

 
          <p>
            Here and in other examples, 
          </p>
          <list>
            <item><p><label>reg</label> = regularized form (for Uyghur, this will
            be the conventional spelling of the segment in Uyghur Latin script (ULY),
            or something approximating it.</p></item>
            <item><p><label>ipa</label> = transcription (fairly broad) into
            the International Phonetic Alphabet.</p></item>
            <item><p><label>pos</label> = part-of-speech tag (like most
            <soCalled>part-of-speech</soCalled> tag sets, the one used
            here includes some morphological and other information that
            goes beyond the eight parts of speech identified by ancient
            grammarians of Latin).  For English examples, the tags used
            are from the Brown corpus, or made up when necessary.  For
            Uyghur examples, the tags used are from the Uyghur Light
            Verbs project.</p></item>
            <item><p><label>ilg</label> = interlinear gloss: for lexical
            morphs, a rough translation into English; for grammatical
            morphs a tag (often the same as the POS tag) describing its
            grammatical function.</p></item>
          </list>
          <p> 
            In what follows, many the examples omit for brevity several attributes or
            elements planned for ATMO:
            <list>
              <item><p><label>lit</label> literal transcription of original script</p></item>
              <item><p><label>lat</label> Latin transliteration of <gi>lit</gi></p></item>
              <item><p><label>ums</label> underlying morphemic segment (an abstract
              representation of a morpheme or morph which has different surface appearances
              based on context)</p></item>
            </list>
          </p>
          <p>For the base case, we might define words and
          morphemes something like this.  As the following section
          shows, these definitions are too simple. But they 
          provide a pattern we can extend to handle the complications
          presented by the data.</p>
          <specGrp xml:id="Simple-w-and-m-elements">
            <elementSpec ident="w" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>word</gloss>
              <desc>contains a sequence of morphemes constituting
                one linguistic word and one orthographic word.</desc>
              <!--* dummy elementSpec, do not attempt to edit *-->
              <classes>
                <memberOf key="model.phrase"/>
              </classes>
              <content autoPrefix="false">
                <rng:oneOrMore>
                  <rng:ref name="atmo_m"/>
                </rng:oneOrMore>
              </content>
            </elementSpec>
 
            <elementSpec ident="m" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>morpheme</gloss>
              <desc>represents a single morpheme or segment
                of a word; attributes indicate the spelling,
                part of speech, and other properties.</desc>
              <!--* dummy elementSpec, do not attempt to edit *-->
              <content>
                <!--*
                <rng:attribute name="lit"/>
                <rng:attribute name="lat"/>
                <rng:attribute name="reg"/>
                <rng:attribute name="ipa"/>
                <rng:attribute name="ums"/>
                <rng:attribute name="pos"/>
                <rng:attribute name="ilg"/>
                *-->
                <rng:empty/>
              </content>
 
              <attList>
                <attDef ident="lit">
                  <desc>Literatim transcription from ms</desc>
                  <datatype>xs:string</datatype>
                </attDef>
                <attDef ident="lat">
                  <desc>Transliteration into Latin alphabet</desc>
                  <datatype>xs:string</datatype>
                </attDef>
                <attDef ident="reg">
                  <desc>Regularized Latin spelling</desc>
                  <datatype>xs:string</datatype>
                </attDef>
                <attDef ident="ipa">
                  <desc>IPA approximation</desc>
                  <datatype>xs:string</datatype>
                </attDef>
                <attDef ident="ums">
                  <desc>Underlying morphological segment</desc>
                  <datatype>xs:string</datatype>
                </attDef>
                <attDef ident="pos">
                  <desc>Part of speech tag for this morph</desc>
                  <datatype>xs:string</datatype>
                </attDef>
                <attDef ident="ilg">
                  <desc>Interlinear gloss</desc>
                  <datatype>xs:string</datatype>
                </attDef>
              </attList> 
 
            </elementSpec>
          </specGrp>
        </div>
 
        <div xml:id="pos-irr-exc">
          <head>More complicated word/morpheme structures</head>
          <div xml:id="pos-irr-1L-nO">
            <head>One linguistic word written as multiple orthographic words</head>
              <p>In the base case, one linguistic word is one orthographic word.
              More complex cases arise when one linguistic word is made up of
              multiple orthographic words.  In simple examples, each orthographic 
              word consists of one or more morphs; for more complex examples, see
              <ref target="#pos-irr-1O-0M">One morph written across multiple 
              orthographic words</ref> below.
              </p>
              <p>The examples below mark the linguistic word with the
              <gi>w</gi> element and subdivide it into <gi>ow</gi>
              (<gloss>orthographic word</gloss>) elements; these in turn are
              divided into <gi>m</gi> elements for the morphs.</p>
              <div xml:id="pos-irr-1L-nO-npr">
                <head>Multi-token proper nouns</head>
                <p>
                  Example:  <mentioned>Nesreddin efendining</mentioned>
                  analysed as a single linguistic word:
                  <code>Nesreddin#efendi-ning</code>
                  <gloss>Nesredden effendi's</gloss> or 
                  <gloss>of Nesredden effendi</gloss>.
                <eg><![CDATA[
  <w>
    <ow>
      <m reg="Nesreddin" ipa="nɛsrɛddin" pos="Npr" ilg="Nasreddin"/>
    </ow>
    <ow>
      <m reg="efendi" ipa="ɛfɛndi" pos="N" ilg="efendi"/>
      <m reg="ning" ipa="niŋ" pos="GEN" ilg="GEN"/>
    </ow>
  </w>
  ]]></eg>
                </p>
                <p>
                  Alternate analyses, which treat the parts of the name
                  as independent words standing in a particular
                  syntactic relation to each other, pose no particular
                  problems for the XML representation: 
                <eg><![CDATA[
  <w>
    <m reg="Nesreddin" ipa="nɛsrɛddin" pos="Npr" ilg="Nasreddin"/>
  </w>
  <w>
    <m reg="efendi" ipa="ɛfɛndi" pos="N" ilg="efendi"/>
    <m reg="ning" ipa="niŋ" pos="GEN" ilg="GEN"/>
  </w>
  ]]></eg>
                The relation of the independent words (here
                <mentioned>Nesredden</mentioned>)
                to the genetive marker may be harder to describe
                in such analyses.  A similar problem arises with
                clitics which attach to phrases, see
                <ref target="#pos-irr-cl">below</ref>
                (<ptr target="#pos-irr-cl"/>).
                </p>
              </div>
              <div xml:id="pos-irr-1L-nO-numbers">
                <head>Numbers</head>
                <p>
                  Uyghur <mentioned>yigirme sekkizinchi</mentioned>
                  <code>yigirme#sekkiz-inchi</code>
                  <gloss>twenty-eighth</gloss>, analyzed as a sequence of three
                  morphs in two orthographic words (not the UyLVs analysis).
                  N.B. The analysis is not proposed as an alternative to the
                  UyLVs analysis but serves as a medium for showing ways to to
                  represent an analysis with this structure in XML.
                <eg><![CDATA[
  <w>
    <ow>
      <m reg="yigirme" ipa="jigirmɛ" pos="NU" ilg="twenty" />
    </ow>
    <ow>
      <m reg="sekkiz" ipa="sɛkkʰiz" pos="NU" ilg="eight" />
      <m reg="inchi" ipa="inʧʰi" pos="ORD" ilg="ORD" />
    </ow>
  </w>
  ]]></eg>
                </p>
                <p>See <ptr target="#pos-irr-exx-overlap"/> below for 
                a different analysis of this example.</p>
                <p>Example: Uyghur <mentioned>tört besh</mentioned> 
                <code>tört#besh</code>
                <gloss>four or five</gloss>, written as two orthographic words
                but analysed as a single linguistic word.
              <eg><![CDATA[
  <w>
    <ow>
      <m reg="tört" ipa="tʰørtʰ" pos="NU" ilg="four" />
    </ow>
    <ow>
      <m reg="besh" ipa="bɛʃ" pos="NU" ilg="five" /> 
    </ow>
  </w>
  ]]></eg>
                </p>
                <p>
                  English <mentioned>twenty one</mentioned>, analyzed as a sequence of two
                  morphs in two orthographic words.
                <eg><![CDATA[
  <w pos="CD">
    <ow><m reg="twenty"/></ow>
    <ow><m reg="one"/></ow>
  </w>
  ]]></eg>
                </p>
              </div>
              <div>
                <head>Reduplication of adjective</head>
                <p>Example:  reduplication of adjective. 
                Uyghur <mentioned>qap qaja</mentioned> 
                <gloss>very black</gloss>
              <eg><![CDATA[
  <w>
    <ow>
      <m reg="qap" ipa="qʰapʰ" pos="REDUPp" ilg="very" />
    </ow>
    <ow>
      <m reg="qaja" ipa="qʰaja" pos="AJ" ilg="black" /> 
    </ow>
  </w>
  ]]></eg>
                </p>
              </div>
 
              <div xml:id="pos-irr-1L-nO-phr">
                <head>Phrases which have become a single word</head>
                <p>Some students of English would prefer to regard English
                <mentioned>in spite of</mentioned> as a single word which
                happens to be written with white space (i.e. as multiple
                orthographic words).
              <eg><![CDATA[
     <w pos="IN">
        <ow><m>in</m></ow>
        <ow><m>spite</m></ow>
        <ow><m>of</m></ow>
     </w>
     ]]></eg>
                </p>
                <p>In older French, what is now the single word
                <mentioned>aujourd'hui</mentioned> was a phrase composed of
                several words (<gloss>on this day of today</gloss>).  An
                analysis which treats it as a single linguistic unit
                (possibly for greater parallelism to modern French, or for
                purely linguistic reasons) while retaining the fact that in
                a given source it is written as multiple orthographic words,
                might be:
              <eg><![CDATA[
     <w pos="RB">
        <ow><m>au</m></ow>
        <ow><m>jour</m></ow>
        <ow><m>d'</m></ow>
        <ow><m>hui</m></ow>
     </w>
     ]]></eg>
                </p>
              </div>
            <div xml:id="irr-oldspelling">
              <head>Older spelling</head>
              <p>In some of the manuscripts transcribed by the
              project, word divisions may occur in the manuscripts 
              within what in unquestionably a single word from the 
              linguistic point of view.</p>
              <p>
              Example:  word and inflectional morphemes written apart
              in manuscript (these are from Prov. 351, p. 7):
              <eg><![CDATA[
 
	    <atmo:w lit="ناشته دا" lat="našth da">
	      <atmo:ow lit="ناشته" lat="našth">
		<atmo:m lit="ناشته" 
		        lat="našth" 
		        reg="našth" 
		        pos="N" 
		        ilg="breakfast"/>
	      </atmo:ow>
	      <atmo:ow lit="دا" lat="da">
		<atmo:m lit="دا" 
		        lat="da" 
		        reg="da" 
		        pos="LOC" 
		        ilg="LOC"/>
	      </atmo:ow>
	    </atmo:w>
	    ...
	    <atmo:w lit="نمرسه ردین" lat="nmrsh lardyn">
	      <atmo:ow lit="نمرسه ردین" lat="nmrsh lardyn">
		<atmo:m lit="نمرسه" 
		        lat="nmrsh" 
		        reg="nmrsh" 
		        pos="PN.INDEF" 
		        ilg="thing"/>
	      </atmo:ow>
	      <atmo:ow lit="نمرسه ردین" lat="nmrsh lardyn"> 
		<atmo:m lit="لارد" 
		        lat="lar" 
		        reg="lar" 
		        pos="PL" 
		        ilg="PL"/>
		<atmo:m lit="دین" 
		        lat="dyn" 
		        reg="dyn" 
		        pos="ABL" 
		        ilg="ABL"/>
	      </atmo:ow>
	    </atmo:w>
                  ]]></eg>
                </p>
              </div>
            </div>
 
            <div xml:id="pos-irr-1O-nL">
              <head>One orthographic word represents multiple linguistic words</head>
 
              <p>An exception to the second rule of the base case (one
              orthographic word spells one linguistic word) arises when
              multiple linguistic words are written as a single
              orthographic word.</p>

              <p>Example:  English <mentioned>You're outtasight!</mentioned></p>
 
              <p>Adopting the <gi>mw</gi> (multi-word) element for this,
              and analysing further, we could have:

<eg><![CDATA[
  <mw>
    <w><m lit="You" reg="you" .../></w>
    <w><m lit="'re" reg="are" .../></w>
  </mw>

  <mw>
    <w><m lit="out" reg="out" .../></w>
    <w><m lit="ta" reg="of" .../></w>
    <w><m lit="sight" reg="sight" .../></w>
  </mw>
]]></eg>
              </p>
              <p>
                This (one orthographic work representing multiple linguistic words)
                appears not to happen in the UyLVs data (or if it does, I am missing
                the signs).
              </p>
            </div>
            <div xml:id="pos-irr-1L-nM">
              <head>Linguistic words and morphs</head>
              <p>N.B. rule 3 has no exceptions.  Linguistic words are always
              sequences of morphs; we introduce the concept of
              linguistic word precisely to save the 1:<ident>n</ident>
              word:morph relation when our identification of morphs and the
              way the text is written don't match.
              </p>
            </div>
            <div xml:id="pos-irr-1O-0M">
              <head>One morph written across multiple orthographic words</head>
              <p>The fourth rule of the base case is that an orthographic
              word consists of one or more morphs.  The exceptional case
              here arises when an orthographic word is not a full morph.
              Or, equivalently, we can describe the situation as involving a
              morph written as a sequence of multiple orthographic words.
              (When this applies, <ref target="#pos-irr-1L-nO">One
              linguistic word written as multiple orthographic words</ref>
              above will necessarily also apply.)
              </p>
              <p>Example: Uyghur <mentioned>on besh</mentioned>
              <gloss>fifteen</gloss>, analysed as a single morph.
<eg><![CDATA[
  <w>
    <m reg="on besh" ipa="on bɛʃ" pos="NU" ilg="fifteen"/>
  </w>
]]></eg>
              </p>
              <p>
                Example: compound nouns made up of two words, analyzed as single
                morphs.
              </p>
              <p>
                Uyghur <mentioned>gherbiy jenup</mentioned> 
                <code>gherbiy#jenup</code> <gloss>southwest</gloss>:
              <eg><![CDATA[
  <w>
    <m reg="gherbiy jenup" ipa="ɣɛrbijʤɛnupʰ" pos="N" ilg="southwest"/>
  </w>
  ]]></eg>
              </p>
              <p>
                Example:  <mentioned>turnajabin</mentioned>
                analysed as a single morph
                (this is from Prov. 351, p. 7):
                <eg><![CDATA[
	    <atmo:w lit="ترنجه بین" lat="trnǰh byn">
	      <atmo:m lit="ترنجه بین" 
	              lat="trnǰh byn" 
	              reg="trnǰhbyn" 
	              pos="N" 
	              ilg="turnajabin"/>
	    </atmo:w>
                  ]]></eg>
              </p>
            </div>
            <div xml:id="pos-irr-cl">
              <head>Distinguishing kinds of morphs (clitics)</head>
              <p>The fifth rule of the simple case is that morphs are not
              distinguished by type.  This is an oversimplification: we
              would like to flag clitics as such.</p>
              <p>Example:  <mentioned>yigirme sekkizinchi Aprilghiche</mentioned> 
                <gloss>to 28 April</gloss>, 
                in which <mentioned>ghiche</mentioned> is an enclitic.
            <eg><![CDATA[
  ...
  <w>
    <m reg="April" ipa="april" pos="N" ilg="April"/>
    <clitic reg="ghiche" ipa="ɣiʧʰɛ" pos="LIM" ilg="LIM"/>
  </w>
  ]]></eg>
              </p>
              <p>Example:  English <mentioned>the Queen of England's hat</mentioned>:
            <eg><![CDATA[
  ...
  <w><m reg="the"/></w>
  <w><m reg="Queen"/></w>
  <w><m reg="of"/></w>
  <w>
    <m reg="England" />
    <clitic reg="'s"/>
  </w>
  <w><m reg="hat"/></w>
  ]]></eg>
              </p>
              <p>These examples also illustrate the principle that clitics
              are shown as attaching to the nearest word, not to the
              phrase.  This is not wholly satisfactory (there is no
              prepositional phrase <mentioned>of England's hat</mentioned>
              in the second example), but the alternative would entail
              analysis of the sentence into phrases when clitics are
              present and attached to phrases. Most projects doing
              part-of-speech tagging prefer to stay away from syntactic
              analysis and phrase structure, and accept the fact that the
              markup leaves unspecified just what the clitic actually
              attaches to.</p>
              <p>If it is desired to distinguish affixes from other
              morphs, they can similarly be marked as <gi>aff</gi>
              elements.</p>
              <note place="block">
              <p><label>Note:</label> The established form for marking
                clitics in interlinear segmentations in the UyLVs project
                was the use of an equals sign instead of a hyphen to mark
                the morpheme boundary.  For the examples above,
                <code>April=ghiche</code>
                or 
                <code>England='s</code>.
                This has the disadvantage that a program without any
                linguistic knowledge cannot tell whether it is 
                the morph on the left or the morph on the right which
                is the clitic.  A human with knowledge of the lexicon
                will have no difficulty with the examples just shown,
                but may not know off-hand how to identify the clitic
                in the imaginary example <code>hra=qur</code>.
              </p>
              <p>The best an annotation interface can do in this case
              is to translate <code>hra=qur</code> into 
              <code>&lt;m lat="hra"/>&lt;m lat="qur"/></code>,
              and for each of these two morphs insert a button allowing
              the user to change the element name from 
              <gi>m</gi> to <gi>clitic</gi>.</p>
              </note>
            </div>
            <div xml:id="pos-irr-exx-1-4">
              <head>More orthographic words than morphs</head>
              <p>Uyghur <mentioned>Sekkiz ming sekkiz yüz toqsan besh</mentioned>
              <gloss>8895</gloss>, with <mentioned>toqsan
              besh</mentioned> analysed as a single morpheme.
              </p>
              <p>This example deviates from both the first and the fourth rule of
              the base case:  it is a sequence of six orthographic words
              analysed as one linguistic word composed of five (not six) morphs.
              </p>
              <p>
                N.B. I'm not recommending this analysis; the task here is to show that
                this possible analysis can be represented.
              <eg><![CDATA[
  <w>
   <ow><m reg="Sekkiz" ipa="sɛkkʰiz" pos="NU" ilg="eight"/></ow>
   <ow><m reg="ming" ipa="miŋ" pos="NU" ilg="thousand"/></ow>
   <ow><m reg="sekkiz" ipa="sɛkkʰiz" pos="NU" ilg="eight"/></ow>
   <ow><m reg="yüz" ipa="jyz" pos="NU" ilg="hundred"/></ow>
   <ow><m reg="toqsan besh" ipa="tʰoqsan bɛʃ" pos="NU" 
          ilg="ninety.five"/></ow>
  </w>
  ]]></eg>
              </p>
              <!--*
                  <p>Or:
                  <eg><![CDATA[
                  <w>
                  <ow><m reg="Sekkiz" ipa="sɛkkʰiz" pos="NU" ilg="eight"/></ow>
                  <ow><m reg="ming" ipa="miŋ" pos="NU" ilg="thousand"/></ow>
                  <ow><m reg="sekkiz" ipa="sɛkkʰiz" pos="NU" ilg="eight"/></ow>
                  <ow><m reg="yüz" ipa="jyz" pos="NU" ilg="hundred"/></ow>
                  <m reg="toqsan besh" ipa="tʰoqsan bɛʃ" pos="NU" ilg="ninety.five"/>
                  </w>
                  ]]></eg>
                  </p> *-->
              <!--* No.  Absolutely not. What were you thinking even to
                  write it down?! *-->
 
            <!--*
            <p>Or most explicit (note that ow is sometimes the parent of m and
            sometimes the child):
            <eg><![CDATA[
  <w>
    <ow>
      <m pos="NU" ilg="eight">
        <reg>Sekkiz</reg>
        <ipa>sɛkkʰiz</ipa>
      </m>
    </ow>
    <ow>
      <m pos="NU" ilg="thousand">
        <reg>ming</reg>
        <ipa>miŋ</ipa>
      </m>
    </ow>
    <ow>
      <m pos="NU" ilg="eight">
        <reg>sekkiz</reg>
        <ipa>sɛkkʰiz</ipa>
      </m></ow>
    <ow>
      <m pos="NU" ilg="hundred">
        <reg>yüz</reg>
        <ipa>jyz</ipa>
      </m>
    </ow>
 
    <m pos="NU" ilg="ninety.five">
      <ow>
        <reg>toqsan</reg>
        <ipa>tʰoqsanɛʃ</ipa>
      </ow>
      <ow>
        <reg>besh</reg>
        <ipa>bɛʃ</ipa>
      </ow>
   </m>
  </w>
  ]]></eg> 
            </p> *-->
            </div>
            <div xml:id="pos-irr-exx-overlap">
              <head>Two orthographic words, two morphs, not aligned</head>
              <p>Uyghur <mentioned>yigirme sekkizinchi</mentioned>
              <gloss>twenty-eighth</gloss>, analyzed as two
              morphs split across two orthographic words.  N.B. I am not
              recommending this analysis, but showing that an analysis with this
              structure can be represented.
              </p>
              <p>
                We can rely on the fact that by default there is no space between
                adjacent morphs in a word; the regularized form of the following
                example will be displayed correctly as <q>yigirme sekkizinchi</q>.
              <eg><![CDATA[
  <w>
    <m reg="yigirme sekkiz" ipa="jigirmɛ" pos="NU" 
       ilg="twenty-eight" />
    <m reg="inchi" ipa="inʧʰi" pos="ORD" ilg="ORD">
  </w>
  ]]></eg>
              </p>
            </div>
          </div>
 

        <div xml:id="ana-module">
          <head>Elements for words and morphs</head>
 
          <p>The elements for words and sub-word segments are common
            across schemas.</p>
 
          <p>When annotation is present, then normally every word in the
            text will be marked as a word, its inflectional morphemes will
            be identified, and each segment will be given a regularized
            spelling, a part of speech code, and an interlinear
            gloss.</p>
 
          <specGrp xml:id="Words-and-Morphs">
            <specGrpRef target="#ATMO-w-element"/>
            <specGrpRef target="#segment-elements"/>
            <specGrpRef target="#ow-and-mw"/>
            <specGrpRef target="#Annotation-attributes"/>
          </specGrp>
 
          <div xml:id="ana-words">
            <head>Words</head>
            <p>As shown in the examples, <gi>atmo:w</gi> will contain
              either a series of <gi>atmo:ow</gi> elements (optionally
              interspersed with <gi>atmo:pc</gi> elements if there is
              inter-word punctuation), or a series of sub-word
              segments.</p> 
            <specGrp xml:id="ATMO-w-element">
              <elementSpec ident="w" mode="add" ns="&nsATMO;" prefix="atmo_">
                <gloss>word</gloss>
                <desc>normally contains a sequence of morphemes or other
                  sub-word segments constituting one linguistic word and
                  one orthographic word; if one linguistic word is
                  written with two or more orthographic words, then
                  either the <gi>w</gi> element will contain a sequence
                  of two or more <gi>ow</gi> (orthographic word)
                  elements, or at least one of the sub-word segments
                  will have a spelling that includes whitespace. </desc>
                <classes>
                  <memberOf key="att.ATMO-global-subset"/>                  
                  <memberOf key="att.Lit-Lat-Optional"/>
                  <memberOf key="att.ATMO-annotated"/>
                  <!--<memberOf key="model.phrase"/>-->
                </classes>
                <content autoPrefix="false">
                  <rng:choice>
                    <rng:oneOrMore>
                      <!--<rng:choice>
                        <rng:ref name="tei_model.ATMO-segment"/>
                      </rng:choice>-->
                      <rng:choice>
                        <rng:ref name="atmo_m"/>
                        <rng:ref name="atmo_aff"/>
                        <rng:ref name="atmo_clitic"/>
                        <rng:ref name="atmo_pc"/>
                      </rng:choice>
                    </rng:oneOrMore>
                    <rng:group>
                      <rng:ref name="atmo_ow"/>
                      <rng:oneOrMore>
                        <rng:group>
                          <rng:optional>
                            <rng:ref name="atmo_pc"/>
                          </rng:optional>
                          <rng:ref name="atmo_ow"/>
                        </rng:group>
                      </rng:oneOrMore>
                    </rng:group>
                  </rng:choice>
                </content>
              </elementSpec>
              
              <elementSpec ident="pc" mode="add" ns="&nsATMO;" prefix="atmo_">
                <gloss>punctuation character</gloss>
                <desc>represents a punctuation character appearing 
                  between words.</desc>
                <classes>
                  <memberOf key="att.ATMO-global-subset"/>
                  <memberOf key="att.Lit-Lat-Required"/>
                  <memberOf key="att.ATMO-annotated"/>
                  <memberOf key="model.ATMO-segment"/>
                </classes>
                <content>
                  <rng:empty/>
                </content>
              </elementSpec>
            </specGrp> 
          </div>
 
          <div xml:id="ana-morphs">
            <p>We define several elements for sub-word segments. Of
              these, <gi>m</gi> is the most generic: it will often be a
              single (inflectional) morpheme, but it may also be a
              sequence of (derivational) morphemes. The <gi>aff</gi>
              (affix) and <gi>clitic</gi> elements may be used to mark
              specific kinds of segment. </p>
            <specGrp xml:id="segment-elements">
              <classSpec ident="model.ATMO-segment" type="model">
                <desc>An element class for all sub-word segments defined
                  in this schema.</desc>
                <!--<classes>
                  <!-\-* dummy membership, for testing only *-\->
                  <!-\-<memberOf key="model.phrase"/>-\->
                </classes>-->
              </classSpec>

              <elementSpec ident="m" mode="add" ns="&nsATMO;" prefix="atmo_">
                <gloss>morpheme</gloss>
                <desc>represents a single morpheme or segment of a word;
                  attributes indicate the spelling, part of speech, and
                  other properties.</desc>
                <classes>
                  <memberOf key="att.ATMO-global-subset"/>
                  <memberOf key="att.Lit-Lat-Required"/>
                  <memberOf key="att.ATMO-annotated"/>
                  <memberOf key="model.ATMO-segment"/>
                </classes>
                <content>
                  <rng:empty/>
                </content>
              </elementSpec>

              <elementSpec ident="aff" mode="add" ns="&nsATMO;" prefix="atmo_">
                <gloss>affix</gloss>
                <desc>represents an affix; attributes indicate the
                  spelling, part of speech, and other properties.</desc>
                <classes>
                  <memberOf key="att.ATMO-global-subset"/>
                  <memberOf key="att.Lit-Lat-Required"/>
                  <memberOf key="att.ATMO-annotated"/>
                  <memberOf key="model.ATMO-segment"/>
                </classes>
                <content>
                  <rng:empty/>
                </content>
              </elementSpec>

              <elementSpec ident="clitic" mode="add" ns="&nsATMO;" prefix="atmo_">
                <gloss>affix</gloss>
                <desc>represents a clitic; attributes indicate the
                  spelling, part of speech, and other properties.</desc>
                <classes>
                  <memberOf key="att.ATMO-global-subset"/>
                  <memberOf key="att.Lit-Lat-Required"/>
                  <memberOf key="att.ATMO-annotated"/>
                  <memberOf key="model.ATMO-segment"/>
                </classes>
                <content>
                  <rng:empty/>
                </content>
              </elementSpec>
            </specGrp>
          </div>
          <div xml:id="ana-attributes">
            <p>All elements used for words and segments of words can
              carry annotation attributes. For words, the <att>lit</att>
              and <att>lat</att> attributes are optional (they are
              strictly speaking redundant, since they can be
              reconstructed from the corresponding values of the word's
              constituents), but they are defined because they are often
              handy. For sub-word segments <att>lit</att> and
                <att>lat</att> are required. The others are all
              optional. </p>
            <note place="block"><p>
                <label>Open issue:</label> A free-standing Schematron
                schema to check for incomplete annotation would be
                helpful. </p></note>

            <p>We define several element classes to carry these
              attributes.</p>
            <specGrp xml:id="Annotation-attributes">

              <classSpec ident="att.Lit-Lat-Required" type="atts">
                <desc>An element class for elements on which the
                    <att>lit</att> and <att>lat</att> attributes are
                  required.</desc>
                <attList>
                  <attDef ident="lit" usage="req">
                    <desc>Literatim transcription from ms</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                  <attDef ident="lat" usage="req">
                    <desc>Transliteration into Latin alphabet</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                </attList>
              </classSpec>

              <classSpec ident="att.Lit-Lat-Optional" type="atts">
                <desc>An element class for elements on which the
                    <att>lit</att> and <att>lat</att> attributes are
                  required.</desc>
                <attList>
                  <attDef ident="lit" usage="opt">
                    <desc>Literatim transcription from ms</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                  <attDef ident="lat" usage="opt">
                    <desc>Transliteration into Latin alphabet</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                </attList>
              </classSpec>

              <classSpec ident="att.ATMO-annotated" type="atts">
                <desc>An element class for annotated segments: words,
                  morphemes, clitics, affixes, and other sub-word
                  segments.</desc>
                <attList>
                  <attDef ident="reg">
                    <desc>Regularized Latin spelling</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                  <attDef ident="ipa">
                    <desc>IPA approximation</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                  <attDef ident="ums">
                    <desc>Underlying morphological segment</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                  <attDef ident="pos">
                    <desc>Part of speech tag for this morph</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                  <attDef ident="ilg">
                    <desc>Interlinear gloss</desc>
                    <datatype>xs:string</datatype>
                  </attDef>
                </attList>
              </classSpec>


            </specGrp>
            
           
          </div>
          <div xml:id="ana-mwow">
	    <head>The <gi>mw</gi> and <gi>ow</gi> elements</head>
            <p>As described <ref target="#pos-irr-exc">above</ref> (<ptr
                target="#pos-irr-exc"/>), we define the elements
                <gi>mw</gi> and <gi>ow</gi> to handle cases where there
              is either more whitespace than would be expected if every
              linguistic word were an orthographic word, or less.</p>
 
            <specGrp xml:id="ow-and-mw">
 
              <elementSpec ident="mw" mode="add" ns="&nsATMO;" prefix="atmo_">
                <gloss>multi-word token</gloss>
                <desc>represents a single orthographic word consisting
                  of a series of linguistic words</desc>
                <classes>
                  <memberOf key="att.ATMO-global-subset"/>
                  <memberOf key="att.Lit-Lat-Required"/>
                  <memberOf key="att.ATMO-annotated"/>
                  <memberOf key="model.phrase"/>
                </classes>
                <content autoPrefix="false">
                  <rng:group>
                    <rng:ref name="atmo_w"/>
                    <rng:oneOrMore>
                      <rng:group>
                        <rng:optional>
                          <rng:ref name="atmo_pc"/>
                        </rng:optional>
                        <rng:ref name="atmo_w"/>
                      </rng:group>
                    </rng:oneOrMore>
                  </rng:group>
                </content>
                <remarks>
                  <p>We require the <att>lit</att> and <att>lat</att>
                    attributes, to help simplify the task of
                    reconstructing the surface written form of the
                    material.</p>
                </remarks>
              </elementSpec>
 
              <elementSpec ident="ow" mode="add" ns="&nsATMO;" prefix="atmo_">
                <gloss>orthographic word</gloss>
                <desc>represents an orthographic word 
                  (i.e. one delimited by whitespace and/or
                  punctuation or whose boundaries are otherwise
                  signalled by the writing system,
                  e.g. by initial and final letter forms)
                  which appears as a constituent part of a
                  segment treated as a linguistic word for
                  purposes of annotation.</desc>
                <classes>
                  <memberOf key="att.ATMO-global-subset"/>
                  <memberOf key="att.Lit-Lat-Optional"/>
                  <memberOf key="att.ATMO-annotated"/>
                  <!--* temporary measure, delete model.phrase once
                    w has a working content model *-->
                  <memberOf key="model.phrase"/>
                </classes>
                
               <!-- <content>
                  <textNode/>
                </content>-->
                <!--*
                <content autoPrefix="false">
                  <rng:oneOrMore>
                    <rng:ref name="tei_model.ATMO-segment_alternation"/> 
                  </rng:oneOrMore>
                </content>
                *-->
                <content>                  
                  <classRef key="model.ATMO-segment" minOccurs="1"
                    maxOccurs="unbounded"/>                    
                </content>
                <!--<content>
                  <rng:oneOrMore>
                    <rng:choice>
                      <rng:ref name="tei_model.ATMO-segment"/>
                    </rng:choice>
                  </rng:oneOrMore>
                </content>-->
                <!--<content>
                  <rng:oneOrMore>
                    <rng:choice>
                      <rng:ref name="model.ATMO-segment"/>
                    </rng:choice>
                  </rng:oneOrMore>
                </content>-->
                <!--<content autoPrefix="false">
                  <rng:oneOrMore>
                    <rng:choice>
                      <rng:ref name="tei_model.ATMO-segment"/>
                    </rng:choice>
                  </rng:oneOrMore>
                </content>-->
                <!--
                <content autoPrefix="false">
                  <rng:oneOrMore>
                    <rng:choice>
                      <rng:ref name="atmo_m"/>
                      <rng:ref name="atmo_aff"/>
                      <rng:ref name="atmo_clitic"/>
                      <rng:ref name="atmo_pc"/>
                    </rng:choice>
                  </rng:oneOrMore>
                </content>
                -->
              </elementSpec>
 
            </specGrp>

          </div>
        </div>
        <div xml:id="tier-level-annotation">
          <head>Tier-by-tier annotation and the <gi>atmo:phr</gi> element</head>
          <p>As an alternative to the markup shown in the preceding
            sections using <gi>w</gi>, <gi>m</gi>, etc.,
            we also provide a level-by-level, tier-by-tier, or 
            <soCalled>horizontal</soCalled> form for 
            annotation.</p>
          <p>In this encoding, each level of representation or
            annotation is encoded using an XML element (<gi>atmo:lit</gi>,
            <gi>atmo:lat</gi>,
            <gi>atmo:pos</gi>, etc.).  Within
            a level, words and punctuation tokens are delimited 
            by whitespace, morphemes by hyphens.  Hash marks (#)
            are used to represent orthographic white space within 
            segments the annotator chooses to regard as single
            linguistic words.  If sentences are tagged and fully
            annotated, a paragraph in the horizontal form
            will have a structure like this:
            <eg><![CDATA[
  <tei:p>
    <atmo:s-wrap>
      <atmo:s>
        <atmo:phr>
          <atmo:lit>...</atmo:lit>
          <atmo:lat>...</atmo:lat>
          <atmo:seg>...</atmo:seg>
          <atmo:ipa>...</atmo:ipa>
          <atmo:ums>...</atmo:ums>
          <atmo:pos>...</atmo:pos>
          <atmo:ilg>...</atmo:ilg>
        </atmo:phr>
      </atmo:s>
      <tei:gloss>...</tei:gloss>
    </atmo:s-wrap>
    <atmo:s-wrap> ... </atmo:s-wrap>
    <atmo:s-wrap> ... </atmo:s-wrap>
    ...
  </tei:p>
            ]]></eg>
            When sentences are not tagged (e.g. in the view T),
            the paragraph will consist of <gi>atmo:phr</gi> and
            other phrase-level elements:
            <eg><![CDATA[
  <tei:p>
    <atmo:phr>
          <atmo:lit>...</atmo:lit>
          <atmo:lat>...</atmo:lat>
    </atmo:phr>
    <tei:seg type="illness-term">
      <atmo:phr> ... </atmo:phr>
    </tei:seg>
    <atmo:phr> ... </atmo:phr>
    <tei:hi rend="red">
      <atmo:phr> ... </atmo:phr>
    </tei:hi>
    ...
  </tei:p>
            ]]></eg>
 
          </p>
 
          <p>As can be seen, the <gi>atmo:phr</gi> element 
            is essentially a wrapper (or substitute)
            for <code>#PCDATA</code>, in manuscript text.
          </p>
          <specGrp xml:id="Phrases">
            <elementSpec ident="phr" mode="add" ns="&nsATMO;" prefix="atmo_" >
              <desc>Encloses parallel versions of character data in
              a transcription.</desc>
              <classes>
                <memberOf key="att.global"/>
                <memberOf key="model.phrase"/>
              </classes>
              <content autoPrefix="false">
                <rng:group>
                  <rng:ref name="atmo_lit"/>
                  <rng:optional>
                    <rng:ref name="atmo_lat" />
                  </rng:optional>
                  <rng:optional>
                    <rng:ref name="atmo_reg" />
                  </rng:optional>
                  <rng:optional>
                    <rng:ref name="atmo_ipa" />
                  </rng:optional>
                  <rng:optional>
                    <rng:ref name="atmo_seg" />
                  </rng:optional>
                  <rng:optional>
                    <rng:ref name="atmo_pos" />
                  </rng:optional>
                  <rng:optional>
                    <rng:ref name="atmo_ilg" />
                  </rng:optional>
                </rng:group>
              </content>
            </elementSpec>
 
            <elementSpec ident="lit" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>literatim text</gloss>
              <desc>Contains a literal (unregularized, uncorrected)
                transcription of some portion of the exemplar.</desc>
              <classes>
                <memberOf key="att.ATMO-global-subset"/>
              </classes>
              <content>
                <rng:text/>
              </content>
            </elementSpec>
            <elementSpec ident="lat" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>Latin text</gloss>
              <desc>Contains a 1:1 transliteration of the text
                in the sibling <gi>atmo:lit</gi> element.</desc>
              <classes>
                <memberOf key="att.ATMO-global-subset"/>
              </classes>
              <content>
                <rng:text/>
              </content>
              <remarks>
                <p>No attempt is made in this version to constrain
                the character set of this element.  It would be
                a good idea.</p>
              </remarks>
            </elementSpec>
            <elementSpec ident="seg" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>regularized and segmented text</gloss>
              <desc>Contains a version of the text in the 
                sibling <gi>atmo:lat</gi> element in which 
                spelling has been regularized and selected
                morpheme boundaries have been indicated within
                words by inserting hyphens or (for clitics)
                equals signs.</desc>
              <classes>
                <memberOf key="att.ATMO-global-subset"/>
              </classes>
              <content>
                <rng:text/>
              </content>
              <remarks>
                <p>No attempt is made in this version to constrain
                  the character set of this element, or the pattern
                  of whitespace and segment boundaries.  It would 
                  probably be a good idea to do so.</p>
              </remarks>
            </elementSpec>
            <elementSpec ident="ipa" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>International Phonetic Alphabet transcription</gloss>
              <desc>Contains a rendering of the text of the sibling
                elements in the International Phonetic Alphabet (IPA),
                to indicate the (conjectured) pronunciation. The text is
                segmented as in the <gi>atmo:seg</gi> sibling. </desc>
              <classes>
                <memberOf key="att.ATMO-global-subset"/>
              </classes>
              <content>
                <rng:text/>
              </content>
              <remarks>
                <p>No attempt is made in this version to constrain
                  the character set of this element.  It would 
                  probably be a good idea to do so.</p>
              </remarks>
            </elementSpec>
            <elementSpec ident="ums" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>underlying morphemic segment</gloss>
              <desc>Contains a version of the text in the 
                sibling <gi>atmo:seg</gi> element in which 
                each segment is replaced by a standardized spelling
                representing the postulated underlying morpheme.
                This provides a standard written representation
                which reverses the phonological effects of context
                on the morpheme and allows phonetically different
                realizations of the same morpheme to be readily
                found.
              </desc> 
              <classes>
                <memberOf key="att.ATMO-global-subset"/>
              </classes>
              <content>
                <rng:text/>
              </content> 
            </elementSpec>
            <elementSpec ident="pos" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>part-of-speech tagging</gloss>
              <desc>Contains a transcription of the text of the
                sibling <gi>atmo:seg</gi> element in which 
                each segment is replaced by a part-of-speech
                tag.</desc>
              <classes>
                <memberOf key="att.ATMO-global-subset"/>
              </classes>
              <content>
                <rng:text/>
              </content>
              <remarks>
                <p>No attempt is made in this version to constrain
                  the values to tags actually defined in the project's
                  POS tag set.  It would be a good idea to do so.</p>
              </remarks>
            </elementSpec>
            <elementSpec ident="ilg" mode="add" ns="&nsATMO;" prefix="atmo_">
              <gloss>interlinear gloss</gloss>
              <desc>Contains a transcription of the text of the
                sibling <gi>atmo:seg</gi> element in which 
                each segment with a lexical meaning is replaced by 
                an English-language gloss and each segment with
                a grammatical meaning is replaced by its POS tag
                or a similar grammatical indicator.</desc>
              <classes>
                <memberOf key="att.ATMO-global-subset"/>
              </classes>
 
              <content>
                <rng:text/>
              </content>
            </elementSpec>
 
          </specGrp> 
 
        </div>
 
        <div xml:id="wm-example">
          <head>A short example of annotation</head>
          <p>An example may make the similarities and differences
            between the representation of a document using <gi>atmo:w</gi>
            and <gi>atmo:phr</gi> elements.</p>
          <p>The tier-by-tier horizontal representation is easier to build by hand,
            so it is not infrequently the initial annotated representation
            of documents.  In horizontal form, the second and third full sentences 
            on p. 7 of manuscript Prov. 351 might be represented
            as follows.  Some long lines have been broken for this 
            display.
            <eg><![CDATA[
        <tei:line n="4" th:sID="L4"/> 
        <atmo:s-wrap n="7.02">
          <atmo:s>
            <tei:hi rend="red">
              <atmo:phr>
                <atmo:lit>سودانی دفع</atmo:lit>
                <atmo:lat>swdany dfʾ</atmo:lat>
                <atmo:seg>swda-ny dfʾ</atmo:seg>
                <atmo:pos>N-ACC N</atmo:pos>
                <atmo:ilg>melancholy-ACC cure </atmo:ilg>
              </atmo:phr>
            </tei:hi>
            <atmo:phr>
              <atmo:lit>بولسون دیسه ترنجه بین 
              ذعفرانینی اوج کون</atmo:lit>
              <atmo:lat>bwlswn dysh trnǰh byn 
                z_ʾfranyny awǰ kwn</atmo:lat>
              <atmo:seg>bwl-swn dy-sh trnǰhbyn 
                z_ʾfrany-ny awǰ kwn</atmo:seg>
              <atmo:pos>LVN-3VOL Vt-COND 
                N N-ACC NU N</atmo:pos>
              <atmo:ilg>LVN-3VOL say-COND turnajabin 
                saffron-ACC three day</atmo:ilg>
            </atmo:phr>
            <tei:line th:eID="L4"/>
            <tei:line n="5" th:sID="L5"/>
            <atmo:phr>
              <atmo:lit>ناشته دا ایچسه سودانی دفع قیلور</atmo:lit>
              <atmo:lat>našth da ayčsh swdany
                dfʾ qylwr</atmo:lat>
              <atmo:seg>našth-da ayč-sh swda-ny
                dfʾ qyl-wr-0</atmo:seg>
             <atmo:pos>N-LOC Vt-COND N-ACC
                N LVN-IPFV.DIR-3</atmo:pos>
             <atmo:ilg>breakfast-LOC drink-COND 
               melancholy-ACC cure
               LVN-IPFV.DIR-3</atmo:ilg>
            </atmo:phr>
          </atmo:s>
          <tei:gloss xml:lang="en">
            To heal melancholy, if one drinks
            turnajabin with saffron at breakfast
            for three days, (his/her) melancholy heals.
          </tei:gloss>
        </atmo:s-wrap>
 
        <atmo:s-wrap n="7.03">
          <atmo:s> 
            <atmo:phr> 
              <atmo:lit>اجغ نمرسه لاردین</atmo:lit>
              <atmo:lat>aǰġ nmrsh lardyn </atmo:lat>
              <atmo:seg>aǰġ nmrsh-lar-dyn</atmo:seg>
              <atmo:pos>AJ PN.INDEF-PL-ABL</atmo:pos> 
              <atmo:ilg>spicy thing-PL-ABL</atmo:ilg>
            </atmo:phr>
            <tei:line th:eID="L5"/>
            <tei:line n="6" th:sID="L6"/>
            <atmo:phr>
              <atmo:lit>البته حزر قیلماق کراک</atmo:lit>
              <atmo:lat>albth ḥzr qylmaq krak</atmo:lat>
              <atmo:seg>albth ḥzr qyl-maq krak</atmo:seg>
              <atmo:pos>AV N LVN-GER XAJ</atmo:pos>
              <atmo:ilg>of.course abstain 
                LVN-GER XAJ</atmo:ilg>
            </atmo:phr>
          </atmo:s>
          <tei:gloss xml:lang="en">
            It is necessary to abstain 
            from spicy (اجغ) foods.
          </tei:gloss>
        </atmo:s-wrap>
            ]]></eg>
          </p>
          <p>In the word-by-word form, the same material will
            look like this:<eg><![CDATA[
        <atmo:s-wrap n="7.02">
          <atmo:s>
            <tei:hi rend="red">
              <atmo:w lit="سودانی" lat="swdany">
                <atmo:m lit="سودا" lat="swda" 
                  reg="swda" pos="N" ilg="melancholy"/>
                <atmo:m lit="نی" lat="ny" 
                  reg="ny" pos="ACC" ilg="ACC"/>
              </atmo:w>
              <atmo:w lit="دفع" lat="dfʾ">
                <atmo:m lit="دفع" lat="dfʾ" 
                  reg="dfʾ" pos="N" ilg="cure"/>
              </atmo:w>
            </tei:hi>
            <atmo:w lit="بولسون" lat="bwlswn">
              <atmo:m lit="بول" lat="bwl" 
                reg="bwl" pos="LVN" ilg="LVN"/>
              <atmo:m lit="سون" lat="swn" 
                reg="swn" pos="3VOL" ilg="3VOL"/>
            </atmo:w>
            <atmo:w lit="دیسه" lat="dysh">
              <atmo:m lit="دی" lat="dy" 
                reg="dy" pos="Vt" ilg="say"/>
              <atmo:m lit="سه" lat="sh" 
                reg="sh" pos="COND" ilg="COND"/>
            </atmo:w>
            <atmo:w lit="ترنجه بین" lat="trnǰh byn">
              <atmo:m lit="ترنجه بین" lat="trnǰh byn" 
                reg="trnǰhbyn" pos="N" ilg="turnajabin"/>
            </atmo:w>
            <atmo:w lit="ذعفرانینی" lat="z_ʾfranyny">
              <atmo:m lit="ذعفرانی" lat="z_ʾfrany" 
                reg="z_ʾfrany" pos="N" ilg="saffron"/>
              <atmo:m lit="نی" lat="ny" 
                reg="ny" pos="ACC" ilg="ACC"/>
            </atmo:w>
            <atmo:w lit="اوج" lat="awǰ">
              <atmo:m lit="اوج" lat="awǰ" 
                reg="awǰ" pos="NU" ilg="three"/>
            </atmo:w>
            <atmo:w lit="کون" lat="kwn">
              <atmo:m lit="کون" lat="kwn" 
                reg="kwn" pos="N" ilg="day"/>
            </atmo:w> 
            <tei:line th:eID="L4"/>
            <tei:line n="5" th:sID="L5"/>
            <atmo:w lit="ناشته دا" lat="našth da">
              <atmo:ow lit="ناشته" lat="našth">
                <atmo:m lit="ناشته" lat="našth" 
                  reg="našth" pos="N" ilg="breakfast"/>
              </atmo:ow>
              <atmo:ow lit="دا" lat="da">
                <atmo:m lit="دا" lat="da" 
                  reg="da" pos="LOC" ilg="LOC"/>
              </atmo:ow>
            </atmo:w>
            <atmo:w lit="ایچسه" lat="ayčsh">
              <atmo:m lit="ایچ" lat="ayč" 
                reg="ayč" pos="Vt" ilg="drink"/>
              <atmo:m lit="سه" lat="sh" 
                reg="sh" pos="COND" ilg="COND"/>
            </atmo:w>
            <atmo:w lit="سودانی" lat="swdany">
              <atmo:m lit="سودا" lat="swda" 
                reg="swda" pos="N" ilg="melancholy"/>
              <atmo:m lit="نی" lat="ny" 
                reg="ny" pos="ACC" ilg="ACC"/>
            </atmo:w>
            <atmo:w lit="دفع" lat="dfʾ">
              <atmo:m lit="دفع" lat="dfʾ" 
                reg="dfʾ" pos="N" ilg="cure"/>
            </atmo:w>
            <atmo:w lit="قیلور" lat="qylwr">
              <atmo:m lit="قیل" lat="qyl" 
                reg="qyl" pos="LVN" ilg="LVN"/>
              <atmo:m lit="ور" lat="wr" 
                reg="wr" pos="IPFV.DIR" ilg="IPFV.DIR"/>
              <atmo:m lit="" lat="" 
                reg="0" pos="3" ilg="3"/>
            </atmo:w>
          </atmo:s>
          <tei:gloss xml:lang="en">
            To heal melancholy, if one drinks
            turnajabin with saffron at breakfast
            for three days, (his/her) melancholy heals.
          </tei:gloss>
        </atmo:s-wrap>
 
        <atmo:s-wrap n="7.03">
          <atmo:s>
            <atmo:w lit="اجغ" lat="aǰġ">
              <atmo:m lit="اجغ" lat="aǰġ" 
                reg="aǰġ" pos="AJ" ilg="spicy"/>
            </atmo:w>
            <atmo:w lit="نمرسه لاردین" lat="nmrsh lardyn">	    
              <atmo:ow lit="نمرسه" lat="nmrsh">
                <atmo:m lit="نمرسه" lat="nmrsh" 
                  reg="nmrsh" pos="PN.INDEF" ilg="thing"/>
              </atmo:ow>
              <atmo:ow lit="لاردین" lat="lardyn"> 
                <atmo:m lit="لار" lat="lar" 
                  reg="lar" pos="PL" ilg="PL"/>
                <atmo:m lit="دین" lat="dyn" 
                  reg="dyn" pos="ABL" ilg="ABL"/>
              </atmo:ow>
            </atmo:w>
 
            <tei:line th:eID="L5"/>
            <tei:line n="6" th:sID="L6"/>
 
            <atmo:w lit="البته" lat="albth">
              <atmo:m lit="البته" lat="albth" 
                reg="albth" pos="AV" ilg="of.course"/>
            </atmo:w>
            <atmo:w lit="حزر" lat="ḥzr">
              <atmo:m lit="حزر" lat="ḥzr" 
                reg="ḥzr" pos="N" ilg="abstain"/>
            </atmo:w>
            <atmo:w lit="قیلماق" lat="qylmaq">
              <atmo:m lit="قیل" lat="qyl" 
                reg="qyl" pos="LVN" ilg="LVN"/>
              <atmo:m lit="ماق" lat="maq" 
                reg="maq" pos="GER" ilg="GER"/>
            </atmo:w>
            <atmo:w lit="کراک" lat="krak">
              <atmo:m lit="کراک" lat="krak" 
                reg="krak" pos="XAJ" ilg="XAJ"/>
            </atmo:w>
          </atmo:s>
          <tei:gloss xml:lang="en">
            It is necessary to abstain from spicy (اجغ) foods.
          </tei:gloss>
        </atmo:s-wrap>]]></eg></p>
          <p>Again long lines have been broken for display;
            in an XML editor, with each <gi>atmo:m</gi>
            element on a single line, the representation may be 
            slightly easier to read.
            The example shows <att>lit</att> and
            <att>lat</att> attributes on <gi>w</gi>;
            these are strictly speaking redundant and
            could be reconstructed from the content of
            the element, but they provide a convenient
            check on the results of any automated 
            segmentation process.  Redundant attributes
            <att>reg</att>, <att>pos</att> and <att>ilg</att>
            may also be used.  The schema will be written
            to allow them, and some software may populate
            and use them.
          </p>
 
          <note place="block">
            <p><label>Open issue:</label>  We do not currently
              have a story about when horizontal markup is used
              and when vertical markup is used.  Segmentation of
              texts into paragraphs and sentences in an XML
              editor will be easier with vertical markup;
              manual annotation without a specialized user
              interface such as an XForm will be easier with
              horizontal markup.</p>
            <p>It would be desirable to have a story.</p>
            <p>It would also be desirable to have stylesheets to
            translate back and forth between horizontal and
            vertical formats.</p>
          </note>
 
        </div>
      </div>
 
 

      <div xml:id="connections">
        <head>Connecting the extensions to the TEI schema</head>
        <p>This section describes the mechanisms by which
          the ATMO-specific elements described here are made
          reachable as children of TEI elements.
        </p>
        <p>The key requirement is that when character data (#PCDATA
        in DTD terms, <gi>rng:text</gi> in Relax NG, <code>mixed='true'</code>
        in XSD) is allowed directly within an element, the
        <gi>atmo:w</gi>, <gi>atmo:pc</gi>, and <gi>atmo:phr</gi>
        elements should also be allowed.</p>
        <p>It is not a requirement to make character data impossible
        in those contexts:  many elements like <gi>tei:p</gi> will
        be legal both within the <gi>tei:text</gi> element (where
        they should never contain character data directly, but only
        <gi>atmo:w</gi> and <gi>atmo:phr</gi> elements) and also
        within notes and within the TEI header (where normal character
        data is expected).</p>
        <p>We could achieve the necessary changes (or so it appears)
          by redefining the macros <ident>paraContent</ident>,
          <ident>specialPara</ident>, and <ident>phraseSeq</ident>,
          since these are used as the content models for TEI elements
          included in this customization and available as descendants 
          of <gi>tei:text</gi>.  (Other macros are used only for
          elements available only within the header.)
          <list>
            <item><p><ident>paraContent</ident> is used for
                  <gi>tei:p</gi>, <gi>tei:hi</gi>, and many other
                phrase-level elements.</p></item>
            <item><p><ident>specialPara</ident> is used for
                  <gi>tei:item</gi>, <gi>tei:note</gi>,
                  <gi>tei:q</gi>, <gi>tei:cell</gi>, and
                  <gi>tei:metamark</gi>.</p></item>
            <item><p><ident>phraseSeq</ident> is used for
                  <gi>tei:speaker</gi>, <gi>tei:term</gi>, and many other
                phrase-level elements.</p></item>
          </list>
        </p>
        <p>We ensure that <gi>atmo:phr</gi>,
	<gi>atmo:w</gi> and <gi>atmo:mw</gi>
	are available in all these macros by adding those
	elements to the TEI <ident>phrase</ident> class.
	We do the same for <gi>atmo:s-wrap</gi>
	in view S.</p>

        <p>Most of the elements we defin in the ATMO namespace are
          members of the attribute class
          <ident>att.global</ident>, from which they
          inherit the TEI's set of global attributes.
          We also describe a small subset of the global attributes
          for use on fine-grained elements, for which 
          <att>xml:base</att> and the like make little
          sense.</p>
        <specGrp xml:id="global-attribute-subset">
          <classSpec ident="att.ATMO-global-subset" type="atts">
            <desc>An attribute class for a subset of TEI attributes to
              be available on fine-grained elements like <gi>m</gi> and
                <gi>clitic</gi>, for which some of the attributes in the
              class <ident>att.global.attributes</ident> make little
              sense. </desc>
            <!--* deletions don't work 
            <classes>
              <memberOf key="att.global"/>
              <memberOf key="att.global.linking" mode="delete"/>
              <memberOf key="att.global.facs" mode="delete"/>
              <memberOf key="att.global.change" mode="delete"/>
              <memberOf key="att.global.source" mode="delete"/>
             
            </classes> *-->

            <attList>
              <attDef ident="xml:id" usage="opt">
                <gloss versionDate="2007-07-02" xml:lang="en"
                  >identifier</gloss>
                <desc versionDate="2005-10-10" xml:lang="en">provides a
                  unique identifier for the element bearing the
                  attribute.</desc>

                <datatype>xs:ID</datatype>
              </attDef>
              <attDef ident="n" usage="opt">
                <gloss versionDate="2007-07-02" xml:lang="en"
                  >number</gloss>
                <desc versionDate="2005-10-10" xml:lang="en">gives a
                  number (or other label) for an element, which is not
                  necessarily unique within the document.</desc>
                <datatype>xs:string</datatype>
              </attDef>
              <attDef ident="xml:lang" usage="opt">
                <gloss versionDate="2007-07-02" xml:lang="en"
                  >language</gloss>
                <desc versionDate="2013-01-07" xml:lang="en">indicates
                  the language of the element content using a
                    <soCalled>tag</soCalled> generated according to <ref
                    target="http://www.rfc-editor.org/rfc/bcp/bcp47.txt"
                    >BCP 47</ref>.</desc>
                <datatype>xs:language</datatype>
                <exemplum xml:lang="en">
                  <egXML xmlns="http://www.tei-c.org/ns/Examples"
                    xml:lang="en">
  <p> … The consequences of this rapid depopulation
    were the loss of the last
    <foreign xml:lang="rap"
	     >ariki</foreign>
    or chief (Routledge 1920:205,210)
    and their connections to ancestral
    territorial organization.</p>
                  </egXML>
                </exemplum>
                <remarks versionDate="2013-12-06" xml:lang="en">
                  <p>The xml:lang value will be inherited from the
                    immediately enclosing element, or from its parent,
                    and so on up the document hierarchy. It is generally
                    good practice to specify xml:lang at the highest
                    appropriate level, noticing that a different default
                    may be needed for the teiHeader from that needed for
                    the associated resource element or elements, and
                    that a single TEI document may contain texts in many
                    languages.</p>
                  <p>The authoritative list of registered language
                    subtags is maintained by IANA and is available at
                      <ptr
                      target="http://www.iana.org/assignments/language-subtag-registry"
                    />. For a good general overview of the construction
                    of language tags, see <ptr
                      target="http://www.w3.org/International/articles/language-tags/"
                    />, and for a practical step-by-step guide, see <ptr
                      target="https://www.w3.org/International/questions/qa-choosing-language-tags.en.php"
                    />.</p>
                  <p>The value used must conform with BCP 47. If the
                    value is a private use code (i.e., starts with
                      <val>x-</val> or contains <val>-x-</val>), a
                      <gi>language</gi> element with a matching value
                    for its <att>ident</att> attribute should be
                    supplied in the TEI header to document this value.
                    Such documentation may also optionally be supplied
                    for non-private-use codes, though these must remain
                    consistent with their <choice>
                      <abbr>IETF</abbr>
                      <expan>Internet Engineering Task Force</expan>
                    </choice> definitions.</p>
                </remarks>



              </attDef>





            </attList>
          </classSpec>
        </specGrp>
        
	<p><hi>The following schema fragments
	currently contain nonsense specifications being
	used for debugging purposes.  They can and
	should be ignored by those interested in the
	schemas described here.</hi></p>
        <specGrp xml:id="TEI-modifications-grammar-S">
          <specGrpRef target="#global-attribute-subset"/>
<!--
	  <elementSpec ident="p" mode="change">
	    <content autoPrefix="false">
	      <rng:ref name="probe-S"/>
	    </content>
	  </elementSpec>
	  <elementSpec ident="probe-S" mode="add">
	    <content>
	      <rng:empty/>
	    </content>
	  </elementSpec>-->	  
 
        </specGrp>

	

        <specGrp xml:id="TEI-modifications-grammar-T">
          <specGrpRef target="#global-attribute-subset"/>
         <!-- <elementSpec ident="p" mode="change">
            <content autoPrefix="false">
              <rng:ref name="probe-T"/>
            </content>
          </elementSpec>
          <elementSpec ident="probe-T" mode="add">
            <content>
              <rng:empty/>
            </content>
          </elementSpec>	  -->
 
          <!--<macroSpec ident="macro.paraContent" mode="replace">
            <content>
              <rng:zeroOrMore>
                <rng:choice>
                  <rng:text/>
                  <rng:ref name="s-wrap" ns="&nsATMO;"/>
                  <rng:ref name="model.inter"/>
                  <rng:ref name="model.milestoneLike"/>
                  <rng:ref name="model.noteLike"/>
                  <rng:ref name="figure"/>
                  <rng:ref name="metamark"/>
                </rng:choice>
              </rng:zeroOrMore>
            </content>
            <remarks>
              <p>Within the transcribed text, all paragraphs (or
              other paragraph-level units which use this macro
              for their content model) should be subdivided
              into <gi>s-wrap</gi> elements; they should <emph>not</emph>
              directly contain character data.
              </p>
              <p>Within the header, however, as within notes
              on the text (as opposed to notes which are part
              of the original text), normal paragraph structure
              should be used.</p>
              <p>The formulation of Schematron rules to enforce
              these practices would be desirable, but it seems
              likely that our validation will always be imperfect.</p>
            </remarks>
          </macroSpec>-->
        </specGrp>
      </div>
 
 
 
      <div xml:id="testing">
        <head>Preliminary testing of the schema</head>
        <!--<p>The initial test of this schema is taking place concurrently
          with the drafting of this document; it takes the form of
          creating a skeleton for Prov. 6 by hand and inserting
          transcription text into it from the existing transcriptions
          using a DTD created for an earlier project on Uyghur Light
          Verbs.</p>-->
        <p>The initial test of this schema is to use it to validate
          two toy documents with a few sentences of Wittgenstein
          in <ref target="toy-att.xml">vertical (word/morph)</ref>
          and <ref target="toy-phr.xml">horizontal (phr)</ref> form.
          Those documents are now valid against the schema defined here.          
        </p>
        <p>The next tests will involve the sample sentences from 
        Prov. 351 shown elsewhere in this document.</p>
        <p>The text schema has not yet been tested.</p>
        <p>When the augmented schemas are ready, they will be tested
        against Prov. 11 and Prov. 351 (possibly only fragments
        of the latter, at first).</p>
      </div>

    </body>
    <back>
      <div xml:id="references">
        <head>References</head>
        <listBibl>
 
          <bibl xml:id="DeRose2004" n="DeRose 2004">
            <author>DeRose, Steven</author>.
            <date>2004</date>.
            <title level="a">Markup overlap: 
              A review and a Horse</title>.
            Paper given at Extreme Markup Languages 2004, 
            Montr&eacute;al, sponsored by IDEAlliance. <!--*
            Available on the Web at 
            <ref target="http://www.mulberrytech.com/Extreme
            /Proceedings/html/2004/DeRose01/EML2004DeRose01.html"
            >http://www.mulberrytech.com/Extreme/Proceedings/html
            /2004/DeRose01/EML2004DeRose01.html"</ref>.
            *-->
            On the Web at <ref
            target="&extreme;/2004/DeRose01/EML2004DeRose01.html"
            >http://conferences.idealliance.org / extreme / html / 2004
            / DeRose01 / EML2004DeRose01.html</ref>.
          </bibl>
 
          <bibl xml:id="schonefeld2007" n="Schonefeld 2007">
            <author>Schonefeld, Oliver</author>.
            <date>2007</date>.
            <title level="a">XCONCUR and XCONCUR-CL:
            A constraint-based approach for the validation of concurrent markup</title>.
            In <title>Datenstrukturen f&uuml;r linguistische Ressourcen
            und ihre Anwendungen /
            Data structures for linguistic resources and applications:
            Proceedings of the Biennial GLDV Conference 2007</title>,
            ed. Georg Rehm, Andreas Witt, Lothar Lemnitzer.
            T&uuml;bingen:  Gunter Narr Verlag.
          Pp. 347-356.</bibl>

          <bibl xml:id="schonefeld2006" n="Schonefeld / Witt 2006">
            <author>Schonefeld, Oliver</author>,
            and 
            <author>Andreas Witt</author>.
            <date>2006</date>.
            <title level="a">Towards validation of concurrent markup</title>.
            Extreme Markup Languages 2006.
          </bibl>
 
          <bibl xml:id="rd" n="Sperberg-McQueen 2006">
            <author>Sperberg-McQueen, C. M.</author>
            <title level="a">Rabbit/duck grammars: a validation method 
              for overlapping structures</title>.
            In <title level="m">Proceedings of Extreme Markup Languages 2006</title>.
            On the Web at
            <ref
              target="&extreme;/2006/SperbergMcQueen01/EML2006SperbergMcQueen01.html"
              >http://conferences.idealliance.org / extreme / html
            / 2006 / SperbergMcQueen01 / EML2006SperbergMcQueen01.html</ref>.
          </bibl>
 

          <bibl xml:id="hsm1998" n="Sperberg-McQueen / Huitfeldt 1999">
            <author>Sperberg-McQueen, C. M.</author>,
            and
            <author>Claus Huitfeldt</author>. 
            <date>1999</date>.
            <title level="a">Concurrent document hierarchies in MECS and SGML</title>.
            <title level="j">Literary &amp; Linguistic Computing</title>
            14.1: 29-42.  <!--* 
                <ref>http://www.w3.org/People/cmsmcq/2000/poddp2000.html</ref>
                *-->
          </bibl>
 
        </listBibl>
      </div>
 
      <div xml:id="elementlist">
        <head>List of elements</head>
        <p>The requirements for ATMO markup are laid out in ATMO
          Technical Report 2015-01; the following list summarizes the
          needed textual features and identifies the relevant TEI
          element and its module (the module is currently missing from
          most entries). <list>

            <item>
              <p>Core module: Elements for basic text structure</p>
              <list>
                <item><p>core - head - heading</p></item>
                <item><p>core - head/add - heading added in
                  margin</p></item>
                <item><p>core - l - verse lines</p></item>
                <item><p>core - lg - stanzas in verse</p></item>
                <item><p>core- list (or: atmo:wordlist?) - word
                  list</p></item>
                <item><p>core - note type="commentary" place="margin" -
                    marginal note, commentary</p></item>
                <item><p>core - note type="corr" place="margin" -
                    marginal correction or change</p></item>
                <item><p>core - p - paragraph</p></item>
                <item><p>core - sp (etc.) - dramatic organization (eg
                    for ghazal)</p></item>
                <item><p>core - teiCorpus (or group) - text collection
                    or anthology</p></item>
              </list>
            </item>

            <item>
              <p>Core module: phrase-level elements</p>
              <p>Details of inscription and / or transcription</p>
              <list>
                <item><p>core - add (@rend for loc, manner) - additions
                    (with place information)</p></item>
                <item><p>core - add rend="overwriting" - overwriting of
                    an addition </p></item>
                <item><p>core - corr rend="overwriting" - overwriting of
                    a correction</p></item>
                <item><p>core - gap - a gap in the
                  transcription</p></item>
                <item><p>core - del (@rend for manner) -
                  deletions</p></item>
                <item><p>core - hi rend="overline" - overlining (when
                    function unknown)</p></item>
                <item><p>core - hi rend="red" - rubrication of uncertain
                    function</p></item>
                <item><p>core - unclear - material of doubtful
                    legibility and uncertain reading</p></item>
              </list>
              <p>Phrase-level elements marking semantically or
                linguistically special material</p>
              <list>
                <item><p>core - date - date</p></item>
                <item><p>core - emph - emphasized material (for
                    rhetorical emphasis)</p></item>
                <item><p>core - foreign lang="..." - words in other
                    languages if not otherwise marked</p></item>
                <item><p>core - gloss - sentence-level
                  translation</p></item>
                <item><p>core - mentioned - word or phrase mentioned not
                    used</p></item>
                <item><p>core - name - </p></item>
                <item><p>core - num - </p></item>
                <item><p>core - q - quoted material (or
                    pseudo-quoted)</p></item>
                <item><p>core - term - terminus techncus</p></item>
                <item><p>core - title - title of a work
                  mentioned</p></item>
              </list>
              <p>Other</p>
              <list>
                <item><p>core - index - semantic escape hatch for
                    recording passages of interest</p></item>
                <item><p>core - milestone - semantic escape hatch for
                    non-nesting structures</p></item>
              </list>
            </item>

            <item>
              <p>Gaiji</p>
              <list>
                <item><p>gaiji - g ? - special / non-standard
                  glyphs</p></item>
              </list>
            </item>

            <item>
              <p>Figures and tables</p>
              <list>
                <item><p>ft - figure - for embedding graphic images when
                    and as necessary</p></item>
                <item><p>ft - table - for tabular material</p></item>
              </list>
            </item>

            <item>
              <p>Hyperlinking</p>
              <list>
                <item><p>linking - ab - semantic escape hatch for
                    chunk-level elements</p></item>
                <item><p>linking - atmo:seal == ab type='seal' ? - seal
                    or stamp</p></item>
              </list>
            </item>

            <item>
              <p>Manuscript descriptions</p>
              <list>
                <item><p>msdescription - msDec/physDesc/objectDesc/p -
                    description of page frames</p></item>
              </list>
            </item>

            <item>
              <p>Names and dates</p>
              <list>
                <item><p>namesdates - [person,] persName, name, rs -
                    names of and references to persons</p></item>
                <item><p>namesdates - [place,] placeName, rs - place
                    names and references to places </p></item>
              </list>
            </item>

            <item>
              <p>TEI module</p>
              <list>
                <item><p>tei - * lang="..." - words in other languages
                  </p></item>
                <item><p>tei - * rend="..." - special
                  formatting</p></item>
                <item><p>tei - * rend="bold" - bolded letters (use hi if
                    needed) </p></item>
                <item><p>tei - * rend="larger" - larger letters (use seg
                    if needed) </p></item>
                <item><p>tei - * rend="overline" - overlining (when
                    function is known) </p></item>
                <item><p>tei - * rend="overwriting" - overwriting
                    generically</p></item>
                <item><p>tei - * rend="red" - rubrication</p></item>
                <item><p>tei - ? - bolded strokes (is this letters or
                    non?)</p></item>
                <item><p>tei - l @part - partial verse lines (@part is
                    in att.fragmentable)</p></item>
              </list>
            </item>

            <item>
              <p>Base text structure: core structural units</p>
              <list>
                <item><p>textstructure - text - text (free standing or
                    in collection)</p></item>
                <item><p>textstructure - front, body, back - main text
                    divisions (not clear how applicable these
                  are)</p></item>
                <item><p>textstructure - group - text collection or
                    anthology</p></item>
                <item><p>textstructure - div - section</p></item>
              </list>
              <p>Base text structure: specialized structural units</p>
              <list>
                <item><p>textstructure - byline - </p></item>
                <item><p>textstructure - docAuthor, docTitle, docDate -
                  </p></item>
                <item><p>textstructure - epigraph - </p></item>
                <item><p>textstructure - opener, closer - </p></item>
                <item><p>textstructure - postscript - </p></item>
                <item><p>textstructure - signed - </p></item>
                <item><p>textstructure - titlePage - </p></item>
                <item><p>textstructure - titlePart - </p></item>
              </list>
            </item>

            <item>
              <p>Transcription of primary sources</p>
              <list>

                <item><p>transcr - ? handShift - information about which
                    hand wrote this bit (also handDesc and handNote from
                    msdesc)</p></item>
                <item><p>transcr - damage agent="water" -
                    waterstaining</p></item>
                <item><p>transcr - damage type="cut" - cut paper
                  </p></item>
                <item><p>transcr - damage type="tear" - torn paper
                  </p></item>
                <item><p>transcr - fw type="catch" -
                  catchwords</p></item>
                <item><p>transcr - fw type="folnum" - folio numbers
                  </p></item>
                <item><p>transcr - fw type="pagenum" - page numbers
                  </p></item>
                <item><p>transcr - metamark - marks to show text
                    transposition, anchoring, etc.</p></item>
                <item><p>transcr - space - unusual horizontal
                    whitespace</p></item>

              </list>
            </item>

            <item>
              <p>Verse</p>
              <list>
                <item><p>verse - caesura - caesura</p></item>
              </list>
            </item>

            <item>
              <p>Extensions in ATMO namespace:</p>
              <list>
                <item><p>- atmo:* - various structural elements needed
                    for annotation</p></item>
                <item><p>- atmo:al-abd? - horizontal line after
                    salutation</p></item>
                <item><p>- atmo:affix - inflectional affix</p></item>
                <item><p>- atmo:clitic - clitics (may be polymorphemic,
                    but we will always treat them as atomic) </p></item>
                <item><p>- atmo:mm - segment (not necessarily strictly
                    monomorphemic, but not to be analysed further for
                    ATMO, e.g. for a polymorphemic stem) </p></item>
                <item><p>- atmo:mw - orthographic word (containing
                      <gi>w</gi>)</p></item>
                <item><p>- atmo:ow - orthographic word (within
                      <gi>w</gi>)</p></item>
                <item><p>- atmo:s-wrap - sentence bundle, wraps
                      <gi>s</gi> and <gi>gloss</gi></p></item>
              </list>
            </item>
            <item>
              <p>Near-clones from TEI analysis and interpretation</p>
              <p>Note that it's not immediately clear whether it's
                better to use the TEI elements with modified names, or
                to replace them with elements in the ATMO namespace.
                Current tentative decision is to put the elements in the
                ATMO namespace.</p>
              <list>
                <item><p>analysis - c - character (?)</p></item>
                <item><p>analysis - m - morpheme; note that
                      <gi>atmo:clitic</gi>, <gi>atmo:mm</gi> and
                      <gi>atmo:affix</gi> are clones of this</p></item>
                <item><p>analysis - pc - punctuation
                  character</p></item>
                <item><p>analysis - s - sentence</p></item>
                <item><p>analysis - w - word</p></item>
              </list>
            </item>
            <item>
              <p>Elements needed if we decide to generate a horizontal
                schema.</p>
              <list>
                <item><p>- atmo:ipa - IPA version of text</p></item>
                <item><p>- atmo:lat - literal transliteration (lit / lat
                    losslessly interconvertible)</p></item>
                <item><p>- atmo:lit - literatim transcription</p></item>
                <item><p>- atmo:reg - regularized version of
                    segmentation</p></item>
                <item><p>- atmo:seg - segmentation of literal
                    transliteration</p></item>
              </list>
            </item>
          </list></p>
        <p>Some things have been consciously <emph>omitted</emph> from
          this list.</p>
        <list>
          <item>
            <p>Omitted from the core module</p>
            <p>Boundary markers for physical organization. These are all
              strictly redundant with view P.</p>
            <list>
              <item><p>core - pb, cb, lb - page, column, and line
                  breaks</p></item>
              <item><p>core - pb @facs - links to page images</p></item>
            </list>
          </item>
          <item>
            <p>ATMO namespace: considered but for now rejected.</p>
            <list>
              <item><p>- atmo:elicited (or tei:q?) - elicited
                  sentence</p></item>
              <item><p>- atmo:stem - stems (may be polymorphemic)
                </p></item>
            </list>
          </item>
        </list>
      </div>
 
      <div>
        <head>Alternate representations and analyses of word structure</head>
        <p>The subsections which follow record some alternative
        XML representations and linguistic analyses for some of the
        examples given in earlier sections.  They are given in order
        to record some of the design alternatives which seem to
        present themselves; they are not supported by the markup
        defined in this document.</p>
        <div xml:id="alternative-representations">
          <head>Sub-elements instead of attributes</head>
          <p>Instead of attributes, analysis might use
          sub-elements, either for all layers of annotation or for some.</p>
          <p>The children of <gi>m</gi> here are alternative
          representations of the morph; the attributes of <gi>m</gi>
          give properties of the morph other than its written form. 
            <eg><![CDATA[
  <w>
    <m pos="N">
     <reg>character</reg>
     <ipa>...</ipa>
    </m>
    <m pos="PL">
      <reg>s</reg>
      <ipa>...</ipa>
    </m>
  </w>
  ]]></eg>
          <eg><![CDATA[
  <w>
    <m pos="N" ilg="father">
     <reg>ata</reg>
     <ipa>atʰa</ipa>
    </m>
    <m pos="POSS3" ilg="POSS3">
      <reg>si</reg>
      <ipa>si</ipa>
    </m>
    <m pos="DAT" ilg="DAT">
      <reg>ghe</reg>
      <ipa>ɣɛ</ipa>
    </m>
  </w>
  ]]></eg>
          </p>
 
          <p>In this form, the example <mentioned>on besh</mentioned> shown <ref
          target="#pos-irr-1O-0M">
          above</ref> (<ptr target="#pos-irr-1O-0M"/>) would take
          the following representation.
          <eg><![CDATA[
  <w>
    <m pos="NU" ilg="fifteen">
      <ow>
        <reg>on</reg>
        <ipa>on</ipa>
      </ow>
      <ow>
        <reg>besh</reg>
        <ipa>bɛʃ</ipa>
      </ow>
    </m>
  </w>
  ]]></eg>
            The example might look more natural if the <gi>m</gi>
            element were deleted and its attributes transferred to its
            parent <gi>w</gi>, but that would introduce an unnecessary
            special case into the markup. With the <gi>m</gi> present,
            we can preserve the invariant that part-of-speech and
            interlinear gloss are given for morphs, not for words. </p>
        </div>
        <div>
          <head>Overlap of orthographic words and morphs</head>
          <p>The example shown <ref target="#pos-irr-exx-overlap"
              >above</ref> (<ptr target="#pos-irr-exx-overlap"/>) could
            also mark up orthographic words explicitly; such explicit
            marking would require the use of fragmentation and virtual
            joins. It's a straightforward example of overlapping
            structures. Using Trojan-Horse markup for the ow elements: 
            <eg><![CDATA[
  <w>
    <m pos="NU" ilg="twenty-eight"> 
      <reg><ow id="owr23s" eid="owr23e"
        />yigirme<ow 
        id="owr23e"
        sid="owr23s"/>
        <ow id="owr24s" eid="owr24e"
        />sekkiz</reg>
      <ipa><ow id="owi23s" eid="owi23e"
        />jigirmɛ<ow 
        id="owi23e"
        sid="owi23s"/>
        <ow id="owi24s" eid="owi24e"
        /> sɛkkʰiz</ipa>
    </m>
    <m pos="ORD" ilg="ORD">
      <reg>inchi<ow id="owr24e sid="owr24s"/></reg>
      <ipa>inʧʰi<ow id="owr24e sid="owr24s"/></ipa>
    </m>
  </w>
  ]]></eg>
          </p>
          <p>Managing this from an XForms interface would be unpleasant.
          So I don't propose to try.
          </p>
        </div>
        <div>
          <head>Alternative analysis:  words nesting within words</head>
          <p>Each of the cases described in section
          <ptr target="#pos-irr-1L-nO"/>
          could also be analysed by identifying
          the orthographic words as linguistic words insted of as just
          irregularities of spelling.  If the rest of the analysis
          remains the same, we then have linguistic words nesting within
          linguistic words.</p>
 
          <p>For morph-level annotation that would only involve renaming
          the <gi>ow</gi> elements as <gi>w</gi> elements; for
          word-level annotation it would suggest providing annotation
          attributes at both levels.  For example
 
          <eg><![CDATA[
  <w reg="yigirme sekkizinchi" ipa="jigirmɛ sɛkkʰizinʧʰi" 
     pos="NU.ORD" ilg="twenty-eighth" >
    <w reg="yigirme" ipa="jigirmɛ" pos="NU" ilg="twenty">
      <m reg="yigirme" ipa="jigirmɛ" pos="NU" ilg="twenty" />
    </w>
    <w reg="sekkizinchi" ipa="sɛkkʰizinʧʰi" pos="NU.ORD" ilg="eighth">
      <m reg="sekkiz" ipa="sɛkkʰiz" pos="NU" ilg="eight" />
      <m reg="inchi" ipa="inʧʰi" pos="ORD" ilg="ORD" />
    </w>
  </w>
  ]]></eg>
          (The NU.ORD tag is a new coinage for this case; it does not occur in
          the UyLVs data.)
          </p>
          <p>
            Self-nesting w elements allow us to 
            characterize English <mentioned>in spite of</mentioned>
            both as a single word and as a word sequence 
            (Brown POS-tags; Brown's
            analysis of <q>in spite of</q>):
            <eg><![CDATA[
     <w pos="IN">
        <w pos="IN">in</w>
        <w pos="NN">spite</w>
        <w pos="IN">of</w>
     </w>
     ]]></eg>
          </p>
          <p>The examples just shown each show (a) a linguistic word
          made up of multiple orthographic words, with (b) each
          orthographic word made up of one or more morphs.  See also
          <ref target="#pos-irr-1O-0M">One morph written across multiple
          orthographic words</ref> above for examples in which (a)
          holds but not (b).
          </p>
          <p>The working decision here is to distinguish two states
          of affairs, each involving two levels which have some
          claim to be considered <soCalled>words</soCalled>:
          </p>
          <list>
            <item>
              <p>one with whitespace, in which multiple tokens are involved, 
              to be analysed and encoded as shown 
              <ref target="#pos-irr-1L-nO">above</ref>
              (<ptr target="#pos-irr-1L-nO"/>).</p>
            </item>
            <item>
              <p>one with no whitespace, in which a single token is involved, 
              to be analysed and encoded as shown 
              <ref target="#pos-irr-1O-nL">above</ref>
              (<ptr target="#pos-irr-1O-nL"/>).</p>
            </item>
          </list>
          <p>
            For purposes of analysis, each of these cases can be
            understood in any of three ways:
          </p>
          <list>
            <item>
              <p>
                as a single word at the outer level, with a more
                complicated internal structure than usual
                (<gi>mw</gi> is a single word, 
                <gi>ow</gi> is not a word, 
                <gi>w</gi> within <gi>mw</gi> is not a word, 
                <gi>w</gi> containing <gi>ow</gi> is a word, 
                otherwise 
                <gi>w</gi> is a word), or 
              </p>
            </item>
            <item>
              <p>
                as a sequence of words grouped into a larger structure
                and in the latter case written without whitespace
                (<gi>mw</gi> is not a word, 
                <gi>ow</gi> is a word, 
                <gi>w</gi> within <gi>mw</gi> is a word, 
                <gi>w</gi> containing <gi>ow</gi> is not a word, 
                otherwise 
                <gi>w</gi> is a word), or 
              </p>
            </item>
            <item>
              <p>
                as involving words nesting within other words.
                (<gi>mw</gi>,
                <gi>ow</gi>, and
                <gi>w</gi> all mark words).
              </p>
            </item>
          </list>
          <p>
            Note that the encoder may possibly feel, when
            confronting the texts, that some instances of
            <gi>ow</gi> are <soCalled>really</soCalled> words, and
            others not, or similarly for <gi>mw</gi> or for
            <gi>w</gi> appearing as parent of <gi>ow</gi> or child
            of <gi>mw</gi>. The markup defined in this document
            provides no dedicated way to distinguish those cases.
            (The generic TEI mechanisms for analysis and
            interpretation can, of course, be used.)
          </p>
        </div>
      </div>
 
      <div xml:id="story">
        <head>Development (and revision history) of this document</head>
        <p>It may be helpful for some readers to record the steps in the
          development of this document. (And if it's not useful for
          anyone but the author, then at least this information is
          tucked away in an appendix.)</p>
        <p>The document began as a verbatim copy of the ODD document for
          the ATMO transcription schema; the title and introductory
          prose were then changed.</p>
        <p> The requirements for ATMO markup were then reviewed and a
          list was made of textual features that need to be recordable,
          together with the TEI elements or attributes for those
          features, when such constructs are available in TEI; ATMO
          elements are suggested when TEI doesn't cover a feature. The
          TEI Guidelines were then consulted to identify the module in
          which each of these elements and attributes is defined. Sorted
          by module, the resulting list became a checklist for the
          inclusion of modules and elements in modules. </p>
        <p> A separate list was created independently and used as a
          check on the first list. The list in <ptr
            target="#elementlist"/> is the result of merging the two
          lists.</p>
        <p>The ODD fragments in the text were then adjusted to embed the
          correct modules and select the correct elements and
          attributes.</p>
      </div>
 
      <div xml:id="workplan">
        <head>Work plan</head>

        <p>As noted above, this document is currently incomplete. The
          following paragraphs indicate the expected work plan for
          finishing it.</p>

        <list>
          <item>
            <p>Make this document generate base grammars for both text-
              and sentence-oriented views of document. </p>
            <list>
              <item><p> (DONE) Generate some schema (Andrews Test).
                </p></item>
              <item><p> (DONE) Make selection of elements. </p></item>
              <item><p> (DONE) Make second independent selection of
                  elements. </p></item>
              <item><p> Merge two lists of elements, organize by module.
                </p></item>
              <item><p> (DONE) Make two toy test documents that should
                  be valid. Optionally make some that should not.
                  </p><p> A document using attributes for annotation is
                  at <ref target="toy-att.xml">toy-att.xml</ref>, a
                  parallel document using sub-elements is at <ref
                    target="toy-sub.xml">toy-sub.xml</ref>. These
                  examples use sentences from Wittgenstein as their
                  object text. </p></item>
              <item><p> (DONE) Adjust schema specification elements in
                  the ODD to include the correct TEI elements.
                </p></item>
              <item><p> (IN PROCESS) 
                Content model adjustments: <list>
                    <item><p> Add extension elements (atmo:*).
                        Incorporate them into the appropriate classes.
                      </p></item>
                    <item><p> Redefine low-level content-model macros,
                        if possible, to reduce number of necessary
                        redefinitions. </p></item>
                    <item><p> Modify other TEI content models as needed.
                      </p></item>
                  </list>
                </p>
                <p>First cut at all elements and content models
                has been done.</p>
              </item>
              <item><p>(DONE, for toy-att and toy-phr)
                Revise ODD and schema until toy documents are
                  correctly validated. </p></item>
              <item><p> (DONE) Make one-sentence test case from project
                  data in attribute form. </p><p> A shallow document
                  segmented into words (<gi>w</gi>) with each word
                  containing a single segment (<gi>m</gi>) is in this
                  directory at <ref target="toy-8-baretext.wm.xml"
                    >toy-8-baretext.wm.xml</ref>. </p><p> A companion
                  shallow document with <gi>atmo:li</gi> and
                    <gi>atmo:lat</gi> elements bundled into phrases
                    (<gi>phr</gi>) is in this directory at <ref
                    target="toy-8-baretext.phr.xml"
                    >toy-8-baretext.phr.xml</ref>. </p></item>
              <item><p> Check that Oxygen interface does OK with
                  Perso-Arabic script in attribute values. If yes,
                  forget sub-element style. If no, forget attribute
                  style. </p></item>
              <item><p> Make hand-converted version of Prov. 11 in
                  text-view. Make hand-converted version of Prov. 11 in
                  sentence-view. </p></item>
              <item><p> Adjust Odd to generate both text- and
                  sentence-schemas. These will differ in verse; it may
                  be that they differ nowhere else. </p></item>
            </list>
          </item>
          <item>
            <p>Make augmented grammars for text, sentence, and page.</p>
            <list>
              <item><p> (DONE) Decide what to do about bi-dialectal
                  elements (retain? represent redundantly as both
                  matryoshka elements and Trojan-Horse elements?) </p>
                <p> Answer: no redundancy. Elements common to multiple
                  views are represented in conventional matryoshka form
                  whenever any applicable view is dominant. Some
                  elements may be common to two views but not to all
                  three. </p>
              </item>
              <item><p> Classify all elements: <list>
                    <item><p>possible descendants of
                      sourceDoc</p></item>
                    <item><p>actually occurring descendants of
                        sourceDoc</p></item>
                    <item><p>possible descendants of text (in sentence
                        view)</p></item>
                    <item><p>possible descendants of text (in text
                        view)</p></item>
                  </list>
                </p><p>Reclassify: <list>
                    <item><p>page-view only</p></item>
                    <item><p>text-view only</p></item>
                    <item><p>sentence-view only</p></item>
                    <item><p>common (PT, SP, ST, PST)</p></item>
                  </list> Cross-classify with those appearing in TEI
                  header. </p></item>
              <item><p> Classify all elements: <list>
                    <item><p>possible descendants of
                      sourceDoc</p></item>
                    <item><p>actually occurring descendants of
                        sourceDoc</p></item>
                    <item><p>possible descendants of text (in sentence
                        view)</p></item>
                    <item><p>possible descendants of text (in text
                        view)</p></item>
                  </list>
                </p></item>
            </list>
          </item>
          <item>
            <p>(Related but distinct) Draft necessary transforms and
              specify ancillary processes.</p>
            <list>
              <item><p> Draft atmo-sourcedoc-to-bare-text.xsl, produce
                  output. </p></item>
              <item><p> Draft atmo-sourcedoc-to-bare-text.xsl, produce
                  valid output (with PI transform). </p></item>
              <item><p> Draft atmo-sourcedoc-to-bare-text.xsl, produce
                  correct output (with PI transform). </p></item>
              <item><p> Adjust transform to preserve comments.
                </p></item>
              <item><p> Adjust transform to emit annotation? </p></item>
            </list>
          </item>


          <item><p> Adjust schema to handle correct transform output.
            </p></item>
          <item><p> Sketch extension to schema for sentence annotation.
            </p></item>
        </list>

        <div xml:id="tag-inventory">
          <head>Selection of element types</head>
          <p>The initial form of this document was a verbatim copy of
            document ATMO Technical Report 2015-03. That means it
            includes the wrong TEI modules and the wrong selection of
            elements from the modules that are correctly included.</p>
          <p>The initial task is to get the correct set of elements into
            the schema.</p>
          <p>The next task is to define ATMO extensions and
            modifications, from paragraph down.</p>
        </div>

        <div xml:id="axes-gloss-tier-seg">
          <head>Presence or absence of translation and annotation</head>
          <p>We expect to want more than one text-oriented (or
            sentence-oriented) schema. Axes of variation are expected to
            include: <list>
              <item>
                <p>Whether a sentence-by-sentence English translation is
                  required, optional, or not expected.</p>
              </item>
              <item>
                <p>Whether tier-by-tier annotation of each sentence is
                  required, optional, or not expected.</p>
              </item>
              <item>
                <p>Whether segment-by-segment annotation of each
                  sentence is required, optional, or not expected.</p>
              </item>
            </list>
          </p>
          <p>At the moment, the expected plan is to produce a schema
            making all three items optional, and then to produce
            variants which forbid all three (minimal documents, before
            any translation or annotation), require translations but
            forbid annotation, or require both translations and
            tier-by-tier annotation.</p>
          <p>Other variants may prove desirable.</p>
          <p>Current work plan: <list>
              <item><!-- +g +t +s *-->
                <p>Make a schema which provides for, and requires,
                  translation, tier-by-tier annotation, and
                  segment-by-segment annotation. This is a simple
                  initial goal.</p>
              </item>

              <item><!-- +g +t -s *-->
                <p>Suppress the segment-by-segment annotation to get a
                  schema which requires translation and tier-by-tier
                  annotation and forbids segment-by-segment annotation.
                  This may be the state produced by the initial
                  annotation interface and may be the final state for
                  some project data.</p>
              </item>

              <item><!-- -g -t -s *-->
                <p>Suppress the translation to get a sentence-oriented
                  schema for minimal documents.</p>
              </item>

              <item><!-- ?g ?t ?s *-->
                <p>Make translation and both forms of annotation
                  optional, to allow validation of documents which have
                  been partially but not completely translated and/or
                  annotated.</p>
              </item>
            </list> Versions of the schema with other combinations of
            required, optional, or prohibited appearance of translation,
            tiered annotation, and segmented annotation may also be
            desirable. </p>
        </div>
        <div xml:id="concur-hack">
          <head>Concurrent markup types</head>

          <p>We wish to achieve loss-less round-trip translation between
            the page- and line-oriented schema used for initial
            transcription (and display with facsimiles) and the text-
            and sentence-oriented schema defined here.</p>
          <p>The current plan is to ensure the non-lossy translation by
            including markup for both schemas in the document; this is a
            straightforward instance of concurrent markup, of the kind
            supported by the SGML feature <code>CONCUR</code>.</p>
          <p>The main open question is how information from the page-
            and line-oriented transcription schema used in the project
            will be represented in the text- and sentence-oriented
            schema defined here (and vice versa).</p>
          <p>So far, there appear to be three candidate notations. <list>
              <item>
                <p>Standard Trojan-Horse markup.</p>
                <p>In this notation originally defined in DeRose 2004, a
                    <code>tei:line</code> element in the transcription
                  schema will be represented as a pair of empty
                  co-indexed <code>tei:line</code> elements, for example
                    <code>... &lt;tei:line n="12" sID="s23L2"/> ...
                    &lt;tei:line eID="s23L2"/> ...</code>
                </p>
                <p>This is the preferred format but will require
                  substantial modifications to the schema: elements of
                  the transcription schema must be declared with the
                  additional attributes <code>sID</code> and
                    <code>eID</code>, and they must be introduced into
                  the content models of all potential parent
                  elements.</p>
              </item>
              <item>
                <p>Generic Trojan-Horse markup.</p>
                <p>Instead of using the <soCalled>natural</soCalled>
                  start-tags of standard Trojan Horse markup, this
                  notation uses a single form, defined in a Trojan Horse
                  namespace. The <code>tei:line</code> example given
                  above will take the form: <code>... &lt;th:start
                    gi="tei:line">&lt;th:avs name="n"
                    value="12"/>&lt;/th:start> ... &lt;th:end
                    gi="tei:line"/> ...</code>.</p>
                <p>Coindexing is redundant but may be added as a data
                  integrity safeguard.</p>
                <p>This requires slightly less work on the schema: the
                  elements of the transcription schema need not be
                  declared, because they will not appear as elements in
                  the document. Only the elements of the Trojan Horse
                  namespace need be declared. But they must be added to
                  essentially all content models in the schema.</p>
              </item>
              <item>
                <p>Processing instructions.</p>
                <p>No elements are used to represent the transcription
                  hierarchy; instead, processing instructions are used.
                  In this notation, the <code>tei:line</code> example
                  will take the form: <code>... &lt;?th tei:line n="12"
                    ?> ... &lt;?th /tei:line ?> ...</code>.</p>
                <p>Again, coindexing may be added.</p>
                <p>This requires no work on the schema, but does not
                  allow useful queries against both document types.</p>
              </item>
            </list>
          </p>
          <p>At the moment, the plan is to get something working as
            quickly as possible using processing instructions, and to
            shift to standard Trojan Horse markup if time allows before
            the end of the project.</p>
        </div>
        <div xml:id="todo-doc">
          <head>Improve documentation</head>
          <p>Once the schemas are working reasonably well, the prose
            portions of this document need to be fleshed out and the
            documentation for ATMO use of elements needs to be
            improved.</p>
          <p>The transcription schema also needs to be augmented to
            handle Trojan-Horse markup of elements declared here.</p>
        </div>
      </div>
    </back>
  </text>
</TEI>
