This chapter describes an additional tag set which may be used for
the encoding of proper names and other phrases descriptive of persons,
places, organizations, and also of dates and times, in a manner more
detailed than that possible using the elements already provided for
these purposes in the core tag set described in chapter .
In section it was noted that the elements
provided in the core allow the encoder to specify that a given
text segment is a proper noun, or a referring string, and
to specify the kind of object named or referred to only by supplying a
value for the type attribute. The elements provided by the
present tag set allow the encoder both to supply a detailed
sub-structure for such referring strings, and also to distinguish
explicitly between names of persons, places or organizations.
Similarly, the elements provided here allow the encoder to supply a
detailed analysis of the component parts of any expression which
denotes a date or time, which is not possible using the elements
described in section .
It should be noted however that no provision is made by the present
tag set for the representation of the abstract structures, or
virtual objects to which names or dates may be
said to refer. In simple terms, where the core tag set allows one to
represent a name, this additional tag set allows one to
represent a personal name, but neither provides for the
direct representation of a person. Appropriate
mechanisms for the encoding of such interpretative gestures may be
found in chapters and .
To enable the additional tag set described in the present chapter,
a parameter entity TEI.names.dates must be declared in
the document type subset with the value INCLUDE, as
further described in section . A document using the
prose base tag set and this additional tag set will thus begin as
follows:
]>
]]>
The chapter begins by discussing additional tags for the
encoding of component parts of personal names (section , place names (section ) and
organizational names (section ). Detailed encoding of
dates and times is described in section .
The additional tag set for names and dates, included in the file
teind2.dtd, has the following overall
structure:
]]>
When this tag set is enabled, three additional element classes
called persPart, placePart, and
temporalExpr are declared. The parameter entities
corresponding with these classes are declared in the file
teind2.ent, as follows:
]]>
Personal Names
The core rs and name elements can distinguish
names in a text but are insufficiently powerful to mark their internal
components or structure. To conduct nominal record linkage or even to
create an alphabetically sorted list of personal names, it is
important to distinguish between a family name, a forename and an
honorary title. Similarly, when confronted with a referencing string
such as John, by the grace of God, king of England, lord of
Ireland, duke of Normandy and Aquitaine, and count of Anjou, the
analyst will often wish to distinguish among components giving some
hint as to the status, occupation or residence of the person to whom
the name belongs. The following elements are provided for these and
related purposes:
contains a proper noun or proper-noun phrase referring to
a person, possibly including any or all of the person's
forename,
surname, honorific, added names, etc.contains a family (inherited) name, as opposed to a given,
baptismal, or nick name.contains a forename, given or baptismal name.contains a name component which indicates that the referent
has a
particular role or position in society, such as an
official title or
rank.contains an additional name component, such as a nickname,
epithet, or alias, or any other descriptive phrase used
within a
personal name.contains a connecting phrase or link used within a name but
not
regarded as part of it, such as van der or
of.contains a name component used to indicating generational
information, such as Junior, or a number used in a
monarch's
name.
As members of the names class, all of these
elements share the following attributes:
provides an alternative identifier for the object being
named, such as a database record key.gives a normalized or regularized form of the name used.
Additionally, all of the above elements except for persName
are members of the class personPart, and thus
share the following attributes:
provides more culture- linguistic- or application- specific
information used to categorize this name component.indicates whether the name component is given in full, as
an abbreviation or simply as an initial.
Legal values are:
the name component is indicated only by one initial.the name component is spelled out in full.the name component is given in an abbreviated form.specifies the sort order of the name component in relation
to others within the personal name.
The persName element may be used in preference to the
general name element irrespective of whether or not the
components of the personal name are also to be marked. Its
key and reg attributes are used in exactly the
same way as those on the rs and name elements (see
section ). The tag persName is synonymous
with the tag name type=person, except that its
type attribute allows for further subcategorization of the
personal name for example as a married, maiden,
pen, pseudo or religious name. Consequently the
following examples are equivalent:
David Paul Brown has suffered the furniture of
his office to be seized the third time for rent.
That silly man
David Paul Brown has suffered the furniture of
his office to be seized the third time for rent.
That silly man
David Paul Brown has suffered ...
That silly man
David Paul Brown has suffered ...
]]>
The persName element is more powerful than the
rs and name elements because distinctive name
components occurring within it can be marked as such.
Many cultures distinguish between a family or inherited
surname and additional personal names, often known as
given names. These should be tagged using the
surname and forename elements respectively and may
occur in any order:
Roosevelt,
FranklinDelanoFranklinDelanoRoosevelt
]]>
The type attribute may be used with both
forename and surname elements to provide further
culture- or project- specific detail about the name component, for
example:
FranklinDelanoRooseveltMargaretHildaRobertsThatcherMuhammadAli
]]>
In the following two examples the type attribute of the
surname element is used to indicate so-called
double-barrelled or hyphenated surnames:
KaraHattersley-SmithNormanSt John Stevas
]]>
In most cases, patronymics should be treated as forenames, thus:
Snorri>
Sturluson>
to combine the two traditions in cyclic form.
]]>
When a patronymic is used as a surname, however (e.g. by an individual
who otherwise would have no surname, but lives in a culture which
requires surnames), it may be tagged as such:
Finnur> Jonsson>
acknowledged the artificiality of the
procedure: As Njála> now begins,
no original saga ever began.
]]>
In the following example, the type attribute is used
to distinguish a patronymic from other forenames:
SergeiMikhailovicUspensky
]]>
This example also demonstrates the use of the sort
attribute common to all members of the
personPart class; its effect is to state the
sequence in which forename and surname elements should
be combined when constructing a sort key for the name.
Some names include generational or dynastic information, such as
Junior or senior, or a number: the genName
element may be used to distinguish these from other parts of the name,
as in the following examples:
MarquesJunior,
HenriqueCharlesII
]]>
It is also often convenient to distinguish phrases (historically
similar to the generational labels mentioned above) used to link parts
of a name together, such as von, of, de etc. It
is often a matter of arbitrary choice whether or not such components
are regarded as part of the surname or not; the nameLink
element is provided as a means of making clear what the correct usage
should be in a given case, as in the following examples:
Mmede laRochefoucaultWalterde la Mare
]]>
Finally, the addName and roleName elements are
used to mark all name components other than those already listed. The
distinction between them is that a roleName encloses an
associated name component such as an aristocratic or official title
which exists in some sense independently of its bearer. The
distinction is not always a clear one. As elsewhere, the
type attribute may be used with either element to supply
culture- or application- specific distinctions. Some typical values
for this attribute for names in the Western European tradition follow:
Here are some further examples of the usage of these elements:
PrincessGraceGrandmaMosesMrsRobinsonSaintAugustinePresidentBillClintonColonelGaddafiFrederickthe Great
]]>
A name may have any combination of the above elements:
GovernorEdmundG.
JerryMoonbeamBrownJr.
]]>
Although highly flexible, these mechanisms for marking
personal name components will not cater for every personal name
and processing need. Where the internal structure of personal
names is highly complex or where name components are
particularly ambiguous, feature structures are recommended as
the most appropriate mechanism to mark and
analyze them, as further discussed in chapter .
The elements discussed in this section are formally defined as
follows:
]]>
Place Names
Like other proper nouns or noun phrases used as names, place names
can simply be marked up with the rs element, or with the
name element.For cartographers and historical geographers,
however, the component parts of a place name provide important
information about the relation between the name and some spot in space
and time. They also provide importance evidence in historical
linguistics. For such applications and others in which the internal
structure of a place name is to be encoded, the placeName>
element and its subcomponents should be used.
placeNamecontains an absolute or relative place name. settlementcontains the name of the smallest component of a
place
name expressed as a hierarchy of geo-political or
administrative units as in Rochester, New York;
Glasgow, Scotland.regionin an address, contains the state, province, county or
region
name; in a place name given as a hierarchy of
geo-political
units, the region is larger or
administratively
superior to the settlement and
smaller or administratively less important than the
country.countryin an address, gives the name of the nation, country,
colony, or
commonwealth; in a place name given as a
hierarchy of geo-political
units, the country is
larger or administratively superior
to the region
and smaller than the bloc.bloca geo-political unit containing one or more nation states.geogNamea name associated with some geographical feature such as
Windrush Valley or Mount Sinai.geogcontains a common noun identifying some geographical
feature
contained within a geographic name, such as
valley,
mount etc.distancethat part of a relative temporal or spatial expression
which indicates
the distance between the place or time
denoted by it and the place or
time referred to within it.offsetthat part of a relative temporal or spatial expression
which indicates the direction of the offset between the two
place
names, dates, or times involved in the expression.
As members of the names class, all these
elements share the following attributes:
keyprovides an alternative identifier for the object being
named, such as a database record key.reggives a normalized or regularized form of the name used.
Additionally, all of the above elements
are members of the class placePart, and
thus share the following attributes:
typeprovides more culture- linguistic- or application- specific
information used to categorize this name component.fullindicates whether the place name component is given in
full, as an abbreviation or simply as an initial
Legal values are:
initthe name component is indicated only by one initial.yesthe name component is spelled out in full.abbthe name component is given in an abbreviated form.
Like the persName element discussed in section , the placeName element may be regarded simply
as an abbreviation for the tags name type=place or rs
type=place. The following encodings are thus equivalent:
Strictly, a suitable value such as figurative should be
added to the two place names which are presented periphrastically in
the second example here, in order to preserve the distinction
indicated by the choice of rs rather than name to
encode them in the first version.modern
Babylon
,
New York,
I have proceeded to the
City of Brotherly Love.
After spending some time in our
modern
Babylon,
New York,
I have proceeded to the
City of Brotherly Love.
]]>
As indicated above, the placeName may simply contain a
character string and its type attribute may be used to
provide a sub-categorization of place names. Alternatively, it may
contain more detailed sub components. A place name may be analysed in
several different ways: as a geo-political unit, using a hierarchy of
descriptive names (see section ); in terms of
geographic features such as mountains and rivers (see section ); relative to other place names (see section .
Geo-political Place Names
A place name is sometimes given as sequence of
geo-political or administrative units, often arranged in
ascending sequence according to their size or administrative
importance, for example: Rochester, New York, or as a single
such unit, for example Belgium. The more detailed component
elements listed above (settle for a settlement, such as a
village, town or city; region for any administrative unit
such as a county, parish or state; country for a politically
recognized national entity; or bloc for any grouping of such
entities) have been chosen for their generality of application. They
may be tailored more closely to project- and
culture-specific needs by specifying appropriate values in their
respective type attributes, as in the following example:
Rochester,
New YorkLaos,
Southeast Asia
]]>
Note that, even in the case where only one of these component place
name elements is used, the placeName element must still be
present.
Rochester
than any other place I know.
]]>
Geographic Names
Places may also be named in terms of geographic features such as
mountains, lakes or rivers, independently of geo-political units. The
geogName is provided to mark up such names, as an alternative
to the placeName element discussed above. It contains a
sequence of phrase level elements, optionally extended by the following
special element:
geogcontains a common noun identifying some geographical
feature
contained within a geographic name, such as
valley,
mount etc.
For example:
Mississippi River
]]>
Where the geog element is used to characterize the kind of
geographic feature being named, the name element will generally
also be used to mark the associated proper noun or noun phrase:
MississippiRiver
]]>
A more complex example, showing a variety of practices, follows:
Glencoe into
GlenEtive, the
LairigGartain and the
LairigEilde
]]>
Relative Place Names
All the place name specifications so far discussed are
absolute, in the sense that they define only
one place. A
place may however be specified in terms of its relationship to another
place, for example 10 miles northeast of Paris or near the top
of Mount Sinai. These relative place names will contain
a place name which acts as a referent (e.g. Paris and Mount
Sinai). They will also contain a word or phrase indicating the the
position of the place being named in relation to the referent
(e.g. the top of, north of). A distance, possibly only
vaguely specified, between the referent place and the place being
indicated may also be present (e.g. 10 miles, near)
Relative place names may be encoded using the following elements in
combination with either a placeName or a geogName
element.
offsetthat part of a relative temporal or spatial expression
which indicates the direction of the offset between the two
place
names, dates, or times involved in the expression.distancethat part of a relative temporal or spatial expression
which indicates
the distance between the place or time
denoted by it and the place or
time referred to within it.
Some examples of relative place names are:
near the top of
MountSinai10 milesnorth ofParis
]]>
The internal structure of place names is like that of
personal names - complex and subject to an enormous amount of variation
across time and different cultures. The recommendations in this section
will be adequate for a majority of users and applications. They may
not, however, satisfy the most specialized inquiries and/or
applications in which case it is recommended that the internal
structure of place names be represented using feature structures
.
The elements discussed in this section are formally defined as
follows:
]]>
Organization names
Like names of persons or places, organization names can be marked as
referent strings or as proper names with the rs and
name elements. For certain applications it may be desirable
to mark the component parts of an organization. In some historical and
social scientific studies, for example, the component parts of an
organization names may give crucial clues which help to characterizing
the organization in terms of its geographical location, ownership,
likely number of employees, management structure etc. The elements
discussed in this section are recommended for this purpose and include:
orgNamecontains an organizational name.
Attributes include:
typemore fully describes the organization indicated in the
organizational name. Possible values include
voluntary, political, governmental,
industrial, commercial, etc.keyprovides an alternative identifier for the organization
being named, such as a database record key.reggives a normalized or regularized form of the organization
nameorgTitlecontains the proper name component of an organizational
name.
Attributes include:
typemore fully describes the organization title. Possible
values include formal, colloquial,
acronym, etc.reggives a normalized or regularized form of the organization
title.orgTypeindicates a part of the organization name which contains
information about the organization's structure or function.
Attributes include:
typemore fully describes the organization type specified in the
name component. Possible values include function,
structure, etc.reggives a normalized or regularized form of the organization
typeorgDivnindicates a division, branch or department specified
in an
organizational name.
Attributes include:
typemore fully describes the organization division specified in
the name component. Possible values include branch,
department, section, division, etc.reggives a normalized or regularized form of the
organizational division.
The orgname element should be used when it is desirable to
mark an organization name irrespective of whether or not its components
are also to be marked. In effect the orgname element is a
special case of a name and thus of an rs element.
Consequently, the following examples are synonymous, though the last is
preferred:
Pennsyla. Abolition Society.
About a year back, a question of considerable interest was
agitated in the Pennsyla. Abolition
Society.
About a year back, a question of considerable interest was
agitated in the Pennsyla. Abolition Society.
About a year back, a question of considerable interest was
agitated in the Pennsyla. Abolition Society.
]]>
Like the rs and name elements, the orgname
element has a key attribute with which an external
identifier such as a database key can be assigned to the organization
name. It also has a type attribute with which the
organization named in the expression can be described, and a
reg attribute with which the organization name can be
presented in a regularized form.
The orgtitle element is used to mark the expression
which provides the proper name component of an organization name
for example:
BSkyB
rather than the
BBC
]]>
Where personal names are encountered as component parts of an
organization's title, as in Ernst & Young, these may be
tagged with the appropriate personal name elements as discussed
in . Examples include:
Ernst &
Young
]]>
Organization names may also contain within them place names
which, in some applications, may yield vital clues as to the
organization's location and or sphere of influence. These
components should be tagged with the approprate place name tags
. Examples include:
IBMUK
said...
The feeling in
Canada
is one of strong aversion to the
United States Government
, and of predilection for self-government
under the
English Crown
]]>
The orgtype element is used to mark those components
of an organization name which indicate something about the
structure or function of the organization. Examples include:
WashingtonWater PowerInc.
THE TICKET which you will receive herewith has been formed by
the
Democratic WhigParty after the most careful deliberation,
with a reference to all the great objects of NATIONAL, STATE,
COUNTY and CITY concern, and with a single eye to the
Welfare and Best Interests of the Community.
]]>
Organizational names may also be specified hierarchically
particularly where the named organization is itself a department
or a branch of a larger organizational entity. The
Department of Modern History, Glasgow University is an
example. The orgdivn element is recommended wherever
it is desirable to isolate the independent levels of an
organizational hierarchy that are specified in an organization name.
Examples include:
Department of Modern History
,
GlasgowUniversity
]]>
Although highly flexible, the mechanisms discussed here for
marking the components of organization names will not cater for
every processing need or organizational name that is
encountered. Where the internal structure of organization names
is highly complex, where name components are particularly
ambiguous, or where it is important to indicate the assumptions
made in the evaluation of an organization name, then feature
structure notation is recommended
The formal declaration of the elements discussed in this section include:
]]>
Dates and Time
The following elements for the encoding of dates and times were
introduced in section :
datecontains a date in any format.
Attributes include:
calendarindicates the system or calendar to which the date belongs.valuegives the value of the date in some standard form, usually
yyyy-mm-dd.certaintyindicates the degree of precision to be attributed to the
date.timecontains a phrase defining a time of day in any format.
Attributes include:
zoneindicates time zone or place name wherever this is
necessary to evaluate a temporal expression.valuegives the value of the time in a standard form.typeindicates something about the type of temporal expression
being tagged.
Legal values are:
descriptiveindicates a temporal expression made in descriptive terms,
e.g. noon.amindicates a temporal expression made on the basis of a
twelve-hour clock and referring to a time between midnight
and noon.24hourindicates a temporal expression made on the basis of a
twenty-four-hour clock.pmindicates a temporal expression made on the basis of a
twelve-hour clock and referring to a time between noon and
midnight.
While adequate for many applications, these elements do not allow
for the representation of the internal structure of expressions
indicating dates or times, which may however be of importance for the
correct interpretation of such expressions, or for certain kinds of
analytic applications. In this section, we introduce the following
special-purpose elements, for use when the internal structure of a
temporal expression is to be encoded:
dateStructcontains an internally structured representation of a date.timeStructcontains an internally structured representation for a time
of day.
Two types of temporal expressions are envisaged for dates and
times: absolute and relative. An absolute temporal
expression is composed of a sequence of the following elements,
possibly interspersed with character data:
daythe day component of a structured date.weekthe week component of a structured date.monththe month component of a structured date.yearthe year component of a date.secondthe second component of a structured time-expression.minutethe minute component of a structured time-expression.hourthe hour component of a temporal expression such asoccasiona temporal expression (either a date or a time)
given in
terms of a named occasion such as a holiday,
a named time
of day, or some notable event.
A relative temporal expression describes a date or time
with reference to some other (absolute) temporal expression, and thus
contains the following elements in addition to those listed above:
distancethat part of a relative temporal or spatial expression
which indicates
the distance between the place or time
denoted by it and the place or
time referred to within it.offsetthat part of a relative temporal or spatial expression
which indicates the direction of the offset between the two
place
names, dates, or times involved in the expression.
As members of the class temporalExpr
(temporal expression)
these elements all share the following attributes:
valuesupplies the value of a date or time in a standard form.typeprovides any application-, linguistic- or culture-specific
classication for the component.reggives a normalized or regularized form of the temporal
expression.Absolute Dates and Times
An absolute temporal expression which is a date will contain only a
sequence of day, monthweek, year
or occasion elements, as in the following examples:
26October1775
]]>
Component elements of a dateStruct may be repeated, provided
that only a single temporal expression is intended:
Friday,
14May1993
]]>
The occasion element may be used for any component of a
temporal expression which is given in terms of a named event, such as
a public holiday for dates, or a named time such as tea time or
matins:
New Years Day
is the quietest of holidays,
Independence Day
the most turbulent.
]]>
These components may be applied to dates using any calendar system
using subcomponents equivalent to those listed above:
Le Vieux Cordelier:
Journal rédigé par Camille Desmoulins,
QuintidiPluviose2e décade,
l'an 2 de la République Indivisible
]]>
Absolute temporal expressions denoting times which are given
in terms of seconds, minutes, hours or of well defined events
(e.g. noon, sunset) may similarly be represented using
the timeStruct element:
13:45
At sunset we walked to the beach.
The train leaves for Boston at
a quarter of two
]]>
The type attribute may be used to distinguish sub-types of
component elements (for example, months or days presented as words or
as numbers) or to provide additional information about the function of
this particular component (for example, to distinguish types of
occasion).
The value and reg attributes
are both used to provide a standardized or regularized form of the
content of an element. The distinction is that the value specified by
the reg attribute is simply that chosen as a convenient way
of grouping together a number of variant forms, whereas that specified
for the value attribute must always be given in some
application-dependent standard form, described in the stdVals
element of the TEI header.
For example:
June9th
:
The period is approaching which will terminate my
present copartnership. On the
1stJany. next, it expires by its own limitation.
]]>
Relative Dates and Times
As noted above, relative dates and times such as in the Two
Hundredth and First Year of the Republic, twenty minutes before
noon, and, more ambiguously, after the lamented death of the
Doctor or an hour after the game have two distinct
components. As well as the absolute temporal expression or event to
which reference is made (e.g. noon, the game, the
death of the Doctor[the foundation of] the Republic), they
also contain a description of the distance
between the time or date which is indicated and the referent
expression (e.g. the Two Hundredth and First Year, twenty
minutes, an hour); and (optionally) an
offset describing the direction of the distance
between the time or date indicated and the referent expression
(e.g. of implying after, before, after).
The elements distance (or measure) and
offset are used to encode these last two components within a
dateStruct or timeStruct. The absolute temporal
expression contained within the relative expression may be encoded
using a occasion element, or by a nested dateStruct
or timeStruct, or by a simple date or
time. This allows for endlessly recursive structures such as
the third Sunday after the first Monday before Lammastide in the
fifth year of the King's second marriage ... --- but so does
natural language.
In the following examples, the reg attribute has been used
to simplify processing of variant forms of expression:
A fortnightbefore
Christmas1786
I reached the station
about a half hourafter
the departure of the afternoon train to Boston
]]>
In the following example, the exact attribute has been
used to indicate a lack of precision in the distance stated:
justbeforesundown
]]>
In the following example, a nested dateStruct element is
used to show that my birthday and the cited date are parts of
the same temporal expression, and hence to disambiguate the phrase
A week before my birthday on 9th December:
A weekbeforemy birthday
on 9thDecember
]]>
The alternative reading of this phrase would be encoded as follows:
A weekbeforemy birthday
on 9thDecember
]]>
Where more complex or ambiguous expressions are involved, and
where it is desirable to make more explicit the interpretive
processes required, the feature
structure notation described in chapter is
recommended. Consider, for example, the following
temporal expression which occurs in the Scottish Temperance
Review of August 1850, referring to the summer holiday known
in Glasgow simply as the Fair:
during the Fair, a
horrible nucleus of immorality and wickedness; it sends our
multitudes to pollute and demoralize the country.
]]>
For the definition of the ana attribute, see chapter
. It is used here to link the temporal phrase with an
interpretation of it. Like most traditional fairs and market days, the
Glasgow Fair was established by local custom and could vary from year
to year. Consequently, in order to provide such an interpretation, it
is necessary to drawn upon additional information which may or may not
be located in the particular text in question. In this case, it is
necessary at least to know the spatial and temporal context (year and
place) of the fair referred to.
These and other features required for
the analysis of this particular temporal
expression may be combined together as one feature
structure of type date-analysis:
the FairGlasgow08-08-185019-09-1850
]]>
The elements described in this section are formally defined as follows:
]]>