Exploration 2000

Web-Based Language Documentation and Description

PANEL 1: Legal, Ethical, and Policy Issues
Concerning the Recording and Publication of Primary Language Materials

December 12, 2000

Background materials submitted by Mark Liberman (corrected and updated 6/3/2001)(original version)

I. Introduction
II. Recording
III. Copyright
IV. Human Subjects Review
V. Discussion


Language documentation involves recording, analyzing, archiving and publishing a wide variety of materials, including audio and video recordings, transcripts, linguistic and cultural annotations and commentaries, dictionaries, grammars, and instructional works. The language to be documented might be the national language of a large and powerful nation state, or it might be an endangered language spoken by a small and politically marginal community, or anything in between. Because of advances in digital media, in networking, and in computing technology, language documentation is becoming both easier and more useful. In addition, there has been much recent interest in documentation of endangered languages. As more language documentation is done, more legal, regulatory and ethical issues arise.

The purpose of these background materials is to give workshop participants a general sense of the legal and regulatory context of language documentation projects in the United States, especially in academic settings. To this end, I've provided a sketch of laws and regulations governing recordings, copyright, and so-called "human subjects" issues, to the best of my ability to understand them. Disclaimer: I am not a lawyer. Do not rely on this material for legal advice.

Similar information will be needed about laws and regulations in other countries. A few pointers are given to other sources of information, where I have been able to find them. Even for the U.S., these materials should not be taken as authoritative. They are no doubt mistaken in part and certainly incomplete.

For lack of space and time, I've omitted a planned section on defamation. There are a variety of on-line tutorials and other resources on this topic.

These background materials do not attempt to cover the ethical issues involved, except insofar as the laws and regulations involved make reference to general ethical principles. Ethical issues will form an important part of panel presentations and discussions, and will of course be a central concern in any follow-on effort to develop principles and practices for language documentation projects. In many ways, there is an imperfect match between the applicable laws and regulations, and the dictates of a plausible set of ethical principles in this area. There are some unethical practices that are entirely legal, and some apparently ethical and desirable activities may sometimes be subject to legal or regulatory constraint. Therefore, people planning language documentation projects need to understand the laws and regulations, and also to think carefully about the ethical issues, as they apply to the facts of each particular case.

I've added a short section at the end, dealing with recent UNESCO and WIPO proposals to create new sui generis intellectual property rights for folklore and databases.

A premise of this workshop is that language documentation is in principle a good thing for all concerned. It is good for the speech communities whose languages are documented, it is good for the countries in which those languages are spoken, and it is good for humanity as a whole. We hope that this panel discussion will eventually lead to a set of substantive recommendations for the design and management of language documentation projects, which will allow these good activities to be as widely practiced as possible, while avoiding legal, regulatory and ethical pitfalls.


The Reporters' Committee for Freedom of the Press (RCFP) has prepared a web document entitled "Can We Tape? A Practical Guide to Taping Phone Calls and In-Person Conversations in the 50 States and D.C."

The legality of voice recording in the U.S. is governed by (Federal and State) laws aimed at preventing eavesdropping.

The key legal point, from the perspective of language documentation projects, is very simple:

Generally, you may record, film, broadcast or amplify any conversation where all the parties to it consent. It is always legal to tape or film a face-to-face interview when your recorder or camera is in plain view. The consent of all parties is presumed in these instances.

In 38 of 50 states, the consent of only one party is required to make it legal to record a conversation. This is also the Federal law, which applies to Washington D.C. My own opinion is that language documentation projects should always obtain the consent of everyone being recorded, for ethical reasons.

Note that the laws with respect to video recording are generally looser. As the RCFP page says:

The use of hidden cameras is only covered by the wiretap and eavesdropping laws if the camera also records an audio track. However, a handful of states have adopted laws specifically banning the use of video and still cameras where the subject has an expectation of privacy, although some of the laws are much more specific. Maryland's law, for example, bans the use of hidden cameras in bathrooms and dressing rooms.

Again, I do not think that surreptitious video recording would be an appropriate practice in language documentation projects, even though it will usually be legal, at least according to U.S. law.

For recordings made outside the U.S., the applicable law (if any) will presumably be the law of the place where the recordings are made. It seems unlikely that there is any place where it is illegal to record an interaction if all parties consent. Anyone who knows of any stronger legal requirements, please let me know.

Note that this section only deals with the question of whether it is legal to make a recording, not the question of who owns what aspects of the result, or whether it is legitimate to distribute or publish the result. These issues are dealt with in the following sections on copyright. Depending on the content of the recordings, publication may also raise legal issues having to do with defamation.

Finally, the question of what constitutes "consent" may need to be evaluated differently in some cases. People who do not really know what a tape recorder is cannot be presumed to consent to being recorded simply because a working recorder is in plain sight. However, a superficial knowledge of recording technology is easy to impart, and this appears to be all that the law requires for consent.


Much information in this section is taken from Nimmer, M.B. et al., "Cases and Materials on Copyright" (1998).

Other information is available via the web site of the U.S. Copyright Office, various copyright tutorials, copyright FAQs, and pages of copyright links on the web.


In a typical language documentation project, copyrights of very many individuals and institutions may naturally come to be involved, implicitly if not explicitly. Thus in a set of transcribed and annotated interviews, copyrights might in principle be held by the interviewers, the interviewees, the transcribers, the annotators, and those who plan the project and arrange the materials for publication. In different countries and at different times, the details of who has what rights will vary, and in many cases the question of whether a copyright exists or not may be arguable.

In some fundamental sense, the modern law of copyright is supposed to be about money, in that it aims to provide an financial incentive for creators by granting them exclusive rights to reproduce and distribute their work for a limited time However, as a practical matter, copyright issues can loom large even when little or no money is at stake. It is rare for the material collected in a language documentation project to have any significant commercial value -- nearly all projects will be a losing proposition from a overall financial perspective, even if some small amount of revenue is eventually derived from publications. Nevertheless, it is common for a project to be hindered because of concern over possible copyrights possibly owned by a crowd of people (speakers, consultants, research assistants, etc.) who cannot easily be located at the time that someone starts thinking about publication, though the holders would be happy enough to give publications permission.

Some recommendations about how to deal with copyright in a language documentation project:

As discussed below, copyright law is not a very good conceptual fit to the perspectives of many individuals and cultures, from Thomas Jefferson forward. However, we must deal with the law as it exists, and we may as well use it in the best way we can.

It would be helpful for some trusted organization to provide a recommended set of alternative copyright assignment documents (that is, either actual transfer of copyright, or non-exclusive licences for suitable research and educational use) for use in different language documentation situations.

A note on whose law applies where

The most important issue, as I understand it, is where the copying is being done, not where the work copied was created. If a work is authored entirely in (say) Ecuador, it is still U.S. Copyright Law that governs its publication and distribution within the U.S. The protections offered are pretty broad, in the sense that (for example) any (new) unpublished work is automatically covered, regardless of the nationality or domicile of the author. However, with respect to U.S. publication, it is U.S. law and not Ecuadorian law that applies. The same would apply in reverse to publication in Ecuador.

This is an over-simplified statement of a complex situation. However, it's important to counter the assumption (which I have often encountered) that if some language data is collected in Country X, and published in Country Y, that the copyright laws of Country X are necessarily and centrally involved. As far as I am able to understand it, this is the opposite of the actual situation.

However, the new Proposed Convention on Jurisdiction and Foreign Judgments in Civil and Commercial Matters of The Hague Conference on Private International Law has the potential to make this more complicated: see the section below.

U.S. Copyright Law

Currently relevant U.S. Copyright law includes the Copyright Act of 1976, the action of the Senate in 1989 to join the (international) Berne Convention for the Protection of Literary and Artistic Works, some amendments passed in 1994 to bring U.S. into greater conformity with the WTO's Uruguary Round TRIPS Agreement, and the Digital Millenium Copyright Act of 1998.

There are various proposals by the World Intellectual Property Organization (WIPO) that the U.S. has not yet adopted (and may or may not adopt in the future). These include proposals for new sui generis rights in databases, folklore, and life forms. At least the first two of these are self-evidently relevant to language documentation activities.

In addition, some situations remain where common-law copyright and state copyright laws may apply, though these appear to be increasingly superceded by U.S. copyright law.

Because of this layered history, in the U.S., the term of a copyright and even its existence are different for works created or published at different times. A simplified table is given below. Consult http://www.unc.edu/~unclng/public-d.htm for a more extensive summary of copyright duration issues.

Works created before 1978, but not published. Copyright expires 70 years after the death of the author, or 31 Dec. 2002, whichever is greater.
Works created before 1978 and published. Copyright expires 75 years from the date of publication.
Works created by an individual on or after 1 Jan. 1978. Protected for the life of the author plus 50 years.
Works for hire. Protected for 75 years from date of publication or 100 years from the date of creation, whichever occurs first.
Joint works. Copyright expires 50 years after the death of the last author.

Principled requirements for copyright in the U.S.

Article 1 of the U.S. Constitution allows Congress "To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries". The exegesis of two words in this clause have significant consequences for U.S. Copyright law: the word "authors" and the word "writings."

The 1976 Copyright Act limits copyright to "original works of authorship fixed in any tangible medium of expression." This phrase references the three key aspects of copyright: originality, fixation and expression.

The Act does not define "original work"; the House Report on the Act explains that the omission is purposeful and that the intended standard is that "established by the courts under the [1909] copyright statute." The courts had based the requirement of originality on the constitutional restriction of copyright protection to "authors".

The requirement that copyrighted works be "fixed in [a] tangible medium" is the contemporary interpretation of "writings". This includes recordings of types not known in 1789, that is, any analogue of writing in another medium. However, oral works are clearly excluded by the intent of the Constitution.

Both the requirement of originality and the requirement that a work be "fixed in [a] tangible medium" have significant consequences.

Originality: According to the Supreme Court's 1991 Feist decision, copying subcriber's names and telephone numbers for re-publication in a competing telephone directory does not constitute copyright violation.

Rather, these bits of information are uncopyrightable facts; they existed before Rural reported them and would have continued to exist if Rural had never published a telephone directory. The originality requirement "rule[s] out protecting ... names, addresses, and telephone numbers of which the plaintiff by no stretch of the imagination could be called the author."

With respect to language documentation projects, it seems likely that "mere facts" such as morphological paradigms would fail the originality test. However, the required amount of originality is very low.

Fixation in a tangible medium: An oral narrative is not subject to copyright until it is recorded or transcribed. Thus "a live television broadcast is not a writing and is therefore not per se eligible for federal copyright protection" (Nimmer p. 36), though the 1976 Copyright act says that "a work consisting of sounds, images, or both, that are being transmitted, is 'fixed' for purposes of this title if a fixation of the work is being made simultaneously with its transmission."

In addition, copyright only protects the expression of ideas, not the ideas themselves. This is also implied by the notion that copyright protects writings.

Legal codes with other origins can and do differ on these points. Thus "Japan [...] has chosen to accord [copyright] copyright protection to oral works no less than fixed works" (Nimmer p. 35, fn. 3). And the proposed WIPO database protection treaty is a sort of anti-Feist, according to which the content (as well as the expression!) of compilations of facts would be protected.

It's important to remember that the restricted nature of U.S. federal copyright protection -- "original works of authorship fixed in [a] tangible medium of expression" -- is felt to be based in the limited powers granted to Congress in the U.S. Constitution. This makes the U.S. more reluctant than other countries to grant broader intellectual monopolies, even though some wealthy and influential interests favor them.

Works for hire

U.S. copyright law includes the concept of "works made for hire." In this case, the employer and not the employee is considered to be the author.

Section 101 of the Copyright Act define a "work made for hire" as:

(1) a work prepared by an employee within the scope of his or her employment; or

(2) a work specially ordered or commissioned for use as a contribution to a collective work, as a part of a motion picture or other audiovisual work as a sound recording, as a translation, as a supplementary work, as a compilation, as an instructional text, as a test, as answer material for a test, or as an atlas, if the parties expressly agree in a written instrument signed by them that the work shall be considered a work made for hire.

Whether under clause (1) or clause (2), many contributions to a typical language documentation project might be consider "works made for hire." If this route is taken, it should be made explicit (i.e. agreed in writing) from the beginning of the project. Note that the concept of "works made for hire" may be different or even non-existent in the copyright law of other countries.

The spectrum of rights

Under current U.S. Copyright law, copyright actually involves a number of distinct rights, which can be sold separately, and also are subject to somewhat different constraints. These are the rights (17 USC §106):

(1) to reproduce the copyrighted work in copies or phonorecords;
(2) to prepare derivative works based upon the copyrighted work;
(3) to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending;
(4) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly;
(5) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and
(6) in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.

In addition, 17 USC §106A grants to "the author of a work of visual art" various rights to "attribution and integrity", which are a species of moral right.

Copyright assignment

A copyright is a form of property, and as such it can be inherited, given away, or sold.

U.S. copyright law specifies (17 U.S.C. § 204) that "[a] transfer of copyright ownership ... is not valid unless an instrument of conveyance, or a note or memorandum of the transfer, is in writing and signed by the owner of the rights conveyed or such owner's duly authorized agent."

A "transfer of copyright ownership" is construed to include an exclusive license, but not a non-exclusive license, which may therefore be informal, verbal or even implicit.

This point is relevant to language documentation projects: if it is understood that a project is preparing a collection of language materials for publication, then (for example) participation in an interview for the project may intrinsically imply a non-exclusive license to include the interview in the published collection, even if there is no "instrument of conveyance". I do not recommend this procedure: it is better to have an explicit copyright assignment, or at least a recorded verbal assent to a non-exclusive license.

Fair Use and other limitations on copyright

There are several specific limitations on copyright, of which the most important is the doctrine of "fair use." Neither statutes nor case law exactly define the boundaries of fair use. 17 USC 107 says:

[T]he fair use of a copyrighted work . . . for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include-

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.  

Taken as a whole, this in fact suggests a wide latitude for fair use in language documentation, since the purpose is typically "teaching ... , scholarship, or research", the use is typically nonprofit, and the "copyrighted work" (e.g. a recorded interview or narrative) is typically not a work for which there is otherwise a commercial market.

Again, I am not suggesting that copyright assignment be omitted in favor of a presumption of fair use; but in cases where existing materials are being considered for publication by a language documentation project, it may be that such publication would constitute fair use.

Moral rights

Article 6(1) of the Berne Convention provides that:

Independently of the author's economic rights, and even after the transfer of the said rights, the author shall have the right to claim authorship of the work and to object to any distortion, mutilation or other modification of, or other derogatory action in relation to the said work, which would be prejudicial to his honor or reputation

These non-economic rights are generally known as "moral rights". Moral rights are recognized explictly in the copyright laws of several European countries, but not so in the U.S., except for certain rights of authors "of works of visual art" (though apparently there may be a sort of common law equivalent for writers, at least in some cases). In the Berne Convention Implementation Act of 1988, the U.S. Congress simultaneously asserted that "[t]he amendments made by this Act, together with the law as it exists on the date of the enactment of this Act, satisfy the obligations of the United States in adhering to the Berne Convention", and also that "[t]he provisions of the Berne Convention ... do not expand or reduce any right of an author of a work ... to object to any distortion, mutilation, or other modification of, or other derogatory action in relation to, the work, that would prejudice the author's honor or reputation." In other words, no general moral rights, at least not as a matter of federal statute.

Copyright in other countries

Alan Ward has written an excellent piece on Copyright and Oral History for the (UK-based) Oral History Society. The copyright laws under discussion in this piece are those of the UK, which are not quite the same as American laws.

The Berne Conventions and the more recent activities of WIPO have created a certain amount of harmonization among copyright laws around the world, but there remains quite a bit of local variation.

Here are some lectures by Dr. Andreas Wiebe on copyright law in Germany and the European Community. Here is an Australian policy document on copyright.

UNESCO/WIPO proposal for sui generis folklore rights

The World Intellectual Property Organization (WIPO) has proposed a number of new forms of intellectual property, to cover cases that are omitted or given what is felt to be inadequate coverage under existing laws.

Two of these are especially relevant to language documentation. One proposal suggests special protection for databases, and another proposal suggests special protection for expressions of folklore.

The database proposal has been very severely criticized in the U.S., by individuals and organizations from many political and cultural viewpoints. The folklore proposal has been largely ignored, though many of the same objections apply.

Speaking for myself, I am sympathetic to the criticisms of the proposed sui generis database rights, and feel the same way about the proposed folklore rights. It is certainly true that standard copyright does not protect folklore, because it is not an individual "work of authorship", is often not "fixed in a tangible medium of expression", and so on. However, it is quite possible that the proposed cure would be worse than the condition it aims to help.

In evaluating things like the WIPO database and folklore protection proposals, one can see them in two ways: as attempts to protect people's work and people's rights -- a sort of human rights inititiative -- or alternatively, as an attempt to convert common ground into commercially exploitable property -- a sort of modern version of the enclosure movement. To the extent that the second view is correct -- and there will be many capable lawyers and deal-makers working hard to use any new laws in that way -- the results may be the opposite of what some supporters of these initiatives have in mind.

It is worth reading the proposals carefully, and thinking about what consequences they might have in actual practice.

Could Disney or Sony buy the exclusive rights to a body of folklore, in perpetuity? Yes, if sui generis folklore protection is a form of property, then it can be bought and sold; and in any case, licensing is to be at the sole discretion of "the competent authorities", who are free to negotiate exclusive arrangements. Could dissident works be suppressed or destroyed on the grounds that they are an "illicit exploitation" because they are "outside the traditional or customary context of folklore" and "without authorization by a competent authority"? Absolutely. Note that the WIPO model provisions specify that "an utilization, even by members of the community where the expression has been developed and maintained, requires authorization if it is made outside such a context and with gainful intent." (say, at a political fund-raiser...) In fact, a community member would be subject to "penal sanctions" if the relevant governmental minister determines that his or her "expressions of folklore" are "distorted in any direct or indirect manner prejudicial to the cultural interests of the community concerned." In other words, the minister of culture could put someone in jail for composing an irreverent folksong.

Reading the WIPO model provisions, my personal reaction is to see helpful-sounding principles with a staggering potential for tyranny in practice.

There are also some difficult conceptual issues. The fact that ethnic groups do not exactly coincide with national boundaries will make it hard to figure out which government would get to authorize activities and collect the tariffs for which body of folklore. For instance, would a Chicago polka band need get clearance from and pay royalties to the Polish government?. And there are also questions about how far back in history the ownership of such cultural property should go. According to this article, three Maori tribes are threatening suit against Lego for producing a game that includes characters with Polynesian names and story lines allegedly similar to traditional stories from Easter Island. Since the Easter Island culture is related to that of the New Zealand Maori roughly as Polish culture is to Russian, this case is roughly comparable to one in which a Russian nationalist organization sued the estate of Lawrence Welk over polka royalties. To sort all this out -- if it really is to be sorted out -- will involve a massive transfer of resources to the world's lawyers.

See Report on Australian Indigenous Cultural and Intellectual Property Rights for a more sympathetic perspective on the use of the law of property in this area.

The Hague Conference on Private International Law's Proposed Convention

It's worth noting in this connection that The Hague Conference on Private International Law's proposed Convention on Jurisdiction and Foreign Judgments in Civil and Commercial Matters would interact with local sui generis intellectual property rights in potentially pernicious ways. The cited Convention (see this link for more details) provides a set of rules about jurisdiction for cross-border litigation, covering nearly all civil and commercial litigation. Within this framework, each member country agrees to enforce the judgments and injunctive orders of courts in other member countries, without any requirements to harmonize the laws involved.

49 countries have signed, including the U.S., Canada, France, Germany, China, Croatia, and Egypt. The fact that the Hague Convention covers sui generis intellectual property regimes creates many opportunities for legal mischief. As James Love writes in a recent article on the topic:

For example, if Cuba enacted a sui generis regime and declared that the Cuban "beat" was intellectual property, it could get a judgment in Cuba against US record companies that were engaged in cultural "piracy," and demand for example, 5 percent of the revenues from global sales of music that use the Cuban beat. Other countries could do the same thing. These judgments would be enforceable globally, under the Convention. So too would bio-piracy judgments against US and European biotechnology and pharmaceutical companies, for "stealing" traditional knowledge, or exploiting without benefit sharing a variety of biological and genetic resources. The motion picture industry could be hit with new sui generis IPR liabilities by countries that give rights in history. Countries like China, which is a member of the Hague Conference, could use this to limit who could actually make films about China. The Hague convention would instantly create a legal framework to legitimatize all of these new IPR claims, and it would not even matter if the "infringing" party did business in the country at all, since the judgments would be enforceable globally, in any Hague member country, and the claims could be based upon shares to global (rather than local) revenues of products.

Love points out that the direction of legal action will not by any means only be from the less developed world against the U.S., Europe and Japan. In fact, developed countries (and the multinational companies based there) have more money and lawyers to devote to the process, and also better access to the courts where the outcomes will be decided, so that their sui generis extensions of intellectual property are likely to turn out to be more valuable:

Some would consider this [international enforcement of sui generis IPR] a positive feature of the Convention, because it would give the developing countries opportunities to "tax" the rich countries, under new and controversial IPR regimes. But of course, the rich countries could and will also enforce their own regimes, including, for example, the European Union sui generis regime on database protection. The US and EU would probably modify their sui generis regimes on pharmaceutical registration data to make it illegal for developing countries to rely upon those data for registration of generic products in poor countries, an approach already included in the new US-Jordan "Free Trade" agreement. And in general, would one would observe is a new dynamic of everyone trying to create their own "rights" in everything, until the public domain shrinks if not disappears altogether.

The ultimate outcome of all of this is uncertain, and depends on larger and more important issues than the IPR status of language documentation materials. The uncertainties should not prevent us from going forward with language document projects. It seems unlikely that sui generis property rights will be successfully attached to words, inflections, syntactic structures, or the forms of everyday discourse. Whatever the outcome, linguists' best protection against such problems is to be solidly based in the speech communities in question, which is a good idea in any event.

Human Subjects Review for Language Documentation


In the U.S., much research involving human subjects must be approved in advance by an Institutional Review Board, as described below. Some but not all language-documentation research is covered by this requirement. No information is provided here about applicable regulations elsewhere in the world: it would be useful to know what they are. The basic principles are no doubt the same, but the details may vary, perhaps widely.

From the more detailed information presented below, I draw the following conclusions:

45 CFR 46

For U.S. Universities, the basic Federal requirements for regulation of research involving human subjects are defined in 45 CFR 46 (Code of Federal Regulations, Title 45, Part 46: Protection of Human Subjects), effective August 19, 1991. Keep in mind that an institution may choose to impose additional requirements. However, many institutions follow the Federal guidelines quite closely.

According to §46.101, 45 CFR 46 applies to "all research involving human subjects" that is "conducted or supported by a Federal Department or Agency", or that falls under certain "research activities for which a Federal Department or Agency has specific responsibility for regulating" (such as the FDA's responsibility for investigating new drugs), with some specified exemptions discussed below.

Research covered by 45 CFR 46 must be reviewed and approved by an Institutional Review Board (IRB) at the responsible institution.

Some things that 45 CFR 46 does not cover

Many types of language-documentation research are exempt from regulation under 45 CFR 46. (Note however that your IRB may require that you submit a request for a project to be declared exempt).

First, most types of privately-funded language-related research are exempt from IRB review, given that they are not "supported by a Federal Department or Agency", and do not fall under the specific regulatory purview of the FDA, the FAA, etc. Note that 45 CFR 46 requires each institution to provide "a statement of principles governing the institution in the discharge of its responsibilities for protecting the rights and welfare of human subjects of research conducted at or sponsored by the institution, regardless of whether the research is subject to Federal regulation". However, IRB review is not required (by Federal regulations) for research that is not Federally supported.

In addition, 45 CFR §46.101(b) specifically exempts six types of human-subjects research from human-subjects regulation, even if the research is Federally funded. Several of these exemptions may be relevant to language documentation projects:

(1) Research conducted in established or commonly accepted educational settings, involving normal educational practices, such as (i) research on regular and special education instructional strategies, or (ii) research on the effectiveness of or the comparison among instructional techniques, curricula, or classroom management methods.

(2) Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement), survey procedures, interview procedures or observation of public behavior, unless: (i) information obtained is recorded in such a manner that human subjects can be identified, directly or through identifiers linked to the subjects; and (ii) any disclosure of the human subjects' responses outside the research could reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects' financial standing, employability, or reputation.

(3) Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement), survey procedures, interview procedures, or observation of public behavior that is not exempt under paragraph (b)(2) of this section, if: (i) the human subjects are elected or appointed public officials or candidates for public office; or (ii) Federal statute(s) require(s) without exception that the confidentiality of the personally identifiable information will be maintained throughout the research and thereafter.

(4) Research involving the collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens, if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects.

Exemption (2) is especially relevant, since it exempts any interview procedure unless "disclosure of the human subjects' responses outside the research could reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects' financial standing, employability, or reputation."

Note again that individual institutions may choose to impose broader review requirements. For example, the University of Pennsylvania requires IRB review of all research projects involving human subjects, regardless of the source of funding of the study. Motivations for broader requirements include concern for legal liability.

Expedited review procedures

45 CFR §46.110 calls for "expedited review procedures for certain kinds of research involving no more than minimal risk". This will typically involve one or two designated reviewers from the IRB, rather than the action of the entire board. A list of specific types of research eligible for expedited review was published in 63 FR 60364-60367, November 9, 1998. Of special interest to language documentation researchers will be the categories:

(6) Collection of data from voice, video, digital, or image recordings made for research purposes.

(7) Research on individual or group characteristics or behavior (including, but not limited to, research on perception, cognition, motivation, identity, language, communication, cultural beliefs or practices, and social behavior) or research employing survey, interview, oral history, focus group, program evaluation, human factors evaluation, or quality assurance methodologies. (NOTE: Some research in this category may be exempt from the HHS regulations for the protection of human subjects. 45 CFR 46.101(b)(2) and (b)(3). This listing refers only to research that is not exempt.)

Thus if not exempt from IRB review, much language-documentation research will be eligible for expedited review.

Criteria for evaluation

45 CFR §46.111 establishes seven criteria for IRB approval: risks are minimized; risks are reasonable in relation to benefits; subject selection is equitable; informed consent is sought where required; informed consent is documented; privacy and confidentiality are maintained where appropriate.

All of these are potentially relevant to language-documentation research. The most consistently relevant issues are likely to be informed consent and privacy/confidentiality.

45 CFR §46.116 defines the general requirements for "legally effective informed consent." These include describing the situation in understandable language, ensuring that the subject has "sufficient opportunity" to consider whether or not to participate, and "minimizing the possibility of coercion or undue influence". The details can be found in the cited section of the regulations. For typical language-documentation research, the key point in obtaining informed consent will be to ensure that each subject understands that audio and/or video recordings will be made available to a wider audience.

Note also that studies are required to include "special safeguards" in case "some or all of the subjects are likely to be vulnerable to coercion or undue influence, such as children, prisoners, pregnant women, mentally disabled persons, or economically or educationally disadvantaged persons".

45 CFR §46.117 requires informed consent to be documented by a signed form in most cases. Given that interactions are being recorded in any case, it makes sense to record the entire "informed consent" discussion. The IRB may accept this recording as adequate documentation. In order to ensure that the discussion does constitute informed consent, it should be "scripted", in the sense that the researcher presents specified explanations, and asks specified questions.

Publication of pre-IRB data

What about recordings in library archives and private collections, where IRB permission was not sought before the recordings were made, and where signed informed consent forms or the equivalent are not available, and probably cannot be obtained? Suppose we want to publish these recordings: assuming that other legal concerns do not stand in our way, does 45 CFR 46 prevent us from doing so?

According to my understanding, in general it does not. However, I don't have a great deal of experience in this area.

First, if both the data collection and the proposed publication are not federally supported, then 45 CFR 46 simply does not apply. University regulations may still require IRB clearance, if the publication is construed as a project involving human subjects.

Second, the data and its publication may well fall under one of the exemptions specified in 45 CFR §46.101(b). For instance, the data may be the results of "interview procedures" where it is not the case that "disclosure of the human subjects' responses outside the research could reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects' financial standing, employability, or reputation." Alternatively, materials already accessible to the public via a library collection may fall under the heading of "the collection or study of existing data [...] if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified."

Finally, there is adequate basis in the regulations for an IRB to agree to such publication in many cases, especially where the cited exemptions arguably apply, where there is no reasonable likelihood of harm or violation of reasonable expectations of privacy, and so on.

The above discussion applies to archives that were collected without requirement for IRB review, for instance because such regulations were not in force at the time of collection. For materials that were collected without IRB clearance during a time when such clearance was required, there may be greater problems.

Recommendation to language documentation researchers

American researchers who are not already familiar with their institution's IRB policies and procedures should take time to learn about them, by reading whatever published or on-line material their IRB may provide, and by talking with people associated with the IRB review process. They should approach their IRB for review or for specific exemption before starting any research project involving human subjects.

Most institutions have a specified procedure for registering with an IRB a project for which exemption from review is claimed, and may require the IRB to act to confirm or deny this exemption. Some institutions have tough rules for the treatment of human-subjects research begun without IRB approval, even if the research turns out to be exempt from review. These rules may require destruction of data already collected, for example.

In the end, the constraints imposed on language documentation research by 45 CFR 46 are reasonable and indeed rather minimal. As long as their procedures and their concerns are respected, IRBs should be expected to make a rational and balanced evaluation of proposed projects in terms of the content of the regulations and the principles on which they are based.

My own opinion is that researchers need to operate on two levels. First, we need to have a clear picture of the ethics of our work, on its own terms. Second, we need to understand the regulations and the law, in both theory and practice, well enough so as to avoid creating unnecessary legal or regulatory problems. We can't count on laws and regulations to stand in place of ethics, any more than they do in other aspects of life.

Problems due to differences in academic culture

In a few places, there is apparently a history of troublesome cases, in which an IRB has developed local policies that may make sense for biomedical research, but are not compatible with ethical research practices in (for example) history. From available descriptions of these problems, in seems that the impact on language documentation would be similarly negative. It is hard to determine how widespread this problem really is: we have not encountered it at Penn, and we have published data created at quite a few other American institutions where the problem did not arise either. In fact, I do not know of any cases where this sort of problem has arisen for a language documentation project.

Policies for human subjects research, and the IRB processes for enforcing these policies, were developed in the first place for biomedical research and behavioral research, and this original shows in the wording of the regulations, the design of the forms, and no doubt in the attitudes of the review boards. The trend has been for more and more academic research activites to become subject to IRB procedures, and as this happens, there is obviously potential for problems due to cultural clashes. Researchers should be aware of the potential for such problems, and take steps from the beginning to avoid them if possible.

A clear and informative explanation of the situation with respect to oral history is presented in testimony presented by Linda Shopes in 1998 to the President's National Bioethics Advisory Commission. According to Dr. Shopes:

Historians report that they have been told by IRBs to submit detailed questionnaires prior to conducting any interviews; to maintain narrator anonymity both on tape and in their published work; and to either destroy their tapes or retain them in their private possession after their research project is completed. Each of these requests misconstrues oral history and violates fundamental standards of historical practice. An interview is an open ended inquiry, generally structured around a set of biographical and broadly historical questions; it does not follow a rigid schedule of questions but is shaped by the interview exchange. While anonymity is an option in oral history, and indeed appropriate in some cases, anonymous sources lack credibility in most historical scholarship - the precise identity of an interviewee often matters, as a way of gauging that person's relationship to the topic under discussion and hence assessing the perspective from which he or she speaks. In fact, most narrators agree to retain their identity in archival collections and published scholarship. And although narrators can choose to restrict all or a portion of their interviews for a period of time, hoarding or destroying tapes contradicts a primary canon of historical research - that sources not only be cited, but also be available and accessible as a way of assessing the validity and integrity of the work that draws upon them. And most incredible to me, some historians report that IRBs have questioned their use of sources in the public record, including newspapers and manuscripts collections, as well as properly archived oral history interviews, simply because they deal with the activities of human beings!

Some of these cases, if true, represent clear misunderstanding by an IRB of 45 CFR 46: for instance, the use of sources in the public record is clearly considered an exempt activity under 45 CFR §46.101(b)(4). Others, such as the insistence on the use of pre-specified questionnaires, or the sequestration of tapes, are not required by the Federal regulations, nor by the statements of principle (such as the Belmont Report) on which they are based, and are apparently local inventions. Though perhaps they are appropriate normative practices for biomedical research, they are entirely inappropriate for oral history projects, as they would be for most language documentation projects.

Shopes goes on to observe that "[i]n many, perhaps most cases, historians have been able to clarify the issues and negotiate protocols for informed consent and for interviewing that satisfy their IRBs."

As a result of the efforts of Shopes and others, many IRBs have worked out special policies for dealing with oral histories (see section III of this informational page from the University of Chicago, for example).

It seems likely that language documentation projects will sometimes experience similar intercultural misunderstanding with local IRBs. Given this possibility, it is crucial for us to become well informed about the issues and about local practices, and to make personal contact with the people involved. We should also work to develop community standards for ethics in language documentation research, and to educate IRBs about these standards. The oral history people have been helped by the fact that the OHA has been developing community standards since 1968 for the "responsibilities interviewers have to narrators, to the public and the profession, and to sponsoring institutions."

Statements of Principle

The Belmont Report

In their mandated "statement of principles" for human subjects research, many institutions refer to the Belmont Report. In some cases, such reference may constituted nearly the entire content of the statement. According to the report's summary paragraph:

On July 12, 1974, the National Research Act (Pub. L. 93-348) was signed into law, there-by creating the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. One of the charges to the Commission was to identify the basic ethical principles that should underlie the conduct of biomedical and behavioral research involving human subjects and to develop guidelines which should be followed to assure that such research is conducted in accordance with those principles. In carrying out the above, the Commission was directed to consider: (i) the boundaries between biomedical and behavioral research and the accepted and routine practice of medicine, (ii) the role of assessment of risk-benefit criteria in the determination of the appropriateness of research involving human subjects, (iii) appropriate guidelines for the selection of human subjects for participation in such research and (iv) the nature and definition of informed consent in various research settings.

The Belmont Report attempts to summarize the basic ethical principles identified by the Commission in the course of its deliberations. It is the outgrowth of an intensive four-day period of discussions that were held in February 1976 at the Smithsonian Institution's Belmont Conference Center supplemented by the monthly deliberations of the Commission that were held over a period of nearly four years. It is a statement of basic ethical principles and guidelines that should assist in resolving the ethical problems that surround the conduct of research with human subjects.

The resulting report, entitled Ethical Principles and Guidelines for the Protection of Human Subjects of Research, was published in April of 1979, and remains a crucial point of reference for Institutional Review Boards at American Universities. It presents three basic ethical principles: "respect for persons," "beneficence," and "justice."

"Respect for persons" is taken to mean that "individuals should be treated as autonomous agents", and that "persons with diminished autonomy are entitled to protection." "Beneficence" requires us to "maximize possible benefits and minimize possible harms." "Justice" requires that subjects be selected fairly.

Application of these principles leads to conclusions about informed consent, about assessment of risks and benefits, and about selection of subjects.

Other statements of principle

The American Historical Association's Statement on Standards of Professional Conduct has a useful section entitled Statement on Interviewing for Historical Documentation. The Oral History Society (UK) provides a set of Ethical Guidelines "for interviewers and custodians of oral history recordings and related material." These both give an admirably clear and specific list of recommendations, related to the OHA guidelines cited earlier.

Some other professional associations have codes of ethics or conduct that are relevant to language documentation, but do not establish any specific guidelines. Rather, they include many principles of great generality, often intrinsically in conflict with one another, and so can be cited in support of a wide range of standards of practice. The codes themselves often cite "the need to make choices among apparently incompatible values". Thus while helpful in laying out the issues, they do not provide much constraint or guidance, nor are they likely to be helpful in educating IRBs about community standards.

Examples of this type are the American Anthropological Association's Code of Ethics, and in the Society for Applied Anthropology's Statement of Professional and Ethical Responsibilities.


This section includes some strictly personal opinions about "human subjects" and "intellectual property" issues in language documentation projects.

Some "human subjects" issues for language documentation

Language documentation projects are different in many ways from drug tests or psychological experiments, and these differences mean that the same underlying ethical principles -- "respect for persons," "beneficence," "justice" -- may have quite different consequences. What follows is an incomplete exploration of some of these differences. In particular, I want to raise two of them: the question of who is a "subject", and the issue of freedom of expression.

The terminology of "subject"

Although it is sometimes reasonable to treat those who participate in language documentation projects as "experimental subjects", in many cases it is more appropriate to consider them as partners in a common enterprise. In some cases, the initiators, managers and agents of the project may be members of the speech community in question, and sometimes the project funding is provided by institutions representing that community or at least parts of it. As we work towards a practical set of ethical standards for language documentation, it's worth keeping this in mind, and thinking of ways to educate IRBs about its meaning and its consequences.

There is a certain flavor of paternalism that sometimes creeps into these discussions, among linguists no less than among biomedical researchers. The speech community whose language is being documented may be small and politically marginal, but it may also be large, wealthy and even politically dominant. In either case, the rights of the individuals involved need to be protected. In either case, the attitudes and sensitivities of (various portions of) the overall group also need to be considered. However, not every possible objection should be accepted at face value and without counter. For example, in any diglossic situation, some members of the speech community will object to the documentation of vernacular language. It would be preposterous to accept this stipulation in the case of American English. To accept it without argument in another case, on the grounds that it is the "will of the (elders of the) community", arguably does not treat the people involved as fully human.

Freedom of expression

This point has been raised by oral historians, among others. People -- including both academics and oral history narrators -- have a right to express themselves. "Respect for persons" requires that this right be respected. If a narrator genuinely wants to put his or her opinions on record in his or her own name, an IRB should be wary of intervening to prevent this. This point is not addressed directly in 45 CFR 46 or similar regulations, but freedom of expression is guaranteed by more fundamental principles of American law. Some IRB guidelines (for example at the Social and Behavior Science IRB at U. Chicago) make this point explicitly: "the IRB is eager to avoid interfering inappropriately with the researcher's or interviewee's freedom of expression."

In some language documentation projects, similar issues arise.

Intellectual property in the digital age

Copyright is a form of "intellectual property." The application of the metaphor of property to words, ideas, images and so on is fraught with conceptual difficulties, in our own intellectual tradition no less than in the culture of Australian Aborigines or African villagers. Thomas Jefferson, for example, was famously skeptical of the notion that ideas could in principle be considered to be property.

In addition, the forms and principles of this legal metaphor have changed over time, as the underlying technologies and the relevant commercial interests have changed. In an increasingly digital world, the result is a situation of considerable complexity, lampooned by the legal counsel of the Free Software Foundation, Eben Moglen, in his essay "Anarchism Triumphant: Free Software and the Death of Copyright":

[O]ur world consists increasingly of nothing but large numbers (also known as bitstreams), and [...] - for reasons having nothing to do with emergent properties of the numbers themselves - the legal system is presently committed to treating similar numbers radically differently. No one can tell, simply by looking at a number that is 100 million digits long, whether that number is subject to patent, copyright, or trade secret protection, or indeed whether it is "owned" by anyone at all. So the legal system we have [...] is compelled to treat indistinguishable things in unlike ways.

Now, in my role as a legal historian concerned with the secular (that is, very long term) development of legal thought, I claim that legal regimes based on sharp but unpredictable distinctions among similar objects are radically unstable. They fall apart over time because every instance of the rules' application is an invitation to at least one side to claim that instead of fitting in ideal category A the particular object in dispute should be deemed to fit instead in category B, where the rules will be more favorable to the party making the claim. This game - about whether a typewriter should be deemed a musical instrument for purposes of railway rate regulation, or whether a steam shovel is a motor vehicle - is the frequent stuff of legal ingenuity. But when the conventionally-approved legal categories require judges to distinguish among the identical, the game is infinitely lengthy, infinitely costly, and almost infinitely offensive to the unbiased bystander.

Although digital technology makes intellectual property more difficult to defend, in theory as well as in practice, it also makes it more valuable. Therefore, the trend has been for the metaphor of intellectual property to be extended more and more widely. For a useful (if partisan) expression of concern, see Lawrence Lessig's essay Reclaiming a Commons. Lessig worries about the very real possibility that extensions to intellectual property law, coupled with networked monitoring and enforcement, will bring

[t]he power through property to produce a closed society — where to use an idea, to criticize a part of culture, to quote “Donald Duck,” one will need the permission of someone else. Hat in hand, deferential, begging, a society where we will have to ask to use; ask to criticize; ask to deploy; ask to read; ask to browse; ask to do all those things that in a free society . . . one takes for granted.

If there is any cultural artifact that is -- and ought to stay -- part of our intellectual commons, it is surely language. All of us would scornfully dismiss the suggestion that access to information about English usage should be monitored and controlled, so as to limit it to those who are politically or economically entitled to have it. The goal of a language documentation project, in the sense intended by this workshop, is to make a language more fully and richly accessible not only to its speech community and to their descendents, but also to the communities in contact with them, and to the rest of humanity.

There apparently are some speech communities whose members uniformly believe that their language should be treated as occult knowledge that outsiders should be prevented from learning. Or at least they believe that access should be strictly monitored and controlled. If so, then obviously these languages are not candidates for documentation in the sense that we are talking about. These communities may wish to do their own documentation for their own internal purposes, but that is up to them.

The digital fruits of a language documentation project are obviously subject to copyright. There are also some opportunities in principle for language documentation to form the basis of money-making enterprises, and therefore there are opportunities for the people who contribute to the documentation to be exploited or taken advantage of. However, we should not allow this (frankly rather dim and nebulous) possibility to prevent language documentation from being done, or to prevent the results from being generally accessible and useful. Instead, we should use the tools of intellectual property law to try to ensure that contributors are treated fairly, while at the same time allowing the results to be as widely useful as possible.

I. Introduction
II. Recording
III. Copyright
IV. Human Subjects Review
V. Discussion