Before we start…
An alternative version of this document was initially drafted by Timothée Poisot with input from members of the Viral Emergence Research Initiative. For this reason, it is excluded from the CC BY-NC-SA license under which the rest of the webiste is published, and may not be reproduced without permission.
An important piece of reading to understand why this document is necesary is “Against the Uncritical Adoption of ‘AI’ Technologies in Academia” by Olivia Guest and colleagues. Another is Joseph Fasano’s “For a Student Who Used AI to Write a Paper” - it asks the very important question: what are we trying to be free of?
What is the purpose of this document?
This document serves two distinct purposes. First, it is a position statement that describes, normatively, our position about the use of generative AI. Second, it establishes a series of rules and practical recommendations.
This document is an integral part of the lab’s code of conduct and values statement. It is intended to be enforced. In addition, the purpose of the document is
- to educate lab members about the pitfalls of GenAI,
- to identify possible solutions that the lab can deploy to remove the need to rely on GenAI, and
- to serve as a basis to identify possible issues not currently addressed by the position statement.
A working definition of GenAI
GenAI is defined here as any system in which an output is produced in response to a textual prompt by the user. Examples of GenAI systems or platforms include ChatGPT, Bard, Copilot, StableDiffusion, MidJourney, DALL-E, and Sora. In cases where the question of “is this GenAI?” is open-ended, Timothée will make the final determination.
What is covered by this document?
This position statement and its rules applies to every activity that is funded (either in part or wholly) by the grants supporting the lab (including, but not limited to, grants for which Timothée is the principal investigator), taking place in the lab, using lab resources, or accomplished by trainees under Timothée’s direct or indirect supervision as part of their lab activities.
It applies to every output, deliverable, code, or data product generated through these activities.
This position statement applies to every area of the lab as a space for open scientific collaboration, including research, software development, data cleaning, outreach activities, public speaking, presentations, and scientific illustration or writing. Notably, this policy also applies to scientific communication aimed at promoting the work done by the lab on social media.
Most projects led by the lab involve external collaborators; it is expected that the lab member in charge of the project will communicate this position statement to such collaborators on projects led by a lab member, especially regarding public facing communications (e.g., press and social networks).
What is not covered by this document?
This document does not cover the uses of non-generative AI. In the context of our work, non-generative AI is usually of the applied machine learning variety. Timothée can be asked to clarify whether a specific use of AI falls within our working definition of GenAI.
This position statement does not cover the use of GenAI as part of research projects on GenAI, or using GenAI as a necessary and legitimate research tool, after they have been discussed with Timothée. It is expected that as of the current revision of this document, this use is currently mostly restricted to Retrieval-Augmented Generation.
This position statement and its rules are agreed upon in addition to instutional rules and guidelines about the use of GenAI, their required disclosures, and the instutional rules about plagiarism.
This position statement exists within the broader landscape of our professional obligations, and ethical responsibilities. Institutional and funder regulations about plagiarism, policies about GenAI, as well as the rules of good behaviour in our field, still apply.
Position statement on the use of GenAI
The use of GenAI is disallowed-by-default: lab members must operate under the assumption that no use of GenAI is allowed as part of the lab’s research projects.
Motivations for the position statement: research is a celebration
The lab’s position statement on GenAI is informed by our values as a collective. First and foremost, research should bring us joy: the joy of using creative work to push the boundaries of knowledge, reconsider the assumptions of our fields, and contribute positively to existential challenges. Every part of this process is worth doing well, and is worth doing ourselves, with care, consideration, and intent.
Consideration 1 - transparency and credibility
We produce both fundamental and translational research. As outlined in our theory of change, we produce this research with the intent of having an impact on decision-making. Because the stakes of our work are potentially immense, we cannot hide behind “the machine told me to” as an excuse to produce work we do not understand. Work we cannot explain is work we cannot trust; and therefore, we cannot ask the rest of our community to trust it. The value added of our research is that we believe every piece of it can be trusted, and every decision we have taken can be explained.
Our commitment to open science is fundamentally a commitment to epistemic transparency, and this cannot be reconciled with the use of generative AI.
Consideration 2 - respect and multidisciplinarity
The work we do is multidisciplinary, and we study a variety of systems, places, and processes, with a varied methodological toolkit. This works (i) because we assume our colleagues hold significant expertise and (ii) because we can expect clarification on any point that we are not ourselves experts on. More fundamentally, multidisciplinary work is effective when we realize that expertise lies within people, and that we cannot remove people from the equation without removing the expertise as well. Assuming that we can replace our colleagues with a statistical model trained on their knowledge can never lead to meaningful work.
Consideration 3 - credit and plagiarism
GenAI models produce their results through (intellectual property; IP) theft: the mining of material required to train them cannot be sustainably collected in full respect of IP laws. GenAI trained on visuals, in particular, must steal from artists to exist. Plagiarism is (second only to data fabrication) one of the most grievous forms of research misconduct. Plagiarism done through a GenAI model is still plagiarism, and the responsibility for it lies with the user. In the same way that we would find plagiarism of our own work distasteful, we do not use GenAI tools to plagiarize the work of others.
The use of GenAI cannot be reconciled with our long-standing commitment towards non-exploitative practices of science, and our long-standing advocacy for effective credit mechanisms.
Rules by area of use
Writing
Large Language Models (LLMs) generate text because they have been trained on a vast corpus of written documents, obtained without respect for intellectual property regulations, and generate this text in a way that is disconnected from the primary source. Using LLMs to generate text is an instance of plagiarism, and is, as such, not allowed.
Writing is a craft, at which one gets better through practice. The purpose of lab work is not only to produce research, but also (and primarily) to build up foundational skills while doing so. This cannot be reconciled with the use of GenAI for writing.
There is a notable exception to the policy when it comes to text. Because English is not the first (or second) language of most lab members, the use of AI as an assistive technology for writing is allowed. This use is limited to minute changes, such as spelling, grammar, and minor rewriting, such as can be suggested by e.g. Grammarly or LanguageTool. The use of GenAI to write, as a guideline, a fragment of text longer than a single sentence (e.g., with ChatGPT), is prohibited.
Peer review and assessment
The use of GenAI as a tool to assist in the peer review of research papers is similarly not allowed. No statistical model can replace human expertise. In addition, the rules of peer review make very clear the fact that disclosing information about the content of a manuscript is a breach of professional ethics; this is even more true when interacting with commercial services that have strong incentives to mine this content to refine their product.
Most granting agencies and publishers will now explicitly prohibit the use of LLMs to assist with evaluation. Being asked to review a manuscript or a grant is both trust in our expertise, and an acknowledgement of our standing in our community; we cannot squander this trust by refusing to invest time and effort in the evaluation of the work of our colleagues.
Lab members have substantial experience as authors and reviewers, and we will rely on collective discussions to navigate possible difficult situations.
Code
The production, diffusion, and sharing of code is regulated through software licenses. LLMs trained on code may, or may not, respect these licenses, and so can put users in a situation where they would infringe on existing licenses.
Re-using code in violation of the terms of its license would create a potential liability for individuals, their institutions, and the lab, particularly given that most of the code we produce for our projects is openly available on a single Github organization.
Lab members have a very long collective experience in designing, writing, and deploying programs in various languages and environments, and collaboration on code has historically been a norm in our community.
Media outputs
The use of GenAI for “art” is threatening the livelihood of actual illustrators, designers, and videographers, by ingesting their output (with no regard for credit or intellectual property) and allowing it to be mined.
When discussing lab work in a non-primarily-lab presentation, we encourage the presenter to follow this position statement, at least for the slides showing lab work.