Ryan K. Boettger

Research

Using Authentic Language Data to Teach Discipline-Specific Writing Patterns to STEM Students

pedagogycorpusSTEM


Researcher’s note (June 2026). Stefanie Wulff and I wrote this paper to push back against the one-size-fits-all advice in generalist technical writing textbooks, using a corpus of our own students’ writing to show what STEM writers actually do rather than what they are told to do. Our central idea was that authentic language data lets students discover discipline-specific patterns—like the intentional use of passive voice—instead of memorizing prescriptive rules that contradict professional practice. That same conviction now drives my current work on AI in writing and assessment: if we want models to evaluate and generate discipline-appropriate prose, they need to be grounded in real disciplinary language data, not in the inherited prescriptions this paper set out to question.

Abstract

Technical and scientific writing service courses offer instructors the opportunity to engage with student populations from across the university. However, it is also this interdisciplinary appeal that has complicated the quality of instruction, particularly for STEM majors. The heterogeneous student population in our service courses often results in generic instruction that contradicts how STEM practitioners communicate on the job. We offer corpus-linguistic approaches as a solution for teaching variation in writing classes. Engaging students with authentic language data helps them understand the patterns used in their discipline rather than reinforce general, and potentially prescriptive, writing principles. We focus our study on the presence of passive voice and reporting verbs in a corpus of student-written critical reviews and white papers. Our results indicated that students applied passive voice in both text types, contradicting the advice of most generalist technical writing textbooks. More importantly, students appeared to use passive voice with an intent, often as a way to stay on topic. Our results also demonstrated that students used a variety of reporting verbs, notably show, believe, and conclude. Overall, these findings suggest ways that technical and scientific writing instructors can integrate corpus research into their classrooms.

Index Terms – Corpus linguistics, passive voice, reporting verbs, scientific writing, technical writing

Introduction

Understanding disciplinary writing differences has become increasingly relevant as writing instruction at the tertiary level moves from literature-based composition courses in English departments to include technical writing and content-based courses taught by scholars in different disciplines. Undergraduates are now expected to write in the discipline or across the curriculum. An effect of these changes is that students need to write in a way that conforms to the practices of a discipline they may not (yet) be familiar with [1-2].

For technical and scientific writing instructors, our service courses provide opportunities to engage with diverse student populations from across the university. However, it is also this interdisciplinary appeal that has complicated the quality of our instruction, particularly for STEM majors. The heterogeneous student population in our service courses often results in generic instruction that contradicts how STEM practitioners communicate on the job. For example, Wolfe found that many of the stylistic and organization principles outlined in 12 generalist technical writing textbooks actually run counter to the standards that scientists and engineers use on the job [3].

Quality instructors cannot feasibly customize their writing curriculum to meet the needs of all students; furthermore, research has shown that the instructional materials available to teach technical writing do not adequately prepare all students to write in their discipline.

We offer corpus-linguistic approaches as a solution for teaching variation in technical and scientific writing classes. Engaging students with authentic language data helps them understand the patterns used in their discipline rather than reinforce general, and potentially prescriptive, writing principles.

We focus our study on examining passive voice and reporting verbs in a corpus of student-written critical reviews and white papers.

I. Corpus-linguistics approaches to writing instruction

Corpus linguistics is an applied linguistics approach that uses computer-assisted techniques to explore authentic language data and facilitates large-scale analyses of writing patterns, expanding students’ depth and breadth of exposure to genres and language when compared to traditional instruction. Quantitative findings reveal language patterns that writers use, showing the most typical language choice for certain functions in context. Subsequently, qualitative analysis provides interpretation of reasons for typical and unusual choices [4].

While we both use and advocate for corpus-linguistic approaches in our own classrooms, the present paper is designed to show how teachers can integrate the findings from corpus research into their technical and scientific writing classrooms.

II. Passivce voice construction

We examined passive voice because advice on its use often reflects the global differences in writing in the humanities and writing in the sciences.

Generalist technical writing textbooks usually instruct students to avoid passive voice [3]. This advice likely reflects the textbook authors’ humanities backgrounds but not necessarily the needs of scientists: Technical communication is a people-oriented field, writing for end users, customers, and clients; however, the sciences and engineering are often object-oriented fields.

Indeed, studies of expert (or published) writing in engineering and sciences support this idea. Ding found that 67% of the 780 transitive verbs in a corpus of professional engineering documents were constructed in passive voice [5]. Conrad also observed that ecology researchers tended to use passive constructions with clear purposes like emphasizing and organizing information to read more effectively [4]. She concluded that writing instructors should deemphasize the “vague advice common in style handbooks” and instead engage students in activities that allow them to explore the different kinds of passive constructions and their effects on a text’s purpose [p. 321].

III. Reporting verbs

We also examined reporting verbs. Reporting verbs are not a common content area to technical communication instruction; however, their use often marks discipline-specific writing variation. Writers use reporting verbs to introduce reports, summaries, questions, or problems. These verb types follow the patterns (i.) V + that (I argue that…) and (ii.) be + V-ed (It was reported that…) [6, 7].

An analysis of expert writing across seven academic disciplines revealed seven frequently preferred reporting verbs: suggest, argue, find, show, describe, propose, and report [8]. However, when exploring their frequency by discipline, substantial differences were found both in the density of the reporting structures and in the choice of verb forms. For example, electrical engineers preferred to propose information while philosophers think and biologists describe.

These verbs can be further categorized according to the activity being reported. Verbs used to report findings or procedures could include observe, discover, analyze, and calculate. Similarly, reporting verbs can be categorized by the writer’s intended strength of idea. When persuading readers toward a position, the verbs apologize, believe, or insist could be used depending on the writer’s desire to take a weaker, neutral, or stronger position.

Teaching reporting verbs broadens students’ vocabulary and raises their awareness to overused (and perhaps, unpersuasive) verbs. For example, the verbs show and find are frequently found in professional writing; however, both are often found to be overused in student writing. The reason is likely linked to students’ writing development; professional writers use a variety of reporting verbs whereas students often default to a handful of verbs. Additionally, students often write simpler versions of the text types that they will write in professional settings, limiting opportunities for language variation [7].

Methods

We compiled a corpus of student scientific writing to explore passive voice and reporting verbs.

Our corpus included 72 texts written by 36 different students enrolled in either a senior or graduate-level scientific writing course. The corpus contained 136,562 words and 11,183 different words.

The first 36 texts were critical reviews. Critical reviews include an overview of a discipline’s relevant research and are usually written as a precursor to a new project. The second 36 texts were white papers. White papers are focused on presenting a solution to a common business or industry problem. This text type is defined by the intersection of the problem definition, technical solutions, and business analysis, which are all presented with an aspect of marketing-based language.

Seventy-five percent of our students were STEM majors. Almost 60% of these students were majoring in biology and preparing for medical or pharmacology schools. Ten other academic majors were represented, including computer science and computer engineering and materials science and engineering. The remaining students were humanities majors, including art education, English, or technical communication. Over 60% of the sample were female, and 75% spoke English as their first language.

We used AntConc to explore the corpus. AntConc is a freeware, multi-platform, portable software tool that was specifically designed to facilitate the teaching of technical writing from a data-driven perspective [9].

Our results were generated with a variety of AntConc functions, including concordances, word lists, and keyword lists.

Results

A preliminary step in exploring any corpus is understanding its vocabulary. We first generated a list of keywords, which are derived by comparing words in a target corpus to a reference corpus. These findings reveal what words are unique to a specific subset of texts.

Table 1 shows some of the keywords unique to both the critical reviews and white papers. Keywords in the critical reviews were related to research practices and sections, including article, section, authors, results and introduction. Keywords in the white papers were much more diverse, including water, Texas, hair, art, violent, and lionfish. The critical reviews included 3,347 different words compared to the 7,836 in the white papers.

Table 1. Keywords unique to the critical reviews and white papers (sorted by statistical significance)

Critical ReviewsFreqWhite PapersFreq
article274water168
the3400can483
section189will347
authors174Texas115
was301hair103
results248art101
introduction132violent100
discussion109lionfish99
study228video200
experiment101needs199

We next explored the presence of passive voice in the texts. Despite the call in most generalist technical writing textbooks to avoid its use, our results indicate that students used passive voice constructions. Concordance lines 1-3 illustrate the various ways students used passive voice in their scientific writing, including as a reporting function as well as a means to stay on topic.

  1. Also a study is referenced that suggested that high hydrostatic…
  2. …to support this particular research, no study is mentioned that corroborates that results…
  3. …lionfish spawn throughout the year, and it is estimated that females can lay over two…

Finally, we identified the three most frequent reporting verbs in the corpus: show (51 hits), believe (23 hits), and conclude (20 hits).

As mentioned earlier, show is one of the most common verbs in both professional and student writing. It can be categorized as a Show verb, which are verbs that indicate a fact or situation or with coming to know or think something. A writer’s use of show could also suggest a neutral position. It would contrast with announce or promise, which would both give a statement a stronger position. Concordance lines 4-6 illustrate the various uses in our corpus.

  1. …Gable and Harmon-Jones’s experiment showed that viewing cute images reduces the global…
  2. However, the results show that participants’ scores improved significantly…
  3. Analyses of natural dialogue in French show that silences within turns (pauses) may be…

Believe is categorized as a Think verb, which are verbs concerned with thinking, including having a belief, knowing, understanding, hoping, and fearing. A writer’s use of believe also indicates a neutral position. This use would contrast with imagine or insist, which would give a statement a weaker or stronger position, respectively. Concordance lines 7-9 illustrate students’ various uses.

  1. These new “objectors” also believe that the government is misleading the public…
  2. Many conservationists believe that money spent on specific reintroduction…
  3. …less valuable than empirical work; however, I believe that the clarity on each conceptual…

Conclude is classified as an Argue verb, which are verbs used to make a strong argument, either positive or negative. Concordance lines 10-12 illustrate students’ various uses.

  1. Roy et al. conclude that liquid slip phenomena should be considered…
  2. The authors conclude that silicon channels have high quality…
  3. Gibbons and Semlitsch concluded that their results in the curve were…

Discussion

The keywords unique to both the critical reviews and white papers suggest they are two distinct text types. The white papers included a larger vocabulary than the critical reviews (7,836 different words compared to the 3,347), suggesting the rhetorical range of the former text type and perhaps the limitations of the latter text type. This substantial difference in vocabulary might also signal students’ development over the 16-week semester. Critical reviews were the first major assignment submitted, and the white papers were the last major assignment.

Additionally, students applied passive voice in both the reviews and white papers, contradicting the advice of most generalist technical writing textbooks. More importantly, students appeared to use passive voice with an intent, often as a way to stay on topic. This is a particularly salient because it demonstrates that students are more mindful of their language choices than perhaps perceived. Other corpus analyses of technical writing, for example, have found that students intentionally choose not to follow the demonstrative this with a noun phrase to avoid assigning blame to themselves or another individual (e.g., This is an unfortunate situation) [10].

We earlier noted other scholars’ calls for technical writing textbooks to update content related to devices like passive voice. In fairness, some authors have responded. For example, when instructing students to convey bad news, the latest edition of Lannon and Gurak states—“Use passive voice to avoid accusations but do not dodge responsibility” [11]. This recommendation contrasts with the text’s 11th edition on the same topic— “In reporting errors or bad news, use active voice, for clarity and sincerity. The passive voice creates a weak and impersonal tone.”

Our results also demonstrated that students used a variety of reporting verbs, notably show, believe, and conclude. Show is common to professional and student writing, and its presence here could mark limitations in developing writers’ vocabularies. However, other results indicate that students used a variety of verb types (Show, Think, and Argue) as well as unexpected verbs like believe that are not often found in other studies. Similarly, other verbs often considered overused like find and say only appeared in our corpus 4 and 6 times, respectively.

Overall, these findings suggest ways that technical and scientific writing teachers can integrate corpus research into their classrooms. For example, students who elect to use passive voice should first consider what they are communicating and then the intent behind the message. Additionally, introducing students to reporting verbs increases their vocabulary, which better acculturates them into the writing conventions of their targeted discipline.

References

  1. C. Bazerman, Reference guide to writing across the curriculum: Parlor Press LLC, 2005.
  2. D. R. Russell, “Writing across the curriculum in historical perspective: Toward a social interpretation,” College English, pp. 52-73, 1990.
  3. J. Wolfe, “How technical communication textbooks fail engineering students,” Technical Communication Quarterly, vol. 18, pp. 351-375, 2009.
  4. S. M. Conrad, “Investigating academic texts with corpus-based techniques: An example from biology,” Linguistics and Education, vol. 8, pp. 299-326, 1996.
  5. D. Ding, “Object-centered—How engineering writing embodies objects: A study of four engineering documents,” Technical Communication, vol. 48, pp. 297-308, 2001.
  6. S. Hunston and F. GILL, “Verbs Observed: A Corpus-driven Pedagogic Grammar1,” Applied linguistics, vol. 19, pp. 45-72, 1998.
  7. E. Friginal, “Developing research report writing skills using corpora,” English for Specific Purposes, vol. 32, pp. 208-220, 2013.
  8. K. Hyland, Disciplinary discourses: Social interactions in academic writing: University of Michigan Press, 2004.
  9. L. Anthony, “Concordancing with AntConc: An introduction to tools and techniques in corpus linguistics,” JACET Newsletter, vol. 155, p. 2085, 2006.
  10. R. K. Boettger and S. Wulff, “The naked truth about the naked this: Investigating grammatical prescriptivism in technical communication,” Technical Communication Quarterly, vol. 23, pp. 115-140, 2014.
  11. J. M. Lannon and L. J. Gurak, Technical Communication, 14th ed. New York: Pearson, 2016.

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.