Abstract When a statistical methods paper is submitted to a journal for publication, examples in which the method is applied to real data are highly encouraged by many journals and in some cases are explicitly demanded.

In this commentary, we argue that real data examples serve several useful purposes. However, we also argue that in many cases, particularly in the fields of genetics and genomics, there is an implicit or explicit expectation for examples to support purposes for which they are ill-suited and furthermore that these inappropriate expectations have negative consequences for the field.

We conclude by noting that real data examples can be tremendously valuable and should continue to be used where appropriate, but that the demands for, expectations of, and conclusions drawn from them need to be scaled back.

Examples, Simulation, Methodology, Statistics, Genomics, Pedagogy, Publishing Statistical methods are vitally important in the biological sciences and continue to evolve.

This is nowhere more true than in genetic and genomic research. In our experience, reviewers and editors of journals that publish papers offering new statistical methods in genetics and genomics are favorably disposed to papers including a real data example that illustrates the application of the method or methods under study.

We believe that the inclusion of real data examples is highly desirable for reasons that we describe below. However, there also seems to be a prevailing belief on the part of many reviewers and editors, especially of high-impact journals, that a a real data example is essential, b the example should reveal an exciting biological finding, and c the method that obtains this exciting finding offers a proof of principle or validation of the method.

In contrast, we believe that each component of this tripartite belief is ill-founded and detrimental. The purpose of this commentary is to offer a more supportable perspective on the value of real data examples, suggest greater restraint in what we ask of examples and what we conclude on their basis, and to offer guidance on using examples effectively for the purposes to which they are well suited.

First, in the past, to a large extent in textbooks as opposed to journal articles, real data examples served a key role in actually illustrating the computational steps involved in conducting certain statistical tests.

This can be invaluable as a pedagogical tool for students and works well with relatively simple calculations. Although this pedagogical tool is enormously valuable, its utility breaks down in complex situations where data sets are necessarily large, cannot be easily summarized by sufficient statistics in simple tables, and require difficult, often iterative, calculations that the reader will not be able to implement with paper and pencil.

Hence, the value of examples to illustrate the mechanics of calculations in modern peer-reviewed articles involving genomic techniques is limited.

Can be done with any example real data set. The data set need not be previously unpublished, especially interesting, or yield any particular result. Illustrate the concept of the method and how results can be interesting.

Can be done with any example real data set or for that matter a simulated data set. Inspire reader to use new method by serving as an exciting testimonial or case report of the value of the method.

This can be beneficial in promoting use but is tantamount to salesmanship rather than edification. Thus, the initial application can serve as a useful first field test of the method. Knowing that the method has been applied to real data at least once indicates that the application is practically feasible.

Third, an example can serve another pedagogical purpose, namely, conveying the concept or rationale of the proposed method and illustrating how the results obtained after applying the method can be interpreted.

Such uses of examples not only clarify but also can make for more interesting reading. Thankfully, such purposes can be served by any example real data set or for that matter even by a simulated data set.

Fourth, a real data example can provide the author with a vehicle through which to tell a story about why and how the new method should be used. Such storytelling has been shown to help people comprehend and especially retain new ideas more effectively [ 1 ].

Finally, real data examples, when they yield biological findings that appear to be new, important, and exciting, can inspire readers to want to use the technique.

In our experience, this is a powerful form of inspiration. An attention-getting paper in a premier journal that claims to have an exciting biological finding produced by a new method often initiates a flurry of calls to statistical geneticists by applied scientists wanting help implementing the new catholicon.

Although inspiring applied scientists to use new and valuable techniques is meritorious, as we shall discuss below, the increasing demand for inspiring examples comes at a price. In our opinion, the price is too steep. The first detriment is that of promoting the inclusion of extraneous information.

That is, in some cases, examples are included because the methodologist knows they are expected and yet they add no additional information or insight to the paper. In many cases, methodologist authors have proven the conclusion of the paper by mathematical proof or simulation study.

For some of these cases, it is straightforward to apply the method to real data or it has been demonstrated with simulated data while evaluating the method.

Nevertheless, authors may decide to include applications to real data because it is explicitly required or consensus exists that it will strengthen the appeal of the publication.

This kind of information regarding real data applications sometimes does not convey any critical information. Furthermore, although the example may not be detrimental, removing it would not affect the fundamental information and logic of the paper [ 23 ].

Statistical analysis is prevalent in the field of education research today, specifically in policy research and in studies of school management, funding, staffing, and student retention rates.

