Article Text

Do questions help? The impact of audience response systems on medical student learning: a randomised controlled trial
  1. Tyler E Mains1,
  2. Joseph Cofrancesco Jr1,
  3. Stephen M Milner1,2,
  4. Nina G Shah1,
  5. Harry Goldberg1
  1. 1Johns Hopkins University School of Medicine in Baltimore, Baltimore, Maryland, USA
  2. 2Johns Hopkins Burn Center in Baltimore, Baltimore, Maryland, USA
  1. Correspondence to Tyler Mains, Johns Hopkins School of Medicine, 1600 McElderry St, Armstrong Medical Education Building, Room 313, Baltimore, Maryland 21205, USA; tmains1{at}jhmi.edu

Abstract

Background Audience response systems (ARSs) are electronic devices that allow educators to pose questions during lectures and receive immediate feedback on student knowledge. The current literature on the effectiveness of ARSs is contradictory, and their impact on student learning remains unclear.

Objectives This randomised controlled trial was designed to isolate the impact of ARSs on student learning and students’ perception of ARSs during a lecture.

Methods First-year medical student volunteers at Johns Hopkins were randomly assigned to either (i) watch a recorded lecture on an unfamiliar topic in which three ARS questions were embedded or (ii) watch the same lecture without the ARS questions. Immediately after the lecture on 5 June 2012, and again 2 weeks later, both groups were asked to complete a questionnaire to assess their knowledge of the lecture content and satisfaction with the learning experience.

Results 92 students participated. The mean (95% CI) initial knowledge assessment score was 7.63 (7.17 to 8.09) for the ARS group (N=45) and 6.39 (5.81 to 6.97) for the control group (N=47), p=0.001. Similarly, the second knowledge assessment mean score was 6.95 (6.38 to 7.52) for the ARS group and 5.88 (5.29 to 6.47) for the control group, p=0.001. The ARS group also reported higher levels of engagement and enjoyment.

Conclusions Embedding three ARS questions within a 30 min lecture increased students’ knowledge immediately after the lecture and 2 weeks later. We hypothesise that this increase was due to forced information retrieval by students during the learning process, a form of the testing effect.

  • EDUCATION & TRAINING (see Medical Education & Training)
  • MEDICAL EDUCATION & TRAINING

Statistics from Altmetric.com

Introduction

Audience response systems (ARSs) use electronic devices, informally known as ‘clickers’, which enable learners to anonymously answer multiple-choice questions posed by an instructor during a lecture. The lecturer typically displays a question on a slide during the lecture, and students are asked to read the question and choose their preferred answers. Students then push the appropriate button on their clickers to submit their responses. A histogram of the number of learners selecting each answer choice can be displayed immediately after the poll, allowing the lecturer and learners to see a visual representation of learners’ current understanding.

ARS use has recently increased in medical schools1–3 and may have several potential benefits including creating an active learning environment, improving engagement and attendance, assessing understanding, and increasing comprehension and retention of material.4 Research also suggests that learners enjoy the use of ARSs during lectures, become more involved in the learning process, and report a higher perception of learning.5–7 Similar benefits have been shown in fields outside of the health sciences including business management and sociology.8 ,9 According to Ebbinghaus's forgetting curve, a significant amount of knowledge loss takes place as soon as 2 weeks after a learner is exposed to new content.10 However, ARSs may also be able to slow the rate of knowledge loss.

At least three explanations have been used to describe how ARSs might improve comprehension and retention. First, several studies have shown that self-assessment improves long-term retention compared with repeated reading and reviewing concepts, a theory called the testing effect.11–14 Second, ARS use enhances the lecturer's ability to rapidly identify and correct learners’ misunderstandings during a lecture. Hatch et al15 found that learners were surprised at how many questions they answered incorrectly during lectures that used ARSs, implying that ARSs can be used as a formative assessment for learners. Finally, the average attention span of an adult learner in a lecture is approximately 20 min.16 ,17 ARS questions may therefore provide an opportunity to ‘restart the attention clock’ during a 60 min lecture by providing a break from listening to the lecturer.18

Despite this theoretical framework, the actual effect of ARSs on learning during a lecture remains unclear, largely because of inherent confounding variables. For instance, researchers cannot completely control for differences in lecturer efficacy and engagement, so creating an adequate control group is difficult. Even if the same instructor is used for both the control and experimental groups, he or she may alter a lecture between iterations. In addition, medical students are highly likely to encounter the material presented during an educational study outside of the experimental setting because of the intentional repetition built into medical school curricula. Therefore, students’ prior knowledge and reviewing of content after the intervention probably affect the results of an education research study.

In order to control for these variables, we used two critical strategies. First, we used a recorded lecture to eliminate all variables associated with the individual lecturer, thereby creating two identical learning experiences. Second, the content of the lecture focused on a concept that was outside of the usual medical school curriculum, so participants were unlikely to have prior knowledge of the material, and had little incentive to study the information on their own. In addition, participants would not be exposed to the material after the lecture, so the rate of knowledge loss would not be influenced by incidental instruction.

The primary objective of this study was to quantify the impact of ARSs on medical student learning. We assessed knowledge immediately after a lecture and 2 weeks later to evaluate the effect of ARSs on the rate of knowledge loss. A secondary objective was to describe students’ perception of the ARS experience.

Methods

Recruitment of participants

First-year medical students at the Johns Hopkins University School of Medicine were recruited into the study with email invitations. Students were incentivised to join the study with free lunch on the day of the study and the opportunity to enter a raffle to win one of three US$30 gift certificates to the medical school café.

Randomisation

Participants were randomly assigned to one of two groups based on the order in which they agreed to participate in the study. The first student to sign up for the study was assigned to group A, the second student to group B, the third student to group A, etc. Two days before the study, participants were emailed their group assignments and instructed to report to a specific lecture hall, based on their random group assignment. Group A served as the control group in the West Lecture Hall, and group B was the experimental group in the East Lecture Hall. Participants did not know the difference between the two groups, only their assigned group and location.

Bias prevention

As in most educational research studies, participants in the ARS group could not be blinded to the fact that they were exposed to ARSs, but they did not know that was the independent variable in this study. On the day of the study, each participant received a randomly generated individual study code (Nos 1–92). No identifying information was linked to any of the study codes. Only individual study codes were included in the dataset, allowing all analyses to be conducted without any identifying information. In addition, no part of this study was linked to any incentive based on answers to the questionnaires.

The lecture

The initial phase of the study required all participants to watch a previously recorded 30 min lecture covering the clinical presentation and management of patients with severe burn, a topic outside the first-year medical student curriculum. The Chief of Burn Surgery at the Johns Hopkins University School of Medicine created the lecture. Using a recorded lecture was a critical component of our experimental design, as it guaranteed standardisation in content delivery to both groups.

The intervention

Both groups watched the recorded lecture at the same time in identical adjacent lecture halls on 5 June 2012. During the lecture, students could take notes but were not permitted to ask questions so that we could control the content delivered. They were also asked not to discuss the lecture content or subsequent questionnaires with anyone until the study was completed. Group A did not use ARSs; group B's video had three ARS questions integrated at 7 min, 18 min and 27 min (figure 1). The three ARS questions were multiple-choice questions written at the knowledge and comprehension level of Bloom's Taxonomy (see online appendix A). Knowledge questions test learners’ recall of information, and comprehension questions assess whether learners understand the meaning of information presented.19 Each time a question appeared in the video, the lecture was paused, and students had 1 min to read the question and select their answers using their handheld devices. To create a realistic ARS experience, the lecturer was present to view the results of the histogram and state the correct answer to each of the ARS questions. He did not add additional information to the lecture at any time; he only repeated the question and said the correct answer choice.

The first questionnaire

Immediately after the lecture, a two-part online questionnaire was emailed to all participants and was open for 4 h after the conclusion of the lecture to ensure all participants would have time to complete it. The first part of the questionnaire included three questions to assess participants’ enjoyment, perception of learning, and engagement during the lecture using 5-point Likert scales, as well as demographic data (see online appendix B). The second part included 10 questions to test participants’ knowledge of the lecture content (the ‘initial knowledge assessment’, see online appendix C). Eight questions were knowledge or comprehension questions, and the remaining two were application questions which assess learners’ ability to use a concept in a new situation.19 None of the questions were identical with the ARS questions, although three questions covered similar content. Students were instructed to complete the questionnaire independently without consulting notes, books or any other resources to ensure their knowledge was solely from the lecture.

The second questionnaire

Two weeks later, participants were emailed another two-part questionnaire without prompting. This second questionnaire was open for 4 days to allow ample time for participants to complete it. The first part of the questionnaire asked participants if they reviewed notes, used external resources, or discussed the content of the lecture with anyone between the two quizzes, and if so, to describe what they did. This allowed us to assess if any confounding variables were present. The second part of this questionnaire contained the same 10 questions as the initial questionnaire to assess knowledge retention (‘second knowledge assessment’). However, the order of questions and order of answer choices within each question were randomised using random.org's random number generator to minimise simple recall.

Data analysis

To quantify the impact of ARSs on student learning, we analysed the scores from both knowledge assessments. We further divided the knowledge assessments into three subsets: (1) all 10 questions; (2) the three ARS-related questions; (3) the remaining seven unrelated questions. This differentiation allowed us to better understand ARS effects, if any, on participants’ knowledge. To determine whether ARSs can slow the rate of knowledge loss, we compared individual participants’ score changes from the initial to the second knowledge assessment using their individual study codes. Finally, we analysed results from the Likert scale questions to determine what impact ARSs had on participants’ perception.

Statistical analysis

Power calculation

We hypothesised that the knowledge assessment score within each group would be normally distributed with an SD of 0.2. With the assumption that the difference between groups on the knowledge assessment scores would be 0.12, we would need 45 participants per group to be able to reject the null hypothesis with a probability power of 0.8. The type I error probability associated with this would be 0.05.

Item reliability

Cronbach's α is a measure of the correlation between questionnaire items, or internal consistency. Some experts recommend that the mean inter-item correlations fall between 0.15 and 0.50.20–22 Assessments with higher inter-item correlations may be redundantly assessing the same concept with different items; however, our assessment tested 10 independent topics covered during the lecture. Given this, the knowledge assessment in its current form may have an acceptable mean α as low as 0.15, since we are measuring the broad construct of burns, as opposed to a focused, lower-order construct such as advanced surgical management of burn wounds. In addition, correlations between dichotomous items, which are either completely correct or incorrect, are too low for high reliability. Finally, it is a relatively short quiz, and α increases with more items.

Item difficulty and discrimination

Item difficulty is the percentage of students who correctly answered an item; it ranges from 0 to 1.0, where a higher value indicates that more students answered the item correctly. The optimal difficulty level for a five-level, multiple-choice question is 0.60. Item discrimination is measured using the point-biserial correlation (PBC), which ranges from −1.0 to 1.0, with higher values indicating more discrimination. It describes the extent to which the students who had high quiz scores answered the item correctly, and students who had low quiz scores answered the item incorrectly. PBC values of 0.20 and higher are in the acceptable range.23

Knowledge assessment scores

Two-sample t tests were also used to examine the mean scores between groups A and B for each of the three subsets of questions in both knowledge assessments described above. The distribution of quiz scores and mean differences were checked for normality. To better understand the practical ARS effect and magnitude, effect size was calculated for group comparisons where t test significance level α=0.10. Effect size is independent of sample size and goes beyond the probability of finding an association. We calculated effect size for comparisons that were statistically or marginally significant to see if the effect of ARSs can be characterised as small, moderate or large. A commonly used estimate, the Cohen's d, provides a measure for the standardised difference between two means, expressed in units of SD.24 Cohen originally explained effect size where 0.2 was a small effect, 0.5 was moderate and 0.8 was large. He later developed a more precise guideline where effect size could be interpreted as the per cent of non-overlap between two groups. For instance, an effect size of 0 means the per cent of non-overlap was 0%, and the distribution of the two groups overlap completely.

Rate of knowledge loss

To assess the decrease in knowledge assessment scores from the first to the second questionnaire, an analysis of covariance (ANCOVA) included the first knowledge assessment score as a model covariate. This generated a more precise measure of how ARSs affected the rate of knowledge loss.

Perception of ARSs

Two-sample t tests were used to compare the mean scores for the students’ perceptions of ARSs on the three Likert scale questions. Internal consistency among these three items was calculated using Cronbach's α as described above.

Statistical system

SAS V.9 was used to analyse results.

Results

Participant demographics

Ninety-two out of 119 first-year medical students (77%) opted to join the study. Student demographic data for both groups are presented in table 1. All of the participants who completed the initial knowledge assessment completed the second knowledge assessment (100% completion rate) (figure 2).

Table 1

Participant demographics

Figure 2

CONSORT 2010 flow diagram.

Confounding variables

Twelve participants, six from each group, did not remember their individual study code, so these 12 were not included in the ANCOVA to determine individual rate of knowledge loss (table 2). One student from each group responded that they read online information about the topic, one student from group B responded that she reviewed her notes, and five students from group B indicated that they discussed the material with someone else before the second knowledge assessment. However, their discussions involved the graphic nature of the images in the lecture rather than the tested content. It should be noted that the video for group A paused several times due to compression issues; the video was restarted after a less than 2 s pause each time.

Table 2

Knowledge assessment scores

Item reliability

An item-level analysis of all 10 questions used on the knowledge assessment showed that Cronbach's α ranged from 0.26 to 0.49, indicating moderate reliability. However, The Spearman–Brown formula was used to predict the anticipated reliability of a longer quiz, so tripling the number of questions from 10 to 30 would have increased Cronbach's α for the overall quiz from 0.39 to 0.66.

Item difficulty and discrimination

The mean difficulty levels were acceptable, ranging from 0.53 to 0.87 for the initial knowledge assessment. Item discrimination, as measured by the PBC, ranged from −0.03 to 0.37 (mean=0.15) for the initial knowledge assessment. The item related to oedema in a burn had the coefficient of −0.03. One might conclude that the oedema question did not discriminate at all, and the most knowledgeable students were answering the item incorrectly and the least knowledgeable students were answering correctly.

Knowledge assessment scores

Overall, the ARS group had significantly higher scores on both the initial and second knowledge assessments (table 2). Immediately after the lecture, the mean (SD) scores were 7.63 (1.40) in the ARS group and 6.39 (1.84) in the control group, p=0.001. Two weeks later, the ARS group still had significantly higher scores on the second knowledge assessment with 6.95 (1.74) compared with 5.88 (1.86), p=0.01. The difference between groups was larger when we analysed only the three quiz questions that covered similar topics to the ARS questions. On the initial knowledge assessment, the ARS group's mean for that subset was 2.50 (0.65) compared with 1.80 (0.95) in the control group, p<0.001. Similarly, on the second knowledge assessment, the ARS group's mean for that subset was 2.11 (0.83) whereas that of the control group was 1.54 (0.87), p=0.004. Furthermore, adding ARSs even improved the means of the other seven non-related questions on both assessments, but the difference was not statistically significant. We found minimal violations with the distribution of scores and normality for all comparisons.

The highest effect size was found for the 10-question subset (Cohen's d=0.87) and the three-ARS-question subset (Cohen's d=0.92), on the initial knowledge assessment. These results indicate a non-overlap of 51.6% in the distribution of scores for groups A and B for both subsets (rounded to the nearest tenth). This is a large effect size and can be interpreted that learners in the ARS group, on average, had higher scores than the non-ARS group by nearly one SD. Lesser but moderate effect size estimates were found for the seven-non-ARS-question subset on the initial knowledge assessment (Cohen's d=0.48; non-overlap of 33.0%), as well as all three subsets on the second knowledge assessment: 10-question subset (Cohen's d=0.60; non-overlap of 38.2%), three-ARS-question subset (Cohen's d=0.58; non-overlap of 38.2%), and seven-non-ARS-question subset (Cohen's d=0.39; non-overlap of 27.4%).

Rate of knowledge loss

The decrease in scores from the initial knowledge assessment to the second assessment did not significantly differ between groups for the overall quiz. However, an ANCOVA showed that the mean adjusted scores for the subset of three ARS-related questions did differ. The ARS group had a significantly higher knowledge assessment score mean of 2.02 (0.14) compared with the control group's mean of 1.62 (0.14) on that subset, p=0.05 (table 2).

Participant perception

Adding ARSs to the lecture increased students’ self-reported engagement, perception of learning and enjoyment (table 3). Moderate effect sizes were found for the following: ‘I was fully engaged for the entirety of this lecture’ (Cohen's d=0.42), ‘I believe I learned a lot during this lecture’ (Cohen's d=0.62) and ‘I enjoyed the format of this lecture’ (Cohen's d=0.59). There was high internal consistency for all three self-reported items (Cronbach's α=0.81).

Table 3

Participant perception results

Discussion

Incorporating only three ARS questions into a 30 min lecture significantly improved participants’ scores on a knowledge assessment by roughly 10% immediately after the lecture and 2 weeks later. In addition, the rate of knowledge loss decreased for content tested using ARSs. Using ARSs during the lecture also increased participants’ self-reported engagement, learning and enjoyment. Therefore, we support the use of ARSs as one strategy to increase learner knowledge, retention and engagement during lectures.

Several similar studies have shown conflicting results. Two single-site studies reported improved retention with ARSs,1 ,2 while a larger multi-institutional study involving physicians showed no improvements in knowledge or retention.3 Hoyt and colleagues25 reported that medical students’ final examination scores did not improve after the introduction of ARSs compared with the previous year's students, while another study involving medical students found increased understanding of the material without any increase in retention.26 These differences may be accounted for, in part, by inherent limitations of the study designs. For instance, several of these studies examined the use of ARSs within existing curricula. In this setting, participants were inclined to independently study the material, thereby improving their knowledge regardless of educational methodologies used, introducing a significant confounding factor. These studies also used live lectures; thus, variables introduced by lecturers themselves could not be controlled. Judson and Sawada27 reviewed 33 years of studies involving clicker use in college lecture halls, and concluded that the improvements in student comprehension initially attributed to ARS use were probably due to individual instructors’ efficacy. We searched several databases, including PubMed, and were unable to find any studies that controlled for these factors. We uniquely controlled these variables through two key features in our study design, which increased the accuracy and validity of our results: (1) testing material outside of the normal curriculum and (2) using a recorded lecture.

However, the generalisability of our results is limited because of the use of a single cohort of students at a single school, the relatively small sample size, and the coverage of a single topic. An additional limitation is the structure of the knowledge assessment, as some items included ‘all of the above’ and ‘none of the above’ answer choices, as well as negatives in the question stems. However, the purpose of this study was to evaluate a teaching method, and the assessment items had appropriate reliability, difficulty and discrimination.

Further study is needed to determine the effect of asking various types of question with ARSs, as this may better define how to use ARSs during lectures. For instance, testing foundational material at the knowledge or comprehension level of Bloom's Taxonomy may be more likely to provide benefits earlier in a lecture, especially for students who need to correct misunderstandings before learning advanced material. Similarly, asking a difficult application question towards the end of the lecture would allow students to review some of the key concepts presented earlier in the lecture. The impact of ARSs may also be related to the number of questions asked during a lecture, as there may be a point of maximum benefit, after which, additional questions have a neutral or negative effect on learning. Furthermore, students’ learning styles and preferences could contribute to or detract from ARS benefits, and should therefore be examined in conjunction with educational interventions.

As discussed in the Introduction, three current theories that explain the beneficial impact of ARSs on learning are (1) the testing effect, (2) correcting early misunderstandings, and (3) restarting the attention clock. The effects of ARSs in this study can be attributed to the testing effect because there was a statistically significant difference in knowledge assessment scores between content that was specifically tested using ARSs and content that was covered but not tested via ARSs. This implies that the process of thinking about a question and choosing an answer provides an additional learning benefit for that content. Chan et al28 found that testing also facilitates retention of related but non-tested information, a result that could possibly be attributed to the second theory. However, our results did not show a statistically significant improvement in knowledge for the seven non-ARS questions and did not support this theory. Furthermore, if ARSs were simply an attention-grabbing mechanism, there would be no significant difference in scores between tested and non-tested content.

Because ARS benefits can be explained by the testing effect, lecturers might be able to use less expensive ways than electronic devices to ask questions during lectures which may provide similar effects on learning. In fact, Stoddard and Piquette29 showed that there was no difference in educational outcomes between lecturers that incorporated ARSs and lecturers that simply put the same multiple choice questions to the class. In contrast, Mayer et al30 found that using ARSs to answer questions during a lecture increased examination scores relative to answering the same questions without ARSs. Other methods of asking questions during the learning process include administering short quizzes at the end of class and online quizzing. Both of these have been shown to be effective31 ,32 and may be as convenient as ARSs, but these methods assess students after the material has been presented rather than during the learning process. Another study involving undergraduate students found that using ARSs increased student participation, honesty while answering questions, and positive emotion during a lecture compared with hand-raising and response cards,33 so there may be additional benefits of using ARSs. Ultimately, it is our view that ARSs provide a convenient and effective way to improve the learning experience.

Main messages

  • Incorporating audience response systems (ARSs) into a lecture significantly improved participants’ knowledge.

  • The benefits of ARSs on learning can be explained by the testing effect.

  • Participants exposed to ARSs reported higher levels of engagement and enjoyment during the learning process.

  • To eliminate confounding variables, the authors used a recorded lecture on content outside of the normal medical school curriculum.

Current research questions

  • How does the type of question asked via ARSs affect student learning (example: application versus knowledge question)?

  • How does the type of material asked via ARSs affect student learning (example: new foundational facts versus review concepts)?

  • Is there a maximum number of ARS questions after which increasing the number of questions during a lecture has a neutral or negative impact on learning?

  • How do students’ learning styles and preferences change the impact of ARSs, if at all?

Key references

  • Pradhan A, Sparano D, Ananth CV. The influence of an audience response system on knowledge retention: an application to resident education. Am J Obstet Gynecol 2005;193:1827–30.

  • Miller RG, Ashar BH, Getz KJ. Evaluation of an audience response system for the continuing education of health professionals. J Contin Educ Health Prof 2003;23:109–15.

  • Karpicke JD, Roediger III HL. The critical importance of retrieval for learning. Science 2008;319:966–8.

  • Tregonning AM, Doherty DA, Hornbuckle J, et al. The audience response system and knowledge gain: a prospective study. Medical Teacher 2012;34;e269–74.

  • Stowell JR, Nelson JM. Benefits of electronic audience response systems on student participation, learning, and emotion. Teach Psychol 2007;34:253–8.

Acknowledgments

The authors gratefully acknowledge all of the medical students who participated in the study and completed both quizzes. They would also like to thank Mr John Steele and Mr Mark Dodd for providing technical assistance throughout the study.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors All five authors contributed to the conception and design of the study or data analysis, assisted in drafting the manuscript, approved the final submission, and agreed to be accountable for the work. Specifically, TEM, HG and JC designed the study, SMM created the lecture content, and NGS created the data analysis plan.

  • Competing interests None declared.

  • Ethics approval The study was approved by the Johns Hopkins University Institutional Review Board, Study #NA_00073103. This work was carried out in accordance with the Declaration of Helsinki, including, but not limited to, there being no potential harm to participants, the anonymity of participants being guaranteed, and the informed consent of participants being obtained.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.