Chapter 17

Handbook of Learning Analytics
First Edition

Data Mining Large-Scale Formative Writing

Peter W. Foltz & Mark Rosenstein


Student writing in digital educational environments can provide a wealth of information about the processes involved in learning to write as well as evidence for the impact of the digital environment on those processes. Developing writing skills is highly dependent on students having opportunities to practice, most particularly when they are supported with frequent feedback and are taught strategies for planning, revising, and editing their compositions. Formative systems incorporating automated writing scoring provide the opportunities for students to write, receive feedback, and then revise essays in a timely iterative cycle. This chapter provides an analysis of a large-scale formative writing system using over a million student essays written in response to several hundred pre-defined prompts used to improve educational outcomes, better understand the role of feedback in writing, drive improvements in formative technology, and design better kinds of feedback and scaffolding to support students in the writing process.

Export Citation: Plain Text (APA)     BIBTeX     RIS

Supplementary Material
No Supplementary Material Available
References (46)

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390–412.

Baikadi, A., Schunn, C., & Ashley, K. (2015). Understanding revision planning in peer-reviewed writing. In O. C. Santos, J. G. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, & M. Desmarais (Eds.), Proceedings of the 8th International Conference on Education Data Mining (EDM2015), 26–29 June 2015, Madrid, Spain (pp. 544 – 548). International Educational Data Mining Society.

Bates, D., Maechler, M., Bolker, B. M., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. ArXiv e-print, Journal of Statistical Software,

Beal, C., Mitra, S., & Cohen, P. R. (2007). Modeling learning patterns of students with a tutoring system using Hidden Markov Models. Proceedings of the 2007 conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work, 238–245.

Black, P., & William, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74.

Buckingham-Shum, S. (2013). Proceedings of the 1st International Workshop on Discourse-Centric Analytics, workshop held in conjunction with the 3rd International Conference on Learning Analytics and Knowledge (LAK ’13), 8–12 April 2013, Leuven, Belgium. New York: ACM.

Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The Criterion Online writing service. AI Magazine, 25(3), 27–36.

Calvo, R. A., Aditomo, A., Southavilay, V., & Yacef, K. (2012). The use of text and process mining techniques to study the impact of feedback on students’ writing processes. Proceedings of the 10th International Conference of the Learning Sciences (ICLS ʼ12) Vol. 1, Full Papers, 2–6 July 2012, Sydney, Australia (pp. 416–423).

Calvo, R. A., O’Rourke, S. T., Jones, J., Yacef, K., & Reimann, P. (2011). Collaborative writing support tools on the cloud. IEEE Transactions on Learning Technologies, 4(1), 88–97.

Conati, C., Gertner, A. S., VanLehn, K., & Druzdzel, M. J. (1997). On-line student modeling for coached problem solving using Bayesian networks. Proceedings of the 6th International User Modeling Conference (UM97) (pp. 231–242).

Crossley, S. A., McNamara, D. S., Baker, R., Wang, Y., Paquette, L., Barnes, T., & Bergner, Y. (2015). Language to completion: Success in an educational data mining massive open online class. In O. C. Santos, J. G. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, & M. Desmarais (Eds.), Proceedings of the 8th International Conference on Education Data Mining (EDM2015), 26–29 June 2015, Madrid, Spain (pp. 388–392). International Educational Data Mining Society.

Deane, P. (2014). Using writing process and product features to assess writing quality and explore how those features relate to other literacy tasks. Educational Testing Research Report ETS RR-14-03.

Deane, P., & Quinlan, T. (2010). What automated analyses of corpora can tell us about students’ writing skills. Journal of Writing Research, 2(2), 151–177.

DiCerbo, K. E., & Behrens, J. (2012). Implications of the digital ocean on current and future assessment. In R. Lissitz & H. Jao (Eds.), Computers and their impact on state assessment: Recent history and predictions for the future. Charlotte, NC: Information Age.

Feng, M., Heffernan, N. T., Heffernan, C., & Mani, M. (2009). Using mixed-effects modeling to analyze different grain-sized skill models in an intelligent tutoring system. IEEE Transactions on Learning Technologies, 2, 79–92.

Foltz, P. W., Gilliam, S., & Kendall, S. (2000). Supporting content-based feedback in online writing evaluation with LSA. Interactive Learning Environments, 8(2), 111–129.

Foltz, P. W., & Rosenstein, M. (2013). Tracking student learning in a state-wide implementation of automated writing scoring. Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Data Driven Education.

Foltz, P. W., & Rosenstein, M. (2015). Analysis of a large-scale formative writing assessment system with automated feedback. Proceedings of the 2nd ACM conference on Learning@Scale (L@S 2015), 14–18 March 2015, Vancouver, BC, Canada (pp. 339–342). New York: ACM.

Foltz, P. W., & Rosenstein, M. (2016). Visualizing teacher assignment behavior in a statewide implementation of a formative writing system. Cover competition. Education Measurement: Issues and Practice, 35(2), 31.

Foltz, P. W., Streeter, L. A., Lochbaum, K. E., & Landauer, T. K. (2013). Implementation and applications of the Intelligent Essay Assessor. In M. D. Shermis & J. Burstein, handbook of automated essay evaluation: Current applications and future directions (pp. 68–88). New York: Routledge.

Gerbner, G., Holsti, O. R., Krippendorff, K., Paisley, W. J., & Stone, Ph. J. (Eds.) (1969). The analysis of communication content: Development in scientific theories and computer techniques. New York: Wiley.

Graham, S., Harris, K. R., & Hebert, M. (2011). Informing writing: The benefits of formative assessment. Carnegie Corporation of New York.

Graham, S., & Hebert, M. (2010). Writing to read: Evidence for how writing can improve reading. Carnegie Corporation of New York.

Graham, S., & Perin, D. (2007). A meta-analysis of writing instruction for adolescent students. Journal of Educational Psychology, 99, 445–476.

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.

Jeong, H., Gupta, A., Roscoe, R., Wagster, J., Biswas, G., & Schwartz, D. (2008). Using Hidden Markov Models to characterize student behaviors in learning-by-teaching environments. In B. Woolf, E. Aïmeur, R. Nkambou, & S. Lajoie (Eds.), Proceedings of the 9th International Conference on Intelligent Tutoring Systems (ITS 2008), 23–27 June 2008, Montreal, PQ, Canada (pp. 614–625). Berlin/Heidelberg: Springer.

Krippendorff, K., & Bock, M. A. (2009). The content analysis reader. Sage Publications.

Landauer, T. K., Laham, D., & Foltz, P. W. (2001). Automated essay scoring. IEEE Intelligent Systems, September/October.

Landauer, T., Lochbaum, K., & Dooley, S. (2009). A new formative assessment technology for reading and writing. Theory into Practice, 48(1), 44–52.

Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358–392.

Mislevy, R. J., Behrens, J. T., Dicerbo, K. E., & Levy, R. (2012). Design and discovery in educational assessment: Evidence-centered design, psychometrics, and educational data mining. Journal of Educational Data Mining, 4(1), 11–48.

Mollette, M., & Harmon, J. (2015). Student-level analysis of Write to Learn effects on state writing test scores. Paper presented at the 2015 annual meeting of the American Educational Research Association.

Page, E. B. (1967). The imminence of grading essays by computer. Phi Delta Kappan, 47, 238–243.

Parr, J. (2010). A dual purpose data base for research and diagnostic assessment of student writing. Journal of Writing Research, 2(2), 129–150.

Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert systems with applications, 41(4), 1432–1462.

Pinheiro, J., & Bates, D. (2006). Mixed-effects models in S and S-PLUS. New York: Springer-Verlag.

Reimann, P., Calvo, R., Yacef, K., & Southavilay, V. (2010). Comprehensive computational support for collaborative learning from writing. In S. L. Wong, S. C. Kong, & F.-Y. Yu (Eds.), Proceedings of the 18th International Conference on Computers in Education (ICCE 2010), 29 November–3 December, Putrajaya, Malaysia (pp. 129–136). Asia-Pacific Society for Computers in Education.

Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146.

Romero, C., & Ventura, S. (2013). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), 12–27.

Roscoe, R., & McNamara, D. S. (2013). Writing pal: Feasibility of an intelligent writing strategy tutor in the high school classroom. Journal of Educational Psychology, 105(4), 1010–1025.

Shermis, M., & Hamner, B. (2012). Contrasting state-of-the-art automated scoring of essays: Analysis. Paper presented at Annual Meeting of the National Council on Measurement in Education, Vancouver, Canada, April.

Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189.

Walvoord, B. E., & McCarthy, L. P. (1990). Thinking and writing in college: A naturalistic study of students in four disciplines. Urbana, IL: National Council of Teachers of English.

White, B., & Larusson, J. A. (Eds.). (2014). Learning analytics: From research to practice. New York: Springer Science+Business Media. doi:10.1007/978-1-4614-3305-7_8.

Whitelock, D., Field, D., Pulman, S., Richardson, J. T. E., & Van Labeke, N. (2013). OpenEssayist: an automated feedback system that supports university students as they write summative essays. Proceedings of the 1st International Conference on Open Learning: Role, Challenges and Aspirations. The Arab Open University, Kuwait, 25–27 November 2013.

Whitelock, D., Twiner, A., Richardson, J. T. E., Field, D., & Pulman, S. (2015). OpenEssayist: A supply and demand learning analytics tool for drafting academic essays. Proceedings of the 5th International Conference on Learning Analytics and Knowledge (LAK ʼ15), 16–20 March, Poughkeepsie, NY, USA (pp. 208–212). New York: ACM.

About this Chapter

Data Mining Large-Scale Formative Writing

Book Title
Handbook of Learning Analytics

pp. 199-210




Society for Learning Analytics Research

Peter W. Foltz1,2
Mark Rosenstein2

Author Affiliations
1. Institute of Cognitive Science, University of Colorado, USA
2. Advanced Computing and Data Science Laboratory, Pearson, USA

Charles Lang3
George Siemens4
Alyssa Wise5
Dragan Gašević6

Editor Affiliations
3. Teachers College, Columbia University, USA
4. LINK Research Lab, University of Texas at Arlington, USA
5. Learning Analytics Research Network, New York University, USA
6. Schools of Education and Informatics, University of Edinburgh, UK

Founding Members
Previous Image
Next Image

info heading

info content