Chapter 8

Handbook of Learning Analytics
First Edition

Natural Language Processing
and Learning Analytics

Danielle S. McNamara, Laura K. Allen, Scott A. Crossley,
Mihai Dascalu, & Cecile A. Perret

Abstract

Language is of central importance to the field of education because it is a conduit for communicating and understanding information. Therefore, researchers in the field of learning analytics can benefit from methods developed to analyze language both accurately and efficiently. Natural language processing (NLP) techniques can provide such an avenue. NLP techniques are used to provide computational analyses of different aspects of language as they relate to particular tasks. In this chapter, the authors discuss multiple, available NLP tools that can be harnessed to understand discourse, as well as some applications of these tools for education. A primary focus of these tools is the automated interpretation of human language input in order to drive interactions between humans and computers, or human–computer interaction. Thus, the tools measure a variety of linguistic features important for understanding text, including coherence, syntactic complexity, lexical diversity, and semantic similarity. The authors conclude the chapter with a discussion of computer-based learning environments that have employed NLP tools (i.e., ITS, MOOCs, and CSCL) and how such tools can be employed in future research.

Export Citation: Plain Text (APA) BIBTeX RIS

Supplementary Material

No Supplementary Material Available

References (77)

Allen, L. K., Jacovina, M. E., & McNamara, D. S. (2016). Computer-based writing instruction. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research, 2nd ed. (pp. 316–329). New York: The Guilford Press.

Allen, L. K., & McNamara, D. S. (2015). You are your words: Modeling students’ vocabulary knowledge with natural language processing. In O. C. Santos, J. G. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, & M. Desmarais (Eds.), Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015) 26–29 June 2015, Madrid, Spain (pp. 258–265). International Educational Data Mining Society.

Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V. 2. The Journal of Technology, Learning and Assessment, 4(2). doi:10.1002/j.2333-8504.2004.tb01972.x

Baker, R., Wang, E., Paquette, L., Aleven, V., Popescu, O., Sewall, J., Rose, C., Tomar, G., Ferschke, O., Hollands, F., Zhang, J., Cennamo, M., Ogden, S., Condit, T., Diaz, J., Crossley, S., McNamara, D., Comer, D., Lynch, C., Brown, R., Barnes, T., & Bergner, Y. (in press). A MOOC on educational data mining. In. S. ElAtia, O. Zaïane, & D. Ipperciel (Eds.). Data Mining and Learning Analytics in Educational Research. Wiley & Blackwell.
Bakhtin, M. M. (1981). The dialogic imagination: Four essays (C. Emerson & M. Holquist, Trans.). Austin, TX: University of Texas Press.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(4–5), 993–1022.

Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The Criterion online writing service. Ai Magazine, 25(3), 27.

Chaturvedi, S., Goldwasser, D., & Daumé III, H. (2014). Predicting instructor’s intervention in MOOC forums. In D. Marcu, K. Toutanova, & H. W. Baidu (Eds.), Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 1501–1511). Baltimore, MD.

Crossley, S. A. (2013). Advancing research in second language writing through computational tools and machine learning techniques: A research agenda. Language Teaching, 46(2), 256–271.

Crossley, S. A., Allen, L. K., Kyle, K., & McNamara, D. S. (2014). Analyzing discourse processing using a simple natural language processing tool (SiNLP). Discourse Processes, 51, 511–534.

Crossley, S. A., Kyle, K., & McNamara, D. S. (2015). To aggregate or not? Linguistic features in automatic essay scoring and feedback systems. Journal of Writing Assessment, 8(1). http://www.journalofwritingassessment.org/article.php?article=80

Crossley, S. A. Kyle, K., & McNamara, D. S. (in press). Tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods.

Crossley, S. A., & Louwerse, M. (2007). Multi-dimensional register classification using bigrams. International Journal of Corpus Linguistics, 12(4), 453–478.

Crossley, S. A., Louwerse, M., McCarthy, P. M., & McNamara, D. S. (2007). A linguistic analysis of simplified and authentic texts. Modern Language Journal, 91, 15–30.

Crossley, S. A., & McNamara, D. S. (2012). Interlanguage talk: A computational analysis of non-native speakers’ lexical production and exposure. In P. M. McCarthy & C. Boonthum-Denecke (Eds.), Applied natural language processing and content analysis: Identification, investigation, and resolution (pp. 425–437). Hershey, PA: IGI Global.

Crossley, S. A., McNamara, D. S., Baker, R., Wang, Y., Paquette, L., Barnes, T., & Bergner, Y. (2015). Language to completion: Success in an educational data mining massive open online class. In O. C. Santos, J. G. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, & M. Desmarais (Eds.), Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015), 26–29 June 2015, Madrid, Spain (pp. 388–391). International Educational Data Mining Society.

Dascalu, M. (2014). Analyzing discourse and text complexity for learning and collaborating. Studies in Computational Intelligence (Vol. 534). Switzerland: Springer.

Dascalu, M., Stavarache, L. L., Dessus, P., Trausan-Matu, S., McNamara, D. S., & Bianco, M. (2015). ReaderBench: The learning companion. In A. Mitrovic, F. Verdejo, C. Conati, & N. Heffernan (Eds.), Proceedings of the 17th International Conference on Artificial Intelligence in Education (AIED ʼ15), 22–26 June 2015, Madrid, Spain (pp. 915–916). Springer.

Dascalu, M., Trausan-Matu, S., Dessus, P., & McNamara, D. S. (2015a). Discourse cohesion: A signature of collaboration. In P. Blikstein, A. Merceron, & G. Siemens (Eds.), Proceedings of the 5th International Learning Analytics & Knowledge Conference (LAK ʼ15), 16–20 March, Poughkeepsie, NY, USA (pp. 350–354). New York: ACM.

Dascalu, M., Trausan-Matu, S., Dessus, P., & McNamara, D. S. (2015b). Dialogism: A framework for CSCL and a signature of collaboration. In O. Lindwall, P. Häkkinen, T. Koschmann, P. Tchounikine, & S. Ludvigsen (Eds.), Proceedings of the 11th International Conference on Computer-Supported Collaborative Learning (CSCL 2015), 7–11 June 2015, Gothenburg, Sweden (pp. 86–93). International Society of the Learning Sciences.

Dascalu, M., Trausan-Matu, S., McNamara, D. S., & Dessus, P. (2015). ReaderBench: Automated evaluation of collaboration based on cohesion and dialogism. International Journal of Computer-Supported Collaborative Learning, 10(4), 395–423.

Dascalu, M., McNamara, D. S., Crossley, S. A., & Trausan-Matu, S. (2016). Age of exposure: A model of word learning. Proceedings of the 30th Conference on Artificial Intelligence (AAAI-16), 12–17 February 2016, Phoenix, Arizona, USA (pp. 2928–2934). Palo Alto, CA: AAAI Press.

Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment, 5(1). http://files.eric.ed.gov/fulltext/EJ843855.pdf

Dong, A. (2005). The latent semantic approach to studying design team communication. Design Studies, 26(5), 445–461.

Duran, N. D., Hall, C., McCarthy, P. M., & McNamara, D. S. (2010). The linguistic correlates of conversational deception: Comparing natural language processing technologies. Applied Psycholinguistics, 31(3), 439–462.

Elouazizi, N. (2014). Point-of-view mining and cognitive presence in MOOCs: A (computational) linguistics perspective. EMNLP 2014, 32. http://www.aclweb.org/anthology/W14-4105

Graesser, A. C. (in press). Conversations with AutoTutor help students learn. International Journal of Artificial Intelligence in Education.

Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H. H., Ventura, M., Olney, A., & Louwerse, M. M. (2004). AutoTutor: A tutor with dialogue in natural language. Behavior Research Methods, Instruments, & Computers, 36(2), 180–192.

Graesser, A. C., McNamara, D. S., & Kulikowich, J. M. (2011). Coh Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40, 223–234.

Graesser, A. C., McNamara, D. S., & VanLehn, K. (2005). Scaffolding deep comprehension strategies through Point & Query, AutoTutor, and iSTART. Educational Psychologist, 40, 225–234.

Graesser, A. C., & Person, N. K. (1994). Question asking during tutoring. American Educational Research Journal, 31(1), 104–137.

Jackson, G. T., Allen, L. K., & McNamara, D. S. (2016). Common Core TERA: Text Ease and Readability Assessor. In S. A. Crossley & D. S. McNamara (Eds.), Adaptive educational technologies for literacy instruction. New York: Taylor & Francis, Routledge.

Jackson, G. T., Guess, R. H., & McNamara, D. S. (2010). Assessing cognitively complex strategy use in an untrained domain. Topics in Cognitive Science, 2, 127–137.

Jarvis, S., Bestgen, Y., Crossley, S. A., Granger, S., Paquot, M., Thewissen, J., & McNamara, D. S. (2012). The comparative and combined contributions of n-grams, Coh-Metrix indices and error types in the L1 classification of learner texts. In S. Jarvis & S. A. Crossley (Eds.), Approaching language transfer through text classification: Explorations in the detection-based approach (pp. 154–177). Bristol, UK: Multilingual Matters.

Johnson-Glenberg, M. C. (2007). Web-based reading comprehension instruction: Three studies of 3D-readers. In D. McNamara (Ed.), Reading comprehension strategies: Theory, interventions, and technologies (pp. 293–324). Mahwah, NJ: Lawrence Erlbaum Publishers.

Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Upper Saddle River, NJ: Prentice Hall.

Jurafsky, D., & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition, 2nd ed. Upper Saddle River, NJ: Prentice Hall.

Koller, D., Ng, A., Do, C., & Chen, Z. (2013). Retention and intention in massive open online courses: In depth. Educause Review, 48(3), 62–63.

Koschmann, T. (1999). Toward a dialogic theory of learning: Bakhtin’s contribution to understanding learning in settings of collaboration. In C. M. Hoadley & J. Roschelle (Eds.), Proceedings of the 1999 Conference on Computer Support for Collaborative Learning (CSCL ’99), 12–15 December 1999, Palo Alto, California (pp. 308–313). International Society of the Learning Sciences.

Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757–786.

Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.

Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the Intelligent Essay Assessor. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 87–112). Mahwah, NJ: Lawrence Erlbaum Associates.

Landauer, T., McNamara, D. S., Dennis, S., & Kintsch, W. (Eds.). (2007). Handbook of latent semantic analysis. Mahwah, NJ: Lawrence Erlbaum Associates.

Landauer, T. K., Kireyev, K., & Panaccione, C. (2011). Word maturity: A new metric for word knowledge. Scientific Studies of Reading, 15(1), 92–108.

Louwerse, M. M., McCarthy, P. M., McNamara, D. S., & Graesser, A. C. (2004). Variation in language and cohesion across written and spoken registers. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Conference of the Cognitive Science Society (CogSci 2004), 4–7 August 2004, Chicago, IL, USA (pp. 843–848). Mahwah, NJ: Lawrence Erlbaum Associates.

McCarthy, P. M., Briner, S. W., Rus, V., & McNamara, D. S. (2007). Textual signatures: Identifying text-types using latent semantic analysis to measure the cohesion of text structures. In A. Kao & S. Poteet (Eds.), Natural language processing and text mining (pp. 107–122). London: Springer-Verlag UK.

McKeown, M. G., Beck, I. L., & Blake, R. G. K. (2009). Rethinking reading comprehension instruction: A comparison of instruction for strategies and content approaches. Reading Research Quarterly, 44, 218–253.

McNamara, D. S. (2011). Computational methods to extract meaning from text and advance theories of human cognition. Topics in Cognitive Science, 2, 1–15.

McNamara, D. S., Boonthum, C., Levinstein, I. B., & Millis, K. (2007). Evaluating self-explanations in iSTART: Comparing word-based and LSA algorithms. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 227–241). Mahwah, NJ: Lawrence Erlbaum Associates.

McNamara, D. S., Crossley, S. A., & Roscoe, R. D. (2013). Natural language processing in an intelligent writing strategy tutoring system. Behavior Research Methods, 45, 499–515.

McNamara, D. S., Graesser, A. C., McCarthy, P., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge, UK: Cambridge University Press.

McNamara, D. S., & Kintsch, W. (1996). Learning from text: Effects of prior knowledge and text coherence. Discourse Processes, 22, 247–288.

McNamara, D. S., Levinstein, I. B., & Boonthum, C. (2004). iSTART: Interactive strategy trainer for active reading and thinking. Behavioral Research Methods, Instruments, & Computers, 36, 222–233.

McNamara, D. S., Ozuru, Y., Graesser, A. C., & Louwerse, M. (2006). Validating Coh-Metrix. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006), 26–29 July 2006, Vancouver, British Columbia, Canada (pp. 573–578). Austin, TX: Cognitive Science Society.

McNamara, D. S., Raine, R., Roscoe, R., Crossley, S. A, Jackson, G. T., Dai, J., Cai, Z., Renner, A., Brandon, R., Weston, J., Dempsey, K., Carney, D., Sullivan, S., Kim, L., Rus, V., Floyd, R., McCarthy, P. M., & Graesser, A. C. (2012). The Writing-Pal: Natural language algorithms to support intelligent tutoring on writing strategies. In P. M. McCarthy & C. Boonthum-Denecke (Eds.), Applied natural language processing and content analysis: Identification, investigation, and resolution (pp. 298–311). Hershey, PA: IGI Global.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representation in vector space. In Workshop at ICLR. Scottsdale, AZ. https://arxiv.org/abs/1301.3781

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–244.

Moon, S., Potdar, S., & Martin, L. (2014). Identifying student leaders from MOOC discussion forums through language influence. EMNLP 2014, 15. http://www.aclweb.org/anthology/W14-4103

Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). Linguistic inquiry and word count: LIWC [Computer software]. Austin, TX: liwc.net.

Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. UT Faculty/Researcher Works. https://repositories.lib.utexas.edu/bitstream/handle/2152/31333/LIWC2015_LanguageManual.pdf?sequence=3

Roscoe, R. D., & McNamara, D. S. (2013). Writing Pal: Feasibility of an intelligent writing strategy tutor in the high school classroom. Journal of Educational Psychology, 105, 1010–1025.

Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. The Journal of Technology, Learning and Assessment, 4(4). https://ejournals.bc.edu/ojs/index.php/jtla/article/download/1651/1493

Seaton, D. T., Bergner, Y., Chuang, I., Mitros, P., & Pritchard, D. E. (2014). Who does what in a massive open online course? Communications of the ACM, 57(4), 58–65.

Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K. (2010). Automated essay scoring: Writing assessment and instruction. International Encyclopedia of Education, 4, 20–26.

Stahl, G. (2006). Group cognition: Computer support for building collaborative knowledge (pp. 451–473). Cambridge, MA: MIT Press.

Teplovs, C. (2008). The knowledge space visualizer: A tool for visualizing online discourse. In G. Kanselaar, V. Jonker, P. A. Kirschner, & F. Prins (Eds.), Proceedings of the International Society of the Learning Sciences 2008: Cre8 a learning world. Utrecht, NL: International Society of the Learning Sciences. http://chris.ikit.org/ksv2.pdf

Trausan-Matu, S., Rebedea, T., Dragan, A., & Alexandru, C. (2007). Visualisation of learners’ contributions in chat conversations. In J. Fong & F. L. Wang (Eds.), Blended learning (pp. 217–226). Singapore: Pearson/Prentice Hall.

Trausan-Matu, S., Stahl, G., & Sarmiento, J. (2007). Supporting polyphonic collaborative learning. Indiana University Press, E-service Journal, 6(1), 58–74.

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–460. http://www.loebner.net/Prizef/TuringArticle.html

Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education: Research, 2(1), 319–330.

Varner, L. K., Roscoe, R. D., & McNamara, D. S. (2013). Evaluative misalignment of 10th-grade student and teacher criteria for essay quality: An automated textual analysis. Journal of Writing Research, 5, 35–59.

Weigle, S. C. (2013). English as a second language writing and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 36–54). London: Routledge.

Wen, M., Yang, D., & Rosé, C. P. (2014a). Linguistic reflections of student engagement in massive open online courses. In E. Adar & P. Resnick (Eds.), Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM ’14), 1–4 June 2014, Ann Arbor, Michigan, USA. Palo Alto, CA: AAAI Press. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/viewFile/8057/8153

Wen, M., Yang, D., & Rosé, C. P. (2014b). Sentiment analysis in MOOC discussion forums: What does it tell us? In J. Stamper, Z. Pardos, M. Mavrikis, & B. M. McLaren (Eds.), Proceedings of the 7th International Conference on Educational Data Mining (EDM2014), 4–7 July, London, UK (pp. 185–192). International Educational Data Mining Society.

Wilson, M. D. (1988). The MRC psycholinguistic database: Machine-readable dictionary (Version 2). Behavioral Research Methods, Instruments, and Computers, 201, 6–11.

Wong, B. Y. L. (1985). Self-questioning instructional research: A review. Review of Educational Research, 55, 227–268.

Xi, X. (2010). Automated scoring and feedback systems: Where are we and where are we heading? Language Testing, 27(3), 291–300.

About this Chapter

Title
Natural Language Processing and Learning Analytics

Book Title
Handbook of Learning Analytics

Pages
pp. 93-104

Copyright
2017

DOI
10.18608/hla17.008

ISBN
978-0-9952408-0-3

Publisher
Society for Learning Analytics Research

Authors
Danielle S. McNamara ¹
Laura K. Allen¹
Scott A. Crossley²
Mihai Dascalu³
Cecile A. Perret⁴

Author Affiliations
1. Psychology Department, Arizona State University, USA
2. Applied Linguistics and ESL Department, Georgia State University, USA
3. Computer Science Department, University Politehnica of Bucharest, Romania
4. Institute for the Science of Teaching and Learning, Arizona State University, USA

Editors
Charles Lang⁵
George Siemens⁶
Alyssa Wise⁷
Dragan Gašević⁸

Editor Affiliations
5. Teachers College, Columbia University, USA
6. LINK Research Lab, University of Texas at Arlington, USA
7. Learning Analytics Research Network, New York University, USA
8. Schools of Education and Informatics, University of Edinburgh, UK

Handbook of Learning Analytics

Natural Language Processing
and Learning Analytics

Abstract

Become a Member

Signup for Our Newsletter