Utility-Value Score: A Case Study in System Generalization for Writing Analytics

  • Beata Beigman Klebanov Educational Testing Service
  • Stacy Priniski University of Wisconsin–Madison
  • Jill Burstein Educational Testing Service
  • Binod Gyawali Educational Testing Service
  • Judith Harackiewicz University of Wisconsin–Madison
  • Dustin Thoman San Diego State University
Keywords: automated writing evaluation, data variability, first-year STEM, model evaluation, model generalization, STEM motivation. student writing, utility value, writing analytics


Collection and analysis of students’ writing samples on a large scale is a part of the research agenda of the emerging writing analytics community that promises to deliver an unprecedented insight into characteristics of student writing. Yet with a large scale often comes variability of contexts in which the samples were produced—different institutions, different purposes of writing, different author demographics, to name just a few possible dimensions of variation. What are the implications of such variation for the ability of automated methods to create indices/features based on the writing samples that would be valid and meaningful? This paper presents a case study in system generalization. Building on a system developed to assess the expression of utility value (a social-psychology-based construct) in essays written by first-year biology students at one postsecondary institution, we vary data parameters and observe system performance. From the point of view of social psychology, all these variants represent the same underlying construct (i.e., utility value), and it is thus very tempting to think that an automatically produced utility-value score could provide a meaningful analytic, consistently, on a large collection of essays. However, findings from this research show that there are challenges: Some variations are easier to deal with than others, and some components of the automated system generalize better than others. The findings are then discussed both in the context of the case study and more generally.


Beigman Klebanov, B., Burstein, J., Harackiewicz, J., Priniski, S., & Mulholland, M. (2016). Enhancing STEM motivation through personal and communal values: NLP for assessment of utility value in student writing. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, San Diego, CA.

Beigman Klebanov, B., Burstein, J., Harackiewicz, J., Priniski, S., & Mulholland, M. (2017). Reflective writing about the utility value of science as a tool for increasing STEM motivation and retention – can AI help scale up? International Journal of Artificial Intelligence in Education, 27(4), 791–818.

Blanchard, D., Heilman, M., & Madnani, N. (2013). SciKit-Learn Laboratory. GitHub repository, https://github.com/EducationalTestingService/skll.

Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

Canning, E., Harackiewicz, J., Priniski, S., Hecht, C., Tibbetts, Y., & Hyde, J. (2018). Improving performance and retention in introductory biology with a utility-value intervention. Journal of Educational Psychology, 110(6), 834–849.

Durik, A. & Harackiewicz, J. (2007). Different strokes for different folks: How personal interest moderates the effects of situational factors on task interest. Journal of Educational Psychology, 99, 597–610.

Eccles, J., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology, 53, 109–132.

Harackiewicz, J., Canning, E., Tibbetts, Y., Priniski, S., & Hyde, J. (2016). Closing achievement gaps with a utility-value intervention: Disentangling race and social class. Journal of Personality and Social Psychology, 111, 745–765.

Hidi, S. & Harackiewicz, J. (2000). Motivating the academically unmotivated: A critical issue for the 21st century. Review of Educational Research, 70, 151–179.

Hulleman, C., Godes, O., Hendricks, B., & Harackiewicz, J. (2010). Enhancing interest and performance with a utility value intervention. Journal of Educational Psychology, 102, 880–895.

Hulleman, C. & Harackiewicz, J. (2009). Promoting interest and performance in high school science classes. Science, 326, 1410–1412.

Hulleman, C., Kosovich, J., Barron, K., & Daniel, D. (2017). Making connections: Replicating and extending the utility value intervention in the classroom. Journal of Educational Psychology, 109(3), 387–404.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Pennebaker, J. W., Booth, R. J., Boyd, R. L., & Francis, M. E. (2015). Linguistic Inquiry and Word Count: LIWC2015. Austin, TX: Pennebaker Conglomerates (www.LIWC.net).