The purpose of an artifact evaluation is to determine whether a piece of software constructed and described in research is reusable and whether the research results are reproducible with the artifact. Reusability includes accessibility and extensibility.

  • Accessibility is availability of the program, availability of code, ease of installation, platform independence, documentation, and the ability to follow simple demonstrative examples.

  • Extensibility is the ability for the evaluator to construct new examples of her own devising and run them with the expected behavior.

  • Reproducibility is the ability to reproduce the results in the paper(s) describing the system, i.e. to use information in the paper to reconstruct the same findings.

This website will list generators alongside an evaluation of their accessibility and extensibility.

Evaluation Cases

Currently, there are no agreed-upon benchmark tasks for evaluation story generators. Candidates include:

  1. The generation of a 50,000 word (or longer) story with author-supplied parameters (e.g. plot structure, characters, initial relationships, and/or final story goals).
  2. A Turing-test of short stories, as exemplified by DigiLit, organized by Dartmouth's Neukom Institute.