Thank you for your participation in this and other similar HITS!

Please take a moment to familiarize yourself with this new HIT by reading the instructions/examples, because things have changed a bit. Thanks again for your work!

In this HIT you will be presented with a dialogue consisting of a conversation between two people. You will also be given a system generation, which aims to contains the next line of the conversation. Your job is to rate the system generation, across 2 axes:

  • Fluency/Grammaticality: Is the system's generation grammatical, easy-to-read, and fluent?
  • Quality/Coherence: Is the next utterance coherent, reasonable, and the type of thing a person might say, in the context of the dialogue history?

You will rate each of the two axes on a scale from 1 to 5, with 1 being the lowest/worst and 5 the highest/best. The specific scales are:

  • Fluency/Grammaticality:
    • 5/5 (excellent): The generation is grammatical and fluent.
    • 4/5 (good): The sentence largely makes sense, but there are some small grammar issues/out-of-place words that don't make for the best writing.
    • 3/5 (okay): The grammar is okay and it's possible to read, but it definitely doesn't sound like a human wrote it.
    • 2/5 (poor): Even though I can kind-of tell the meaning, it's difficult to read this unnatural sentence.
    • 1/5 (terrible): The generation has severe errors in grammaticality/is almost or completely unreadable.
  • Quality/Coherence:
    • 5/5 (perfectly coherent, interesting): The generated next utterance is very relevant and coherent with the dialogue context; a human might say this.
    • 4/5 (mostly relevant): The generation is relevant but not perfect given the dialogue context.
    • 3/5 (neutral): The generation is somewhat plausible/relevant.
    • 2/5 (mostly irrelevant): I see why this could be generated but it doesn't make much sense.
    • 1/5 (wrong/nonsense/irrelevant): The generation doesn't seem to apply to the dialogue at all or doesn't make any sense.
Dialogue context (read me first!):
${prompt}
System's generation (rate this!):
${machine_completion}

Is the system's generation grammatical, easy-to-read, and fluent?

The grammar is okay and it's possible to read, but it definitely doesn't sound like a human wrote it.

Is the next utterance coherent, reasonable, and the type of thing a person might say, in the context of the dialogue history ?

The generation is somewhat plausible/relevant.

(Optional) Please let us know if anything was unclear, if you experienced any issues, or if you have any other feedback for us.