In this HIT you will be presented with a short system generation. Usually the generation will be
only a single sentence. Your job is to rate the generation across 2
axes:
- Fluency/Grammaticality: Is the system's
generation grammatical, easy-to-read, and
fluent?
- Commonsense: Is the system's generation
describing a plausible, realistic, and
commonsensical scenario?
You will be able to rate each of the three axes on a scale from 1
to 5, with 1 being the lowest/worst and
5 the highest/best. The specific scales
are:
-
Fluency/Grammaticality:
- 5/5 (excellent): The generation
is grammatical and fluent.
- 4/5 (good): The sentence largely makes sense, but there are
some small grammar issues/out-of-place words that don't make
for the best writing.
- 3/5 (okay): The grammar is okay and it's possible to read,
but it definitely doesn't sound like a human wrote it.
- 2/5 (poor): Even though I can kind-of tell the meaning,
it's difficult to read this unnatural sentence.
- 1/5 (terrible): The generation
has severe errors in grammaticality/is almost or completely
unreadable.
-
Commonsense
- 5/5 (this is
reasonable+plausible!): This describes a very
coherent/plausible/reasonable situation.
- 4/5 (mostly reasonable): This could reasonably happen.
- 3/5 (neutral): This situation might happen, but it's not
that likely/it's a bit weird.
- 2/5 (mostly unreasonable): Most of what's expressed here
couldn't happen at all.
- 1/5 (this wouldn't happen!):
This is impossible/nonsensical.
Note: for rating fluency/grammaticality, don't worry about the
commonsense axis! There can be grammatical sentences that are
nonsensical, and vice versa (see the examples).