90. Who do you write like? The limitations of literary analysis tools

Who can resist a personality test or a fortune teller? Writers are no exception. The online tool I Write Like promises to tell you which famous author your style most resembles. The tool works by Bayesian analysis, much like a spam filter on your e-mail. I tried it and was rewarded with the answer that I wrote like Vladimir Nabakov.

gypsy-2

Then I tried it again. And again. Seventeen times with seventeen different stories. I got thirteen answers ranging from the flattering Tolstoy to the surprising Stephanie Meyer. Five times I got James Joyce. So I started to wonder what the tool was really responding to and analysed the five “Joycean” stories in more detail.

The stories didn’t have genre in common. Two were literary, one was humour, one a thriller and the final one a psychological flash story. Did they then have some stylistic similarity? I used the Hemingway app which measures lexical complexity and assigns a readability score. They averaged grade 4.2, but varied widely from grade 7 (more complex) to grade 2 (less complex).

They also had a higher average “lexical density” than is typical for fiction or than the non-“Joycean” stories. Lexical density is a measure of how many words in a text carry information (nouns, adjectives, verbs and adverbs) compared with non-informative grammatical words (such as articles, conjunctions, prepositions). Fiction typically has a lexical density of between 49% and 51%. Only 20% of the five “Joycean” stories fell within this range, but only 25% of the non-“Joycean” stories did either.

To check that the I Write Like website wasn’t just throwing out random names, I fed it the same texts on two different days. It gave the same answers, so it is measuring something. Then I ran the obvious test – I fed the tool text from James Joyce’s Portrait of the Artist as a Young Man. It identified this as being like Agatha Christie! To be fair, it identified Tolstoy’s Anna Karenina as being like Tolstoy.

In the course of this chase, I also looked at another literary analysis tool, the Online Authorship Attribution Tool. This is similar to the tools used by Universities to detect plagiarism in student essays. It compares three features of an unidentified text with known samples: the use of function words, such as “and” and “the” which are independent of content; punctuation; and lexical structure such as sentence length, word length and complexity of vocabulary. This tool failed to identify any of my “Joycean” stories as being like James Joyce. However, it also failed to identify chapter five of Portrait of the Artist as being by Joyce either.

I write like badge

I looked more closely at my five “Joycean” stories. They did have one thing in common – they all contained an extended monologue or internal dialogue. I have no way of knowing for sure whether this was what the tool was detecting. But it made sense, more sense than the idea that I have thirteen different styles. The story that was like Stephanie Meyer (author of the vampire romance Twilight series) was sci-fi and contained a hunt.

The moral of the investigation is: use these toys for fun by all means, but don’t take their readings any more seriously than you would a personality test or a horoscope in a magazine.

3 thoughts on “90. Who do you write like? The limitations of literary analysis tools

Leave a comment