TWAIN-L Archives

Mark Twain Forum

TWAIN-L@YORKU.CA

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Clay Shannon <[log in to unmask]>
Reply To:
Clay Shannon <[log in to unmask]>
Date:
Wed, 24 Jun 2020 01:02:33 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (8 lines)
Twainians and Twainiacs, lend me your peepers!
I have written a computer program which compares two documents to determine the likelihood of them having the same author. Among many other pairs of writings (MLK, Malcolm X, Ted Kaczynski, etc.), I compared "The Adventures of Tom Sawyer" with "Adventures of Huckleberry Finn." 
The PDF report that the program generates gives statistics about average length of sentences and words in both books it is analyzing, frequency of usage of various symbols (from commas to @), phrases (of ten letters or more) that both documents have in common ("Tom" and "Huck" have 4900 such phrases in common!), and uncommon* words used in both books, with counts and percentages.
I would be happy to email anybody here that is interested the PDF report that my app generated. Lector emptor: it is 385 pages long.
* In this case, the definition of "uncommmon" is any word other than the 3,000 most-used English words.

- B. Clay Shannon

ATOM RSS1 RSS2