Content-Transfer-Encoding: |
quoted-printable |
Sender: |
|
Subject: |
|
From: |
|
Date: |
Wed, 24 Jun 2020 01:02:33 +0000 |
Content-Type: |
text/plain; charset=UTF-8 |
MIME-Version: |
1.0 |
Reply-To: |
|
Parts/Attachments: |
|
|
Twainians and Twainiacs, lend me your peepers!
I have written a computer program which compares two documents to determine the likelihood of them having the same author. Among many other pairs of writings (MLK, Malcolm X, Ted Kaczynski, etc.), I compared "The Adventures of Tom Sawyer" with "Adventures of Huckleberry Finn."
The PDF report that the program generates gives statistics about average length of sentences and words in both books it is analyzing, frequency of usage of various symbols (from commas to @), phrases (of ten letters or more) that both documents have in common ("Tom" and "Huck" have 4900 such phrases in common!), and uncommon* words used in both books, with counts and percentages.
I would be happy to email anybody here that is interested the PDF report that my app generated. Lector emptor: it is 385 pages long.
* In this case, the definition of "uncommmon" is any word other than the 3,000 most-used English words.
- B. Clay Shannon
|
|
|