Hi Heather-
I use wkhtmltopdf[1] in our web crawling applications to capture a PDF
of a given page, as well as a screenshot.
If you had a list of all the pages you need to create PDFs of, you could
just iterate through it in a loop in a simple bash script just like we
do here[2].
Also, if it is the content that needs to be captured and there is a
chance you'll need some of the functionality, it might be worth
considering creating WARCs of each page.
-nruest
[1] http://wkhtmltopdf.org/
[2] https://github.com/ruebot/arxivdaleascii/blob/master/arxivdaleascii
On 14-11-10 02:31 PM, Heather Ryckman wrote:
> Apologies for cross-postings.
>
> Happy Monday all!
>
> Does anyone have any experience with any software that can reliably convert multiple HTML pages to a single PDF document? We have a bunch of online manuals that were created where each page was an html page. These manuals need to come to the Archives but conversion to PDF, one page at a time, is extremely time-consuming. Most of these manuals are hundreds of pages a-piece.
>
> Has anyone else encountered a similar problem and have any recommendations on how to resolve the problem in an efficient way.
>
> PS – The pages don’t have any functionality that needs to be preserved; it’s just the content that needs to be captured.
>
> Thanks for any assistance that you can provide. Have a wonderful day!
>
> Heather Ryckman | Archivist
> Legal | The Co-operators
> 130 Macdonell Street, Guelph, ON N1H 6P8
> Tel: 519-824-4400 ext. 302798 / Toll Free: 1-800-265-2662
> [log in to unmask]
> www.cooperators.ca
>
>
> Please consider the environment before printing this message.
> This message, including any documents attached, may contain privileged and confidential information intended for the recipient only. Any unauthorized use, copying or disclosure is prohibited. If you have received this message in error, please notify the sender by email and delete or destroy all copies of this message. We use reasonable safeguards to protect all information collected, used, retained and disclosed in the course of conducting business; however, email may be vulnerable to interception by unauthorized parties. We discourage you from emailing personal or sensitive information. If you provided your email to us, or if you contacted us by email, we accept this as your consent to communicate with you by email. If you do not wish for us to communicate with you by email, please notify us at your earliest convenience.
>
> Avant d’imprimer ce message, pensez à l’environnement.
> Ce message, y compris tout document qui y est annexé, peut contenir des renseignements confidentiels et privés destinés exclusivement à son destinataire. Toute utilisation, copie ou divulgation non autorisée est strictement interdite. Si vous avez reçu ce message par erreur, veuillez en aviser immédiatement l’expéditeur par courriel et détruire ou supprimer toutes les copies existantes de ce message. Nous prenons des mesures de protection raisonnables pour protéger toute information recueillie, utilisée, conservée et divulguée dans le cadre de nos affaires; cependant, un courriel est susceptible d’être intercepté par des tiers non autorisés. Nous vous déconseillons de communiquer par courriel des renseignements personnels ou sensibles. Si vous nous avez fourni votre adresse courriel ou que vous nous avez envoyé un courriel, nous tenons pour acquis que vous acceptez de communiquer avec nous par courrier électronique. Nous vous déconseillons de commun!
iquer par
courriel des renseignements personnels ou sensibles.
>
====================================
|