-
Notifications
You must be signed in to change notification settings - Fork 2
Pilot test images
Official pilot test images from the East Asian Library collections should go on the AWS server in a sub-folder of the /var/www/html/pilot_images/ directory. This way we can view them from the web (example) and they will be available to sites running on the Django framework as well as Tesseract OCR. Here are the steps necessary to upload the images and make them accessible. It's best to do this via the Terminal shell on Mac OS/Linux or a terminal-like environment (probably PowerShell?) on Windows.
- Copy the images from your camera/phone to a local folder on your computer, e.g., /home/pete/test_photos
- Log in to the AWS server using the instructions on this wiki page.
- On the AWS server, go to the pilot_images/ directory: $ cd /var/www/html/pilot_images
- On the AWS server, create a folder to house your images, with a descriptive name: $ mkdir nexus5x
- On your local machine, copy the files over to the target directory on the AWS server via SSH copy (scp): $ scp -i LOCATION_OF_ccing.pem_FILE /home/pete/test_photos/* [email protected]:/var/www/html/pilot_images/nexus5x/.
- On the AWS server, make sure the images are world-readable: $ chmod -R 755 /var/www/html/pilot_images/nexus5x
- Open a web browser and make sure you can view the images at the expected URL: http://ec2-54-173-153-28.compute-1.amazonaws.com/pilot_images/nexus5x/
We now have 2-3 collections of ~100 books each, some already with UCLA Library barcodes and some not, that we can use for official an pilot test, i.e., taking pictures of them and their associated barcodes, using the barcodes to rename the images, and then uploading these images to Scribe. Details are available on this Google doc.
Until we get actual images of book covers and title pages from the catalogers, we can use one of the Internet Archive’s extensive collections of book cover images for testing -- see for example https://archive.org/search.php?query=book%20covers.
Update as of 3/30/17 One likely workflow for CCing is that librarians will drop the book page images into a web folder somewhere, maybe hosted by Box (and perhaps with other metadata in different folders and files), and then provide this link to a server-side app via a basic interface that they can use to kick off the process of "ingesting" the images into CCing's OCR->Scribe workflow.
Initial testing of this workflow with Box ran into problems because it's not clear how to get direct download links of the full-resolution versions of the cover images from Box. So for now, we've installed an Apache web server on the AWS machine and are serving the test images from there. The actual location of the images on the server is in /var/www/html/test_images. They are accessible via the web at URLs like the following:
A bunch of book covers from part 9 of the Internet Archive's "Amazon covers crawl":
http://ec2-54-173-153-28.compute-1.amazonaws.com/test_images/
Some old-timey book covers that we tried to serve from Box:
http://ec2-54-173-153-28.compute-1.amazonaws.com/test_images/old_sample/
A bunch more images from Amazon, via the IA:
http://ec2-54-173-153-28.compute-1.amazonaws.com/test_images/amazon/