Tien Chiu

  • Home
  • About
    • Honors, Awards, and Publications
  • Online Teaching
  • Gallery
  • Essays
  • Book
  • Blog
  • Dye samples
You are here: Home / All blog posts / Scanning success!
Previous post: Site outage and temples
Next post: Help fund a fellow weaver

June 21, 2011 by Tien Chiu

Scanning success!

Yesterday I “scanned” two entire magazines – Prairie Wool Companion #16 and Weaver’s #44.  The OCR software, while not perfect, is pretty darned good!  Here are “before” and “after” shots of a small section with no weaving content (which I think is minor enough to fall under “fair use”):

original text for 1/2 page
original text for 1/2 page
The same page, text-recognition only
The same page, text-recognition only

As you can see, it’s not perfect – there are a few misspellings and a few cropped photos – but it’s good enough to use as a searchable PDF, and because there’s an option to save it as a PDF/A file type, I’m not worried about OCR errors making it unreadable.  PDF/A saves the original image PLUS the searchable text, all on the same page.  (The searchable version is overlaid on the original, but invisible, so all you see is the image with whatever you searched for highlighted.)  By reducing the image quality slightly, I halved the file size, resulting in a 25 MB file for the entirety of Weaver’s #44 – making it small enough to place in Evernote, my online note-taking software.  (The advantage of putting it in Evernote is that it means I can search all my magazine issues at once.)

Here is how I did it:

  • I set up a camera on a copy stand (which is basically a device to hold the camera so that the lens points directly downwards.  Think of it as a tripod, except vertical instead of horizontal).
  • I put an open magazine underneath the camera, and positioned both camera and magazine so the two visible pages occupied the entire field of view.
  • I placed a pane of double-strength window glass over the magazine to flatten the pages.
  • 120W halogen floodlights provided the light, placed on either side, at a shallow angle to avoid reflection from the glass.
  • I used a remote control to trigger the camera, so I wouldn’t jostle the camera body (and blur the photo) while pressing the button.
  • The resulting images were loaded onto the computer, and processed (an entire issue’s worth at once) using Abbyy Finereader 10.0, the best consumer-affordable OCR software available.  Processing the images hogged the CPU for about 20 minutes for each issue – definitely intensive work!
  • I then saved the document in PDF/A format, with image quality set to medium (screen quality, not print quality).
  • Voila!  A searchable .pdf of an entire magazine issue.

There was quite a bit of camera-fiddling to make sure I had the exposure and focus right – and I think I will go back and fiddle some more before doing the entire “run” of magazines.  At the moment, the images are  yellower than I’d like, I’m pretty sure there’s a camera setting to compensate.

There are a few other things that I’m considering doing, like going through each magazine to “clean up” the text-recognition.  Not to correct misspellings – I haven’t got the time or patience to proofread every single page – but to delete advertisements, etc. that might result in extraneous words.  For example, when I search for “temple”, I don’t want to see all the ads that contain the word “temple” (there might be a lot of them), I want to see articles that contain the word.  Etc.  So the cleanup will take time, but is worthwhile, I think.

Meanwhile, of course, I continue to crank away on my new job (first day went well, lots of information to digest), and will work some more on the weaving later this morning.  I need to finish debugging the warp, and then I can start weaving samples and taking measurements, so I can knit up a blank accurately.

All in good time…!

Share this post!

  • Tweet
  • More
  • Email
  • Share on Tumblr
  • Print

Filed Under: All blog posts, textiles, weaving

Previous post: Site outage and temples
Next post: Help fund a fellow weaver

Comments

  1. terri says

    June 21, 2011 at 2:52 pm

    Glad to hear your first day went well–enjoy the new job!

  2. Julie Sohns says

    June 22, 2011 at 5:24 am

    The camera setting you want to check for color correction is White Balance. If you set the White Balance to match the type of lights you are using you should lose the yellow color cast.

Subscribe to Blog via Email

Information resources

  • Dye samples
    • Procion MX fiber-reactive dye samples on cotton
    • How to "read" the dye sample sets
    • Dye sample strategy - the "Cube" method
  • How-Tos
    • Dyeing and surface design
    • Weaving
    • Designing handwoven cloth
    • Sewing

Blog posts

  • All blog posts
    • food
      • chocolate
    • musings
    • textiles
      • dyeing
      • knitting
      • sewing
      • surface design
      • weaving
    • writing

Archives

Photos from my travels

  • Dye samples
    • Procion MX fiber-reactive dye samples on cotton
    • How to "read" the dye sample sets
    • Dye sample strategy - the "Cube" method
  • Travels
    • Thailand
    • Cambodia
    • Vietnam
    • Laos
    • India
    • Ghana
    • China

Travel Blog

Entertaining miscellanies

© Copyright 2016 Tien Chiu · All Rights Reserved ·

 

Loading Comments...