Tuesday, May 30, 2006

Scan to PDF in Ubuntu, with Beagle Searchable Meta Data

This bash script lets you scan directly to a PDF and then search your scanned PDF's using beagle, not just by file name, but by the information (meta data), that you can save with your PDF.

I developed this script because I really want to have a paperless desk and I could not find an easy way to scan documents to PDF, (and find them again!).

A picture's worth a thousand words...

I have a "launcher" on my desktop called "scan", I just click it and....

Select colour or Grey (color or Gray) for the Americans!





Then press OK for each page of the PDF you want to make. (Cancel to finish)



Then give the PDF a file name... (jam recipe!)



Then some meta data so beagle can find it... jam, recipe, grannys, yum.



Thats all !

Now search for "grannys jam" in Beagle.....



And there it is !!



Here is the script, it was made pretty fast so......

Please feel free to mess about with it. And post improvements back here !

Programs you'll need..
zenity
ps2pdf
pdftk
scanimage

Copy the script into its own directory, chmod u+rxw. Thats all!.

-----------------------------------

#!/bin/sh

#scan to pdf with metadata, by Mac Jones, New Zealand
#http://maconstuff.blogspot.com/

#scan a batch

#decide grey or colour (gray or color for the Americans!)
colour=`zenity --list --title "Color or Gray?" --radiolist --column "-" --column "Scan" TRUE Gray FALSE Color`

a=0 #page counter
cont=1 #should we continue?

until [ $cont -eq "0" ] #keep doing it until cont variable is not a zero.
do
echo -n "$a "
let "a+=1"
if zenity --question --text "OK to scan a page, Cancel to finish, Page=$a" --title "Scanning pages"
then
cont=1
scanimage --format pnm --resolution 150 --mode $colour > "$a.pnm"
else
cont=0
fi

done # No surprises, so far.

#convert the raw file to postscript
convert -density 150 *.pnm out.ps | zenity --progress --auto-close --title "Converting to Postscript"

#convert the postscript to pdf
ps2pdf out.ps out.pdf | zenity --progress --auto-close --title "Converting Postscript to PDF"

#remove raw scan files
rm *.pnm

#remove old ps files
rm out.ps

#beep to get attention after processing
echo -e "\a"

#add the metadata and file name.
#this meta data can be searched from Beagle in Ubuntu.

#echo "Please enter a name for the PDF file (** no .pdf on end)"
nm=`zenity --entry --text "Enter file name, (no .pdf on the end)" --title "File Name?"`

#echo "Please enter Metadata for searching"
meta=`zenity --entry --text "Meta data for searching" --entry-text=$nm --title "Meta Data for Searching"`

echo "InfoKey: Producer" > tmp
echo "InfoValue: $meta" >> tmp
echo "InfoKey: Keywords" >> tmp
echo "InfoValue: $meta" >> tmp
echo "InfoKey: Title" >> tmp
echo "InfoValue: $nm" >> tmp

#update the metadata
pdftk out.pdf update_info tmp output "$nm.pdf"

#rm metadata file and pdf
rm tmp
rm out.pdf

zenity --info --text="All done, $nm.pdf is ready!" --title "Thanks!"

--------------------

Why this blog is here

This blog is a place for me to put the interesting things I find on my never ending adventure into Linux. I should have done it ages ago, as two years in I feel like a bunch of the stuff I learnt the hard way may have beenuseful to others !

For reference, I'm using BSD and Debian, my business has 80+ computers on Debian (Damn Small Linux & Ubuntu). BSD is running our firewalls.

So here goes....