|
After scanning a book or article it is necessary to do a little editing to maximize later reading. All of these steps are optional, but you will be very pleased if you go through these steps. All of these steps can be done very quickly and in one pass at the same time.
First, You should number pages in a way the computer can recognize during playback. This will allow you to quickly correlate to any page in the original text and move back and forth easily between pages later as you read Proportionally. Adding page numbers also makes correcting spelling and unrecognized characters much easier. You can very easily see which page and where on the page a mistake is located.
You should also strip the header and footer (if any) lines from your scanned text so you won't keep rereading these same words over and over and having the flow of the main text continually interrupted by these words. As you strip headers, It is often easiest to go through the text, page by page, adding outline marks and making other edits as you go. You will be amazed at how fast you can actually make all these edits by hand.
Text downloaded from CD ROM does not have headers or footers; nor does it have regular page numbers nor spelling errors (hopefully).These problems only crop up when you are scanning in a book. The easiest way to edit scanned text is to rest the book (or stack of pages) in front of the keyboard. As you number each new page, turn the page exposing the current page.
Now you can quickly see what has to be done to the page as it appears on the screen. Any Chapter titles and sub-titles will immediately present themselves. Charts, graphs, tables, math and footnotes will all be clearly evident. Do not worry about paragraph indents. All these indents (if present) are automatically removed later during Proportionalizing.
The Last Word on a Page
The last word on the page may be broken apart from the first word on the next page. If so, it will be missing a hyphen. You should add a hyphen to such words. Alternatively, you can delete the hard return between the two word parts, thereby knitting the two parts together. Doing this is often a lot more work as the page number often falls between.
Page Numbers on the Bottom of the Page
This placement just means that the page number refers to what went before. The user should mentally add one to the page number count to know what page he or she is currently reading. Alternatively, you can move page numbers to the tops of pages.
Inserted Text
Sections of text, often on colored backgrounds, are sometimes inserted in the flow of regular text. After OCR, these sections need to be moved to the end of the appropriate section of main text, so they minimally interrupt the flow of thought. Often these inserted sections run from one page to the next. In such cases they will need to be cut and pasted together and then moved. You may wish to add a short line of hyphens at the end of sections of inserted text before saving. Doing this will enable you to easily tell where such text ends.
Correcting Character Recognition & Spelling Errors from Scanning Occasionally the scanner will have trouble recognizing a character. This can be because the character did not print properly on the original page. The brightness level for the scan could also be set slightly off. Anyway, after running the Post Scan program, misunderstood characters are marked with a ~character; .
Sometimes it is too much bother to correct spelling errors due to scanning. If the eye sees a ~character; in the middle of a word it will quickly see this as a challenge to guess what the word should be. This gets very easy to do. However, there is an easy way to correct spelling if a number of people are going to see the same text and you need it to be perfect.
Go to the beginning of the text which you have scanned and saved and type:
Option+Command+Zero (0).
Every time you press Enter you will move to the next character recognition error as indicated by the "°ree;" symbol. Correct each error as you come to it. After you have corrected all these character recognition errors then you can run the text through your regular spell checker and make
more spelling corrections. Often, italicized text will be misspelled, especially if on a colored background. Remember: If you set the lighten-darken control correctly before beginning to scan, you will have the fewest spelling errors.
Pictures, Charts, Graphs and Tables
All of these should be selected and cut from text. In their place you should type: Option key+t plus the words "SeePage:". This will produce the character string "†SeePage:". Notice that there are no spaces between words. Then you can add the page number if you wish, for example: †SeePage:103.
You can use the regular keyboard to type this out, or you can use a super-fast keystroke to type it out at the position of the cursor. Just type: Option+C. Alternatively pull down the Macro list to "ab) Type †SeePage:". During playback, the program will automatically pause at each † for a moment so you can formally pause the program if you wish to go to the actual text.
Note: Simple one-column outlines and one-column lists are fine to leave as is.
Footnotes
Footnotes should either be cut out completely or placed next to their reference number in the text. You also need to type a period after any footnote number in the actual text. This way sentences will end properly with a final period. This problem arises because footnote numbers are added right next to the end of sentences without a space break. Hence they are read as part of the preceding word. Adding a final period after the number allows the end of the sentence to be recognized as such by the PR program.
Next, select and cut footnotes. Either discard them or paste them next to their reference number in the text, separated by a space. If you choose to relocate a footnote, add a * at the end of the footnote so that the reader will know that the footnote has ended.
Margin Notes
Margin notes should be removed or treated as introductory paragraphs with special [] marks around them. Sometimes margin notes, if present, will occasionally get spread out in the text; they may need special treatment. Margin notes should be inspected as you come to them. The easiest thing to do is to cut them out when you block text.
Text Wrapped Around a Picture
Text wrapped around a picture will often be considered a graphic and automatically deleted. This situation does not happen very often. Usually, pictures are in separate blocks from the text, and easily blocked out. The best approach for this problem, if it occurs, is to tape a piece of white paper over the illustration. This way, the scanner will see only text. Use a removable and reusable white tape on the edge of a piece of 3x5 note paper, cut to shape if necessary. Alternatively, you can just rezone the page into sections without graphics, before doing text recognition. This latter approach is much quicker.
Separating Two Pages Scanned at the Same Time
The easy way to do this is to automatically zone text as "no zones". Then before you start text recognition rezone each page. Only include the page number and not the rest of the header or footer.
Missed Page
Occasionally, the scanner will miss most of a page, especially if it gets confused by intermeshed illustration. The easiest thing to do here is to turn off deferred recognition and multiple pages options on the scanner program and scan the missing page. Then copy and paste it onto a separate file. Then copy and paste this document into the appropriate place in the main work. Individual pages can very quickly be added this way without serious inconvenience as you are editing.
Math
Math equations need to have the spaces removed between characters. Otherwise, each number in the equation will appear on a separate line. Furthermore, scanning usually does a terrible job on sub and super scripts as well as fancy math graphics. If you do not want to rework the math, it may be easier to just treat math sections like a graph and have the student refer to the appropriate page in the book. In this case, cut out the math and type: Option+t. The letter † will be inserted.
Adding Interactive Pauses
If you want to add pauses to the text to make interactive questions and answers out of the text as it is read, now is a good time to do this.All you do is to type a ~ in the sentence where you want a pause to occur. When the text is Proportionalized, these marks are automatically turned into hidden signals which the reading programs recognize if you so choose. Otherwise, they will not play out.
Note: When you add pauses to text that has not been Proportionalized, you are using a different program than when you add a pause to Proportionalized text. In the latter case you are adding the actual hidden signal.
Removing Interactive Pauses
Occasionally, you may wish to save basic text with and without additional cognitive pauses. You can easily do this. Either start off by saving an uncoded copy or save the coded text as one file and then open it and remove all the ~ marks and save it as a second file. To remove ~ marks: Type:
Adding Markings to Chapter Titles and Sub-Titles
Now is one time to add markings manually to text for chapter titles and sub-titles. Use <:#; <:=; <:; <:-; and > respectively just before the first word they refer to. Of course you can also add these markings as you read. See section on outlining text.
Note: Be sure sub-titles are not part of regular lines of text. Sub-titles should end with a hard return; that is, they should not be the first part of a line of text. Add a hard return if necessary. You do this by pressing the return key.
You can use the keyboard and shift key in the regular manner or you can quickly type marking combinations using the following keystrokes:
- for <:# (indicates a chapter title) Type: Option+a
- for <:= (indicates a primary sub-title) Type: Option+s for <: (indicates a secondary sub-title) Type: Option+d
- for <:- (indicates a tertiary sub-title) Type: Option+f
- for < (marks a selected sentence) Type: Option+g
- for p# (marks a page number) Type: Option+z
- for > (marks a selected name or word) Type: Option+x
Reversed Titles
Reversed titles, where the letters are white and the background black, will not scan. You must retype these titles if any.
Saving Prepared Text
It is often a very good idea to save text that is all prepared for Proportionalizing. This is text that can be read as a regular word processing file. Furthermore, saving text at this point takes up a lot less memory. It actually takes six times as much storage to save the same amount of text once it has been Proportionalized.
If you are working with a lot of books which you are not going to use that often, you may want to save them as text files. Then you can Proportionalize a whole book overnight as necessary. This means you can save the average book on just one diskette (1.4 megs.). Alternatively, about seventy pages of Proportionalized text can be saved on each diskette (1.4 megs.)
The best approach for a school is to keep all the books in current use on a file server in Proportional format on locked files. Each student downloads Proportionalized text as needed from the central memory onto his own, or lab computer and plays it as he or she wishes, marking the text as desired and saving selections onto personal files. This way text can also be sent via modem over the phone lines to students at home. This process can operate automatically without involving school personnel.
Text Section with Too Many Hard Returns and Tabs
Occasionally, the OCR program will create a short section of text which is all chopped up. It will have extra tabs and hard returns in it. It almost always occurs on indented text. This problem is very easy to fix. All you need to do is to select the section of text and then go up to the Search menu and activate Find/Change. Pull down the Direction sub menu to "Within Selection" then insert "hard return" in the find line and click on Change All. Next insert "tab" on the find line and again click on Change All. Your section of text will be all fixed up.
Note: Be sure to choose "within selection" or you will cut out all the hard returns and/or tabs in the piece.
Copyright Jetsoft Development Company 1997-2005
|