Collect Page Numbers
PowerGREP deals with plain text files. Plain text files consist of unformatted text, so there’s no real concept of a page. Still, plain text files can contain page breaks represented by ASCII character 12 decimal, also known as the form feed control character. Some text editors, such as EditPad Pro and PowerGREP’s built-in editor, allow page breaks to be inserted by pressing Ctrl+Enter and show them as horizontal lines.
PowerGREP’s built-in decoders that convert PDF files and XPS files into plain text (so PowerGREP can search through them) also insert page breaks that match the page transitions in the original PDF and XPS files. You can make PowerGREP search for these page breaks to determine the page numbers. In this example we’ll do this to get search results that indicate on which page each search match was found. We’ll use the “file sectioning” feature to split the file into one section per page. The main search then processes the PDF or XPS one page at a time, with the section number being the page number.
- Select the PDF files you want to search through in the File Selector.
- Select a file format configuration such as “proprietary formats” that converts PDF and XPS files to plain text.
- Start with a fresh action.
- Set the action type to “collect data”.
- Set “file sectioning” to “split along delimiters”.
- To use each page break as the delimiter to divide the file into sections (pages), we need to set the search term for the file sectioning to a page break. There are two ways to do this. Choose whichever way you find more comfortable.
- Set the “search type” to “literal text”. Click on the “section search” box and then press Ctrl+Enter. A horizontal line representing the page break appears.
- Set the “search type” to “regular expression” and type in the regex \f into the “section search” box. This regular expression matches the “form feed” control character that indicates page breaks.
- Specify your search term(s) in the main part of the action.
- In the collect box, use the match placeholder %SECTIONN% as a placeholder for the page number. E.g. %MATCH% on page %SECTIONN% collects found me on page 7 when the main part of the action finds “found me” in the 7th section (page).
- Click the Search button to run the search.
You can find this action in the PowerGREP5.pgl standard library as “Collect page numbers”.