With PowerGREP, you can easily extract any sort of information from large numbers of documents, archives or spreadsheets, and save the collected information into a comma-delimited text files or CSV files. Here are the basic steps:
Suppose you have a large number of orders stored in text documents, and you want to make a list of the delivery addresses. In each file, the delivery address has the following layout:
Deliver to: Joe N. Doe Street address (one or two lines) City, ST, 12345-6789
In the CSV file, you want to have the following fields: name,address 1,address 2,city,state,zip
You can easily achieve this following the steps above. First, we need to create a regular expression that matches a delivery address, which is quite straightforward. We match “Deliver to:” first. Then we capture one line of text with (.*)\R which is the name. Then one or two lines with (.*)\R(?:(.*)\R)? which are the address. Finally, we match one line that ends with a state code [A-Z]{2} and ZIP code [0-9]{5}(?:-[0-9]{4})?. Using [, ]+ we allow commas and/or spaces as delimiters in the last line. The complete regular expression becomes: Deliver to:\R(.*)\R(.*)\R(?:(.*)\R)?(.*?)[, ]+([A-Z]{2})[, ]+([0-9]{5}(?:-[0-9]{4})?).
The (?:group) parts in the regular expression are non-capturing groups. Those simply group items to repeat them together. The (group) parts are capturing groups. They’re essential in allowing us to insert part of the regular expression match into the text to be collected. In this case, we simply reference each group once. As the text to be collected, enter: \1,\2,\3,\4,\5,\6. If a capturing group did not participate in the match, it is substituted with nothing. E.g. in a delivery address with only one line for the street address, \3 will remain blank.
Set the target type to save results into a single file to get one CSV file with all delivery addresses. You can then open the CSV file in a spreadsheet program or other application.
You can find this action in the PowerGREP5.pgl standard library as “CSV: Extract data into a CSV file or spreadsheet”.