Make Sections and Their Contents Consistent
This example illustrates how you can use named capturing groups to carry over regex matches from the file sectioning to the main part of the action.
Suppose you have a number of HTML files, with headings such as <h1>heading 4</h1> that you want to make consistent. The 4 should be changed into a 1.
PowerGREP makes this easy. Use file sectioning to match the header tag and its contents. Then make the main action search-and-replace through the header, replacing numbers in the header’s contents with the header’s nesting level carried over from the file sectioning.
You can find this action in the PowerGREP5.pgl library as “Make numbers in HTML heading tags consistent”.
- Select the files you want to search through in the File Selector.
- Start with a fresh action.
- Set the action type to “search and replace”.
- Select “search and collect sections” from the “file sectioning” list. Leave the section search type as “regular expression”.
- In the Section Search box, enter the regular expression <h(?'headerlevel'[1-6])>(?'tag'.*?)</h\k'headerlevel'> and make sure to leave “case sensitive search” off. This regular expression contains two named capturing groups, “headerlevel” and “tag”.
- In the Section Collect box, enter the named backreference ${tag} to restrict the main action to the contents of the tag.
- In the Search box in the main part of the action, enter the regular expression \d+ to match any number.
- In the Replace box, enter the named backreference ${headerlevel}
- Set the target and backup file options as you like them.
- Click the Preview button to run a test.
- If all looks well, click the Replace button to update the headers.
When PowerGREP executes this action, the following happens for each file:
- The sectioning regex matches a heading tag in the file, e.g. <h1>heading 4</h1>. The heading tag’s number 1 is stored in the named group “headerlevel”, and the tag’s contents heading 4 are stored in the named group “tag”.
- Because the section collect is set to a reference to the named capturing group “tag”, the main action will search only through the contents of the heading tag.
- The main action matches the first number 4 in the heading tag’s contents.
- The main action replaces the matched number with the contents of the backreference “headerlevel”: 1
- The main action repeats steps 3 and 4 until all numbers have been replaced. In the example, the section after substitution becomes <h1>heading 1</h1>
- PowerGREP repeats steps 1 through 5 for all heading tags in the file.
Updating the Heading Tags Themselves
Doing the opposite, updating a heading tag to make it consistent with numbers in the tag’s contents, is almost as easy. What we’ll do is replace <h1>heading 4</h1> with <h4>heading 4</h4>
You can find this action in the PowerGREP5.pgl library as “Make HTML heading tags consistent with their contents”.
- Select the files you want to search through in the File Selector.
- Start with a fresh action.
- Set the action type to “search and replace”.
- Select “search for sections” from the “file sectioning” list. Leave the section search type as “regular expression”.
- In the Section Search box, enter the regular expression <h(?'headerlevel'[1-6])>(?'tag'.*?)</h\k'headerlevel'> and make sure to leave “case sensitive search” off. This regular expression contains two named capturing groups, “headerlevel” and “tag”.
- Turn on the option “replace whole sections”.
- In the Search box in the main part of the action, enter the regular expression \b[1-6]\b to match a number between 1 and 6. The word boundaries also make sure we don’t match the number in the heading tag itself.
- In the Replace box, enter <h\0>${tag}</h\0>
- Set the target and backup file options as you like them.
- Click the Preview button to run a test.
- If all looks well, click the Replace button to update the headers.
When PowerGREP executes this action, the following happens for each file:
- The sectioning regex matches a heading tag in the file, e.g. <h1>heading 4</h1>. The heading tag’s number 1 is stored in the named group “headerlevel”, and the tag’s contents heading 4 are stored in the named group “tag”.
- The main action searches through the entire section, i.e. tag with contents.
- The main action matches the first number 4 in the heading tag. Because of the word boundaries in our regular expression, the 1 in h1 is not matched.
- The backreference \0 in the replacement text is substituted with the regex match 4 and the named backreference “tag” is substituted with heading 4 captured by the file sectioning. The result is <h4>heading 4</h4>
- Since we turned on “replace whole sections”, the whole section is substituted with the replacement, and the main action is done with this section.
- PowerGREP repeats steps 1 through 5 for all heading tags in the file.