Many web authors are sloppy at adding proper <TITLE> tags to their HTML files. They are easy to forget because they are not clearly visible when viewing a website. However, <TITLE> tags are important because they’re used as the default name for bookmarks/favorites. Most search engines will use the titles to list your pages in the search results.
Assuming you have been more careful with adding the title to the HTML body, you can easily fix this problem with PowerGREP. Usually, <H1> tags are used to add titles to the body. We will use <H1> tags in the example below, but you can easily adapt it to whatever tags you have been using. What we’ll do is tell PowerGREP to find the <H1> tag in each file and capture its contents. Then we use the captured text to replace the <TITLE> tag.
The “filter files” feature on the Action panel is what we’ll use to capture the <H1> tag into a named capturing group. Then we can set the main action to search for the <TITLE> tag and to replacde it with the contents of the named capturing group. This relies on PowerGREP’s special ability to carry over text matched by named capturing groups from one part of the action to the next.
Should a file not have an <H1> tag, then it is filtered out and no changes are made to it. If a file has more than one <H1> tag, then only the first tag is used. Once all the regular expressions in “filter files” have found a match, PowerGREP considers the file to meet the filtering requirement. It won’t look for any further matches for the filtering regex.
If a file does not have an <TITLE> tag, the search-and-replace won’t replace anything. If a file has more than one <TITLE> tag, then all of them are replaced with the contents of the first <H1> tag in the file.
This action is available in the PowerGREP5.pgl library as “Update HTML title tags”.
If some of your HTML files do not have TITLE tags at all, but they do all have <HEAD> tags, you can use the following regular expression <head>(?:(.*?)<title>.*?</title>)? for the search-and-replace. This regex matches the <head> tag optionally followed by the group (.*?)<title>.*?</title>. This group starts with (.*?) to skip over any number of characters and capture those into capturing group number one. The star is made lazy so this group matches as few characters as possible, expanding only as needed to allow <title>.*?</title> to match the title tag. If there is a title tag, then the first capturing group matches the text between the head and title tags. If there’s no title tag, then (.*?) expands all the way to the end of the file before giving up (assuming we turned on “dot matches newlines”). Since the question mark and the end of the regex makes the group after the head tag optional, the regex matches only <head> in that case.
The replacement text becomes <head>\1<title>${h1}</title>. In addition to inserting the new title tag and the named capturing group, this replacement text also re-inserts the <head> tag that we matched and the text between the head and title tags that we may have matched. If there was no title tag in the file, then the first capturing group did not participate in the match, and \1 inserts nothing.
You can find this action in the PowerGREP5.pgl standard library as “Update or insert HTML title tags”.