Convert Text File Encoding And Line Break Style
The reference topic about text encoding configurations provides some background information on the various encodings used for plain text files, and how to set up PowerGREP to make sure it correctly displays and searches through the text in your files. This example assumes you’ve already selected a text encoding configuration that allows all your files to be displayed correctly when opening them in PowerGREP’s built-in editor.
While PowerGREP supports all the text encodings and legacy code pages that still have any relevance and offers a lot of flexibility in using different encodings when reading different files via its text encoding configurations, many other applications are not nearly as flexible. Windows applications, for example, typically only support the PC’s default code page (such as Windows 1252) and a few forms of Unicode (UTF-8 and UTF-16LE with a byte order marker).
Line break styles can also be an issue. Windows text files normally terminate lines with a CRLF pair, while UNIX/Linux and OS X text files use a single LF. Classic Mac text files used a single CR. PowerGREP automatically handles all these as well as all other Unicode line breaks. So you don’t need to tell PowerGREP which line break style your files use. But again, other applications are usually not so flexible. Windows applications like Notepad show UNIX text files as if they had no line breaks. Linux applications often show the CR that they’re not expecting at the end of a line as a Ctrl+M control character.
With PowerGREP you can easily convert your text files to a specific encoding and a specific line break style.
- Select the files you want to convert in the File Selector.
- Make sure “text encodings to read files with” is set to a configuration that allows PowerGREP to correctly read the text in all those files.
- Start with a fresh action.
- Set “action type” to “list files”.
- Leave the Search box blank to convert all files you selected in step 1.
- Set “target file creation” to “convert matched files to text” if you want to overwrite each file with its conversion. Set it to “convert copies of matched files to text” if you want to keep the original files and save the conversions in another folder. If you choose the latter, you’ll get two additional settings to specify which folder.
- Set “target file text encoding” to the encoding that the converted files should use. Converting to Unicode always works as Unicode supports all characters. Converting to other encodings may cause characters to be lost if the new encoding does not support all characters used in some files. Lost characters appear as question marks after the conversion. Choose “same as original file” if you want to change the line break style without changing the encoding.
- Set “target file line break style” to the line break style that the converted files should use. Unicode line breaks are unaffected. Only CR, LF, and CRLF line breaks are converted to the style you choose. Choose “same as original file” if you want to change the encoding without changing the line break style.
- Set the backup file options as you like them, so you can undo the conversion if it doesn’t work out the way you expected.
- Click the Convert Files to execute the conversion.