Friday, September 30, 2011

Python New Line (换行符), Line Feed, Carriage Return

In Python, one small but important thing is about the new line characters. Especially when you're donging something cross-platform. Windows uses Carriage Return (\r) followed by a Line Feed (\n) as the new line character. Linux uses only Line Feed (\n) as the new line character while Mac uses (\r). Therefore, the safest way to store your file for cross-platform is using "\r\n", which is compatible with all three major systems. The followings are some useful notes to deal with this issue.

Keep the new line character when reading/writing files:
Sometime, it's quite important to keep the new line character as it is when reading/writing files in python. For example, when a Windows file using '\r\n' as end of line is read into python, the end of line will be automatically replaced by '\n'. The same thing happens in the writing process. So to stop this auto conversion, use following commands. "rb" and "wb" instead of "r" and "w" tell python read the file as binary, and python won't treat it as a text file nor do any auto conversion on it.
msgs = open(input_file, "rb").read()
open(output_file, 'wb').write(msgs)

Make your new line characters safe:
As mentioned above, '\r\n' is probably the safest new line character you should use in python. If you have file not using this safe new line character, the easiest way to make it safe is doing following things to it:
msgs = open(input_file, "rb").read()
msgs = '\r\n'.join(msgs.splitlines())
open(output_file, 'wb').write(msgs)

1 comment:

  1. A really helpful tip for python developers and i believe it will help to deal with the problems in a better way. you must keep positing such tutorials for other problems as well.

    ReplyDelete