Wednesday, August 10, 2011

Difference between Carriage Return and Line Feed


Carriage Return (CR, 0x0D in ASCII or \r in most programming languages): moves the cursor to the beginning of the line without advancing to the next line. It's used by old Mac OS.

Line Feed (LF, 0x0A in ASCII or \n in most programming languages): moves the cursor down to the next line without returning to the beginning of the line. It's used in most Unix-like systems (Linux, MacOS X, etc).

The End of Line (EOL, 0x0D0A, \r\n): is actually a combination of the CR and LF characters. It moves the cursor both down to the next line and to the beginning of that line. This character is used as a new line character in most other non-Unix operating systems including Microsoft Windows, Symbian OS and others.

A common problem, we will meet is when we open a text file in Notepad and it all looks like one giant wrapped around line. This text file is probably generated by a Unix-like system using Line Feed (\n) as a newline mark which is not compatible with Windows where EOL (\r\n) is used. The way to solve this problem is replacing \n with \r\n. If you open a Unix format text file in UltraEdit, it will ask you "File is probably not DOS format, do you want to convert to DOS format?" Choose Yes, UltraEdit will convert all \n to \r\n for you. The same thing can be done in Python:
import re
newline_re = re.compile('\n')
src = newline_re.sub('\r\n', src)
So if you try to open a text file in Python with built-in method open(), if it's a plain text file, you can just use 'r' mode or 'w' mode to write. It will treat \r, \n and \r\n all as \n. However, if you are opening a binary file and such auto-conversion is not desired, files should be open in 'rb' mode or 'wb' to write. Appended 'b' indicates Python to treat the file as a binary file instead of text file. It's especially useful when dealing files across all platforms.
src = open("C:\example.po", "rb").read()

No comments:

Post a Comment