Carriage Returns vs Line Feeds

My first job out of school was to maintain and add to an ETL-style feed processing platform written in Perl. It’s basic function was to take a data source, apply a series of transformations to it, and then spit it back out–usually in a different format such as XML or an SQLite flat file.

A problem that I often encountered when simultaneously working in a UNIX environment and dealing with data sources that were compiled by Excel users, had to do with newline representation. Without getting into the origins of the divergence, an application can basically choose to represent the newline as either a line feed (\n), a carriage return (\r), or a combination of the two (\r\n) control characters. In this case, UNIX does the line feed, and Windows does a carriage return followed by a line feed.

This was why when I opened up a CSV file in Vim, I would occasionally see something that looked like:

hello world^M
how are you doing good sir?^M
this is the last line^M

These files were created by an application that encoded the newline as \r\n, and the \r remained as an artifact (^M) when viewing the file with an application that defines newlines as \n. In this case, I would either process the file using dos2unix or strip the file of these carriage returns by issuing the following command:

:%s/\r//g

In a similar vein, you may sometimes run into something that looks like this:

hello world^Mhow are you doing good sir?^Mthis is the last line^M

These files were created by an application that defined the newline as a single \r, and should be handled by replacing them with newlines:

:%s/\r/\r/g

or

:%s/^M/\r/g

(you can generate the ^M control character with either <ctrl-v><ctrl-m> or <ctrl-v><enter>)

If :%s/\r/\r/g looks confusing, I don’t blame you. Although Vim fails to interpret the character sequence \r\n as a newline on read, it can successfully interpret the carriage return \r into a newline on write. Consequently, you can get rid of the ^M carriage return artifacts with the above search-and-replace command which ends up matching carriage returns and replacing them with newlines.

As for the origins of carriage returns and line feeds, it’s helpful to think about it in terms of typewriters. The carriage return moves the carousel back to the beginning of the current line, and the line feed introduces a new line by shifting the paper up. What we conceptualize as a newline today is the abstraction of the two mechanical operations that were required to accomplish the same thing on the typewriter!

References:
Newline

Advertisement

About Eugene Kashida
I tell browsers what to do.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.