Saturday, July 13, 2019

Adding line numbers to a large number of entries in a file

After looooong time...

I had this requirement of creating a sequence file for a Map-Reduce job.  I had a large set of files (some 1000s) in a folder and I need to create keys.txt for the Sequence file creator.

First I thought I need a painful way of writing a python module to get the list of files, add the index value as I walk through the list and then save them in a file named keys.txt

Then I found out the good old editor vi can easily do this job for me.  Here are the steps:

1) Go to the folder where you have the last set of files
2) Use ls -1a > keys.txt to get just the file names.  One caveat is this will add ., .. and keys.txt in the file list.  We need to remove them.
3) Open the keys.txt in the vi.
4) Remove those 3 lines that has those entries mentioned in 2).
5) In vi command mode, enter the following:

%s/^/\=printf('%-3d,', line('.'))

and press enter.

The above command will add the line number for each line. For example:
1,1.zip
2,2.zip
...
...
...
500000,500000.zip

And that is all folks....

No comments:

Post a Comment