Bash Snippets for Data Processing

My daily job requires writing scripts to process data, mainly stored in csv format. I recently found using single bash commands is often an easier and faster way than using python, perl or R for data preprocessing, for tasks like data merging, filtering, subsetting or even plotting. This notebook lists a set of utility bash commands by its use cases for data preprocessing.

Merging

  • Merge csv files by row without headers.
1
awk 'FNR > 1' *.csv > output.csv

So far the fastest way to merge csv files by rows and excluding headers (assuming the first row) in each one. See reference.

  • Append the header from a file (first row) to the beginning of the other file.
1
head -n 1 input.csv | cat - other.csv > temp && temp other.csv

See reference.

Comment