My daily job requires writing scripts to process data, mainly stored in
csv format. I recently found using single
bash commands is often an easier and faster way than using
R for data preprocessing, for tasks like data merging, filtering, subsetting or even plotting. This notebook lists a set of utility
bash commands by its use cases for data preprocessing.
- Merge csv files by row without headers.
awk 'FNR > 1' *.csv > output.csv
So far the fastest way to merge csv files by rows and excluding headers (assuming the first row) in each one. See reference.
- Append the header from a file (first row) to the beginning of the other file.
head -n 1 input.csv | cat - other.csv > temp && temp other.csv