ZSH global aliasesPosted on February 12, 2014
Just a short post about two useful global aliases I created. ZSH global aliases are basically variables which are expanded before the command is executed. This allows them to be placed anywhere on the line, not just at the start like traditional aliases. Bash (as far as I know) does not have an analog to ZSH global aliases, but I have found them very useful.
Both of them produce exactly the same output (unique lines in a file), but in two different ways.
The first uses awk to hash the lines seen, and only print the line if the current line is not in the hash. This finds the unique lines without sorting, which preserves the original order is usually much much faster. The only issue is it can exhaust system memory if used on extremely large files.
How much faster is hashing for uniques rather than sorting for uniques?
Note these commands use GNU shuf to randomly shuffle
Now lets count unique lines in the files
So you get the same results 16X faster using the hashing strategy.
And lets use the aliases which started this whole mess
All in all this allows you to do sorting faster and with less typing than the
| sort | uniq pattern.