Content area
Full text
Scripting languages that support a command line interface offer a practical solution
Crunch: 1. To process, usually in a time consuming or complicated way. Connotes an essentially trivial operation that is nonetheless painful to perform. The pain may be due to the triviality's being embedded in a loop from 1 to 1,000,000,000. "Fortran programs do mostly number crunching." [1]
Data crunching is the process of automating the filtering and translation of data from one format to another. It frequently deals with text data, but also can deal with XML, binary data and relational databases.The translations themselves are frequently very straightforward and simple. In principle, these translations could easily be done by hand with a text editor, at least for text files.The difficulty, as the definition above suggests, comes when you have multiple records to alter. These records could be part of a file or a data stream. Normally, this term is not applied to processes involving statistical or numerical manipulation, these more involved processes are referred to as number crunching.
While this type of processing could be done with development tools such as Visual Basic, Delphi or C++, these tools are generally overkill for this type of work. A more practical approach is to use one or more of the scripting languages available. Commonly used tools for data crunching include Python, Java, Ruby and MySQL or SQLite.The common factor between all of these tools is that they support a command line interface. Why does this make a difference? Primarily because it is hard for other programs to deal with the output of GUI applications. It is also much easier to chain data crunching applications using a command line interface. This is important as many data crunching tools are single-function;...





