This articles covers basic tools and technologies to use when conducting the first steps on big data analysis.
- Linux as the base OS
- For basic data processing:
- Bash shell: environment for running multiple command-line Linux tools for data manipulation
- Comes pre-installed with Linux; check Reference manual for usage
- Bash might be not the default Linux shell, see how to switch to it
- Learn Bash by examples
- Most important Linux Commands and phenomenons to master for data manipulation:
- AWK – simple data reformatter with compact coding features
- Python – easy to learn, effective programming language with a huge amount of libraries available for various tasks. Great for data manipulation used from the command line.
- Bash shell: environment for running multiple command-line Linux tools for data manipulation
- And the big data analysis framework chosen based on the type of data analyzed. For the first step tutorials our suggestion would be: