Weather data analysis and visualization - Big data tutorial Part 6/9 - SED example

Tutorial big data analysis: Weather changes in the Carpathian-Basin from 1900 to 2014 - Part 6/9

Manipulating output data with the Linux SED command - SED example

This Kartograph tutorial uses JSON as dataformat, so I needed the same format for my own data that the tutorial uses - example:

[{"Weather": "ARAD RO", "ll": [21.35, 46.1331], "1882": 742.0 ? {"Weather": "MURSKA SOBOTA RAKICAN SI", "ll": [16.2, 46.7], ? "1962": 931.0}]

And the resulting data from PIG is not compatible with it, as it have different markup, presented here:

((ARAD RO,46.1331,21.35),{((1882,742)),((1883,680)),((1884,656)),((1885,656)),((1886,770)),((1887,718)),((1888,467)),((1889,893)),((1890,570)),



?



92))})



((DEVA RO,45.8667,22.9),?

So I’ve used the SED Linux command to alter the resulting dataset from Pig

SED example

Sed command 1

sed 's/(([A-Z ]*),([0-9.]*),([0-9.]*))/[{"Weather": "1", "ll: [3, 2], /g' rain_orig.csv > rain_new.csv && cat rain_new.csv

This command checks for ?(any number of capital characters)?, followed by ?,? and ?(any number of 0-9 numbers or ?.?)? blocks of two separated by a ?,? so is valid for ?((ARAD RO,46.1331,21.35)? and changes this string to [{“Weather”: ?ARAD RO?, “ll”: [21.35, 46.1331] The syntax:

sed 's/  /  /g' input > output

for each string of the input find:

regular expression / change it to something.

1, 2 in SED Command 1 is a link to the regular expressions defined in the first part, in our case, 1 refers to ARAD RO, which is encapsulated in ? ? marks I parse rain_orig.csv , the output of PIG and output it to rain_new.csv Sed command 2

sed 's/{((([0-9]*),/"1": /g' rain_new.csv > rain_new2.csv && cat rain_new2.csv

The followings are set up as the above, parsing for {((number, and changing it to ?number? ? valid for years Sed command 3

sed 's/)),((([0-9]*),/, "1":/g' rain_new2.csv > rain_new3.csv && cat rain_new3.csv

While this one parses for )),((numbers), and changes it to numbers So after running all the 3 SED commands we have the needed results in the format of

b>[{"Weather": "ARAD RO", "ll": [21.35, 46.1331], "1882": 742.0, "1883": 680.0, ?b>

One manual task left is to properly close the encapsulation at the end of the JSON file by changing } , to }] The JSON for the Map is available now, and I’ve saved it to the directory of the new map visualization example.