A couple of days ago there was a news story on the housing bubble in China. The report spoke of Ghost Cities – row upon row of concrete towers that have mushroomed around the country in anticipation of an upwardly mobile population but with no living soul having yet taken up residence. Check out 24.864N, 102.7954E to explore one such city in the Yunnan province. It is interesting though that the orderly buildings in the picture on the right are surrounded on the North East and to the West by a disorderly array of shanty towns. Seems to be teeming with life all around. Just not in those multistory buildings in the middle. The ghosts are as manufactured as the landscape; apropos some related imagery.
This took me back to my research days when I was working in remote sensing satellite imagery. I came across the Large Area Crop Inventory Experiment (LACIE) that was done in the 70’s using satellite data gathered from the Landsat. After review of the data the researchers were able to conclude that the wheat yield from the Soviet Union was going to be 20% below the stated estimate from the official Soviet channel. Till date such estimates came about through surveys and were likely influenced by geopolitical factors (euphemism alert). The final yield report from the Soviet Union was only 1% off the forecast made by LACIE.
I use the two examples above to explain why Big Data is different from legacy data feeds. It is the difference between getting a data sample from a potentially biased source, versus getting all the raw data there is without any massaging or modulation.
On another note, LACIE was a Big Data initiative for reasons other than just data volume. The Landsat multispectral scanner was just 4 bands with resolution of 68m x 68m. The amount of data collected was less than the space on a DVD today. The Big-ness was in the breadth and the scope of the initiative. LACIE was not just a forecast on crop yields; it established a methodology to foretell famine risk well in advance of the harvest. The scope was truly epic, and I mean that both in the literal and in the colloquial sense of the word.