For better or worse, we are no longer living in the age of data, we are now living in the age of big data. As many of us know, big data translates into large amounts of data being collected, analysed, and used – for better or worse – for specific purposes; however, this data doesn’t just magically appear, it has to be transferred, and that process is today’s word, data dump.
The term data dump
Starting with the term itself, data dump first appears in the journal Operations Research in 1965, stating that: “This data dump plays an important part on the SIMPAC system, since very little analysis is done during the run, this all being reserved for a post-run program.” The first term in this compound, data, the plural of the Latin datum, meaning ‘a thing given’, first appears in English via Sir Willian Batten’s Most Easie Way Finding Sunnes Amplitude (1630), but, strictly in the sense of information in digital form, it can be attributed to Calvin N. Mooers in a lecture given in 1946 and found in The Moore School Lectures (1985), where Mooers states that: “The data is stored in the memory in a systematic fashion with the points numbered in sequence.” The second term, dump, meaning ‘to throw down or drop with force’, likely originated in Scandinavia, as Old Norse possesses the word dumpa (‘to beat’), and, though our generally understood (aforementioned) meaning of the term first appears in an 1868 edition of The Commerce and Financial Chronicle, its first association with computer data occurs in 1956, when the journal Computer and Automation explained, “Dump check, a check which usually consists of adding all the digits during dumping, and verifying the sum when retransferring.”
The devil is always in the details
In practice, a data dump is simply the transferring of a large amount of data between 2 systems, typically over a network connection. Though it seems simple enough, the devil is always in the details. For most cases, data dumps are utilitarian functions where information is utilized by software or analysed by people for a specific purpose, such as legal data extraction, business planning or product development, or even governmental census data for service improvements.
On the other hand, though they may be utilitarian and commonplace, that doesn’t mean that data dumps should be without scrutiny. First and foremost, the overall relevancy, accuracy, and nondeceptive presentation of data needs to be examined. Second, especially in the bigger scope of data in the age of privacy, the pitfall of personally identifiable information needs to be addressed: as can be seen in the recent complaint against Europol (as well as numerous complaints against many tech companies), the excessive collection and transferring of mountains of data has the potential to violate personal privacy and data security laws/restrictions.