I've got a couple of related questions I guess...
I was wondering if anyone has any insight into any interesting history that has been done utilising "big data" - by which I mean, mining huge datasets in a way that has only been possible in the last decade or two? Did any of this sort of analysis overturn any previously prevalent explanations for events?
Has any thought been given by historians as to how historians of the future will deal with the huge amounts of data we generate now? Rather than worry about a lack of sources, won't future historians be drowning in billions of tweets, e-mails, even Reddit posts? Has any thought been given to how all of this information might be parsed in a meaningful way?
Hoping this doesn't break the 20 year rule as I'm asking about the practice of studying history, rather than about recent events per-se!
Well, one of the limitations of data mining is that the data needs to have been available, meaning there were records/figures available to be mined and explored, which leaves its scope with regard to history to more recent history - but it also leads to focuses on certain areas of history more than others, such as economic history, for which we have large quantifiable sets of data available.
As far as interesting uses... one of the most interesting ones is THOR which is a program the US Air Force to map every single bomb dropped since WW I by the US - using paper reports and modern databases - to examine the effectiveness of air campaigns and bombing targets. A lot of it of course is used to refine and improve tactics and strategies, but I'm sure there have been a lot of interesting historical insights derived from it for a military historian.
Likewise, for US military historians, combat and casualty figures are being documented and explored to provide insight into military operations. For instance, we have taken historical data to provide databases which provide outputs for casualties sustained in war. Example: American War and Military Operations Casualties: Lists and Statistics
As far as how future historians will deal with it? A lot of it is speculation, but it will likely be parsed as it is today - with automation to sift through the data to aid historians in identifying relevant and historically accurate data from the massive amounts of irrelevant and inaccurate data.
I'm not a historian but I do work with big data as part of my day to day work, so I may be able to answer this.
The type of big data I work with is biological, and one of the major ways big data has already affected the study of history is from the study of genetics.
Most people I'm sure have heard of the study which showed that about 0.5% of the worlds population are descended from Genghis Khan, this was originally published in the American Journal of Human Genetics in a report called The Genetic Legacy of the Mongols. In brief the process of making this study involved sequencing many different men's Y-chromosome, thus being able to trace their male lineage. A genetic analysis was then used to spot unusual features in many of the Y-chromosomes and thus infer a common ancestor, the global distribution of these features among different populations suggesting a link to the mogols.
Now that study was from 2002 when genetic sequencing was in it's infancy, it is now much cheaper and we can sequence an individuals DNA relatively easily. To see this potential for the future, just read this news report in Nature of a genome hacker creating the largest ever family tree, containing 13 million individuals.
So what can big data genetics in the future do? From DNA sequencing of the modern population we can trace past migration of human populations, this was actually done fairly notably back in 2001 as part of a BBC series called Blood of the Vikings, examining where in Britain the vikings settled, leaving their DNA behind. With more and more people across the world being sequenced we should be able to have a more detailed picture of past human migrations.
You also may be able to find genetic marks in modern DNA for other historical events, already it has been seen that we can see evidence for the Mongol invasions in the modern human genome, but it has the potential to reveal smaller events such as the sack of Magdeburg.
One of the things I know for certain this field has disproved is the theory that polynesia was populated by people from South America who reached it by raft, as proposed by Thor Heyerdahl in Kon Tiki. In the 1990's scientists examined the mitochondrial DNA from polynesians and found that they are most closely related to people from South East Asia and not South America.
Interestingly non human genetic data is also being studied, I remember reading in 1491 by Charles C. Mann, how Maize was remarkable for being domesticated from a non-edible species, Teosinte. Recent scientific work on the population genetics of different varieties of maize is giving us some insight into how this was done.
But I don't want to dismiss the tools. The tools matter, even if they are only part of an argument rather than the provider of the argument. So as a very easy, cheap example — whenever I want to track the rise and fall of a given term, I just pop over to Google Ngram Viewer and see what it tells me. It's never the argument, but it can be part of an argument. Here's one of my favorite examples: What age are we in?. You can see a lot of interesting trends here about how we define ourselves relative to technology. That can't be the argument in and of itself, but it meshes on well with other historical discussions.
Similarly, citation analysis actually can provide interesting arguments about changes over time. So you can track how many articles are submitted on the philosophy of quantum mechanics over time, for example, and note that the proportion relative to other articles in physics drops dramatically in the 1950s through the 1970s. This is then a nice, compelling datapoint to add to a discussion of changing trends in physics in the Cold War — a shift away from the abstract, in part because the abstract doesn't lend itself towards scaling up and doesn't get you government grants.
So the tools matter. Digital tools have radically transformed how younger historians do research. Our training and note-taking and archive use sometimes varies drastically from how our advisors, even our "young" advisors (e.g. people who got tenure recently), wrote their dissertations. And this is something historians have spent a lot of time talking about.
As for what to do about it... historians do try to encourage (and sometimes require) scientists and important figures to save their e-mails. Of course not everyone likes this; e-mail accounts are often a blend of professional and unprofessional, and even the "professional" correspondence is a lot more casual than most written communication. I doubt the success rate will be high. Unless the NSA is secretly archiving everything and will let historians of the future look at it, I doubt much will be there. (As an aside, the FBI often transcribed wiretapped phone conversations — their archives can give rare insights into casual telephone conversations, the sort of thing that is rarely written down. Of course, it does so at the cost of violating the privacy of the person in question.)
But historians will adapt. They always do. That's the job. They will find ways to make interesting arguments and tell interesting stories. There will be limited by the source material but they always are, always have been, and always will be.