How historians take notes when working with sources? Do they use any fixed conventions or computer programs?

by cherry_picked_stats

This might sound like a weird question but hear me out.

It seems to me that significant amount of historical 'basic facts' (for lack of better words) are structurally similar - they frequently include some combination of quite finite set of categories: eg. person, personal role, place, political/religious entity, event, group, date etc. Most of those categories are quite rigidly set, particularly in the political and military history.

For example utterances like "Person X died in the year Y in the place P according to the source S1" or "The kingdom of K1 fought the battle of B with the kingdom of K2 in the year Y according to the source S2", encompass thousands of actual sentences to be found in history books. At first glance taking and sharing notes of such 'basic sourceable facts' in some predetermined, common and conventional form seem to me, non historian, like a good idea.

Yet I never heard about anyone doing that, so either I've heard too little or it actually isn't a good idea at all. So - if the former, what systems are used, and if the latter, why is that?

restricteddata

My sense is that every historian takes notes differently, and that there are no fixed conventions or programs. Most "systems," when you ask historians about them (which I have, both formally and informally, as I was asked to sort of survey people on this in my field a few years back) are highly individual and idiosyncratic. My sense is that 40 years ago people getting PhDs in history were taught pretty specific approaches to this using notecard systems (and if you get a "practical guide to doing history work" from the 1980s-90s you'll find descriptions of this), but the ubiquity of personal computers has totally shattered that, and no universal system ever arose in replacement. There is no ubiquity even in managing citations, which is an even easier problem in principle than managing facts — some people use one of a handful of common programs (Zotero, EndNote, Mendeley, etc.), some use their own homemade systems, some use essentially no system at all (I am somewhat in the latter category, because I find it easier to just hand-write citations as I use them or look them up than to manage a database and its inevitable idiosyncrasies; I am not claiming this is an ideal approach, just what I have ended up doing over the years and switching to another system is a big-enough time investment that I am not eager to do it!).

The problem with your scheme is that the category of "fact" you are talking about is almost an insignificant part of what historians do. Yes, we need to know some dates. We usually do not even care as much about birth and death dates as you might think given their prevalence in what we write. These days we universally just Google these things as we need them while writing, and usually Wikipedia pops up and we taken for granted that it's probably right on this sort of thing (though we really ought not take that totally for granted).

The problem with your scheme is also that while some historical data is easily structurable as simple declarative relationships, a lot is more tricky and definitional. How would such a database handle something as seemingly simple as "when did World War II start?" WWII is not a natural category and does not have a natural start date (like a birth date). You could date it to the invasion of Poland by the Germans... or you could invade it to the invasion of China by the Japanese... or some other date. When we choose one of those dates as the "start date" we are making a value judgment about what is important and what we care about, and rendering that into "factual data" obscures the value judgment and interpretive aspects of it.

When I am taking notes on something that is heavily date-dependent, that usually means I am working with a lot of documents and I am taking notes on those. Those notes are not of a simple factual form, they are all over the place. I just do these in Word. Let me give you an example from an actual notes file of mine where I correlated many documents based on the date, because I was moving through a tight timeline:

April 25, 1945, 12:00 noon.

Meeting with Stimson, Groves, Truman. Truman given Groves’ report (of April 23) to read, talk about.

First real meeting with HST about bomb. Stimson mostly dominates, discussion mainly on foreign policy (Russian) implications. Marshall did not attend final meeting, did not want press to get suspicious. Groves slips in side door of White House. Leslie R. Groves, “Report of Meeting with the President,” (25 April 1945), CTS Roll 3, Target 7, Folder 24, "Memorandums to (Gen.) L. R. Groves Covering Two Meetings with the President (Dec. 30, 1944, and Apr. 25, 1945)."

Stimson had drafted a memo, with Bundy, went over it with Harrison, showed it to Marshall and Groves. Stimson dates meeting at 12 o’clock noon. Calls Groves’ report the “manufacturing operation.” Meeting lasted 45 minutes. Stimson Diary (25 April 1945).

[some actual paragraphs from the memos were here, for quoting or not later]

June 6, 1945, 10:15am

Stimson has meeting with HST. Reports on Interim Committee meetings (which Byrnes had already mentioned to HST). Stimson reported:

That there should be no revelation to Russia or anyone else of our work in S-1 until the first bomb had been successfully laid upon Japan. [… lots more omitted]

Along with matters relating to Potsdam, international control. Stimson, “Memorandum of conference with The President,” (6 June 1945), NSA, which is same content as Stimson Diary (6 June 1945).

June 18, 1945, 3:30pm.

Big meeting at White House about war. Just military officers (no Stimson). Bomb not discussed — talking about invasion plans, casualties, etc. Lots of disputes among principals as to how many casualties, question of Russian participation, unconditional surrender. Marshall there. Truman approves Kyushu operation. Minutes of a Meeting at the White House (18 June 1945), NSA.

...and so on for 14 pages for this particular set of documents. You can see that this does not render itself into simple factual relations, but is a mixture of notes to myself, abbreviations for both people (HST = Harry S. Truman) and sources (NSA = National Security Archive at George Washington University), with some citation info (not complete citations, just quick ones that will let me see where something is from), and all of the notes are about the specific questions and topics I am interested in (and nothing else).

So that's not really amenable to a highly-structured system. That doesn't mean you couldn't make a program that would allow one to take notes on documents, or to have lots of different notes exist in a similar environment. Such programs do exist, though none have ever been appealing-enough to me to be worth trying to change my existing workflow, because most programs are not written with my workflow and concerns in mind (sigh), and are usually not flexible-enough to be adapted to exactly what I would like them to do (double sigh).

But let's back up and suppose, just for the sake of argument, that your database would be useful to any historian. Who is going to populate it with data? Who is going to decide what facts are worthy of inclusion based on the universe of possible facts? What would the criteria even be for what kinds of facts are worth knowing? I bring this up because I think that not only are there deep epistemological issues that would need to be resolved (and frankly, make the entire project seem untenable to me), but there are just labor issues as well at the core. It's not a feasible project, and no professional historian wants to spend all day doing data entry of this boring sort, and if they had access to a labor pool to do boring tasks (like student researchers), they'd probably want them to be doing something more obviously useful than this (like, I need a student to just make an index of the books in my office, so I can look up what I already have and where it is very quickly — that would be more useful to me than a database of essentially random facts, and just as dull to compile).

"Ah," you might then think, "what if we could make an AI that could assemble this for us?" And this is basically what some kinds of "learning model" AIs do — create networks of relationships — but a) you have taken one Hard task and now made it a Very Very Hard task, because AIs can't "read" or "understand facts" without a lot of work, and b) you have not actually gotten rid of any of the epistemological issues, and introduced new ones with things like algorithmic bias, and c) you've multiplied dramatically the already-present problem of validating the accuracy of the dataset.

So I think this is the wrong way to think about historical data in general, and how historians work. The main job of the historian is not the compilation of the little names and dates you see in textbooks. It is about richer understandings of what happened in the past, and the interpretive frameworks that help us understand the meaning of them. None of which, in my mind, reduce themselves to this kind of database problem or model. Which is one reason I am not too worried about losing my job to an AI sometime in the future; it is just not that kind of job. :-)