I understand nuclear weapons readiness and reliability is a sensitive issue. Have there been any publicly released studies of the availability and reliability of the American nuclear deterrent from a mechanical standpoint? Could the Minuteman system actually work under a surprise attack at 3:00 AM in the morning during a North Dakota blizzard?
From what I've gleaned, there has been only 1 test in the whole history of the nuclear triad where a live strategic nuclear missile was tested from launch to detonation, and that was on "May 6, 1962, a Polaris A-2 missile with a live W47 warhead was tested in the "Frigate Bird" test of Operation Dominic by the USS Ethan Allen (SSBN-608) in the central Pacific Ocean, the only American test of a live strategic nuclear missile."
Is this one test sufficient evidence for the reliability of the system?
This was a major item of concern during much of the Cold War, especially after the Limited Test Ban Treaty passed. Barry Goldwater accused the government of creating a "Maginot line of missiles" on this basis — you couldn't test the weapons, you couldn't have certainty in them.
Even the Frigate Bird test, which was supposed to end such concerns, used a modified (not strictly stockpile) warhead. As Curtis LeMay told Congress, "we have only had one test, it was not under fully operational conditions, we fired one Polaris out in the Pacific with a warhead on it. It was not truly operational. It was modified to some extent for the test."
The concern pre-dates these tests, of course. The Fat Man bomb was calculated to have a fairly high chance of failure (I don't have the number in front of me but it was something like 15%). Certain weapons systems were notoriously buggy and unlikely to respond correctly under emergency conditions. The Snark missile system, with a 4 megaton warhead, was a particularly impressive boondoggle. Air Force studies concluded that during an emergency situation, only one third of the Snarks attempted to be fired would leave the ground and of those, only one in ten would hit their targets.
Of course, LeMay and Goldwater had stakes in the answer. They wanted more nuclear testing. They wanted more money for missiles. Etc. Donald MacKenzie writes about this question directly in Inventing Accuracy: A Historical Sociology of Nuclear Missile Guidance. I can send you (or anyone else who PMs me) a chapter that was distilled from the book for The Science Studies Reader which more or less summarizes his argument. Eric Schlossers' Command and Control also talks about weapons reliability issues — more important than the missiles themselves is the communications system (for much of the Cold War, the weapons systems were only coordinated by un-hardened, civilian phone lines run by AT&T). It's a big topic and Schlosser has written an appropriately big book about it, but I enjoyed it greatly.
In addition to Inventing Accuracy, I'd recommend Hunley's Preludes to U.S. Space-Launch Vehicle Technology which covers the rocket technology powering each US ICBM family in detail, including developmental challenges. There were many reliability problems early on, with Redstone achieving a 65% test flight success rate between 1953 and 1958, and Atlas achieving a 69% success rate (158 out of 229 flights) at a cost of $14.81M per missile. [The success rates are debatable, since some flights were still considered a "success" given that their primary technical objectives were achieved prior to a catastrophic failure].
Solid rocket boosters significantly improved the launch response time, reliability, and storage longevity of operational ICBMs, starting with Minuteman I (and Polaris) in 1962. By then, US propulsion and inertial guidance technology was certainly robust enough to enable complete target destruction during a full-scale launch, given the sheer number of operational missiles and their likely number of targets.
I am not qualified to comment on the other obvious question: whether the nuclear warheads would have detonated with sufficient force and reliability to achieve their ghastly objectives.
Regarding Russian missiles, this page summarizes an interview stating an 8% failure rate for all launches, and a much lower rate for the larger missiles (SS-20). It is often said that Russian nuclear warhead yields were made larger to compensate for their relatively less accurate missile guidance systems, but the only source I have found to confirm that (so far) is this interview (on page 38) which only speaks of the missile systems of the 1950s and 60s.
Follow-up thought: during the Cold War one of the US testing protocols was to select a warhead at random from the stockpile periodically and set it off underground, just to see if it would work. I don't know what the reliability percentages were on these; I'm not sure the data is public. This of course is only a testing in isolation of the delivery mechanism, much less in isolation of what an actual emergency/wartime environment would look like. The people who thought about nuclear war always thought about these things in terms of probabilities, not certainties. This is one of the reasons for the "overkill" that became so prevalent in the US war plans during the Cold War, where they thought (in the early 1960s) that it would take three 80 kt weapons to adequately destroy Hiroshima. It's not that they weren't aware that it only took one 15 kt bomb to take the city out in 1945, it's that their way of thinking about it had changed so there needed to be a near-100% chance of total destruction, and they knew that the accuracy of the weapons and the chance of them working weren't by themselves near-100% but something less.
I've talked with current weapons analysts and asked them how much the certainty matters. I mean, if North Korea had an 80% chance to destroy Seoul, versus a 90% chance, would it change how we dealt with them? As a layperson I would say that any chance over some very low number would be enough to regard their nuclear threat as equally real (what that number is, I don't know, but 10% seems appropriately high for me when we are talking about a city of many millions), but the planner-types I have talked to have said that indeed, a difference between something like 80% and 90% would definitely change how policy was handled, which strikes me as kind of nuts.