What is the difference between reading the Old Bailey Proceedings in the original printed version and reading and searching them online? Besides convenience of access, this website offers the ability to use the powers of computing to search the text rapidly in a number of ways. But these powers are dependent on twenty-first century processing of the text, which represents an interposition between the original text and online users’ experiences of it, potentially distorting our understanding of the original text. The more users understand the process by which the text has been processed on this website, the better they will understand the results of their searches.
Contents of this Article
- The Limits of Keyword Searching
- Marked up Information: Defendant, Victim, Offence, Verdict, and Punishment
- Name, Location, Status/Occupation, Gender and Age
- Checking the Markup
- What do I do if I Find Errors?
- Reading Trials in or out of Context
- Revisions to this Website
Simple keyword searching depends on having an accurately transcribed text. The transcriptions used on this site, the methods for which are described in About this Project, are considerably more accurate than the vast majority of online historical text which has been generated by optical character reading (OCR), but there is a small error rate of well under 1 per cent. While users can always check the original page images when reading the text if they think the transcription is inaccurate, keyword searching will not find the small number of words which are incorrectly transcribed. For this reason, if a perfectly accurate search of the text is required, users are urged to try different spellings, or use wildcards, in order to locate not only alternative spellings in the original but also possibly mistranscribed words.
A particular source of error is the eighteenth-century long s, which was sometimes confused with f by our rekeyers. Thus, a search for the phrase foul disease, an eighteenth-century term for venereal disease, produces 75 results, but a search for the phrase soul disease produces 12 more. A user interested in keywords with the letters f and s are advised to conduct searches transposing these two letters.
Search functions which allow you to narrow your search to focus on specific types of information are dependant on that information having been previously identified, or marked up, as having that meaning. Markup was performed by the project’s data developers, by inserting XML tags around the identified text, and assigning that text to a category of information, such as name, or a type of offence. This labour intensive markup process, described in About this Project, represents the imposition of a modern layer of interpretation onto these texts, reflecting the historical understanding of project staff in the early twenty-first century, and it is likely that an edition of the Proceedings created at another time would include very different markup strategies. Users should also be aware that, while strenuous efforts were made to achieve consistency, the markup is neither perfect nor comprehensive. It is necessary to be aware of how each of category of information has been marked up, and to know to what extent the markup is comprehensive.
By definition, every criminal trial involves a criminal charge, or indictment, which states the name of the accused, the crime, the verdict, and, if the defendant was found guilty, a punishment. Most trials also have an identified victim. This information was marked up by the data developers, either manually or semi-automatically, in which case the results of automated markup were checked by the data developers.
In some cases where the text is ambiguous, this markup process inevitably involved subjective judgement. The Proceedings never constituted a formal legal record, though over time they were increasingly used as such, and early editions in particular can fail to identify legal information clearly, for example describing the crime as simply “stealing”, without specifying which legal category of theft the defendant was accused of. In order to get more specific information on such points, one sometimes needs to consult the original indictments, which are kept in the London Metropolitan Archives.
The specific categories used in this project for offences, verdicts and punishments were identified by project staff following the legal definitions in use at the time, and a comprehensive markup manual was provided to ensure consistency between data developers. Nevertheless, a degree of variation in practice cannot be ruled out, and users are urged to check the markup where precision in results is required.
The process of assigning subcategories to higher level categories (such as the offence categories Breaking [the] Peace, Damage to Property, and Deception, as can be seen in the drop down menu on the search home page) was even more subjective. To facilitate structured searching and the calculation of statistics, nine general categories of offence, four general categories of verdicts, and six general categories of punishment were created, into which relevant specific categories were inserted. Depending on one’s research interests, some of these categories might have been defined differently, for example thefts might not have been divided into the separate categories of Theft and Violent Theft, and Branding might have been included as a Corporal Punishment. Users should of course feel free to create their own categories, but that requires exporting the results of statistics searches into a spreadsheet and manipulating them, as explained in Doing Statistics.
This information was not routinely marked up and therefore searches will not be comprehensive or precise.
With the exception of defendants, victims, judges and juries, names were marked up using automated natural language programmes, as explained in About this Project, with an accuracy of roughly 80 to 90 per cent. This means that some names of witnesses and barristers were missed. Only names with both a forename and a surname were marked up. Therefore a researcher interested in finding all the mentions of the famous barrister William Garrow would not find all occurrences of his full name using a name search, nor expressions such as "Mr Garrow". For this reason, it is often best to use keyword search instead. Other types of information, including defendants and victims’ status or occupational labels, defendants' places of residence and crime locations, were only marked up when they appeared in the first paragraph of a trial; once again it is best to use a keyword search if you wish to conduct a comprehensive search for a specific occupation or place name.
As explained in age search help, ages were provided in the Proceedings in certain circumstances, and they have only been systematically marked up where the information was provided at the start or end of a trial and in numerical form.
Similarly, as explained in gender search help, genders have been provided for all names which have been marked up, but as the process was automated it is subject to a degree of error owing to unusual names where the gender is unclear.
Users can check the markup for any trial by clicking the View as XML link at the bottom of every page. This will reveal all the markup inserted in that text, and allow you to determine, for example, the original text which formed the basis of assigning a specific offence category. Whenever you get search results which appear in any way problematic, it is best to check the XML to see if this explains the result.
In the example here, Henry Pope was indicted for stealing a pair of Fire-Irons, Value 12. s. and divers other Goods from John Durnell. If you click on View as XML you can see that this text is embedded within tags which specify that the offence category had been identified as theft, and the offence subcategory as grand larceny:
If you believe the markup is wrong, see the next section.
Similarly, users of the statistics search function can check their results by clicking on the number in any cell in the table. This will allow you to see the individual trials that make up that number. By examining those cases and checking the markup, one can better understand the significance of the total number for that cell.
If you think have found an error in the transcription or markup, please submit this information using the add a correction link in the yellow box at the top of every transcribed document in the Proceedings. Subject to available time and personnel, we will endeavour to correct the error during our annual update.
We do not know a lot about how the original purchasers read this publication, but we can imagine that the Proceedings were treated like any periodical--readers browsed through each edition as it came out, picking out selected trials for in depth reading. Whichever trials they chose, they may have been drawn into reading the subsequent or preceding trials simply by their proximity to the trial of interest, while other trials may have been encountered merely through serendipity.
Modern users of the online Proceedings, on the other hand, are more likely to access specific trials, or any other part of the text, via a process of conducting a search with specific criteria. Consequently, users rarely are drawn into any other aspect of that edition of the Proceedings, and arguably they are therefore prone to rip the text from its context, the inevitable result of all keyword and structured searching. Consequently, they will fail to notice relevant additional material from that issue of the Proceedings, such as a further trial involving an accomplice, or a wave of prosecutions for a particular crime. To mitigate this effect, this website provides the option of browsing full editions of the Proceedings and Ordinary’s Accounts by date, so you can read these documents in a form similar to the way original readers read them.
Unlike a typical printed edition of a primary source, a website can go through regular revisions with little difficulty. This gives us the chance to correct errors and implement improvements to the background materials and search functions. It does mean, however, that a search of the website can occasionally produce slightly different results from the same search performed on an earlier date. This is why, as indicated in our Citation Guide, users are encouraged always to indicate the date accessed and version number (indicated in the bottom left hand corner of every page), in citations of material from this website. This provides those who follow up citations with a possible explanation if they attempt to repeat precise searches and obtain different results. You can find summaries of the changes which have been carried out on the site for each update since the initial launch in 2003 in our What’s New Archive, and the most recent changes in the What's New page.
While the original text of the Proceedings remains the same, the transcription, mark up and search facilities used to search the text can and do change, even if only marginally, providing yet another indication of how the online edition of the Proceedings is a resource mediated by twenty-first century interventions.