This search facility allows the server-side API to be queried and the results refined and passed to both Zotero and Voyant Tools. It also allows you to use a more like this facility to identify linguistically similar trials.
The technical documentation for the API is here.
The API Demonstrator allows a user to generate results in much the same way as the original site does - by searching on keyword, date range, crime, verdict, gender, etc. But with the Demonstrator, these result sets can then be explored – either through modifying the query (undrilling), or through breaking down the results by any of the available sub-categories of tagged data, and by all words in each trial. Once the result set has been generated, it can then be bundled up and exported (either as full texts, or as a query string) to either Zotero or Voyant Tools.
In terms of search functionality, the most important differences between the API and the main search facility, are first, that it allows a search by trial (and hence a collection of well-defined text objects), rather than by offence, defendant, etc. as is the case on the original site; and, second, that it allows you to explore the result sets, before exporting them to Zotero (for weeding), or to Voyant Tools for further linguistic analysis.
For example, you can use the API to find all trials where the word Commonwealth appears; or select out inter-gender assaults for specific analysis; and then explore these results by Undrilling the results, or breaking them down further.
Using the API Demonstrator interface, you can build a query very much as you would on the Custom Search pages of the Old Bailey site itself; and have access to all the categories of tagged data for each trial through the pull down menus associated with the search boxes. See About this Project for a list of the categories of information tagged, and also the search help texts available through the main Search Page and the Guide to Searching.
You might, for instance, be interested in sexual crimes committed by men on women during the 1740s. The results of this particular query would look like this:
In total this query returns 41 trials (you can specify the upper limit for export at 10, 50, or 100 trials). By clicking, for example, on trial t17410828-63, the text will be displayed in the lower right hand corner of the screen, along with a variety of options for analysing the results.
These results can now be explored in detail either by choosing Undrill in relation to specific query components, or by using Break Down to identify relevant sub-sets of trials. The choices available for Undrill will comprise all the elements that composed the original query, including offence category and date range.
You can also Break Down the results either by trial text or any of the main categories of tagged data associated with all the trials in the search results (replicating the list of data in the main search form). Choosing to break down by keywords generates an ordered list of all words in the results (excluding stop words), and allows you to further Drill the results to refine them; selecting perhaps only those trials where the word father was present (15 trials in this instance).
The More Like This function appears any time there is a trial text displayed, and allows you to identify similar texts to the trial you started with. It relies on measures of word frequency and density, and works best to locate trials with similar descriptive elements. Since aspects of criminal justice – the precise nature of the crime, verdict and punishment - are frequently expressed in the same language, More Like This will also tend to locate trials for the same kind of crime or which resulted in the same punishment.
In essence, this facility uses an index of every word in the Proceedings to determine both where words appear, and how common they are. Starting with an individual trial More Like This counts all the words it contains, and ranks them from the most frequent to the least frequent (excluding stop and two character words). This generates a list of terms from the original trial ordered according to frequency. The number of appearances for each word is then multiplied by a measure of how rare each word is in the Proceedings as a whole (its Inverse Document Frequency). This allows a score to be calculated for each word in the trial. The twenty-five highest scoring words in the resulting list are then used to generate a query that locates all trials where these words can be found.
For a more detailed description of this function, see How More Like This Works in Lucene.
Once you have satisfied yourself that the results you have generated are appropriate for the topic and issue you are researching, you can then export the results, to Zotero for ‘weeding’, or as part of a library of either texts or queries; or to Voyant Tools for more detailed linguistic analysis and visualisation.
You have three options: to export your results as a Query URL, a Zip URL, or to bundle 10, 50 or 100 trials and Send to Voyant Tools. As the labels suggest, selecting a Query URL will return the current query string; the Zip URL will return a Zip file that includes all the results currently being shown; and Send to Voyant Tools will export the full text of all trial results (up to 10, 50 or 100 trials) to the Voyant Tools site.
To explore what you can do with these materials once they have been exported, see:
As a part of the Datamining With Criminal Intent project, we have also developed a new statitics facility that allows more complex graphing and visualisation of trial data and text.