ncse: 2008 / 2018 | Nineteenth-Century Serials Edition

James Mussell and Laurel Brake

The Nineteenth-Century Serials Edition (ncse) was launched in beta in May 2008. There were further updates over that summer, but, as the ncse release notes indicate, in beta it remained. Like a periodical whose last issue contains no indication that it was the last, for almost ten years ncse remained unfinished. And so it remains: while this new version of ncse permitted us to revise and repurpose the material, publishing it again a decade after that first launch, we consider it more like a second instalment in a series or a second edition than a final page. There is something provisional about digital resources, their persistence dependent on revision as their environment changes around them. Perhaps, then, the best image for this new ncse comes not from the world of book publication but from the press instead. Nineteenth-century periodicals would relaunch themselves in new series, insisting they were the same the publication while restarting the sequence of volumes to mark a new beginning. In going back to the source material – or rather, going back to the data that represents that source material – we are reinterpreting it again. This new edition is still ncse but, just as the first remained in beta, so too is this edition not the definitive representation. Like a new series of a journal, we once again acknowledge the contingency of limits.

ncse was produced as a collaboration between Birkbeck College, King’s College London, Oliver Software, and the British Library. It was funded by the Arts and Humanities Research Council from 2005-2007, with a small extension allowing the project to run on until 2008. The resource is an edition of six nineteenth-century periodicals and newspapers: the Monthly Repository (1806-1837); Northern Star (1837-1852); Leader (1850-1860); English Woman’s Journal (1858-1864); Tomahawk (1867-1870); Publishers’ Circular (1880-1890). The page images were derived mainly from the British Library, both at St Pancras and Colindale, but these were supplemented by material from other collections where appropriate. Most images were from microfilm but we obtained digital images from hard copy where we thought it really mattered (coloured plates, for instance). In the resource, access to the periodicals and newspapers was through Oliver Software’s Viewpoint: users could search or browse the material directly through a portal we labelled ‘Facsimiles’, but they would also see content in Viewpoint if they accessed it through our second portal, ‘Keywords’. This second portal was designed at the Centre for Computing in the Humanities at King’s College London and used metadata to provide a number of structured searches: images, subject, person, place, and institution. The image metadata was encoded by hand using the ontology developed by another resource, the Database of Mid-Victorian Illustration; the remainder, however, were produced through computational work on the (uncorrected) OCR transcripts. The subject index was created using a semantic tagger to situate articles in a hierarchy of subject classifications and named entity extraction was used to produce lists of people, places, and institutions. Together, Facsimiles and Keywords provided two complementary ways to encounter nineteenth-century print: by browsing and free-text searching, activities with which users are familiar, or using structured searches that revealed patterns in the underlying data.

Over the course of the project we talked a lot about sustainability. We made sure that we used robust data formats, followed well-established standards, and documented the project as fully as possible. We also knew that sustainability was as much about social practice as materiality, that it is what one does with things that determines how they survive. The Facsimiles component was always the most vulnerable. We learned a great deal through working with a commercial publisher: as Olive were not used to working closely with nineteenth-century material, we were forced to try and conceptualise it in terms they would understand; in turn, as the scope of the project changed, Olive’s expertise in processing material in bulk helped us to reimagine content in computational terms. However, our relationship with Olive ceased with the end of the project and, while the component itself continued to function, the server-side software on which it relied became outdated and, eventually, vulnerable to attack. As a result, the Facsimiles component was taken offline in June 2017, removing all access to the newspapers and periodicals.

When we launched ncse in beta in 2008 the Centre for Computing in the Humanities undertook to host and maintain ncse for five years; in the end the resource was functioning for nine, even outliving the Centre itself. With no response from Olive, the only solution was to replace the Facsimiles component with a new viewer and King’s Digital Lab, one of the successors to the Centre for Computing in the Humanities, duly produced and costed a plan of work. One of the weaknesses of current research-funding models is their time-limited nature. We had initially planned a full launch of ncse in autumn 2008; one of the reasons this did not happen was that the project team had by then dispersed and were working on other things. It was difficult enough to secure further funds to develop ncse in 2008 (we put in an unsuccessful bid to the AHRC in 2009); returning to it in 2018 was even more challenging. Eventually, due to the goodwill of the original institutional partners, King’s College London and Birkbeck College, we were able to find the necessary resources and work began in March 2018.

The funding allowed the team at King’s Digital Lab to explore the data produced by Olive as part of the earlier production process and, modifying a viewer produced for another project, develop a platform through which ncse content could be accessed once more. As in the old Facsimiles component, the resulting resource allows users to browse and search the six periodicals and newspapers. To browse, users click on the title in which they are interested, allowing them to see issues by year. On this page, thumbnails of available issues communicate the look of nineteenth-century print as well as alert readers to supplements or other matter included amongst the run. Two of our titles, the Leader and the Northern Star, published multiple editions, and these are clearly labelled and visible. Within the issues, the existing metadata allowed us to show the different departments in an issue, allowing users to jump to whatever interests them, as well as separate out advertisements, images, and other types of article (which we have labelled ‘text’). At page view, users can scroll around, zoom in or out, save, download and print. Users can also see a breakdown of items on the page, view the OCR transcript, and copy a model citation. The search, which is available from the landing page as well as throughout the resource via the banner at the top of the screen, allows users to search across all the content, breaking results down by publication, year, or category (image, advertisement, text).

The new platform provides a much more efficient way of accessing ncse content, but because it works differently, it necessarily offers a different conception of the edition. The 2008 ncse envisaged its content in the following hierarchy:

Edition > Title > Volume > Issue > Department > Item

‘Title’ was how we referred to each of the six newspapers and periodicals. As the actual title of any publication was likely to differ over its run, this was a nominal title that represented the publication as a whole. We retained ‘Volume’ as the next structural unit to acknowledge the material condition of the periodicals and newspapers (they were nearly always bound) as well as the bibliographic structures that they themselves provided (even the Northern Star, a newspaper, numbered itself off in volumes). ‘Issue’ named both the numbered issues and any supplements or other printed material that was collected into volumes (front matter, indexes, one-off pamphlets etc). ‘Department’ was what we called the sections into which issues were divided, and we named any textual component, whether an article spanning multiple pages or just a short quip, an ‘Item’.

This structure enabled us to map our conceptual understanding of the edition onto the data structure at all levels but one. Olive, in collaboration with the editors, segmented the issues into items. While we experimented with using formal features to try and identify departments as part of this process, this was abandoned, leaving the editors to do it by hand instead. Any items that indicated the start of a department were marked as such creating a post-hoc level of structure. These marked items also provided the solution to an additional problem. None of the OCR in ncse is corrected and, as a result, we could not use it to produce tables of contents for the issues. Instead, what we did was use the items that marked departments, displaying images of them rather than text from the transcript. This proved very effective: publications often indicated the start of a new department through formal means such as a heading in fancy type; using images of these headings allowed us, too, to demonstrate structure through the appearance of the printed item.

In the new resource, users still browse from the highest level, ‘Edition’, and then move down through the level until they get to the page, but aspects of the old hierarchy have been lost. Whereas the old ncse organised issues into their volumes, the new resource groups them by year. All the content is still there and, indeed, the volume divisions remain visible in the front and end matter displayed alongside the issues, but, nonetheless, one of the ways in which nineteenth-century periodicals organised themselves has been submerged. Equally, whereas the old ncse allowed users to click an item on a page, allowing it to be manipulated in various ways, the new ncse offers users the page instead. This has the advantage of keeping the item in context and the viewer, which is much better than that designed by Olive, allows users to zoom in and out as required, but, whereas the old ncse attempted to model the structure of both page and issue, the new resource relies on users to do so for themselves. We managed to recreate the departments in the same way as the old resource, again using images of the items to take advantage of the formal markers of structure on the printed page. Whereas in the old resource these were tucked up in the sidebar, we have now put them in the centre of the screen, allowing the issue to both be seen as a whole and speak for itself.

If we were to create ncse from scratch today it would look very different. When we began, we thought we had a relatively small corpus of interesting publications we could use to explore aspects of periodical form while, at the same time, making the publications more accessible. Our approach was broadly derived from a textual scholarship founded in print (ncse is an edition); however, as the project developed, we had to respond to the affordances of the digital. Our hand was forced by the discovery that the corpus was much larger than we had imagined, some 110 thousand pages rather than 30 thousand, which made us rethink our methodology, turning to computational approaches rather than relying on marking up individual items. The Keywords component demonstrated what was possible, named entity extraction producing useful indexes that revealed relationships we could not have discovered through free text searching or simply reading the periodicals. This was even more true of the subject classifications, which associated items in entirely unexpected ways. We thought the intellectual question at the heart of the project was how to edit journalism; we finished up thinking about data.

Our priority with this new resource was to get the six periodicals and newspapers back online. We know ncse, available to all online, continues to be widely used and we want this to continue for as long as possible. While the untidy data produced during the initial project is a document of its history, the elegance and accessibility of this new resource are the result of its particular moment. This ncse enters a very different landscape from its predecessor. In 2008 the first crop of resources from Cengage and ProQuest had just been published and there was no British Newspaper Archive, a resource that now contains almost 30 million pages. Discussions about distant reading had just begun, but there were few resources using data-driven approaches in nineteenth-century studies. Visualisations, for instance, tended to model pages rather than explore the underlying data and, even though scholars were thinking in terms of networks, there were few representations of how people and publications might be connected (Gephi was launched the same year as ncse). Digital approaches to the nineteenth-century press provide the means to see it anew and, in doing so, change the way it is understood. ncse was always an argument about the press and, in revisiting it again, we are reminded of how much the to-do list persists in changing with time. There will always be more to do; editions, it turns out, are serials after all.