Blog - Research

Named Entity Recognition for Placenames in Welsh Language Texts

Collections / Digitisation / News / Research - Posted 07-03-2023

Using Wikidata to structure Welsh placename data

The text we read when we view a web page, a blog or a journal article is full of rich and valuable information. Our brains are very good at processing and making sense of words in the context in which they are presented. We can tell when a word is a placename because we understand the sentence around it, and are expecting to see a place name. Also, we often already know the name of the place and could describe it in further detail from memory.

If computers could understand text as we do then they could be super useful in helping us find and understand information better. Technology such as Named Entity Recognition (NER), where machines are trained to recognise things like people, places and organizations by analyzing a whole text, is increasingly being used to turn plain text into a structured network of ‘things’, and this means machines can make a more complex analysis of text, much as we do.

As part of our ongoing Welsh Place Names project, which is funded by the Welsh Government, we were keen to explore how these new technologies and methodologies might be applied to Welsh language texts and to our own collections. With millions of pages of journals, newspapers and books already digitised, how might this technology help us improve our services for better research, discovery and interpretation?

Named Entity Recognition

The Dictionary of Welsh Biography was chosen for this experiment, as a (fairly) manageable corpus of about 5000 articles, packed with information about people and places. Most placenames have actually already been tagged as such in the mark-up for each page, which gives us a good benchmark for NER models to aim for, and a big corpus of place names for further analysis.

Identifying which words are placenames is the first step in this process. Those names then need to be reconciled against a database of names, which can give us access to a deeper, multilingual understanding of the place.

English language NER tools struggle to identify places in Welsh text for a number of reasons. Firstly they are not trained to understand grammatical mutations present in the Welsh language. For example, ‘Tregaron’ is the name of a town, in English and Welsh, however, if the text reads ‘yn Nhregaron’ it will not recognise the name due to the mutation (treiglo) of the first letter. Secondly, many placenames are different in Welsh (e.g. Cardiff is Caerdydd) and so models trained on English text simply won’t have the word in their vocabulary. Several English models were tested and many either didn’t recognise names, or assumed they were names of people.

We therefore experimented with ‘Cymrie’, part of the Welsh Government funded Welsh Natural Language Toolkit.

 

Extracting named entities from digital text using ‘Cymrie’

This was able to extract a number of Welsh placenames, including many with mutations. The text of 5 articles was analyzed in detail. On average the tool was able to extract approximately 67% of placenames. Of those place names identified, only 2% were not in fact places.

Some of the placenames it was unable to recognise were tagged as people or organizations, though this was at a lower rate than the English language model.

Reconciling the Data

Knowing what words are names of people or places is useful only to a point, because we still know nothing more than ‘it’s a place’. For the data to be really useful we need access to more information about each place, such as its name in other languages, its location on a map and the county, country or continent it is part of. We can then apply a unique identifier to each place and they become unique data entities.

To do this we need to take our long list of place names and attempt to reconcile them against a database which holds more information about them. In our case we are using Wikidata, which is home to one of the largest corpus of Welsh place names available. Wikidata is free for anyone to reuse and is structured as linked data.

The Dictionary of Welsh Biography contains around 80,000 instances of place names. Due to the practicalities of working with such a large dataset, I opted to work with the first 46,000 tagged places.

The tags in the Welsh Biography code often contained more than just the placename. They commonly included a Grid reference, the type of place (city, village etc) and the relation to that place being discussed in the article.

Obviously having all this information to hand makes the reconciliation process far more likely to succeed. As NER technology improves, it should be able to imply much of this information, by understanding the wider context in which the place name appears, but for now, we must accept that without this additional information, this process would have a far lower success rate.

Using Open Refine’s reconciliation tool we were able to compare our list of placenames to Wikidata. The software’s algorithm looks for similarities in spelling but also considers the likelihood of a match based on the popularity of its content. By transforming the grid references from our data into coordinates we were also able to instruct Open Refine to score matches based on their proximity. Places with matching names and proximity of less than a kilometre were mostly matched automatically. Our data on the type of place was also used to help the software make a judgement.

In order to give the reconciliation process the best chance of success some initial cleaning was done to remove mutations from the text. Much of this could be done using a series of transformations such as;

  • Nghaer – Caer
  • Nhre – Tre

Others require knowledge of the language and human input in order to avoid the corruption of other names. For example ‘Lan’ cannot be automatically changed to ‘Llan’ without corrupting other names such as ‘Lanishan’.

Other issues included the use of English language names in the Welsh text;

  • New England (Lloegr Newydd)
  • Bristol (Bryste)
  • Saint Brides (Sant y Brid)

There were also a number of placenames which had suggested matches, but had a high chance of also being the name of a property. For example;

  • Trawscoed (house, estate and community)
  • Cilgwyn (village in Powys, Gwynedd, Carmarthenshire AND a gentry house)
  • Ty-coch (area near Swansea and common house name)

short of reading each article in order to make a decision, there is currently no way to match such places with any certainty. However, such a manual process could be easily gamified as a crowd-sourcing task. Undertaking such tasks would also create training data for improving NER in the future.

 

Reconciling the data to Wikidata using OpenRefine

The result was an initial match of 25,000 names, to which a further 2000 were quickly added following a human review of high-scoring match suggestions. These matches include 2208 unique place names. Beyond this, an increasing amount of time would be required to match entries manually.

 

Matching placenames to unique identifiers allows us to examine the frequency of specific places in the text with greater accuracy

Utilizing the enriched data

Now that we have aligned our placenames to Wikidata entries for those places, we have access to a wealth of additional information. This extra information can be summarized in several categories;

  • Persistent ID – Being able to assign a unique Qid to each placename means we can treat each one as a unique entity, even if there are examples of multiple places with the same name.
  • External ID’s – Wikidata collects persistent Id’s from other institutions which hold information about the subject. This helps align and enrich data across multiple datasets.
  • Contextual information – This includes links to Wikipedia articles, openly licenced images and references to other authoritative works.
  • Structured Data – Wikidata contains a linked, structured ontology about its items, So places are linked to their administrative hierarchy and every other item in the dataset with a statement about that place.

 

This allows us to better understand the connections between people and place. In the example below a computer is able to understand that two people are connected to several common places through reference to these places in their Welsh Biography articles. The colour and thickness of the connecting strands also indicate the frequency of these references within each article.

 

 

When this approach is scaled up to the whole corpus we can see a hugely complex web of interconnections between people and places.

 

 

And since we now have access to coordinates for all our places, we can visualize these connections on a map. Below we see visualisations for an individual and for the whole collection using people’s birthplace as a starting point, connected to all other places mentioned in their articles.

 

 

Using the contextual information in place name tags we can make more granular queries, such as links between the place of birth and places of education mentioned in their articles. This highlights clear correlations to major centres of learning and further demonstrates the research potential of the data.

 

Conclusions

In conclusion, existing technology can accurately identify around 60-70% of Welsh place names in digital text. Training more advanced A.I. algorithms using larger place name vocabularies and a bigger corpus of training data may help to increase this percentage even further. Undertaking this process at scale would allow for further research and reconciliation work to take place and would also help to improve search and discovery functionality, but it does not identify unique places, only the instance of a place name.

In order to create notable benefits, the data must be reconciled against a database with data about specific places. With many duplications in place names in Wales and around the world this step is vital in creating connections to the correct places. It would seem that we don’t yet have the technology to automate this, in any language, with a high level of certainty. Several examples of pipelines being developed in order to identify entities in text and reconcile directly against Wikidata or other large datasets do exist, including a project by a colleague here at the National Library (link). However, they have faced the same kind of challenges.

Where additional supporting data already exists, like our Dictionary of Welsh Biography example it is possible to automate this to some degree but there is still a significant margin for error without human input.

Whilst accurate and complete identification of entities from a text is not yet possible, these processes offer value, as a stand alone activity or as part of a multidisciplinary approach, as a way of improving understanding of a text and improving search and discovery services for users.

Importantly, the ability to undertake this work on Welsh language texts is only possible with the continued development, adaptation and improvement of new technologies, and the availability of Open Access data sources such as Wikidata and Open Street Map as well as large corpora of Welsh language text for training machine learning algorithms.

 

Jason Evans, Open Data Manager

Tags: , , , , , , , , ,

A Welsh Placename Revival?

#LoveMaps / News / Rants and Raves / Research - Posted 25-11-2022

A Democratic Digital Infrastructure for Welsh Place-Names

Eryri (Snowdonia) National Park recently announced that they would no longer be using the English names for Yr Wyddfa (Snowdon) or Eryri (Snowdonia) as part of a wider policy of adopting and safeguarding Welsh place names across the park.

 

This move has been broadly welcomed, especially here in Wales, and campaigners will be hoping that this bold move will empower others to follow suit. There is already talk of the Welsh football team adopting the use of ‘Cymru’ in both languages after the impending World Cup.

 

Such moves not only help to safeguard the Welsh language but to celebrate it, and to encourage visitors to engage with it.

 

The decision by Eryri National Park also raises some interesting questions. Will the rest of the tourism sector follow their lead? Or will they continue to use the English name? And what about education, the media and the government? We shall see.

 

As custodians of knowledge, the National Library of Wales is naturally invested in archiving official records, but when it comes to Welsh place names, these official records don’t always reflect popular culture and practice and are certainly slow to react to changing public expectations. 

 

In terms of mapping, many official maps only use the English versions of Welsh place names. For example, despite a concerted effort in recent years, Ordnance Survey still lacks a lot of Welsh language data. In an effort to support the growing demand for Welsh language mapping and data the National Library is working on developing free and open data and mapping solutions for Welsh placenames. With funding from Welsh Government and in partnership with Mapio Cymru and Menter Iaith Môn we are engaging with crowd-sourced, community-governed data sets, Wikidata and Open Street Map, to help develop a Welsh language mapping solution. We’ve used our technical expertise to help align these two sources of Welsh place name data, and worked with Welsh Government open data and the Welsh Language Commission to increase the richness and diversity of the data. 

 

 

And these data sets allow the community to decide on the form of placenames. On Wikipedia and Wikidata names are changed or adopted by an open democratic process – already there is a lively discussion on English Wikipedia about changing the title of the article on Yr Wyddfa. But the data sets also offer flexibility, a name can have many variants, including multiple ‘official’ names, and different names can be noted for different time periods. Consumers of the data then have a choice of what data they want to present on their map. Recently both the BBC and Welsh Government have used this open data to serve Welsh language maps to the public

 

 

We have also used this rich open data to ensure that Welsh Wicipedia has basic articles about (almost) all towns and villages in Wales. We recently created over 800 of these and are working with volunteers to enrich Welsh language content about our places. One of our volunteers has created dozens of articles about historic streets and buildings in Wrexham and we are planning an editing event in partnership with the Welsh Place-Name Society to further improve Wikipedia content about Welsh placenames, their history and their meaning. If you are interested in taking part, you can find out more here.

 

 

We will also be working with Menter Iaith Môn to teach school children how to add information about their community to Wikipedia in Welsh, and to collect sound bytes of children pronouncing their local placenames. These too will be made freely available on Wiki.

 

This project allows us to do more than simply archive and give access to records. This is about engaging with the public and supporting the development of digital infrastructure for Welsh placenames. This will also enable us to think about how we present our collections in the context of place and time. Another output of our work this year will be a prototype map for viewing our collections in both English and Welsh, which we hope will be a positive step towards the development of a truly bilingual search and discovery solution, with the flexibility to adapt quickly to positive change, like the recent renaming of Yr Wyddfa (Snowdon).

 

Jason Evans

 

Open Data Manager

 

Tags: , , , , ,

Literature and History of Medicine Research Centre

Collections / Digitisation / Events / Research - Posted 07-11-2022

Aberystwyth University, in partnership with the National Library, is launching a new research centre on Friday, 11 November, the Literature and History of Medicine Research Centre. The centre will make use of the research sources in the Library’s medicine collections as a foundation for new academic research in the field. A one-day conference has been arranged for the launch on 11 November. It’s free and you can book a ticket to the event here. The conference will be held in person and online.

The Library’s medicine-related collection is extensive, and includes print material, archival material, manuscript material, architectural material, drawings and photographs. As a result of the Library’s Medicine and Health in Wales before the NHS project, the medicine-related material that is part of the Welsh and Celtic Print Collection is now available on the online catalogue in its entirety, with the items that are out of copyright also digitized and available remotely. The print collection includes a number of important research sources, including the reports of the Medical Officer of Health for the rural and urban district councils across Wales, hospital reports and psychiatric hospital reports.

The psychiatric hospital reports offer a good example of the type of information and data that is included in these print sources. If we look at the example of the annual reports of psychiatric hospitals, in this case the reports of the Joint Counties Asylum at Carmarthen (see above for the embedded digital version or click here to see it on the Library’s digital viewer), we can see the feast of core data that the reports offer to researchers. The reports contain data on a large number of aspects of the life of the hospital and its patients including statistics regarding where patients came from, their work, the nature of their illnesses, mortality rates, the patients’ diet, the patients’ ages, readmission levels, the patients’ relationship status, and the institution’s financial statistics.

Such data is fundamental to research in this field, and it is hoped that establishing the Centre in partnership with Aberystwyth University will be a means of strengthening the relationship between the Library, our collections and the research community. If you want to learn more about the partnership, or if you’re interested in the latest research in the field of literature and the history of medicine, book a ticket to the conference!

Dr Douglas Jones,

Published Collections Projects Manager.

James Albert Ukawsaw Gronniosaw, William Williams, Pantycelyn and the Depiction of Slavery

Collections / Research - Posted 20-06-2022

Amongst the many printed works associated with William Williams Pantycelyn held by the National Library is a 1779 Welsh translation of A Narrative of the Most Remarkable Particulars in the Life of James Albert Ukawsaw Gronniosaw, an African Prince, as related by Himself originally published in English under the auspices of Selina Hastings, Lady Huntingdon in 1772. Gronniosaw’s Narrative is an important work, especially in terms of the development of early black biography. It was the first autobiography by a black author published in Britain and one of the earliest known examples of a slave narrative.

At first glance, that it was Williams Pantycelyn that was responsible for the translation and publication of Berr Hanes o’r Pethau Mwyfa Hynod ym Mywyd James Albert Ukawsaw Gronniosaw makes sense. Williams Pantycelyn after all was responsible for the first printed condemnation of the slave trade in Welsh in the first volume of his Pantheologia, published in 1762. However, as a number of academics, most notably the African American literary critic Henry Louis Gates Jr., have noted there is no condemnation of slavery in Gronniosaw’s text, unlike later 19th century slave narratives. Indeed, according to Gates Jr.’s reading of the text one of the key threads throughout is Gronniosaw’s abandonment of his African heritage and his blackness as he seeks to become more European in order to gain acceptance in 18th century Anglo-American society.

 

 

Other readings argue that the text presents slavery in a generally positive, paternalistic light playing down its brutal reality. Gronniosaw’s initial enslavement is portrayed as saving him from being murdered; the horrors of the Middle Passage are absent, with only a reference to a mild bout of sea-sickness; it is through slavery that Gronniosaw is brought to a Christian country from a ‘pagan’ Africa; it is through slavery, and specifically through his final ‘dear kind master’ that Gronniosaw is converted to Calvinism. There is no explicit condemnation of the slavery as an institution, no meditation on the condition of being in bondage or on the morality of slaveholding. Indeed, the text can also be seen as making an implicit case for slavery as a path to conversion, an argument made by its editor Walter Shirley, Lady Huntingdon’s cousin, in the Narrative’s preface.

How then do we reconcile Williams Pantycelyn’s avowedly anti-slavery principles with the publication of a text which at best was ambivalent in its attitude to slavery? The same can, of course, be asked with regards to Gronniosaw, as a former slave responsible for the authorship the text. Recent academic work by the academic Ryan Hanley, focused on the religious, social and cultural milieu behind the original publication of the Narrative, may shed some light on these contradictions. As Hanley has argued the depiction of slavery in the Narrative was profoundly influenced by Gronniosaw’s relationship to evangelical Calvinism and its social networks. Hanley identifies a number of key factors that help explain the way slavery is depicted in the Narrative.

First, while the text is commonly read as a slave narrative today, on publication the Narrative’s function was primarily as a piece of devotional literature, forming part of a sharp, and by now obscure, theological debate on predestination and slavery conducted by pamphlet between the Calvinists and the John Wesley’s Arminian Methodists. The central focus of the Narrative is on Gronniosaw’s path to Calvinism, his conversion, his engagement with Calvinist circles and the comfort provided by his religious faith during his extremely challenging circumstances post-slavery. The Calvinist belief that a person’s fate in the afterlife was pre-ordained meant that their freedom in the physical world was of little importance in terms of their eventual salvation, which had significant implications for their views on slavery at this time. For proslavery Calvinists such as George Whitfield and Lady Huntingdon, as long as the gospel was being preached to their slaves they saw no obstacle to owning slaves, their spiritual wellbeing being of more importance than their physical freedom.

Second, there are issues related to Gronniosaw’s authorial agency, especially in relation to the texts’ muted depiction of slavery. A number of actors stood between Gronniosaw, the narrator, and the published text: an amanuensis, an editor and perhaps most significantly a patron, the slave owning Lady Huntingdon. An alternative reading by Jennifer Harris, however, makes the case for a higher degree of authorial agency, with Gronniosaw omitting key facts, such as his probable Islamic background in contrast to the depiction of a ‘pagan’ Africa, as a means of playing on European sympathies and prejudices.

Third, many of the people in this Calvinist social network, on whom Gronniosaw was, crucially, financially dependent upon at different times, were involved in the slave trade, including key figures such as George Whitfield and Lady Huntingdon. Indeed, Lady Huntingdon, the patron of Trefeca College, is key here with all the actors involved in the Narrative’s production, as Hanley points out, doing their upmost to please her. Williams Pantycelyn was also well acquainted with Lady Huntingdon, writing many of his English hymns at her behest and in relation to her influential role as the benefactor of Trefeca College

The religious, social and cultural environment in which Gronniosaw’s Narrative was produced provides important context in relation to its depiction of slavery. The primacy given to theological concerns and the role of Lady Huntingdon also provides similar context for Williams Pantycelyn’s role in the translation and publication of the Berr Hanes. However, questions remain in reconciling its muted depiction of slavery and Williams’ opposition to the slave trade and how these relate to the proslavery views of many in that periods Calvinist social network.

 

Dr. Douglas Jones

Printed Collections Projects Manager

 

Further reading

 

Evans, Chris – Slave Wales: the Welsh and American Slavery, 2010.

Gates Jr., Henry Louis – The Signifying Monkey, 2011.

Gronniosaw, James Albert Ukawsaw – Berr hanes o’r pethau mwyaf hynod ym mywyd James Albert Ukawsaw Groniosaw, tywysog o Affrica: fel yr adroddwyd ganddo ef ei hun, 1779.

Hanley, Ryan – ‘Calvinism, Proslavery and James Albert Ukawsaw Gronniosaw’, Slavery and Abolition 35 (2), 2015.

Hanley, Ryan – Beyond Slavery and Abolition: Black British writing, c.1770-1830, 2018.

Harris, Jennifer – ‘Seeing the Light: Re-reading James Albert Ukawsaw Gronniosaw, English Language Notes 42 (4), 2005.

James, E. Wyn – ‘Blessed Jubil!: Slavery Mission and the Millennial Dawn in the work of William Williams Pantycelyn’ in Cultures of Radicalism in Britain and Ireland, 2013.

James, E. Wyn – ‘Welsh Ballads and American Slavery’, Welsh Journal of Religious History 2, 2007.

James, E. Wyn – ‘Caethwasanaeth a’r Beirdd’, Taliesin 119, 2003.

Potkay, Adam and Sandra Burr – Black Atlantic Writers of the 18th Century: Living the New Exodus in England and the Americas, 1995.

Schlenther, Boyd Stanley – Queen of the Methodists: The Countess of Huntingdon and the Eighteenth-century Crisis of Faith and Society, 1997.

Tyson, John R. – ‘Lady Huntingdon, Religion and Race’, Methodist History 50 (1), 2011.

Welch, Edwin – Spiritual Pilgrim: A Reassessment of the Life of the Countess of Huntingdon, 1995.

Tips on how to research your family history

Reader Services / Research - Posted 26-10-2020

 

What a strange time! We are once again going into a lockdown period and the Winter season is nearing when very often many of us turn to researching our family history. Why not give it a go? Not sure where to start, read on.

5 steps to start your family history research

Step 1 –start with yourself noting any events, dates and places, working back to your parents and to previous generations as far as you can.
Step 2 – ask members of the family for their memories, make a note or record the information for future use.
Step 3 – look for evidence – certificates, photographs, newspaper cuttings etc, the attic is a good place to start.
Step 4 – organise the information you have collected so far, create a family tree on paper or electronically, there are plenty of free options online.
Step 5 – create a list of what needs further research, search the Library website to see what is available and for further help contact the Enquiries Service

If you have already done some research here are a few tips when researching further.

10 tips to move forward with your research

  1. Remember to make a note of the resources you have searched, even if nothing was found, it will save duplicate the search in the future.
  2. Read widely about the resources that are available and how to interpret the information.
  3. Remember when using parish registers they record baptisms, marriages and burials and certificates record births, marriages and deaths.
  4. When parish registers are difficult to read or parts missing, use bishop’s transcripts to fill the gaps if they have survived.
  5. Can’t find members of your family in the parish registers, look in nearby nonconformist records.
  6. When looking at the 1841 census remember that the age for those over 15 have been rounded down to the nearest 5 this helps when trying to search for a birth/baptism.
  7. By the 1911 census a lot more information is asked including – how many years married, how many children born to the marriage and how many still alive.
  8. When you come across a death it is always worth searching to see if a will was left.
  9. Newspapers are always a great source of information about people, places and events especially when they can be searched online.
  10. After searching the general resources, why not venture to search other collections such as estate, solicitors, manorial records, Great Sessions and a variety of other collections available through the Library website and catalogue.

Good luck with the searching and enjoy!

Beryl Evans, Research Services Manager

 

Wikipedia Translate-a-thon

Events / News / News and Events / Research - Posted 11-10-2019

To celebrate Libraries Week the National Library hosted a Welsh language Translate-a-thon for students at Aberystwyth University hoping to pursue a career as translators. The goal was to translate existing English Wikipedia articles about famous writers into Welsh. The event was part of a wider WiciLlên project, funded by the Welsh Government and aimed at improving online access to Welsh language information and data about literature and the Welsh bibliography.

The National Library of Wales’ National Wikimedian helps the library support and contribute to Wikipedia. The Welsh language Wikipedia has been the focus of this work since collaboration began in 2015. The Library and its main funder, the Welsh Government have recognised the importance of this hub of Welsh language knowledge in building a sustainable and thriving future for the Welsh language – Welsh Wicipedia is already the most viewed Welsh website and now has over 100,000 articles. However there is still lots of work to do in order to give access to ‘all knowledge’ in Welsh.

The Library has been working with the Professional Translation Studies course at Aberystwyth University for several years, building on the idea that using Wicipedia’s content translation tool for perfecting translation means students can actively contribute to the improvement of freely available Welsh language content whilst studying, giving real value to their assignments.

Coarse leader Mandi Morse says: “We are delighted to be able to take advantage of the Wikipedia platform while teaching the postgraduate Professional Translation Studies course. It gives our students great experiences as they develop their translation skills, giving them the opportunity to practice translating all kinds of subjects and contexts. Wikipedia is certainly extremely useful and enriches our provision”

12 students attended the event at the National Library, and 9 new articles were created. In many cases, making information about these people available in Welsh for the first time. New articles include German novelist Gerhart Hauptmann, who won the Nobel prize for literature in 1912 and English Children’s author Joan Aiken. You can find a full list of articles created are available on Wikimedia.

We hope to facilitate similar events in the future in order to support the improvement of Welsh language content online and to encourage Welsh Universities to think about how they can do the same.

Jason Evans

National Wikimedian

Tags: , , , , , , , , , , ,

The WiciLlên project

Collections / Events / News / News and Events / Research - Posted 10-09-2019

Sharing data and information about Welsh literature with the world

The National Library of Wales working in partnership with Menter Iaith Môn for a second time has secured a grant from the Welsh Government for the WiciLlên project, in order to deliver an ambitious project focused on openly sharing information about Welsh literature on the Wikimedia projects.

The project will consist of two main strands. Firstly the National Library will begin sharing a huge dataset of all books of Welsh interest ever published in Wales. This dataset contains information about nearly half a million books, their authors and publishers.

As part of the WiciLlên project the first 50,000 of those records will be enriched and shared as linked open data on Wikidata. The data will be searchable and reusable in dozens of languages, including Welsh. This will improve access to this important dataset, help improve citations on Wikipedia and provide opportunities for developers and researchers wishing to re-use the data.

The second strand of the project will focus on improving content on the Welsh Wikipedia. The National Library will deliver a Hackathon event and a series of Wikipedia editathons, whilst Menter Môn’s Wikipedian in Residence will deliver events for school children of different ages.

Nia Wyn Thomas, who heads Menter Iaith Môn said: “It’s a privilege, as always, to work with Wikimedia UK and the National Library to enrich open content in Welsh through the skilled hands of Anglesey’s children. Over the period of the collaboration, we are proud of the work that has been achieved, and the impact of the work around developing children’s digital competency through the medium of Welsh, be it their first, or second language. The influence of the work on the development of the Welsh language is also great, in a field where the language is not always seen as progressive”

The project has already started and will run until March 2020.

Jason Evans

National Wikimedian

Tags: , , , , ,

Treasured Manuscript collection gets the Wikidata Treatment

Collections / Digitisation / News / Research - Posted 11-06-2019

Converting metadata to linked open data.

The Peniarth Manuscripts form one of the most important collections held by the National Library of Wales. Its 560 manuscripts date from the 11th Century onward and contain some of the most important and iconic Welsh literary works in existence, including stories from the Mabinogion, the Book of Taliesin and the earliest copies of the ancient Laws books of Wales. In 2010 the collection was included in the UNESCO UK Memory of the World Register, further underlining its importance as a national treasure.

The collection has of course been catalogued and digitisation of the entire collection is currently underway. So now seemed like a good time to explore the potential of linked data in order to better understand and explore the makeup of the collection.

At the National Library of Wales we have now converted collection Metadata to Wikidata for a number of collections including paintings and printed material. This has lead to an enrichment of data and easy access to tools for querying and visualizing the collections. Creating Wikidata for each of the Peniarth manuscripts would result in similar advantages, but first the existing metadata would have to be cleaned and refined before being mapped to entities within Wikidata. Some mappings were easy, for example metadata tags for parchment and paper were easily matched to the relevant Wikidata entities. Dates and measurements simply needed formatting in a particular way in order to add them to Wikidata, and the QuickStatements (QS) upload tool contains detailed instructions on how to do this.

Much of the data already existed in set data fields making mappings fairly straight forward. However the metadata for many manuscripts also included a text based description of the item, which in many cases included additional information such as the names of scribes and people whose works are represented within the manuscript (authors). Extracting this data was more difficult. By filtering searches for specific sentence structures and/or certain keywords it was possible to semi automate the extraction of this data, but it also required manual checking to avoid mistakes. Once the names, works, subjects and genres were extracted they then had to be matched to Wikidata items. If these items did not yet have a Wikidata item, they were created whenever possible using data from other sources.

The ontology for describing manuscripts on Wikidata is still being tweaked, so in order to properly separate and describe both the scribe/copyist of a work and the authors of works included in a manuscript it was necessary to create a new property on Wikidata, which can now be used to describe the scribe, calligrapher or copyist of a manuscript work.

Once the data was prepared in a spreadsheet it was uploaded to Wikidata in stages using the Quickstatements tool. We also uploaded sample images of the 100 or so manuscripts which have already been digitised to Wikimedia Commons. Since the implementation of structured data on Commons any upload which links to the relevant item on Wikidata it now pulls in much of the relevant descriptive data automatically, meaning there is a lot less work involved in preparing a batch upload of images than in days gone by. Since the National Library uses IIIF technology to display its digital assets, we also included persistent id’s to our image viewer and links to IIIF manifests in our Wikidata upload.

Once the data is uploaded it can immediately be queried and explored using the Wikidata SPARQL Query Service. This tool has a suit of visualisation options, but there are a number of other useful visualisation tools which can be used in conjunction with a sparql query without the need for any coding knowledge, such as the Wikidata Visualisation suit and RAWGraphs.

In many cases it is technically possible to retrieve the same data from standard Metadata as you can from the linked data – it’s just that we don’t have the tools to easily do so. For example we could easily list manuscripts from smallest to largest, or oldest to youngest, or perhaps explore the relationship between the size of a manuscript and the date it was created.

Interestingly, this query clearly shows a trend of increasing size in the manuscripts over time and it also seems to point to a trend towards producing manuscripts of similar sizes at different periods in time.

We can also easily analyze data about the language of the works in the collection. It’s worth remembering that many works contain texts in more than one language, but we know that 43% of items contain Welsh language text whilst 33% contain English and 19% contain Latin.

See the Sparql Query

Whilst this is definitely useful, the extra information extracted from text descriptions in the metadata begins to enrich and add further value to the data, allowing us to perform new queries on the data. For example we can attempt to break down the collection by genre and main subject for the first time. This of course is only as accurate as the original data, and in some cases the variety of content within a single manuscript makes it impossible to apply an overarching content type, but in terms of research and discoverability, the data certainly provides new insight. For example, we can identify all manuscripts which contain correspondence, and then see who the main subject of those correspondence are, and because Wikidata is linked data we could then access biographical data about those people.

See the full visualization on Wikimedia Commons. See the Sparql Query

Many of the manuscripts in the Peniarth collection include copies or partial copies of other notable works, in fact some of the manuscripts are copies of other manuscripts in the same collection. Using Wikidatas ‘Exemplar of’ property it was possible to connect the manuscripts to data items for the works they contained. Again, I suspect the original metadata does not identify all the works included in the manuscripts so the results of any query will not be exhaustive but they will represent all of the current data in our catalogue.

See the full visualization on Wikimedia Commons. See the Sparql Query

We can see from the visualisation the no fewer than 22 manuscripts contain text from the codification of Welsh Law by Hywel Dda, 21 manuscripts are copies of other manuscripts in the collection and 12 are exemplars of various printed books.

Using the newly created Scribe property on Wikidata we have been able to link data for each manuscript to the data items for every scribe mentioned in the metadata. Three scribes stand out as the most prolific, with their hand writing appearing in dozens of Manuscripts. Two of the three, Robert Vaughan of Hengwrt and W.W.E Wynne of Peniarth once owned much of the collection and did much to annotate and copy the texts. The third, John Jones, was a well known collector and scribe, and is credited with copying many texts which might otherwise have been lost forever. By exploring which scribes contributed to which manuscripts we can identify connection between otherwise unconnected individuals.

See the Sparql Query

Finally, it’s important to underline the fact that Wikidata doesn’t just allow us to explore individual collections in new ways, it acts as a hub, joining collections together in an ever expanding web of cultural heritage data. We have added a lot of data for people in the Dictionary of Welsh Biography for example, and a simple query now allows us to identify all of those who contributed to the Peniarth collection.

In the same way, we can link to collections in other institutions, many of whom are also beginning to add their collections to Wikidata. Oxford University is one such institution and this means that manuscripts of Welsh interest at Jesus College like the Book of the Anchorite of Llanddewi Brefi and the Red Book of Hergest are now connected through linked data to the copies of those manuscripts in the Peniarth Collection.

Run the live query on the Wikidata Query Service

As more and more collections are added to this huge linked open network we will increasingly be able to reconcile, explore and make sense of our combined cultural heritage, which for hundreds of years has existed in closed silos. By applying new technology and Open licensing, cultural institutions can now breath new life into old data, and reach a wider audience than ever before.

Jason Evans

National Wikimedian

Tags: , , , , , , ,

Wikimedia and the culture sector – A new impact study.

Collections / Events / News / Research - Posted 16-05-2019

In December 2017 the Library’s National Wikimedian began work on a Welsh Government funded project to improve the quality of information about people related to Wales on the Welsh language Wikipedia.

The entire project was planned using Europeana’s new Impact Playbook with the aim of exploring and document the changes, or impacts, to different stakeholder groups of delivering a range of Wikimedia based activities focused around collections at The National Library of Wales.

A full impact report for the project has now been published.

The Impact Playbook works by creating clearly defined change pathways with measurable outcomes (or changes) during the planning process, insuring that a wide range of outcomes and desired impacts can be assessed and measured at the end of the project.

This is the first time a project focused on Wikipedia based activities has been assessed in this way, so this was a great opportunity to explore and document the impact of working with Wikimedia in the culture sector.

The project focused around the release of 4,862 Welsh portraits to Wikimedia Commons, with an emphasis on improving access to Welsh language content and providing opportunities for the public to engage through the medium of Welsh.

Bilingual Wikidata was created for each portrait. This data was used to help create nearly 1,500 new Welsh Wikipedia articles, utilizing 25% of the images. The images generated 1.6 million page views in 55 languages in the space of a month, greatly increasing access to information about Welsh people.

Working with Menter Iaith Môn, a series of events were held at schools highlighting how Wikipedia-based learning can contribute positively to schools’ targets for the Welsh language and digital literacy.

A ‘hackathon’ event demonstrated the value of open data to the creative industries in Wales and a number of use cases were documented.

The project demonstrates how working with Wikimedia can help cultural heritage institutions build and support new communities and achieve outcomes which align with their core values whilst increasing access to, and use of, their digital collections.

Jason Evans

National Wikimedian

Tags: , , , , , ,

Heraldic Visitation of Wales

Collections / New Accessions / News / Research - Posted 12-11-2018

During Tudor and Stuart times, heraldic visitations were tours of inspection undertaken by Heralds or their deputies to scrutinise, register and record the coats of arms of the nobility and gentry in England, Wales and Ireland. Having recently purchased a fine pedigree roll of the period, the National Library invited two modern-day heralds to visit us in October: the present Wales Herald Extraordinary, Mr Thomas Lloyd, and his predecessor, the sprightly 90-year old Dr Michael Powell Siddons.

They are seen here inspecting (and no doubt approving of) the heraldic roll, dated 3 December 1591, which was recently purchased by the Library at auction in Shrewsbury. The roll (now NLW MS 24125G) traces the pedigree of Frances Vichan (or Vaughan), heiress of Hergest Court, Herefordshire to ‘Kradog, Earle of Herefourde, Lord of Radnor and Knight of ye Round Table in King Arthur’s time’. Frances married Herbert Jeffreys of Kirham Abbey, Yorkshire, whose grandfather, Col. Herbert Jeffreys, had been Governor of Virginia.

The 2-metre long roll, which seems to be in the hand of Richard Adams, scribe and painter of Ludlow, was produced by Thomas Jones (c. 1530-1609) of Fountain Gate, Cardiganshire. Jones, the almost mythical ‘Twm Siôn Cati’, is popularly depicted in later literature as a brigand and rogue, and is sometimes described as ‘the Welsh Robin Hood’. In real life, he was a canny producer of pedigrees for the up-and-coming Welsh nobility, and had cornered the market for ornate displays of prestige and one-upmanship on parchment. Strict accuracy was not always a primary consideration, and having appealed to the vanity of his patrons, one can almost imagine this entrepreneur’s smirk as the pocketed the proceeds of his latest venture.

Thomas Jones – the ‘Del Boy’ of Tudor Wales?

Maredudd ap Huw
Curator of Manuscripts

← Older Posts

Categories

Search

Archives

About this blog

A blog about the work and collections of the National Library of Wales.

Due to the more personal nature of blogs it is the Library's policy to publish postings in the original language only. An equal number of blog posts are published in both Welsh and English, but they are not the same postings. For a translation of the blog readers may wish to try facilities such as Google Translate.

About the blog