Graham Charles Gordon Thomas was born in Cardiff in 1941. After school and graduating with First Class Honours he went on to study at Trinity College Dublin. Graham spent a number of years in Dublin, as an MA student under the auspices of the Welsh Language Department at University College in Cardiff. He was researching the subject of ‘Tri Thlws ar Ddeg Ynys Prydain’ (‘The Thirteen Treasures of the British Isles’), a series of items in late-medieval Welsh tradition. That meant doing a lot of research on folklore and he spent most of his time at the Irish Folklore Commission. Graham’s time in Ireland during the 1960s furnished him with many anecdotes, many involving nuns. He later studied here in Aberystwyth at the College of Librarianship before embarking on his first job at Liverpool University Library. He worked in a department where the medical journals were kept. Doctors from some Liverpool hospitals used to call him to ask him to look at the journals to see what the appropriate ‘dosages’ of some of the drugs were to give to their patients! This was very risky, as Graham’s eyesight was by his own admission not good at all at the time. Later he worked for the Board of Celtic Studies before joining the National Library of Wales as Research Assistant in 1974.
Graham’s scholarly contribution was very significant. His great work (which he had been working on since around 1974) of indexing all the Welsh prose texts in manuscripts was a huge project. The work he carried out on it was incredibly detailed. It would be fair to say that Graham knew more about the Welsh prose of the manuscripts than anyone else, ever. Fortunately his work is to be carried on by Geiriadur Prifysgol Cymru.
He retired as Assistant Archivist in 2001 having written numerous articles for academic journals on charters and other medieval manuscripts. He published his magnum opus on ‘The Charters of the Abbey of Ystrad Marchell’ in 1997. After retirement in 2001 he continued to write and in 2014 published a translation of Bewnans Ke, a play in Cornish from c.1500 about the life of St Kea.
Graham was erudite, knowledgeable and an inspiring colleague. In retirement he continued to work on his prose index and other academic projects, often to the accompaniment of Handel. I was fortunate to meet Graham in 1992 when I started work in the National Library of Wales and struck up an immediate friendship and like many others benefitted greatly from his knowledge and especially his enthusiasm for the Library’s collections.
The Coronation of King Charles III is an opportunity to see how such occasions have been marked in the past and how this is reflected in the collections of the National Library.
The Coronation is essentially a Christian ceremony, and it used to be the custom to print the sermons preached during the service. There are several examples of these in the Llandaff Cathedral collection which was purchased by the Library in 1984, including this sermon by William Talbot, Bishop of Oxford, preached at George I’s coronation in 1714.
In 1820 the Merionethshire lexicographer, grammarian, editor, antiquary and poet William Owen Pughe wrote a poem under his bardic name Idrison to mark the coronation of George IV.
Music is another important element of the ceremony, with new pieces being composed for each coronation. Our musical collections include a hymn by the Rev. W. Morgan and an anthem by Sir John Goss, both with Welsh words, published for George V’s coronation in 1911. But in Newtown that year the annual sports and musical festival had to be postponed because of the coronation festivities.
When George VI was crowned in 1937, the Rev. Arthur Morgan preached a sermon with the title “The meaning of the Coronation” in Shirenewton Church in Monmouthshire, which was subsequently published. The celebrations in Connah’s Quay were more light hearted, including football and netball matches, a firework display, and a present of chocolates for primary-school children.
In 1953 services were held in the chapels of Penygroes, Carmarthenshire, and a cymanfa ganu in the Brangwyn Hall, Swansea, to celebrate the coronation of Elizabeth II. Maesteg Council marked the occasion by publishing a special issue of its official guide.
These are just a few examples of events held throughout Wales and the United Kingdom. I wonder what publications will be added to our collections following this year’s celebrations.
The ‘Discovering Medieval Aberystwyth’ project has proved to be a great success, with talks, guided tours of the town, and activities all designed to help us understand what life in Aberystwyth was like centuries ago. On the 20th April the programme of events culminated with a talk on ‘Discovering Medieval Aberystwyth in Manuscripts’, accompanied by a pop-up exhibition of manuscripts from the Library’s collections.
The event proved very popular with excellent talks and presentations by staff from the Library and Aberystwyth University. One of our archivists Dr David Moore took us on a virtual tour of the Library’s collections of medieval manuscripts and documents relating to Aberystwyth, and Dr Rhun Emlyn and Dr Louisa Taylor, lecturers from the University’s History department gave fascinating presentations of the results of their recent research into life in medieval Aberystwyth and the documents and seals of the Gogerddan Estate Collection. The event was rounded off with a poetry reading from the work of one of Aberystwyth’s best-known medieval poets Dafydd ap Gwilym performed by Eurig Salisbury of the University’s Department of Welsh and Celtic Studies. Eurig humorously brought to life two of Dafydd’s most famous poems – ‘Merched Llanbadarn’ (‘The Girls of Llanbadarn’) and ‘Dewis Un o Bedair’ (‘Choosing One from Four’).
The accompanying exhibition featured a selection of items from the Library’s collections, all of which are described in our catalogue and are available to view either digitally or in our Reading Room. Why not take a look yourself – who knows what you might discover about medieval Aberystwyth?
Once again this May sees another Carto-Cymru – The Wales Map Symposium. This time we will be meeting face to face, for the first time since 2019. This is the seventh annual symposium and our theme this year is the work of the Ordnance Survey (OS). We will be looking at how approaches to mapping the landscape have changed over time and how historical OS maps can help us to understand our physical environment both past and present.
As usual the event is being held jointly between the National Library and the Royal Commission who are based here in the Library’s building. This year’s event is also being held in association with the Charles Close Society and ties in with their AGM which is also being held at the Library the next day.
We have a very exciting line-up of speakers this year, we will be welcoming back some old hands, but also seeing some new faces.
Our first speaker will be Keith Lilley, Professor of Historical Geography at Queen’s University, Belfast. Keith is one of our regular speakers, this will be his fourth appearance at the event and this time his topic will be ‘Excavating’ the map: Landscapes of the Early Ordnance Survey in Great Britain and Ireland.
Keith will be examining the relationship between ‘map’ and ‘field’ looking at sites of survey and survey practices that not only shaped the making of the finished map but also materially shaped those landscapes the map represents. He will then go on to look to the OS maps themselves, to reveal insights into the field-operations of those OS personnel on the ground.
Our next speaker, Dr Rob Wheeler, is honorary secretary to the Charles Close Society and he will be discussing the ‘blue & black’ OS drawings. Rob will explain how the Ordnance Survey produced new editions of its 1:2,500 scale plans by printing a version of the old edition in light blue and using this as a drawing key. Since the blue would not photograph, only the lines overdrawn or added by the draughtsman would appear on the finished map. Many of these MS drawings for England are held here at the National Library, those for Wales are held by the Royal Commission.
These maps are not simply a manuscript version of the new edition superimposed on a blue of the previous one. The blues are normally not the printed version of the previous edition, but manuscript documents associated with its survey and drawing. The source varies according to whether the previous edition was a 1st or 2nd edition. These drawings can provide topographical information additional to that on the printed maps.
Our final speaker of the morning session is Jess Baker of the Ordnance Survey who will talk to us about how the way that OS works has changed over time and provide us with a detailed view of OS’s history and highlight notable moments that have affected that change.
Jess will tell us about why certain features have been added and taken off maps over time, the rationale behind differing styles and symbologies used, and even how the artwork on map covers has evolved.
After lunch Scott Lloyd of the Royal Commission will talk to us about the Meresmen and the Parish Boundaries of Wales. He will examine the processes behind the creation of the parish boundaries on the first edition 25-inch mapping for a small number of parishes in North-east Wales.
Scott will discuss the surveyors sketch books with notes by the meresmen appointed to represent each parish, the subsequent Boundary Report books dealing with issues on the line of the boundary, the printed ‘sketch maps’ and the Journals of Inspection which record the comments of concerned landowners. All of which preceded the printed map and allow an insight into the establishment of the boundaries.
The next talk will be a tour of some of the Ordnance Survey publications held here at the National Library. In this talk I will endeavour to show some of the less well known and perhaps surprising maps produced by the OS.
Since the National Library of Wales was founded in 1907, it has acquired thousands of Ordnance Survey maps, many directly from the Ordnance Survey through Legal Deposit, but also through donation and purchase. This is especially true of those maps published prior to the Library receiving copyright status in 1911. As a result, the Library has a wide range of Ordnance Survey publications, mainly maps, but also textual works. While we tend to concentrate on maps of Wales, I hope to show that our collection of OS maps contains much more.
Our final talk sees Mike Parker, kindly taking time out from promoting his new book, taking us on a journey through nearly half a century of studying and writing about Wales and maps.
Mike’s talk will mix some of the history of Welsh cartography, with thoughts about Welsh representation in the wider map world, together with an exploration of some of its quirkier corners.
We are looking forward to a really great day and to learning lots of fascinating things about OS maps. It is really great to be meeting again face-to-face. There are tickets still available and it would be wonderful to see as many of you as possible on the day. For those that cannot make it the event is also being made available online.
Carto-Cymru 2023 will be held on 12 May with registration from 9.30. For further information and tickets please visit events.library.wales
Long distance running has long been associated as part of Welsh folklore, with the likes of Guto Nyth Bran being made famous for epic running feats. That being said, the renowned status of Welsh runners didn’t just end within those legendary tales, as in more recent times, a number of Welsh individuals have also gained recognition for their running prowess and subsequently become part of the Marathon Hall of Fame.
Welsh Marathon Legends
Tredegar born, who won several majors between 1983 and 1993. His crowning glory coming in Chicago in 1983 where he broke the world record with a time of 02:08:05.
Tanni Grey Thompson
Olympian who not only won several gold medals in 4 separate Olympics at various distances, but also won 7 London marathons between 1992 and 2002.
Winner of the Reykjavik marathon in 1996 and has also competed in several races over the world.
Not only have Welsh runners gained notice throughout the running world, but some truly epic marathon races are now part of the runner’s calendar. The jewel of the crown being the Snowdonia marathon, which traverses the wondrous scenery from between Llanberis and Beddgelert. Highlights of recent races can be re-watched at the National Library, as they were initially aired on S4C.
A small selection of other marathons from around Wales:
Great Welsh Marathon (Llanelli)
Newport Wales Marathon
The Wales Marathon (Pembrokeshire)
Man v. Horse (Llanwrtyd Wells)
Turner, J. Guto Nyth Brân: bachgen cyflyma’r byd, 2012
Grey-Thompson, T. Aim high, 2012
Norris, V. In the long run, 2012
Neal, C. The world marathon book: a celebration of the world’s most inspiring races, 2018
Pfitzinger, P. Advanced marathoning, 2009
Edwards. A, ‘Wedi rhedeg y ras i’r pen’, Cristion, Rhif. 178 (Mai / Mehefin 2013), p. 4-5
Dafydd, L. Rhedeg ras galetaf Ewrop . . . y merched a marathon Eryri, Golwg, Cyf. 22, rhif 8 (22 Hydref 2009), p. 34-35
Jones, R. A. 4:46:36 – marathon yn y meddwl, Barn, 544 (Mai 2008), p. 26-27
Here is a selection of some of the many thousands of books that reach the National Library through Legal Deposit every year. As one of the Legal Deposit Libraries for the UK and Ireland we receive a copy of nearly every book and periodical published in England, Scotland and Ireland – as well as Wales. It’s difficult sometimes for people to grasp the extent of our Legal Deposit collection. So if you’re interested in the Amazon rain forest or the mysteries of the human mind, want to see the latest issue of Poetry Wales or Four Four Two, want to understand how a fusion reactor is likely to work, or just pore over the novels of your favourite author, pop into our Reading Room with your reader’s ticket. We have almost everything for ever.
On the 6th of April this year we will be celebrating Non Print or Electronic Legal Deposit along with the five other legal deposit libraries. Ten years ago the National Library of Wales, the British Library, the National Library of Scotland, Trinity College, Dublin, the Bodleian Library, Oxford and Cambridge University Library were granted the right to receive electronic publications in addition to those in print. That doesn’t only mean a change in format from a book in your hand to text on a screen. The switch to electronic publications has also increased the amount of knowledge we have to offer you.
Everyone is now familiar with e-books and e-periodicals and many hundreds of thousands of these are now available in the Reading Room of the National Library but not everyone realises that the world of ‘publishing’ includes all the websites of UK webspace. Imagine how much of this material disappears every year as web pages are updated. The role of UKWA the UK Web Archive is to ensure that the content of BBC Wales, Diverse Cymru or the Football Association of Wales web pages as well as many thousands of other websites, big and small, from Wales and the rest of the UK, are kept safe for the future.
April 6th, 2023 marks 70 years since the death of Welsh poet Idris Davies in 1953. He became a poet of great knowledge and skill, and his work is known for its unflinching honesty of a period of hardship and change in Rhymney, South Wales in the 1920s and between two World Wars. Though this is the theme that he is mostly known for, Davies also gained repute in discussing war, politics, and faith in his works, with Dylan Thomas and T.S. Eliot among his admirers.
Born on 6th January 1905 in Rhymney, Davies was the son of a Welsh-speaking colliery winderman, Evan, and his wife Elizabeth Ann. Like countless other young boys at this time, Davies left school at 14 to work as a miner in the Abertysswg and Rhymney Mardy Pits, like his father. The claustrophobic and stifling conditions of the pits are brought to life in Davies’ notes and memoirs, including describing how he lost part of his middle finger in an accident in an almost detached, matter-of-fact manner.
The danger the men faced every day down in the mines was something that Davies began to get grimly accustomed to, and yet more hardship was to come. The General Strike of 1926 and a long period of unemployment for Davies and his fellow miners fuelled his deep-rooted anger due to the grossly unfair and iron-fisted treatment of the miners, and is felt throughout his poetry. It was this anger and contempt that was to become one of Davies’ main qualities in his work.
Yet throughout this unflinching poetic representation of South Wales life in the early 20th century also came through a stoic sense of pride in ‘his’ Rhymney, and indeed of Wales as his homeland. He used poems such as Gwalia Deserta (NLW MS 22399C) as an ode to the gwerin to not only highlight the betrayals suffered by the people of South Wales but also their hopes, with great effect.
Idris Davies’ work continues to stand as a testament to yet another of Wales’s great poets, although it is fair to say he has not always received the praise he deserved, especially during his lifetime. The National Library of Wales is fortunate to hold many of his notes, diaries, and works; his notes and diaries (NLW MS 22402B, NLW MS 22414C) give a special insight into the mind of Davies from a different perspective and portray a deeply thoughtful and sometimes sensitive approach to life. His years and suffering as a miner shaped his love and appreciation for simplicity itself, such as fresh air and sunlight. This is felt most keenly in his gentle reflection on death in Request:
Between 1961 and 1963, Rhodri Morgan, who would later become the First Minister of Wales, studied at Harvard University in the United States of America. While he was there he wrote a series of 88 letters to his family in Swansea describing his experiences, the news, and his views on the important issues of the day. The letters show Rhodri Morgan developing his political thinking and the experiences that influenced him.
Letters from Rhodri Morgan, 1961-1963 (Papurau Prys Morgan: File 25)
Letters from Rhodri Morgan, 1961-1963 (Papurau Prys Morgan: File 26)
Letters from Rhodri Morgan, 1961-1963 (Papurau Prys Morgan: File 27)
Typescript of ‘The dedication of a prince’ by Lord Ogmore giving an account of the investiture of Prince Charles as the Prince of Wales at Caernarfon Castle in 1969. Lord Ogmore was a member of the organising committee for the investiture and the typescript contains 12 chapters reporting on the different aspects of the event including the work of the organising committee; preparations and exercises, opposition, description of the day and subsequent events.
Over 800 items from the print collection have been made available on the main catalogue, including works such as Libri Walliae (Vol. I, Vol. II and Supplement) and a selection of early printed items purchased by or recently donated to the Library, for example:
Adelina Patti (1843-1919) was one of the greatest nineteenth-century opera stars. This elaborately decorated album is a piece of classic Victoriana, contains a collection of photographs documenting the career of Adelina Patti by leading photographers of the day in Paris and London, showing the soprano in various operas at the height of her fame during the 1860s.
Part tragic romance, part tale of a young women’s triumph in the face of duplicity and adversity, and part morality tale warning against the dangers of elopement, the biography is most probably a work of fiction. Indeed, the consensus today is that Mary Charlton was almost certainly not its author. Now largely forgotten, Mary Charlton was a well-known author in her day, well-known enough for a publisher and anonymous author to publish a fake biography in order to profit from her name.
Mary Charlton, was a novelist, poet and translator who published 12 works with the Minerva Press between 1794 and 1813. Charlton also featured on the Minerva Press’ 1798 list of notable authors, a sign of the popularity of her novels with the general public. While the fake biography published in 1817 places her origins in the Abergavenny area, in reality very little is known about Charlton’s life, although her novel Rosella (1799) which includes an extended tour of Wales may indicate Welsh origins. However, it is just as likely that this Welsh setting can be attributed to the Celtic revival of the period.
The Minerva Press was a popular late 18th century/early 19th century publishing house, established in 1790 by William Lane. A by-word for cheap, popular fiction the Minerva Press specialised in the gothic novel, making great use of the circulating library in disseminating its works to the general public. The gothic novels published by Minerva Press also gave it a less than reputable reputation, most famously as the publisher of a number of the ‘horrid novels’ referenced in Jane Austen’s Northanger Abbey.
While Mary Charlton’s Welsh connections are unverifiable, two other Minerva Press authors had definite Welsh connections. The first was Anna Maria Bennett (1750?-1808), best known for her novel The Beggar Girl (1797), a work Samuel Coleridge was particularly appreciative of. Born around 1750 at Merthyr Tydfil, Bennett published five novels with Minerva Press, between 1785 and 1806, two of which, Anna: or Memoirs of a Welch Heiress (1785) and Ellen, Countess of Castle Howel (1794), had Welsh settings.
A second Minerva author with Welsh connections was Ann Hatton (1764-1838), better known as Ann of Swansea, author of Cambrian Pictures (1810). Born in Worcester to the Kimble acting family, Ann had to follow a different profession due to a disability. Ann lived an interesting and sometimes turbulent life, which involved a bigamous marriage, an attempted suicide in front of Westminster Abbey, and modelling and lecturing at Dr James Graham’s notorious Temple of Health and Hymen in Pall Mall. Experiencing periods of poverty, Ann was eventually provided a £90 a year stipend from her more famous siblings, the actors Sarah Siddons and John Phillip Kimble, on the condition that she live no nearer than 150 miles from London. This was partly due to her sister’s annoyance at Ann’s tendency to use her sister’s name in appeals for financial aid and to keep her sister’s name out of the London newspapers. Ann remarried and after a period in the United States, where she mixed in radical political circles, she returned to the UK, settling in Swansea in 1799. Her adoption of the moniker ‘Ann of Swansea’ for her written work attests to her identification with her new home.
In their day these three women writers were bestselling authors. Sitting outside the literary canon, the popular novels published by these women authors, and by Minerva Press in general, nevertheless provide us with a reflection of the popular tastes of their day. These were the works that the reading public lapped up in droves. The same can be said of the popular novels, penny dreadfuls, and other forms of cheap popular literature published throughout the nineteenth century.
Dr. Douglas Jones
Printed Collections Projects Manager
Aaron, Jane – ‘The Rise and Fall of the ‘Noble Savage’ in Ann of Swansea’s Welsh Fictions’ in Romantic Textualities: Literature and Print Culture 1780-1840, 22, 2017, pp.78-88.
Blakey, Dorothy – The Minerva Press 1790-1820, London, 1934.
‘Charlton, Mary’ in Janet Todd (Ed.) – A Dictionary of British and American Women Writers 1660-1800, London, 1987, p. 83.
‘Charlton, Mary’ in Virginia Blain, Patricia Clements and Isobel Grundy – The Feminist Companion to Literature in English, London, 1990, pp 197-198.
Henderson, Jim – ‘Ann of Swansea: A Life on the Edge’ in The National Library of Wales Journal, XXXIV (1), 2006.
The Life, Adventures and Vicissitudes, of Mary Charlton, the Welsh Orphan, Written by Herself and Dedicated to Her Own Sex, Whom She Hopes Will Honor Her Little Narrative, with a Candid Perusal, Rochester, 1817.
Rhydderch, Francesca – ‘Dual Nationality, Divided Identity: Ambivalent Narratives of Britishness in the Welsh Novels of Anna Maria Bennett’ in Welsh Writing in English, 3, 1997, pp. 1-17.
The text we read when we view a web page, a blog or a journal article is full of rich and valuable information. Our brains are very good at processing and making sense of words in the context in which they are presented. We can tell when a word is a placename because we understand the sentence around it, and are expecting to see a place name. Also, we often already know the name of the place and could describe it in further detail from memory.
If computers could understand text as we do then they could be super useful in helping us find and understand information better. Technology such as Named Entity Recognition (NER), where machines are trained to recognise things like people, places and organizations by analyzing a whole text, is increasingly being used to turn plain text into a structured network of ‘things’, and this means machines can make a more complex analysis of text, much as we do.
As part of our ongoing Welsh Place Names project, which is funded by the Welsh Government, we were keen to explore how these new technologies and methodologies might be applied to Welsh language texts and to our own collections. With millions of pages of journals, newspapers and books already digitised, how might this technology help us improve our services for better research, discovery and interpretation?
Named Entity Recognition
The Dictionary of Welsh Biography was chosen for this experiment, as a (fairly) manageable corpus of about 5000 articles, packed with information about people and places. Most placenames have actually already been tagged as such in the mark-up for each page, which gives us a good benchmark for NER models to aim for, and a big corpus of place names for further analysis.
Identifying which words are placenames is the first step in this process. Those names then need to be reconciled against a database of names, which can give us access to a deeper, multilingual understanding of the place.
English language NER tools struggle to identify places in Welsh text for a number of reasons. Firstly they are not trained to understand grammatical mutations present in the Welsh language. For example, ‘Tregaron’ is the name of a town, in English and Welsh, however, if the text reads ‘yn Nhregaron’ it will not recognise the name due to the mutation (treiglo) of the first letter. Secondly, many placenames are different in Welsh (e.g. Cardiff is Caerdydd) and so models trained on English text simply won’t have the word in their vocabulary. Several English models were tested and many either didn’t recognise names, or assumed they were names of people.
Extracting named entities from digital text using ‘Cymrie’
This was able to extract a number of Welsh placenames, including many with mutations. The text of 5 articles was analyzed in detail. On average the tool was able to extract approximately 67% of placenames. Of those place names identified, only 2% were not in fact places.
Some of the placenames it was unable to recognise were tagged as people or organizations, though this was at a lower rate than the English language model.
Reconciling the Data
Knowing what words are names of people or places is useful only to a point, because we still know nothing more than ‘it’s a place’. For the data to be really useful we need access to more information about each place, such as its name in other languages, its location on a map and the county, country or continent it is part of. We can then apply a unique identifier to each place and they become unique data entities.
To do this we need to take our long list of place names and attempt to reconcile them against a database which holds more information about them. In our case we are using Wikidata, which is home to one of the largest corpus of Welsh place names available. Wikidata is free for anyone to reuse and is structured as linked data.
The Dictionary of Welsh Biography contains around 80,000 instances of place names. Due to the practicalities of working with such a large dataset, I opted to work with the first 46,000 tagged places.
The tags in the Welsh Biography code often contained more than just the placename. They commonly included a Grid reference, the type of place (city, village etc) and the relation to that place being discussed in the article.
Obviously having all this information to hand makes the reconciliation process far more likely to succeed. As NER technology improves, it should be able to imply much of this information, by understanding the wider context in which the place name appears, but for now, we must accept that without this additional information, this process would have a far lower success rate.
Using Open Refine’s reconciliation tool we were able to compare our list of placenames to Wikidata. The software’s algorithm looks for similarities in spelling but also considers the likelihood of a match based on the popularity of its content. By transforming the grid references from our data into coordinates we were also able to instruct Open Refine to score matches based on their proximity. Places with matching names and proximity of less than a kilometre were mostly matched automatically. Our data on the type of place was also used to help the software make a judgement.
In order to give the reconciliation process the best chance of success some initial cleaning was done to remove mutations from the text. Much of this could be done using a series of transformations such as;
Nghaer – Caer
Nhre – Tre
Others require knowledge of the language and human input in order to avoid the corruption of other names. For example ‘Lan’ cannot be automatically changed to ‘Llan’ without corrupting other names such as ‘Lanishan’.
Other issues included the use of English language names in the Welsh text;
New England (Lloegr Newydd)
Saint Brides (Sant y Brid)
There were also a number of placenames which had suggested matches, but had a high chance of also being the name of a property. For example;
Trawscoed (house, estate and community)
Cilgwyn (village in Powys, Gwynedd, Carmarthenshire AND a gentry house)
Ty-coch (area near Swansea and common house name)
short of reading each article in order to make a decision, there is currently no way to match such places with any certainty. However, such a manual process could be easily gamified as a crowd-sourcing task. Undertaking such tasks would also create training data for improving NER in the future.
Reconciling the data to Wikidata using OpenRefine
The result was an initial match of 25,000 names, to which a further 2000 were quickly added following a human review of high-scoring match suggestions. These matches include 2208 unique place names. Beyond this, an increasing amount of time would be required to match entries manually.
Matching placenames to unique identifiers allows us to examine the frequency of specific places in the text with greater accuracy
Utilizing the enriched data
Now that we have aligned our placenames to Wikidata entries for those places, we have access to a wealth of additional information. This extra information can be summarized in several categories;
Persistent ID – Being able to assign a unique Qid to each placename means we can treat each one as a unique entity, even if there are examples of multiple places with the same name.
External ID’s – Wikidata collects persistent Id’s from other institutions which hold information about the subject. This helps align and enrich data across multiple datasets.
Contextual information – This includes links to Wikipedia articles, openly licenced images and references to other authoritative works.
Structured Data – Wikidata contains a linked, structured ontology about its items, So places are linked to their administrative hierarchy and every other item in the dataset with a statement about that place.
This allows us to better understand the connections between people and place. In the example below a computer is able to understand that two people are connected to several common places through reference to these places in their Welsh Biography articles. The colour and thickness of the connecting strands also indicate the frequency of these references within each article.
When this approach is scaled up to the whole corpus we can see a hugely complex web of interconnections between people and places.
And since we now have access to coordinates for all our places, we can visualize these connections on a map. Below we see visualisations for an individual and for the whole collection using people’s birthplace as a starting point, connected to all other places mentioned in their articles.
Using the contextual information in place name tags we can make more granular queries, such as links between the place of birth and places of education mentioned in their articles. This highlights clear correlations to major centres of learning and further demonstrates the research potential of the data.
In conclusion, existing technology can accurately identify around 60-70% of Welsh place names in digital text. Training more advanced A.I. algorithms using larger place name vocabularies and a bigger corpus of training data may help to increase this percentage even further. Undertaking this process at scale would allow for further research and reconciliation work to take place and would also help to improve search and discovery functionality, but it does not identify unique places, only the instance of a place name.
In order to create notable benefits, the data must be reconciled against a database with data about specific places. With many duplications in place names in Wales and around the world this step is vital in creating connections to the correct places. It would seem that we don’t yet have the technology to automate this, in any language, with a high level of certainty. Several examples of pipelines being developed in order to identify entities in text and reconcile directly against Wikidata or other large datasets do exist, including a project by a colleague here at the National Library (link). However, they have faced the same kind of challenges.
Where additional supporting data already exists, like our Dictionary of Welsh Biography example it is possible to automate this to some degree but there is still a significant margin for error without human input.
Whilst accurate and complete identification of entities from a text is not yet possible, these processes offer value, as a stand alone activity or as part of a multidisciplinary approach, as a way of improving understanding of a text and improving search and discovery services for users.
Importantly, the ability to undertake this work on Welsh language texts is only possible with the continued development, adaptation and improvement of new technologies, and the availability of Open Access data sources such as Wikidata and Open Street Map as well as large corpora of Welsh language text for training machine learning algorithms.
A blog about the work and collections of the National Library of Wales.
Due to the more personal nature of blogs it is the Library's policy to publish postings in the original language only. An equal number of blog posts are published in both Welsh and English, but they are not the same postings. For a translation of the blog readers may wish to try facilities such as Google Translate.