An introduction to Foclóir Briathra Gaeilge

1. Outlines

The present online dictionary aims at documenting the differentiation of meanings and constructions Modern Irish verbs show in both spoken and written usage. It is thus an attempt to focus on the semantics vs. syntax interface of this structurally crucial word class in a systematic way, yet strictly on the basis of empirical evidence. The aim is not so much a contribution to, or revision of syntactic or semantic theory, but a theoretically informed, yet common-sense and data-based insight into the way Irish sentences are conceived and shaped, expanded or contracted. It is also a documentation of the way speakers of Irish view their physical and social environment when reflecting and talking on all kinds of states, events or possibilities.

This compilation could be labelled as a valency dictionary in the sense that all usual, if not necessary complements of verbs are recorded with each entry, along with information about the semantic characteristics of these complements. Here we differ markedly from previous dictionaries of Irish as the traditional binary distinction between ‘transitive’ and ‘intransitive’ clearly does not suffice to describe the constructional properties of a verb, nor the semantic structure of a real sentence. Furthermore, the occurrence of prepositions and certain adverbs in the environment of a verb may have structural and/or semantic significance; only in that case such components, i.e. complements, as distinct from adjuncts, are represented here. Clearly, much of this information can be gleaned from sources such as FGB, if one has the patience to comb and sort the relevant entries thoroughly. Other works concerned with Irish verbs deal with morphology only, which is not our concern here, seeing also that meanwhile digital sources have been developed to cover this side of the matter.

We attempt to classify the semantic relation between a verb and each of its complements in a given collocation in terms of an inventory of semantic roles, such as ‘agent’, ‘instrument’, etc. (see Roles & Classes). In this area we rely much on various previous theoretical stands and claims in general linguistics, yet forced by sheer evidence and critical thinking come up with a catalogue of semantic roles which is specially geared to our data. This is not necessarily due to some peculiarities of a Celtic language, but owed to strict adherence to a large amount of scrutinized original text.

2. Sources

The empirical basis to this presentation consists of two digitized collections of spoken and written Irish before 2000, comprising ca. 2.5 million words. At present there are much larger digital sources available for Irish, but at the time our work started that was not the case. There was only one corpus of transcribed spontaneous speech available, of any size, i.e. Caint Chonamara (CC); and the written usage was best represented by Corpas Náisiúnta na Gaeilge (CNG) compiled and digitized by Institiúid Teangeolaíochta Éireann, giving a fair and varied selection of written text from a variety of genera. The spoken corpus was exploited completely, the written one only in representative parts, to keep a balance between the two sources. The narrow confines of CC (Co. Galway Gaeltacht) no doubt cause a strong dialectal bias, not quite compensated by the written corpus which has no regional predilections.

This selection of sources has strong implications for the general character of Irish as shown here: the spoken corpus, apart from its regional specifities, draws us back into an almost historical form of Modern Irish, as all speakers recorded were born no later than 1924, having learnt the language from their local 19th century elders. Recorded in 1964, this sounds natural and rich even nowadays to people there. Given the dominant trends even in core Gaeltacht strongholds, we didn’t think that more speech from younger people out there should be included here, seeing that most are on their way out of this language, which some are only just mimicking in a form that has a poor chance of survival. After all, this documentation is intended to serve as a tool for linguists worldwide, with a view on the traditional characteristics of Irish rather than its current state of limbo.

The samples found in these corpora were not selected according to any preconceived ideas about the verb in question, no matter how familiar one would have felt with any of them. The analysis shown here is strictly based on the material scrutinized.

Apart from the simple division tr./itr. in Ó Dónaill’s FGB dictionary, which we find inadequate, as well as the separate listings with some preposition or other, there is much in his entries for verbs that could figure well in this presentation. But there is no guarantee as to the actual relevance of what can be found there, as the sources are usually unidentified. With due respect to Niall Ó Donaill and his team, it would take long before all of that immense lexicographical work could be merged with our data, although the recent digital version of FGB would make such a project appear more feasible. – It should be noted here that more than half of the 2730 verbs listed in FGB does not occur in either of our two corpora or are extremely rare elsewhere, and often somewhat odd for reasons not to be discussed here (e.g. seapán, pearsantaigh, díphacáil).

The total number of verbs occurring in our sources is 1114 (excluding formations in -acht, which occur only as verbal nouns although translatable as verbs into English). 723 verbs occur less than 50 times, which resulted in their exclusion from this study, for the time being, which left us with ca. 400 items to be researched in detail. Out of these ca. 200 have now been systematically evaluated and are presented here, with another ca. 100 to follow soon. Given the strong tendency among native speakers of Irish to use noun-based circumscriptions instead of compact verbs as most other European languages would have them, an inventory of ca. 300 verbs should give a fairly representative picture of this word class, and form a solid basis for learners.

3. Details of presentation

3.1 Structure of lemmata

The subdivision of each verb entry is primarily based on semantic features. Thus an overriding physical process will be the starting point of the entry, with more narrow specifications, or metaphorical shifts following. Major distinctions in syntactic arrangement are the basis for further subdivisions. Furthermore, certain frozen usages such as set phrases or discourse formulas are given in separate subentries. Basically, these specifications are stated on three levels (01-A-01), thus showing degrees of affinity; sometimes a fourth level is shown (01-A-01.1). It must be said however that, over the years, the degree of ‘magnification’ could not be kept constant at all times: some verbs are more finely subclassified than others - a matter to be remedied later.)

Furthermore, the classical distinction between homonymy and polysemy has almost been abandoned the further the work progressed. Many distinctions along 01…/02… could be seen as cases of homonymy, which would result in complete spitting of lemmata on the primary level. We did that only in very few cases, a good example being téigh 1 ‘go’ and téigh 2 ‘heat’ (for good morphological reasons too). For practical purposes however this reluctance towards deciding what constitutes ‘a different verb in identical shape’ shouldn’t make much of a difference to users.

3.2 Types of information

3.2.1 Translations and definitions

On each level of lemmatic differentiation we give a translation into English and German without commenting on the degree of appropiateness. Along with that, a more or less formal definition of the state or process in question is given, avoiding the terms used in the translation.

3.2.2 Semantic verb classes

As it appears useful to group verbs into classes denoting movement, production, or cognition, etc., the entries are marked along such classes (see Roles & Classes).

3.2.3 Syntactic categories

The boxes, arranged from left to right in each entry represent the syntactical positions and relations of the categories relevant for each entry. The old and still difficult question of obligatoriness is here shown by means of ‘grey boxes’, wherever the status of some element is not clearly definable as obligatory, grammatically at least, many contextually motivated elisions being ignored for our purposes.

3.2.4 Semantic roles

Another relevant perspective concerns the nature of the relation between the verb and its ‘satellites’. A variety of models have been presented, over some decades, to establish more suitable criteria here. We chose to use an inventory of categories particularly geared to the material in the corpus, i.e. rather an inductive systematics not much in line with any of the deductive sets discussed in general linguistics. A list of these semantic roles is given in Roles & Classes). The relevant label is marked at the top of the box representing a complement, indicating the nature of the relation between that complement and the verb.

3.2.5 Full-text material

A maximum of three samples from each of the corpora is given to illustrate and document the (sub)lemma in question. Some of the original tokens were slightly edited and often reduced to the essential part while retaining the quality of a well-formed complete sentence. Samples which show a passive structure are interpreted in the sense of an active equivalent, wherever possible.

Apart from these original data simple made-up sentences are given to project the information shown in the diagrams onto natural sentences.

3.2.6 Lexical specifications

Occasionally semantic information is added concerning possible lexical items (mostly nouns) in a given position, such as [airg] ‘financial amount’, [bia] ‘food’, etc. Similarly, a short list of lexemes found is given, though not exhaustively. Wherever the lexical selection can be deducted from the semantic role, such as agent (AGS) → [animate], nothing is specified.

In the fields P1 and P2 the specific prepositions are stated, leaving no ‘logical’ space for the actual lexical filling. However, wherever of interest, such information is given in the same field, to avoid unnecessary complexity of the layout. – A situation of particular interest arises when an adverbial element and a prepositional phrase representing the same semantic role occur along each other or alternatively, but always obligatory (Chuaigh sí isteach sa gcoill. ~ Chuaigh sí isteach. ~ Chuaigh sí sa gcoill. - but not *Chuaigh sí.). In cases like these the adjacent fields are merged into one box, headed by one unifying label for the semantic role in question (DIR in the example given.)

3.2.7 Synonyms

Lexical semantic relations are not in the focus of this verb dictionary. In fact, many cross-references of this nature can be found by checking the translations as many of them would appear under quite a number of entries. Nevertheless, wherever likely (in spite of possible doubts in theory and practice) hints at synonyms ar given in a separate field.

3.2.8 Frequency

All figures referring to quantative data refer to the main lemma as a whole. Thus there is no representation here of more special usages of a verb and their frequency. Some subentries are only sparsely documented, but structurally relevant, or otherwise interesting, even if rare in the corpus. These are accepted into this documentation. On the other hand, a mass of tokens for some particular setting (such as fishing in Conamara) distort the figures, due to the nature of that part of the corpus, and should be taken into consideration in a general evaluation of this presentation.

3.3 Minor issues

The working language during the most intensive phase of the project was German, and much of the translating, commenting and perhaps thinking was done in terms of that language. By and by English was used here and there, but at this stage it must be admitted that the definitions in particular are missing as Béarla, mostly. This can and will of course be remedied in the future, but other gaps still open have a priority: (a) data from some verbs which have been analyzed on the Basis of CC alone, (b) inclusion of another 100 less frequent verbs, for which the data are ready, closer analysis pending.

Another concern for the near future is the clarification of some tentative assignments of semantic roles. This is one of the mainstays of our work, yet the most difficult one. All the rest took much time, but this takes much thinking, reading, and discussing.

4. Looking back

As most of the relevant information required for a documentation of this nature cannot be generated automatically, the process which led to this publication was slow and often difficult. But grants from DFG (Bonn, twice) and COGG (Dublin) helped to put some skilled persons at work in Wuppertal, Bonn and Dublin, from 2006 to 2009, and later again up to the present date (Summer 2016), to sort out that sea of data carefully, making good use of current digital ressources, or designing our own, if needed. As far as the latter aspect of the project is concerned, special mention must be made of Friedel Frowein, MA (Wuppertal) who designed a verb identification programme for running text in Irish, and lately to Michal Boleslav Měchura, MPhil (Dublin and Brno) who converted the original database (encoded in FileMaker) to the Lexonomy dictionary writing system.

A printed version of this valency dictionary was first envisaged, but turned out difficult to realize given the deadline problem, as well as possible marketing difficulties. The present electronic version may disappoint some people who wanted the book, but the overall benefit is not to be neglected: whatever may come up, worth to be added to this compendium, can be seen the next day. This applies to some parts of the original corpus, but may later on include material from other sources, as reasonable proposals from individuals anywhere are welcome, but will be scrutinized and inserted in the right place, if relevant.

We hope that the idea of “never completed, thus unreliable” will not turn on warning lights here. Whatever is shown here can be taken as a fair, if only approximate representation of the respective verb in Modern Irish, even allowing for semantic distinctions and grammatical information not easiliy accessible from outher sources.

Is le cúnamh airgid ó COGG (An Chomhairle um Oideachas Gaeltachta agus Gaelscolaíochta) a cóiríodh Foclóir Briathra Gaeilge le foilsiú ar líne. The online publication of the Valency Dictionary of Irish Verbs has been supported financially by COGG (An Chomhairle um Oideachas Gaeltachta agus Gaelscolaíochta).

Toradh é Foclóir Briathra Gaeilge ar thionscadal taighde a cuireadh i gcrích in Ollscoil Wuppertal agus in Ollscoil Bonn le cúnamh airgid ó Deutsche Forschungsgemeinschaft. The Valency Dictionary of Irish Verbs is output from research conducted at the University of Wuppertal and the University of Bonn with financial support from Deutsche Forschungsgemeinschaft.

Tá Foclóir Briathra Gaeilge ar fáil faoi cheadúnas Open Database License. The Valency Dictionary of Irish Verbs is available under the Open Database License.