[A paper presented at the Workshop on Language Typology within the XIII International Conference on Historical Linguistcs, 10 - 17 August 1997, Duesseldorf, Heinrich-Heine-Universitaet]

Some Factors and Regularities

of Analytic/Synthetic Development of Language System

Anatoliy A. Polikarpov

Among various kinds of typological development in language system there are two significant interrelated cases, namely, cases of language system drift to more analytic or, on the countrary, to more synthetic structure. By the most part Principle changes during the processes are as follows. During the process of analyticity growth a language shortens its morphemic (mainly, affixal) and lexemic (mainly, grammar words) inventory while redistributing an overall volume of functions of senses coverage among the remaining morfemic and lexical units, and among the units of its enlargening phraseological inventory; elaborates furthermore more restricted syntax (fixed word order), simplifyes morphological inflectional and deniotional and word-formational subsystems, rebuilds phonology, etc. On the contrary, during the process of syntheticity growth a language enlargens its morphemic (most noticeably - affixal) and lexemic inventory, while redistributing functions of senses coverage mainly in favour of lexemes. It also shortens the whole bulk of phraseological units (and overall sense field covered by them) and elaborates more complex morphological and syntactic structures (including more flexible word order).

Each language system is noticeably involved in one or the other kind of the development from time to time in its history. Moreover, periods of analytic development might be followed by development in the back direction (towards relatively greater syntheticity of language structure) with possible return in the future to the development to the relatively more analitic structure, etc. This is the case for the known history of almost any Indo-European language, as well as for languages of Turkic, Ugro-Finnic, Semitic, Syno-Tibetan and other language families.

The most explicit picture of this kind can be seen from the history of languages belong to families which have a long and uninterrupted written tradition, beginning from some remote point of their historical development - e.g., Romance languages (beginning from Classic Latin), Indo-Arian languages (from Sanskrit), Iranian languages (from Old Persian), ets.

What are the causes and mechanisms for either direction in the development, change of direction, stop of the development at some time, different tempo of it in different languages or in the same language in different periods of its existence?

History of linguistics mainly presents examples of attempts to explain observable analytic development of languages. On the one hand, it was explained as some kind of structural "decay", "corruption", "degradation" of units and structures of languages in their modern state (as compared to a romantically understood previous "golden state" - Y.Grimm, [W. von Humboldt] and others) or, on the contrary, as manifestation of some language "progress", "clarification" of its structure during its overall analyticization (A.A.Potebnja, O.Jespersen, E.H.Sturtevant, etc.).

Little was done to explain sometimes observable language system development in the back direction - to the greater syntheticity.

There is no other way to explain both types of development besides considering the communicative basis of Human Language existence. System Linguistics is an appropriate approach of this kind [Melnikov, 1978; 1988]. Basing on its fundamentals it was possible to elaborate a more specific model of the word life cycle [Polikarpov, 1988; 1993; 1994; 1995; Polikarpov, Kurlov, 1994], including regularities of analytic/synthetic tendencies in language system development as one of the main components of it.

The main point of the model is a statement of some specific equilibrium of two kinds of language speakers' economy in any speech community:

- (1) "language", or "paradigmatic" economy, i.e. the tendency to have as low load as possible for a long-term language memory of a typical speaker;

- (2) "speech", or "syntagmatic" economy, i.e. the tendency to have as short speech messages as possible.

These two kinds of economy concern different, but interrelated, and even contradicting tendencies. Indeed, narrower a set (a "paradigm") of sign elements in a language, the greater functional load should each of its elements carry for satisfactory coverage of the societal sense fields of the same volume, proportionally less certain for hinting-quessing use in communication each of signs is, and proportionally more combinatorially complex (analytic), on the average, should be the way of denoting each of sense objects in communication by preserved signs. So, decrease of a sign "paradigm", if it occurs, should lead, (on the average), to some increase, of necessary operative combinatorial efforts of an average speaker to express the same content in a typical message, as compared to the previous state of a language. On the contrary, the amount of necessary combinatory efforts in actual communication is, (naturally), decreasing for an average speaker when a paradigmatic set of signs increases according to some causes.

So, as it can be seen, there are two kinds of contradicting optimizing requirements in communicative activity - paradigmatic and syntagmatic ones. While one of them improves its values, another worsens and vice versa. The real state of relations between the two requirements is some compromise, an equilibrium of contradicting forces established at a certain point which is integratively minimal for current conditions of the speech community existence. The crucial point for understanding the mechanism for establishing the optimizing equilibrium is an assumption that the "price" of the two kinds of economies is not fixed once and forever, but is changing according to some changes in communicative conditions. Indeed, in the creation of a new social community (e.g., political, economical, etc. integration in some society of members of another society, with another native language, as a part of it) and therefore - fast spread of a language of the dominating part of the society among speakers of another language, the price for paradigmatic economy noticeably grows. It is a result, first, of impossibility for nonnative speakers to master their new language skills immediately. Further, it leads to frequent omission of some specific categories of signs (words and morphemes) from native speakers' speech (as a result of low communicative effect for use of these language units), i.e., to narrowing of the commonly used core of language signs and constructions.

This stage of the adaptive process could be called "speech adaptation". Continuation of the situation for 2-3 generations leads, eventually, to fixing of frequently used sign units, forgetting of those which were used by native speakers only and following reconstruction of relations, redistribution of the overall set of functions among preserved sign elements, and beginning of adaptation of preserved elements to newly acquired bunches of functions (usually much more voluminous for each of the remaining signs, even in the absense of some most specific, idiosyncratic language meanings possibly completely thrown away by the new communicative practice as unaffordable now for a new "average speaker").

This stage of the adaptive process could be called "language adaptation". Direction of the adaptive process here is towards (more anlytic shape) language system's more analytic shape than it posessed before.

The process of analyticization is realized for language elements (morphemes, words, phrases and syntactic units) in some stochastically formed order. Two main selective criteria are present in the process.

First, affixes, grammar words and grammar expressions are difficult for use, due to specific grammatical idiomaticity present in their meanings. The more idiomatic the meanings, the greater is the difficulty of acquisition and operation of them for nonnative speakers and less chances for preserving them is present in a system.

Second, rare words covering some specific (or peripheral) areas of the whole semantic space of a community are difficult for common use and acquisition. The rarer a word, the greater is the difficulty for an average speaker to obtain knowledge of its meanings in typical conditions of the community.

These two factors contradict each other in the process of building a new system on the stage of speech adaptation, but they coincide in the "direction" of their action on the stage of language adaptation. This means that eventual elaboration of narrower set of basic morphemic and lexical signs being on the whole semantically (and syntactically) more universal (than signs of these classes in language in its previous life) is in harmony with elaboration of a tendency to active use of free combinations of morphemes and lexemes instead of use of usualized combinations of them for denoting specific (peripheral) sense areas. So, the elaboration and use of the relatively more restricted lexical vocabulary consisting more of root words than derived ones (and more compounds than affixed words among derived ones) is only natural for analytic language speakers.

4. As well, it is only natural in this situation for a language to elaborate a set of phraseological units on greater scale than it used to have before. Greater stock of phraseological units is a kind of compensation in a language for a newly established narrower stock of lexical units.

5. Further, parallel to presenting some theoretical assumptions and predictions I will try to present some empirical evidence for statements made. The evidence can be of two main kinds - (1) for features of different historical stages of some language developping to more analytic structure and (2) for features of several coexisting (modern) languages which arrived to their different modern typological state in the result of different development in differed historical conditions. In this paper I present the results of comparison of some significant typological features of two clearly typologically opposed languages - English and Russian. The typological status is obtained by each of them during their history not by chance, but according to some significant differences in their communicative histories [Polikarpov, 1979].

6. Comparison of lexical and phraseological systems of English and Russian showes the following.

There were investigated comparable (in terms of relative thematic homogeneity and completeness) dictionaries of these languages - "Dictionary of Modern Russian Language" in 17 vols - DMRL [1948-1965] and "The Random House Dictionary of the English Language" - RHDEL [1966]. Stock of Russian lexical units appeared to be 1.2 times more volumous than corresponding English stock (120,000 and 100,000 units). At the same time the greatest collections of Russian idioms of different kinds do not overcome 10,000 units while corresponding English collections contain up to 30,000 units of this kind.

7. There was also undertook the investigation of parallel English and Russian texts (translations in equal proportions from Russian into English and vice versa [in equal proportions]). English texts being typically longer than the corresponding Russian ones contain noticeably narrower vocabulary of different words in them.

8. Having relatively more restricted lexical inventory (during the adaptive process) an analytic language should elaborate lexical units being, on the average, more loaded functionally being more polysemous (in terms of the number of different free meanings). Results of the comparison of the data obtained from the analysis of the greatest collections of Russian and English words, DMRL and Webster's 3-d [1963] appeared to be in good accordance with expectations, showed greater average numbers for English (2.7 meanings) than for Russian (1.7 meanings) words.

9. Regular correlation between lexical systems of languages clearly typologically opposed in the considered aspect most brightly could be demonstrated by way of comparing the overall lexical polysemy distributions of them [Polikarpov, 1987]. Polysemy distribution curves of Russian, naturally, are less loaded in the area of maximum polysemy, than English ones. Correspondingly, it has comparatively greater loaded area of low polysemic words than the English distribution. It takes place independently of the size (type) of Russian and English dictionaries ("large", "medium" or "small") involved in investigation demonstrating some stable structural property of the compared languages.

10. Having on the whole a relatively more restricted lexical inventory an analytic language should also posess, more restricted inventory of lexical synonyms and antonyms. Investigation of two most comprehensive (synonymic dictionaries of Russian and English) while are close to each other in principles of their compillation [Evgen'eva, 1975; Webster's, 1973] also provided us with the evidence for this tendency. Enough to say that about 2,000 English synonymic groups comprise about 10,000 of synonymic meanings, while the corresponding set of Russian synonymic units contains about 8,000 groups comprising on the whole more than 20,000 synonymic meanings. (More detailed analysis of Russian and English lexical synonymic subsystems see in [Kolodjazhnaja, Polikarpov, 1992; Pokrovskaja, Polikarpov, 1997; Pokrovskaja, 1997]).

11. Having, on the average, relatively more polysemous and therefore referentially wider words, some relatively more analytic language should posess relatively more frequent words than any other more synthetic language. This prediction is easily confirmed by data from frequency dictionaries compiled for Russian and English using parallel texts as well as texts of equal length and of the same general content. Depending on the size of a sample Russian lexical units are used in texts 1.25 - 2.20 times less frequent than corresponding English units.

12. Relatively more frequent words of a relatively more analytic language should be proportionally shorter than words of relatively more synthetic language. Again, it is easily exampled by comparing Russian with English. Russian words are, on the average, 1.4 times longer than English ones. Taking into account the greater phonetical reduction of speech realisation of Russian phonemes it is possible to get an estimation of the degree (1.25 times) to which an average running Russian word is lengthier than an average English word in comparable, texts.

It is also important to note that the degree of relative shortening of words in a more analytic language should be approximately the same as the degree to which its sentences and texts on the whole are lengthier (in number of running words expressing some certain content) than sentences of a more synthetic language. This is case here, because English texts, on the average, are really 1.20 - 1.25 times lengthier than corresponding Russian ones (in number of running words). Here we have a clear case of some compensatory correlation between some language parameters (governed by communicative criteria of necessary and satisfactory efforts for reaching a communicative goal) providing a language of any degree of analyticity/syntheticity with approximately equal ability for effective (economical and purpose-reaching) functioning.

13. Dictionary with shorter words - in re latively more analytic languages - presupposes greater chances for coincidence of them in their "plan of expression" (pronuciation or writing of words) and for becoming homonyms (phonetical or graphical). Again, history of languages intensely developing in the direction of greater analyticity is richer on cases of this kind than history of more synthetic languages. Comparison in this aspect of modern Russian and modern English lexical systems shows that this is true. There are no more than about 500 homonymic groups in Russian while at least 2,000 of them in English (according to data excerpted from our own original "Dictionary of Russian Homonyms" being in preparation and from L.V.Malakhovski's "Dictionary of English Homonyms", Moscow, 1996).

14. Being more universal (and therefore being used, on the average, more intensely) than words in synthetic languages lexical units of an analytic one are on the average developing (and eventually wearing out and dying out) faster. So, change of lexical generations wihin the lexicon of more analytic languages should happen, and really happens much more noticeably than within the lexicon of more synthetic languages. Only this theoretical finding [Polikarpov, 1993] can be a real basis for an explanation of the unexplained yet (only noticed) fact of different rates of different languages' core vocabulary decay. Correlating of differences of this kind with differences of related languages in their analytic/synthetic typology clarifies the problem. For example, one may compare typological shapes of modern Norse and modern Icelandic, English and German, Russian and English, etc., and relate these results with empirical data on rates of their core dictionary survival in time. In all these cases we have better safety in time of the lexicon of the more synthetic language. (On some other aspects of explaining the regularities of language units survival in time see in [Polikarpov, 1993; 1994; 1995; Kapitan, 1994].

15. Disappearing of the situation of intensive language spread (as a result of growing community homogeneity, i.e., levelling individual language knowledges during the process of mutual - "plus" and "minus" - teaching of elements and patterns present in communication) inescapably leads to arising of the reverse process - of syntheticization of a language system - its global grammaticalization. Local grammaticalization (taking place in language any time) in this period of language change under circumstances of its necessity is being strengthened and converted into the global tendency. This is naturally determined by the back change in the "price" of each mentioned kinds of "economy" in favour of greater significance in new conditions of the "syntagmatic economy".

Newly acquired equilibrium of "prices", where the overall cost resulting from summing up of two types of economy is an optimizing parametre, limits continuation of changes in any direction.

16. This approach to the problem of arising of more or less analytic/synthetic language systems is supported also by evidence from the history and modern state of Chinese, Romance, other Germanic, Turkic, Mongolian, Finno-Ugrian, Semitic and other languages [Polikarpov, 1997; Breiter, Polikarpov, 1997; Kapitan, 1994]. See also some additional data for this in the abstract of my paper "On Some Natural Regularities in Word-Formational Process of Indo-European Languages" submitted for ICHL97.


BREITER M.A. [1994]. Length of Chinese Words in Relation to their other Parameters // Journal of Quantitative Linguistics. V.1, N3, 1994.

BREITER M.A., POLIKARPOV A.A. [1997]. Polysemy and Frequency of a Word in Chinese: Experimental Study of System Dependences // IV International Conference on Languages of the Far East, South-East Asia, and Western Africa, September 17-20 1997, Institute of Asian and African Countries, Moscow Lomonosov State University. M., 1997.

KAPITAN M.E. [1994]. Influence of Various System Features of Romance Words on their Survival // Journal of Quantitative Linguistics. V.1, N3, 1994.

MELNIKOV G.P. [1978]. Systemology and Cybernetic Problems in Linguistics (in Russian). M.: Sovetskoe Radio, 1978.

MELNIKOV G.P. [1988]. Systemology and Cybernetic Problems in Linguistics. L.- Sidney: Gordon and Breach, 1988.

POKROVSKAJA E. [1997]. Investigation of the DB of Russian Lexical Synonyms // Proc. of the 3d Intnl Conf. on Quant. Linguistics. Aug. 26-29 1997, Helsinki, Finland. - Helsinki, 1997.

POLIKARPOV A.A. [1979]. Elements of the Theoretical Sociolinguistics. - Moscow: Moscow Univ. Press, 1979.

POLIKARPOV A.A. [1987]. Polysemy: Quantitative-Systemic Aspects // Quantitative Linguistics and Automatic Texts Analysis. Tartu, 1987.

POLIKARPOV A.A. [1988]. K Teorii Zhiznennogo Tsikla Leksicheskikh Yedinits (Towards the Theory of Life Cycle of Lexical Units) // Applied Linguistics and Automatic Text Analysis. Papers from the All-Union Conference held 28.01-30.01.1988 at Tartu University).- Tartu: Tartu University Press, 1988.

POLIKARPOV A.A. [1993]. A Model of the Word Life Cycle // Contributions to Quantitative Linguistics / Ed. by R. Koehler, B.B. Rieger. - Dordrecht: Kluwer, 1993.

POLIKARPOV A.A. [1994]. Zakonomernosti zhiznennogo tsikla slova i evolutsija jazyka. Statja 1. Modelirovanije osnovnykh sistemnykh sootnoshenij (The Regularities of Word Life Cycle and Language Evolution. Article 1. The Modelling of the Main System Correlations) // Russkij Filologicheskij Vestnik (Russian Phylological Bulletin), N 1, 1994. - Moscow, 1994.

POLIKARPOV A.A. [1995]. Zakonomernosti zhiznennogo tsikla slova i evolutsija jazyka. Statja 2. Teorija i eksperiment (The Regularities of Word Life Cycle and Language Evolution. Article 2. Theory and Experiment) // Russkij Filologicheskij Vestnik (Russian Philological Bulletin), N 1, 1995. - Moscow, 1995.

POLIKARPOV A.A. [1997]. Lexical Sybsystem of Natural Language System: Theoretical and Experimental Aspects of its Coming-to-Be and Evolutionary Study (in Russian, a manuscript). - Moscow, 1997.

POLIKARPOV A.A., KURLOV V.Ya., [1994]. Stylistics, Semantics, Grammar: Experience of System Correlations Analysis (On the Basis of Data from the Explanatory Dictionary) // Voprosy Jazykoznanija (Journal "Linguistic Problems"). N 1, 1994 (in Russian).