last substantial update 22 June 2009; last change 23 May 2014
The aims of the project I describe on this page are to produce a mapping from the phonology of American Sign Language to a spoken phonology, which
I've followed another principle in the main, that
Principle 4 eliminates the possibility of selectively choosing to mark or ignore certain features when transcoding an ASL word, according to whether they're necessary to retain contrasts with another word. But one might want to allow this selectivity; only requiring spoken coding of what's necessary to reconstruct a word would yield a leaner and in all expectations more natural spoken language, albeit one with less fidelity. Some thoughts in this direction constitute my last section.
This project has much in common with producing a transcription system for ASL. In transcription systems, there is plenty of prior art: a few examples for ASL are Stokoe notation and Thomas Stone's ASL Sign Jotting, and with the larger bailiwick of signed languages in general, HamNoSys, SignWriting, and David Peterson's Sign Language IPA. Now, a system meeting my goals, when written out in IPA, would yield a sort of transcription scheme for ASL, but a poor one. The choices of symbols would seem unmotivated, phonetics being a poor motivation by any external standard; and I have made many concessions to obtaining a half-sensible spoken phonology which would be out of place in a devoted transcription system. Ultimately I do want something speakable.
By "speakable" I don't necessarily mean 'comfortable for English speakers' — even as English speakers would be the most likely adopters of any language based on ASL; that's too practical a concern for me ;-). I'm simply aiming for a language within the natlang ambit of possibility. (That said, if you really wanted something convenient for English speakers it shouldn't be too hard to make a couple of phoneme substitutions to get closer to the mark: perhaps throw in some epenthetic vowels, chuck the uvulars and other awkward consonants, maybe bring in voicing to replace them... but that I leave to you.)
I don't have any command of ASL myself. I'm very grateful to Sai (or should I call him /katʃ/?) for answering my ASL questions and informing me of his speaker's intuition on several points. Thanks also to David J Peterson for feedback on an earlier incarnation of this project, and those who commented on the CONLANG list.
Send your comments, questions, etc. to me, Alex Fink. I use gmail with the username 000024.
David Perlmutter, in Sonority and syllable structure in American Sign Language, Linguistic Inquiry 23 no. 3 (1992), 407–442, observed several structural and distributional constraints on ASL signs as composed of places and motions and handshapes. The SLIPA page has a summary of these constraints. In light of them, Perlmutter drew the analogy that places and motions and handshapes are to signed languages what consonants and vowels and tones are to spoken languages, respectively. These equivalences would thus seem to be the natural equivalences to impose in my scheme as well. I've deviated from them, though, for the following reasons.
Non-manual elements aren't treated in Perlmutter's analysis, but I'll need to handle them. Now they seem to fit the bill for suprasegmentality even better than handshapes do, given that they're not even performed with the same apparatus as the unarguable segments, and (in ASL) they bind to groups of words rather than of phones. The marking of phenomena like topics and WH-question, which ASL achieves with the eyebrows, are in many spoken languages the stronghold of prosody. So I've decided to allocate to non-manual markers prosody, for the information structure-type markings, and suprasegmentals including tone for the miscellaneous ones. (Not word stress, though; I use this for special purposes in two-handed signs.)
There's little reason to map ASL supersegmentals, such as duration, to other than spoken supersegmentals. I'd like it best if, indeed, duration actually mapped to duration, and so forth.
ASL has really quite a lot of handshapes — 18 by Stokoe's relatively lumping analysis, and there are 44 in my handshape table most of which are contrastive — which is an absurd number of tones for a natural language. And I figure that I shouldn't try to stretch tone beyond non-manual phenomena. So I've got to co-opt something else, and vowels seem better than consonants, given that vowels are more likely to behave suprasegmentally than consonants (witness vowel harmony).
If the two endpoints of a motion are given, it seems that in practice there's rather little entropy left in it. Thus if the endpoints of an unmarked motion are nailed down in our spoken rendering by other means, it seems quite adequate to represent the motion by nothing. Indeed, to do otherwise would induce an unnaturalistic dependence in the spoken form between the segments representing the motion and its endpoints. I've chosen to throw arcedness vs. straightness of a motion in this group of mostly redundant features as well, with the idea that in canonical cases the hand orientations at motion start and end determine it.
The endpoints of a motion need not be places in Perlmutter's sense, though; they may be more or less precise points at any old place in the signing space. In these cases we will have to encode the direction and/or length of the motion somehow. And there may well be cases where, even if the endpoints happen to be places, it is a better analysis to specify one relative to the other than both independently. For uniformity with the case where endpoints are at real places (which are consonants: see the next bullet), we'll use consonants to encode these motions, which can be thought of as relative places. And this will turn out to be the most common case.
(Sai tells me this analysis of motions and endpoints is not a new one in sign language studies, and that it has a name, but he doesn't know it and I haven't found it.)
Of course there can be contrastive modifications to motions: one might move the hand along a tighter or a wider arc, unadorned or with some coarticulation like tumbling, etc. These will be marked with an extra set of segments, consonants as well.
Places being consonants is fine.
I order these elements so that a typical (Place-)Motion(-Place) word has a maximal structure
[place or motion endpoint 1] [orientation 1] [handshape 1] [motion modifiers] [place or motion endpoint 2] [orientation 2] [handshape 2].The vowels are italicised. I allow many of the consonantal components to render one of their highly unmarked values (or their absence) as zero. But in order to allow places unchanged from the previous place to be rendered as zero, for parsimony in situations where only handshape or orientation or such have changed, I avoid rendering any single place as zero (though I will still exploit /ʔ/).
For vowels having a zero option also seems awkward, and so I won't. Fortunately every sign has at least one handshape, so that every word ends up with at least one vowel. If there's no change of handshape, the word may have just the one vowel. I allow the vowel corresponding to a handshape to be repeated in subsequent handshape slots if it's unchanged, or not. Sometimes repetition may be necessary, for instance if there is a long succession of places and motions with the same handshape; consonant clusters cannot be allowed to form since these have special meanings.
In Perlmutter's scheme a word that consists of only a Place, with no motions, is like a consonant-only word; he justifies this with the observation that such necessarily have some secondary articulation, such as tapping. In my scheme such words will still contain a vowelled syllable:
[place] [orientation] [handshape] [secondary articulation].Tapping seems pretty unmarked to me, and I represent it as zero. Other secondary articulations, like wiggling, will be rendered as consonants.
I haven't actually given non-manual elements any more thought yet than is recorded above.
It would give me somewhere to start if I knew enough prosodic phonology to say which particular prosodies are common in natlangs to mark topics, WH-questions, etc. But I don't.
Among the handshapes of ASL, seven are sufficiently unmarked that they can occur as the handshape of the resting non-dominant hand, namely 5, B, 1, A, S, O, and C (Baker-Shenk & Cokely, American sign language: a teacher's resource text on grammar and culture, p. 82). On the other end of the markedness scale, my impression is that D, M, R, T occur only as letters, i.e. in (possibly nativised) fingerspellings and initialised signs, 7 only in numeric signs involving that digit, and W=6 only in one or the other, so they're among the most marked handshapes.
I've analysed handshapes as having the following dimensions of variation.
I regard a finger as raised if it's in any position other than folded into the palm. Curved is as in regular O, bent is as in flat-O, clawed is as in X.
In most handshapes there's only one finger disposition, but D, the large digits including open-8, and F and K can be thought of as having two. Among these, I'll reinterpret F differently below. The 'extended but middle finger bent' option is there to handle K and open-8. I still haven't done anything about D 6 7.
Most of these sets of fingers can be seen as a contiguous block of fingers from the index out, plus a possible extra pinky. o000 is parenthesised since it's peculiar to the marked handshape M.
Crossing is peculiar to R.
It seems not indefensible to me that F might be analysed as having curved spread fingers, which are realised with less and less curvature as you go away from the thumb. Sai thinks this is bad phonemics, though.
Observe that thumb out in front or forming a loop only happens if the fingers are bent or curved. The thumb tucked behind the index is peculiar to T.
How to account for the difference between S and A? It seems that S is the less marked of the two, and so probably should be the unextended one, but it's not clear what that makes A, so I've just given it its own category above. In the end we'll evade the question.
So the task is to map these features onto vowels.
The set of fingers raised seems highly salient, and if we leave out the state of the pinky is a scale with four or five positions. The best feature in vowel space with this many positions is height. I pair up fingersets and heights as follows: low for ?ooo, mid-low for ?oo0, mid-high for oo00, high for 0000 (well, ?000, to include M). I've chosen this order over the reverse since several of the other handshape features are moot if no fingers are raised, and most languages have fewer low vowels than non-low.
The next most salient handshape feature is perhaps finger disposition. I limit myself to a single feature collapsing frontness and rounding, in order not to end up with too unnaturally stuffed a vowel inventory. The distinction between curved and bent fingers seems to bear a low contrastive load, so for now I collapse the two of them (perhaps we can mark the distinction with an extra consonant or some such). They'll both be back rounded, as against straight fingers which are front unrounded. Extended but middle finger bent is in between these, so it's mid (un)rounded. This leaves clawed, which there isn't much room for here anymore; we'll come back to it.
For thumb disposition, leaving aside the rare tucked behind the index, we've got two positions when the fingers are straight (unextended, to the side), and four when at least one is bent or curved (unextended, to the side, out in front, forming a loop). So it would be natural to use two binary features here, one of which is only good on back vowels. For the former, why not nasalisation. For the latter an offglide /j/ seems to make sense, since there are often constraints against e.g. sequences of a /j/ and an /i/. (I've made it an offglide so that it doesn't get into messy pileups with the orientation consonant, though soon I'll use an onglide too.)
To assign these we rely on markedness. When the fingers are straight, that means nonnasal is unextended thumb and nasal is thumb to the side. When the fingers are bent, in view of the fact that C and O are the least marked of these handshapes, I reserve the sequences without the yod for them. I think G wins over baby-O for unmarkedness, so nonnasal with no glide is thumb to the front, and nasal with no glide is forming a loop. And nonnasal + /j/ is unextended thumb, and nasal + /j/ is thumb to the side.
We kluge A in by pretending it has bent fingers, so it takes a /j/ to be distinct from S.
Having used a /j/ offglide, I might as well use a /w/ offglide too. This works for clawed fingers.
Two features are left, the extra pinky and finger spreading. Leaving aside crossing fingers, both are binary. And they occur in complementary distribution — the extra pinky only if at most the index finger is raised, spreading only if at least two fingers are — so they may as well be the same feature. I'll pick an onglide /j/ (over a /w/ because I'm using labial approximants elsewhere; this does have the unfortunate effect of creating lots of /ji/).
I summarise the assignments below. Each entry in the table gives a handshape, its description in the featural analysis above, and its assigned vowel. Many of the gaps are systematically fillable with other handshapes; perhaps even a few of them should be included that I'm unaware of.
Legend for the featural analyses:
|character 1||character 2||character 3||character 4||character 5||general|
|fingers raised||extra pinky?||finger disposition||finger spreading||thumb disposition|
|0: none||-: no||-: straight||-: unspread||-: unextended||.: feature is irrelevant|
|1: index||i: yes||c: curved||+: spread||a: as in A||?: I don't know|
|2: index + middle||h: bent||r: crossed||+: to the side|
|3: i + m + ring||x: clawed||c: to the front|
|4: all four||k: strt, middle bent||o: loop|
|t: behind index|
So we've got nine vowels. And for that, most any phonologist seeing the sounds in the table above would collapse at least /ɔ o/ and /ɘ ɨ/ as allophones, though the first at least are technically separated by the handshapes in NO. As markedness goes, it's unfortunate that the unmarked 5 /jĩ/ has turned out worse than the rarer thumby-B /ĩ/ and 4 /ji/.
Orientations appear in my scheme as the last elements of onset clusters. So I'll code them as the sort of segments that are most at home in such positions, approximants, especially liquids.
My take on orientation markedness is that the zero-marked value is the most natural way for the hand to be situated given that it's touching a given point in space (which might be a neutral space pseudo-point). In ASL phonotactics there are also constraints on modes of contact each handshape allows, and it may be that a better coding system would consider these; but I don't know them.
Regardless of the disposition of the fingers, I take the orientation frame of the hand to be defined relative to the base of the palm, and name orientations in terms of the sides of an open palm, since I don't want orientation codings or names to change when you only move the fingers around. For instance, touching a point with the tips of the fingers of a flat-O is a palm-side contact, since that's what it would be if you opened the fingers flat without moving anything else.
The fact that spheres aren't flat (and not only that, they don't even support a nonvanishing continuous tangent vector field) got in the way of my first couple attempts to make a nice and orthogonal coding for orientation, with for instance elbow behaviour and wrist behaviour coded separately. To illustrate the problems, hold your hand comfortably in neutral space in front of you, so that your forearm is pointing upwards. Now sweep your arm at the elbow 90° forward away from you, then 90° to the inside parallel to the ground, then 90° back up. This puts your forearm back where it was, but you'll notice that your hand has rotated at the wrist 90°. Now suppose one tried to code orientations in such a way that side-to-side twisting at the wrist was separated out from other features. The angle of this twisting changed over the course of these three arm motions, so one of them individually must have carried a twist; but none of the three individual motions seems like a compelling place to posit one.
In lieu of such a scheme, here are the orientation segments I'm using. Orientation is broken down into which side of the dominant hand is touching the point (or pseudopoint) of contact and how the rest of the hand is rotated. I sequence the segments in that order. For the point of contact:
We start our consideration of places with the most genuine ones, those which actually are locations on the body. For these I use obstruents, which feel like the most consonanty of consonants.
My basic place assignments are derived from a relatively coarse identification of the ASL-relevant points of the body, close to Stokoe's one: even that is a fair number of obstruents. Among points not on the arm, there is a feature of centrality: they are either sagitally central or off to the side. We'll mark this by fricativity, and co-opt the same distinction to mark different sides of the arm for points there. Sai suggests noncentral points are more common, so they get to be the less marked stops, while central points are fricatives. As for height of the points, I assign POAs from the back to the front of the mouth in top to bottom order, which has the nice effect that the arm gets the coronals. There's a break between the labials and the coronals where I jump back to the chest, but I don't mind, since that gap strikes me as impressionistically one of the larger inter-column gaps on the IPA chart anyways. (It looks like I'm not going to end up using voicing constrastively, given that I haven't used it here.)
This gives the following assignments.
|chin, below lips||/x/||cheek||/k/|
|(underside of wrist||/ʂ/)||(back of wrist||/tʂ/)|
|palm||/s/||back of hand||/ts/|
|fingers||/θ/||thumb side of hand||/t/|
|center of chest||/f/||side of chest||/p/|
The elbow and shoulder aren't a very close pair, but I suspect that more refined distinction of basic points near either of these is unnecessary. I'm considering omitting the wrist points as basic too.
Places not among the basic inventory, and places more precise than the basic inventory provides when that's necessary, can be specified as modifications of basic places. These places default to the dominant side of the body, except those on the arm which default to the nondominant arm, since that is easier touched by the dominant arm. To specify points on the other side also requires a modification.
I have a couple of different categories of renderings of points not touching the body. When these points can be easily and correctly specified relatively as endpoints of a motion, that is likely preferable to what I describe below. I'm not entirely happy with the duplication of strategies here, but it seems to be a good thing to do as regards markedness.
Positions in neutral space are the most unmarked positions of all, so I've given them minimally obtrusive consonants, namely glottals, which can be regarded as just phonation. I distinguish a few such positions according to the disposition of the forearm; I'm not actually sure all of them are necessary.
For the forearm pointing up I take the unmarked orientation to be palm outward, so that e.g. fingerspelling is performed with /ʔ/ onset and mostly in the unmarked orientation. For pointing forward I take it to be palm down. For the side position I haven't decided.
Again, there are classes of modifications to specify more refined points not touching the body, including some which lift points on the body out to nearby points.
There is also a special series of points for pronominal locations. These are construed as clusters with an initial nasal, which I chose with an eye to the other value of nasals. The nasal is allowed to assimilate in place to the following consonant (though I may write it overprecisely), and to take a syllabic realisation. The second element of the cluster is the representation of a (possibly modified) place on the body close to the relevant pronoun. I haven't yet defined these associations precisely, but, for instance, /nt/ is a sensible name for an default pronominal location off to one side at neutral height, and /ɴχ/ for a pronoun sagitally central and above the head. For the location of one's interlocutor I use /mf/.
Most of the place modification serve to specify a point slightly displaced from its canonical location. A modification of a point coded C is coded as a cluster MC, the initial segment M giving the details. Modifications can stack. Values of M are given in the following list.
Here and subsequently, I use these four logical directions for motion along the body surface even where physically they aren't really a good fit: thus e.g. even if my palm is sitting face up in neutral space, moving towards the fingers is moving downwards.
The association of stops to directions is chosen to have them roughly line up with their association to body parts; this association will recur. There is no particularly good reason for /f/ or /x/. For completeness I point out that /s/ and /ʃ/ can also occur in this cluster-initial position, but have entirely different meanings: they denote signs with both hands active.
Some motion endpoints are represented by nasals, the last common manner of articulation I've got left. However, to keep proliferation of consonants to a minimum, obstruents are reused for motion endpoints as well. Nasals serve for motion endpoints not touching the body, obstruents for those that do touch. The rule for disambiguating this use of obstruents from their use as genuine places is the following:
The first obstruent (or cluster thereof) in a word is always a genuine place, as is any obstruent (cluster) if the previous cluster ends in (or is) a nasal or a glottal. Other obstruents (or clusters thereof) are motion endpoints, unless they're preceded in a cluster by /ʔ/ in which case they're genuine places.Markedness justifies making the motion endpoint senses simpler, since words with motions best specified relatively in terms of them outnumber words with motions between any two old points.
In particular, in a word whose first two consonants (or clusters) are a nasal and an obstruent, the obstruent is a genuine place. It's okay to rule out the sequence of an endpoint off the body followed by a relative point on the body, since endpoints on the body are always specified with respect to other points on the body (i.e. I don't have 'inward to contact').
Motion endpoints are usually interpreted relative to the nearest genuine place to their left, but if there are none such they're interpreted relative to the nearest genuine place to their right. The latter only ever happens in the inward-moving case of the last paragraph. The default orientation for a motion endpoint is the actual orientation (not the default one!) of the genuine place it's interpreted with respect to.
The place of articulation assignments of these endpoints use the same directional scheme as the place modifications. For nasals we have
There are two basic kinds of two-handed sign, if we exclude iconic signs in which each hand can assume a classifier handshape and act freely: these are those where the nondominant hand serves as a static base, and those where it copies the dominant hand in disposition and in motion. The motion can be copied either as a reflection or in parallel or lapping by half a revolution of a circle.
In signs with the non-dominant hand static, this hand acts merely as another set of available Places: the relevant Place consonants are /ʂ/, /tʂ/, /s/, /ts/, /θ/, and /t/. For these signs the only data that must be specified for the static hand are its handshape and orientation. The general rule is that this information is coded as a special first liquid + vowel syllable, with the same coding as for the dominant hand, and the word stress falls in this case on the second syllable; this distinguishes these from one-handed signs where stress always falls on the first syllable.
Orientations are taken with respect to the default that the contacted place faces up. For handshapes I also have defaults which are assumed if there's no initial unstressed vowel. The default handshape is B (/i/) if the contact point is the front or back of the palm or the fingers, and S (/a/) if the contact point is the thumbside or the wrist. (I'm not sure if this is a good choice for the wrist.) If the vowel is omitted, the initial approximant may be taken syllabic.
Signs where the non-dominant hand copies the dominant hand are indicated by two special types of cluster, those with initial /s/ and /ʃ/, prefixed to the first consonant in the description of the dominant hand. Generally /s/ means that the non-dominant hand is to follow the dominant hand as a sagittal reflection, the more common behaviour, and /ʃ/ that it is to follow it in parallel. However, if there is no side-to-side motion, these two behaviours would be collapsed, so in this case I interpret the prefixes differently: /ʃ/ means that the non-dominant hand acts oppositely to the dominant in some direction, for instance staying opposite it in a circle, leaving /s/ for cases where the motion is a true reflection. (As this shows, I regard reflection as the more natural relation than parallelling.)
What about those signs where both hands move independently? It would be a plausible generalisation of the resting nondominant hand machinery to code the nondominant and dominant hands' motions in succession, using the primary stress to show where one switches hands.
I haven't given a lot of attention to any significant phoneme classes in ASL not treated so far. For instance, I don't even know how many secondary articulations it's necessary to distinguish. One of them, I suppose, is tapping, which I analyse as zero. Another is wiggling, i.e. continuous small changes of orientation. Since the lateral liquid is my most frequent orientation consonant I'm of a mind to make wiggling the also lateral /ɬ/. This is slightly unfortunate, being a fricative — but I don't really have any plausible unused manners of articulation at this point. As for ASL's supersegmentals, for motions performed specially slowly or quickly I mean to use the iconic representations, namely shortened and lengthened vowels. (I should look into a list of aspects.)
When an ASL sign has reduplication I code this by reduplication of a corresponding sequence in the spoken realisation. (This may demand buffer vowel insertion to prevent formation of new clusters.)
Below, as examples, are renderings into my system of the vocabulary words from Bill Vicars' ASL lesson #1.