Concepts of Space, Position and Motion

Eric S. Wheeler

In the JackAndJill implementation of the Theatre of the Mind project (Wheeler 2012), there is a virtual stage which provides a space for positioning entities called “places” (e.g. London), and  for positioning and moving mobile entities such as “players” (e.g. Jack) and “props”  (e.g. a pail).  A formal model of these concepts is needed to make the animation work, and especially to explain the semantics of verbs of motion, and prepositions.  This note starts to develops that model.

Space in the animation

This is simple.  We use a system of three integers (x,y, z), ranging over the width, height, and depth (distance away from the viewer) of the stage. Call this “animation space”.  Also, by setting a “viewing distance” which positions where the audience is viewing the stage, we can calculate the projection of any point in the stage space onto its position on the view screen (i.e. the plane in front of the stage that corresponds to the viewer’s computer screen).

If we set (0,0,0) in the centre of the view screen, with the stage going behind it, then:

  • negative x is distance towards stage-right and positive x towards stage-left;
  • negative y is distance downwards, and positive y is distance upwards; and
  • positive z is distance away from the audience;  negative z is distance towards the audience, but is not used because it represents positions that are in front of the view screen.

In terms of the stage coordinates, the view screen is (X, Y, 0) for all X, Y.

Let v be the viewing distance.  Then any object at position (x, y, z) on the stage will project onto the view screen at  (p, q, 0) or simply (p, q),   where  p = x * factor, and q = y* factor,

and factor  (i.e. the scaling factor) is given by:

factor =  v/(z+v)

Having dealt with how points (and therefore objects associated with them) are displayed on the view screen, we can use the 3-dimensional coordinates as the model of where things “really” are.

However, linguistically, we do not work in terms of point coordinates.  The task then is to relate the linguistic concepts of position to the system of “real” coordinates that we have adopted.

Some motivation

Here are some examples of linguistic references to position, size, and motion that will motivate our linguistic model of space:

  1. A 50-foot pole is  “fifty feet long” if the pole is lying on the ground, but “fifty feet high” or “fifty feet tall” if it is upright.  In the horizontal plane, “long” is appropriate, but in the vertical direction, “high” is right.
  2. “Up” and “down”, “top” and “bottom” are used in the vertical direction (“The cat went up the tree, to the top”), but on a page, one can also use these words for one of the dimensions (e.g. “He read down the page, and signed at the bottom”), even if the page is physically lying in  a horizontal plane ( e.g. flat on a desk).  But, it is only to one of the page’s two dimensions:  “down” can’t mean from the left to the right; one can read across a page, or down a page, but the two cannot be interchanged.
  3. The expressions “stage-right” and “stage-left” highlight the fact that the meaning of “left” and “right” depends on the orientation assumed:  “stage-right” is the right side of the stage, assuming the stage front is facing the audience; however, we read from “left-to-right” on a page, taking “left” to be the reader’s left, which is the right side of the page (with the front facing the reader).  When you put something in the “left” drawer of a cupboard or dresser, is it your left or the furniture’s left?
  4. For me,  it is the case that one lives “on Main Street” but the kids play “in the street”.  My francophone colleagues tell me that in French, depending on the dialect, one can use “dans (in)” or “sur (on)” but not necessarily in the same way as in (my dialect of) English.
  5. With abstract concepts, there is a definite use of prepositions, e.g. one can speak “in English” or “in a dialect of English”  but not *“on English”.  If abstracts are an extension of the use of prepositions in a physical setting, then the extension has some restrictions, perhaps based on the properties in the physical setting.

Basic Concepts


Place:  A place is an entity which occupies part of the space on a stage.

Player:  A player is an entity which can move about on a stage. It occupies some space at any given time, but unlike a place, the space can change.

Prop:  Spatially, a prop is like a player, except that it does not move on its own…something or someone has to move it.


Axis:  A one dimensional space.  It can be implemented in animation space as a single coordinate, a parameter which is a linear combination of animation coordinates.

An axis has a negative and positive end.  In particular, we have (for any place or player):

  • A main axis, which runs from back (negative end) to front (positive end)
  • A vertical axis, which runs from bottom (negative end) to top (positive end)
  • A lateral axis, which runs from left (negative end) to right (positive end)

A region of an axis is defined by two points on the axis, the region’s negend (a term coined from “negative end”, to avoid using terms like “lower bound” and “upper bound” which are biased to one of the axes) and posend, such that  the negend is closer to the negative end of the axis, and posend closer to the positive end of the axis.  Informally, the region is the segment of the axis between the negend and posend, and the boundary of the region is the negend and posend.  We will also call this a 1-d region.

By extension, a 2-d region is defined by two axes, and a 3-d region by three axes.  Thus, we will say that a point  P is in a 3-d region Q if P is in the (1-d) region of each of Q’s axes.   This definition implies that 2-d regions are rectangles, and 3-d regions are boxes; and this is sufficient for now; although it would be possible to define a region using a more elaborate function of the defining axes (e.g.  make a circular region by requiring x2+ y2 < r2), we don’t need this for now.

The stage has a fixed orientation for its axes, and by default, places and mobile objects have the same orientation for their axes, at least until the mobile objects are moved.


The boundaries of a 3-d space Q are given by a space with one of  Q’s dimensions set by the endpoint of the Q.  Thus the boundary is a 2-d space.  For example, if Q = (X, Y, Z) where X is the region on Q’s main axis,  Y is the region on Q’s vertical axis, and Z is the region on Q’s lateral axis, and if TOP is the posend on Y (i.e. the most positive point on Q’s vertical axis), then the upper surface of Q is the  2-d space given by the regions X and Z, with the vertical position TOP.

In like manner, 2-d spaces have 1-d boundaries, and the 1-d spaces  have 0-dimensional boundaries given by the negends and posends.

For any object we can have a 3-d space:

Axis Posend Negend Posface Negface
Main front back front surface back surface
Vertical top bottom upper surface lower surface
Lateral right left right surface left surface

Centre and Size

For the animation, it is convenient to give every space a “centre”,  a point equal distance from its endpoints.   Thus, we can represent a region (1-d space) by its centre, and by a measure  of its size,  or better its half-size (s). If c is the centre, then the posend is at  (c+s), the negend at ( c-s), and the length or size of the region is 2s.  The concept extends to 2-dimensional and 3-dimensional spaces providing the centre is the same for all three axes.

The centre becomes useful as a stage location for a space, when linguistically we are not concerned with the extent of the space.   For example,  when we say “John is at the church”, we are only concerned with having the position of the Player John the same as the position of the Place “church” — it does not matter where John is within the space of the church.

Associated Spaces

With any 3-d space, there are a set of associated spaces, defined by the three axes.  For example, on the vertical axis, a space S with a negend N and posend  P  has a space (the “under-space”) that has locations with a vertical coordinate less than N, and another space (the “over-space”) with coordinates that are greater than P.  (In fact, we can image two versions of such spaces:  the one where the other coordinates are the same as S (i.e. the space directly under or over S), and the space without such limits. However, the distinction may not be necessary for our purposes.)

Axis Negend Space Posend Space
Main rear-space front-space
Vertical under-space over-space
Lateral left-space right-space


We move an object by changing the location of its space relative to the space of the stage, including its position and its orientation.  We will call this “locomotion”.  So, for example, we can move “Jack” from stage centre to stage right by moving the centre of Jack’s space, and change the orientation of Jack by rotating all the axes around (say) the vertical axis so that Jack is facing to the right.  (That is to say, Jack is facing to the right relative to the stage…of course Jack is always facing forward relative to himself).

There is another sort of motion (not considered here) which is the change of configuration of the parts of Jack, e.g. raising and lowering his arms, or moving his legs. We will call that “wiggle”.

In animating locomotion, we not only want to show the start and end positions of a mobile object, but also lots of the in-between positions.  To do this, we need the concept of a “path”.


PathSegment:  A PathSegment is defined by two positions, “path-start” and “path-end”  We animate locomotion along a PathSegment by moving the object from the path-start to the path-end in a straight line, at a constant speed (clearly, there are other possibilities here).  The orientation of the PathSegment is from path-start to path-end, i.e. we can line up the main axis of an object with the PathSegment.

Path: A Path is an ordered set of  PathSegments, such that the path-end of one PathSegment is the path-start of the next (when there is a next).  The path-start and path-end of the Path are the path-start of the first PathSegment and path-end of the last PathSegment.   This definition implies that a Path can also be seen as  a set of points (the start and end points of the segments).  It also says that a Path need not be a straight line, and it need not be traversed all at a constant speed.

It is possible to define a “loop” as a Path whose end point is its starting point, and  allow objects to circulate around the path a given number of times.

InteriorPath:   For a space, it is sometimes convenient to have an object moving within the bounds of the space.  An InteriorPath is simply a path (possibly a loop) whose points are confined to the interior of the space.

Classes and Parts

Objects can be places, players or props.   Instances (e.g. “Jack”) belong to classes (“boy”) and the classes can be organized in hierarchies  (e.g., the class “ boy”  is a subtype of the class “person”  and likewise, an instance of the class “girl” is an instance of the class “person”).

Objects can inherently have parts, i.e. the parts are not introduced into a narrative, but are just assumed to be there: e.g. Jack has a head, which has a crown (taken either as the top of the head, or as the regal hat that sits on the head).  Thus, our ontology includes classes (Class)  which can be related by a type-subtype relationship (Type) or by a part-whole relationship (Part)

Interpreting Prepositions

Many English prepositional phrases can be interpreted as references to spatial concepts.   Here is a preliminary categorization of some prepositions in the context   “S is [preposition] O”, where S and O are the subject and object (noun phrases) respectively, e.g. “Jack is at the church”  or “Jill is in London”.  Of interest is the nature of the space associated with the object, and how the space of the subject relates to that object’s space.

For example,  with “in”, we expect the object to be a 3-d space, and the space of the subject to be constrained by the boundaries of the object space.   Thus (in the phrase “Jill is in London”), London is considered to be a 3-d space, and the space occupied by Jill is such that  negends of the three axes for London are less than the corresponding negends for Jill,  and the three posends for London are greater than the corresponding posends for Jill.   That, of course, is just the long way of saying Jill’s space is inside London’s space, but if we are to avoid using prepositions to define prepositions, that is what is necessary.

In contrast, the preposition “at” does not require we know the dimensions of the object.   We can have “Jack is at the church” (the church being a 3-d space, because Jack can be “in the church”), or “Jack is at the top of the church (the top generally being a 2-d surface, and Jack can be “on the top of” something, but not *“in the top of” something unless we create a special circumstance where the top is 3-d.) or even “Jack is at the end of the road” (where “end” suggests a point or zero-dimensional space).

What matters is that the location of the subject is the same as the location of the object.  There is, I think, an asymmetry here. Jack, as the mobile object, can be  “at the end”, but it would be strange to say “The end of the road is at Jack”.   In more static contexts, however, we can say: “Point P is at the intersection of lines L and M” or “The intersection of line L and M is at point P”.

Some Spatial Prepositions: a first analysis

In the context “ <subject> <verb> <preposition> <object>”, eg.  “Jack is at the church” or “Jack went to London”,

  • Let S and O be the spaces referenced by the subject and object respectively.
  • Let loc(X) be the representative location for space X
  • Let interior(X) be the interior of X (whatever dimensions are relevant)
  • Let Y< X say that space Y is contained by space X.
Prep Subject Object Constraint Comment
at mobile static loc(S) = loc(O) “Jack is at the church”
in 3-d static loc(S) < interior(O)
on 1-d loc(S) < interior(O) “on the line”
2-d loc(S) < interior(O) “on the bottom/top/face of ”
3-d loc(s) < top(O) “on the church” i.e. the 3-d object is implicitly referring to a 2-d surface
to path( path-end=O)
from path(path-start = O)
with mobile loc(S) = loc(O)O is mobile “Jack is with Jill”
through path( path-end=O) + path(path-start=O)
under loc(S) < under-space(O)
over loc(S) < over-space(O)
beside loc(S) < left-space(O) or right-space(O)

With very few semantic primitives, we are able to distinguish senses of  several prepositions.  Of course, the context is intentionally limited to a setting involving place (physical space); extensions to other domains, especially abstract ones, will require some theory to relate the abstract to the physical, but that should be do-able.   Even temporal interpretations follow if we posit that time is like a 1-d space with a main axis:   “before”, “after”, and “during” can be seen as references to time (temporal space) in the front-space, rear-space and interior of some reference event. Other prepositions may need more semantic concepts (an elaboration of the ontology we has started to design, above).  For example, “for” seems to involve a parallel concept to “to” but also some notion of  an abstract pathway (e.g. a pathway representing the intention of someone to transfer the subject to the object: “this letter is for Jack” = “I intend this letter go to Jack”).  We need to do more study before we can define these prepositions.


Wheeler, Eric S. 2012. Theatre of the Mind: Introduction.

To process a human language text effectively, it is necessary to have some understanding of what the text means.  The Theatre of the Mind (TOM) project is my attempt to display text meaning in a way that is not just another text:  TOM provides a “virtual stage” on which animated characters act out the meaning of the given text.

The first implementation, called “JackAndJill”, began with the nursery rhyme “Jack and Jill went up the hill to fetch a pail of water…”.  It displays a stage and backdrop against which one can set up certain locations (e.g. the town called “London”, the hill called “High-Hill”) and characters (the boy “Jack”, the girl “Jill”, the boy “Sam”).  The programme accepts texts such as “Jack went up the hill” or “Jill has gone to London”, and produces semantic structures (“scripts” in our terminology) corresponding to the successful parse or parses.  The scripts can be “played” and when played, the characters move appropriately on the stage.  (see Figure 1)

Figure 1. Jack and Jill on the Virtual Stage.  Jack is going to London.

Our objective was not to create state-of-the-art animation, so the techniques used are elementary; nonetheless, the characters move in 3-dimensional space, orient themselves correctly for the path they are following, and move their arms and legs.  Some of these body-motions (i.e. change of configuration of body parts vs. translation of the whole character to a new location) are used to distinguish one kind of action from another (e.g. “go” vs “dance”; with “dance”, the character twirls around as well as moves forward).  So, the animation is as sophisticated as needed to capture the semantic distinctions in the text and to do it in a non-text medium.

I have always been concerned that semantic explanations which (for example) translate “John runs”  into “run(john)” are overlooking some of the necessary details.  To be fair, when “run(john)” is a statement in a formal logic, it has a well-defined meaning.  As such, it is capturing (in a well-defined way) something about the meaning of the text, and in principle, the logic statement could drive other representations such as  our animation.  But there is the uneasy feeling that moving from one text notation to another text notation can simply hide the challenging parts of the interpretation;  when one has to see the meaning acted out, it is intuitively clear what has (or has not) been captured.

The JackAndJill programme uses a parser that maps strings of characters (divided into words by the presence of blanks or line ends – so called white space)  into constituent structure trees and uses these trees to compose elementary semantic units (called “intentions”) into a composite semantic structure (a “script”). We will describe these in more detail in subsequent notes.   Here we note that the process allows for multiple parses and therefore multiple interpretations of a text: the lexicon (mapping strings of characters to syntactic categories and their associated semantic intentions) can have multiple entries for the same string;   the syntactic rules can apply in multiple ways to the same input ([Jack talked to [the girl] in London] vs [Jack talked to [the girl in London]]) and the virtual stage can provide different scenarios for the same string, depending on the prior configuration of characters (The performance of “Jack goes to London” varies depending on where Jack is placed at the start).   A more enhanced version of the JackAndJill programme would allow the user to manually adjust the state of the stage.

We envision the JackAndJill programme and other like implementations being used in various ways:

  1. It leads us to build a computational model of language and language processing.  As such, it is an aid to our understanding of natural language, on the one hand, and of the various academic theories about natural language on the other hand.   Existing theories often are isolated from one another (e.g. a theory of syntax not relating to any theory of semantics and vice versa)  but to make a successful JackAndJill programme, it is absolutely necessary to have all the parts of the system working with one another.
  2. A working system can be used in practical settings, for example, to teach language by allowing a user to try constructions and see what they mean.  English “in” and “on” are not easy to decipher: (in my dialect at least) one lives “on a street” but “in a town”, although one could play  “in the street” or “on a team”
  3. A more robust version of JackAndJill could be used by authors to create stories that are acted out on stage (What would William Shakespeare have done with such a tool?).  More mundanely, the tool might help the authors of work instructions, standards documents, and regulations to visualize what their writing actually says.  The visualization could also be an adjunct to the written document to aid the reader as well.

All of these applications require a breadth of coverage and a depth of function that is not currently there, but with the current state of information technology, they remain real possibilities.

We intend to make the JackAndJill grammar handle the core features of English, such as the syntactic structures (phrases, optional and mobile elements, embedded clauses, etc, — the English auxiliary construction is particularly interesting computationally), the use of closed class categories (determiners, quantifiers, prepositions, particles, conjunctions and so on) and “basic” open class words (including representative verbs of motion, perception etc.), so that the grammar can be extended to a particular field or application by adding more open class items on the model of  what already exists. Thus “walk”, “trot”, “canter” and “gallop” could all been seen as extensions of “go” (or perhaps “dance”) by specifying the mode of going.

We also hope to explore some of the corresponding semantic fields from a computational perspective, such as the language-user’s implicit notions of space, position and motion (driven by the semantics of prepositions).

All these these things will be discussed in more detail on subsequent notes to this blog.  Readers who have comments or contributions to make are invited to write me at:

wheeler <at>  (where you replace <at> with @)


Wheeler, Eric S. 2009. Theatre of the Mind: A Project to Animate the Language of Thought and Communication. in E-Learning and Digital Media. 6.3 Special Edition. Sept 2009. 272-273.

Wheeler, Eric S. 2009. Visualizing Language. Presentation to the International Linguistic Association. New York. April 2009.

In the 21st century, why can’t our academic findings be published online… a kind of personal reporting of explorations and results, not unlike what researchers in the 17th and 18th centuries did with their letters to colleagues. Here is my attempt to do this using a blog to capture my ongoing thoughts on the subjects I am exploring.

You are invited to respond to or contribute to anything here by contacting me at wheeler [at] (where, of course, you replace [at] with @)