As of January 2004 au.riversinfo.org is archived under Precision Info


Go To Table of Contents (closed_book.jpg - 9K) Table of Contents: Environmental Mapping Systems - Locationally Linked Databases

Chapter Two: Deriving Inferences From Databases That Define A Partition and Contain Sinks Of Information

Outline of Chapter

This chapter establishes that it is not possible to write a general algorithm1 for analysing interactions between biological organisms and properties recorded at various locations described by a locationally linked database; the problem can not be solved, even approximately. In order to establish this fact this chapter demonstrates that any attempt to write such an algorithm meets with intractable problems. This result has substantial implications for analysing pattern in such spatial data. The arguments presented are comprehensive, exhaustive, and at times abstract. A summary of the core arguments of this chapter is given below.

Since it is not generally possible to use a stochastic process operating over the set of location elements to test a hypotheses regarding the effect of property pi on an organism, one has to know how the particular organism will respond to coded properties prior to writing the algorithm for determining interactions between the organism and coded properties.

The distribution of organisms is frequently limited by conditions that are sub optimal rather than lethal. A transplant experiment may be necessary to account for historical factors responsible for determining the distribution of an organism. The Supplement to this chapter explains the meaning of some technical terms used in this chapter.

Cardinality: Stating Facts In Numerical Terms.

To formulate information in a scientific manner it is essential to have basic facts stated in numerical terms. However, it is not necessary to enumerate each unit in the universe in order to arrive at an acceptable estimate for the total. A carefully designed sample may provide the necessary information (Raj; 1968). It is possible to enumerate only those characteristics of entities for which there exists a correspondence between the characteristic and the concept of Cardinality. It is reasonable to write an algorithm for analysing interactions between entities distributed at various locations described by a locationally linked database and physical properties recorded within the database only when it is known exactly what can and what can not be counted, or represented by numbers, and how the said entities can be analysed. In many environmental management applications distributed entities of interest are biological organisms and the recorded properties describe physical characteristics present at the section of the environment described by each location element.

The Structure Of Locationally Linked Databases

All Databases That Specify Location Consist Of A Finite Collection Of Location Elements.

According to the Collins Dictionary of Mathematics (1989) the concept of space embraces a set of objects to which may be attached associated attributes, together with relation(s) defined on that set. A straight line defined between pairs of objects is one such relation.

A spatial examination of an entity must address itself to either analysing a variable that assumes values at different locations and/or analysing properties assigned to a particular spatial location. This entails a statement of location with respect to an imposed coordinate system, physical properties and topological characteristics at the stated location. Topological characteristics are significant because they address the spatial nature of geographic descriptions and specifically deal with issues associated with connectedness of points and areas.

The number of elements any finite adding machine can address is finite, therefore, the method of digital representation must be finite. A finite set of location elements, denoted by Å, can be constructed to represent all distinct location elements addressable to a locationally linked database. Without loss of generality the set Å, of location elements, can be said to have M2 members.

Using A Map To Represent The World With A Finite Collection Of Location Elements.

In order to represent the real world using the finite set of location elements it is necessary to propose a map, an ordered pair, made up of a transformation with a given co-domain, that takes a location in the real world (W) and maps the location to a particular location element addressable to the database. To effect reasonable analysis such a map would have to be well defined.
 
World f:W ® Å Database

Figure 2.1: The map f : W ® Å takes locations in the real world and maps them to particular location elements addressable to the database.

Conclusions derived about the real world W, using the representation Å, can be extended to the real world only if the transformation f:W®Å is invertible, at least approximately. The analysis that follows is independent of whether or not such a map exists and therefore establishes that even if it is possible to construct an f:W®Å that is exactly invertible it is still not possible to write a general algorithm for analysing interactions between biological organisms and properties recorded at various locations described by a locationally linked database.

Establishing A Mathematical Representation

Scales Of Measurement.

Scales of measurement based on elements of the Real Number Line are possible only because there exists an isomorphism between what can be done with measurable properties of objects and what can be done with numbers (Stevens; 1946). When measuring characteristics of objects, experimental operations are performed for classifying (determining equality), for rank-ordering, and for determining when differences and when ratios between the aspects of objects are equal. The empirical operations performed and the characteristics of the property being measured determine the type of measuring scale attained (Stevens; 1946).

Table 2.1 describes the group structure and permissible statistics for each of 4 scales of measurement that can be erected when using a representation based on elements of the Real Number Line. The mathematical group structure of a scale is the determined by the collection of algebraic functions which leave the scale form invariant. For a statistic to have any meaning when applying any particular scale the statistic must be invariant under all the transformations permissible for the scales listed mathematical group structure.
 
Scale Basic Empirical operations Mathematical Group Structure Permissible Statistics (invariantive)
Nominal Determination of Equality Permutation Group: x(i) = f(x)
f(x) means any one-to-one substitution.
  • Number of cases
  • Mode 
  • Contingency correlation
Ordinal Determination of greater or less Isotonic group: x(i) = f(x)
f(x) means any monotonic increasing function.
  • Median Percentiles
Interval Determination of equality of intervals or differences General Linear Group
x(i) = ax + b
  • Mean
  • Standard deviation
  • Rank-order correlation
  • Product-moment correlation
Ratio Determination of equality of ratios Similarity Group
x(i) = ax
  • Coefficient of variation

Table 2.1: The scales of measurement (Birkhoff in Stevens; 1946).

The permutation group includes the isotonic, general linear, and similarity groups as subgroups. Therefore, any statistic applicable when using the nominal scale is automatically applicable when using an ordinal, interval or ratio scale. Similarly, any statistic applicable when using an ordinal scale can be applied when using the interval or ratio scale and any statistic applicable when using the interval scale is applicable when using the ratio scale.

In order to take ratios of measurable characteristics of objects in any self-consistent manner it is essential that the ratio scale be used. This scale requires an absolute zero. Measurement of temperature in degrees Celsius serves as a specific example. Given a temperature x 0C it is not reasonable to consider a temperature 2*x 0C; for if x be 10 0C then 2*x corresponds to 20 0C, but if x be -10 0C then 2*x 0C corresponds to a temperature of -20 0C! The problem arises since measurements of temperature in degrees Celsius conform only to the interval scale.

Describing Location Elements Using Members Of Finite Sets.

P - The Set Of Measurable Properties For Each Location Element.

Every number is represented by a symbol and therefore to describe the distribution of phenomena that can be represented by numbers, at any location Li, a member of Å, it is necessary and sufficient to ascertain the presence or absence of an associated symbol. Properties that can be represented by numerals, at any Li, are determined to be either present or absent. Properties coded as present or absent at each location element must be drawn from a finite set of properties, this set can be denoted by the symbol P. Without loss of generality P can be said to have N3 members. Each member of the set P can be uniquely identified using 1 of the symbols pi, where i takes all integral values between 1 and N.

Generally it will not be possible to impose an ordering on elements of the set P. This can be seen by considering an example for which the properties p1, p2 and p3 represent a value for a measurable property of a plant, an elevation value and a particular class of land aspect respectively. It is possible to define a property of magnitude only on those sets endowed with an ordering. In the case where it is not possible to define a property of magnitude it follows immediately that it is not possible to define a metric.

W - The Set Of All Possible Distinct Property Codings That Can Be Used To Describe Location Elements.

The set of N distinct properties can be used to recursively generate a set of all possible distinct property codings that can be used to describe location elements. This set can be denoted by the symbol W. In order to generate the set W, a rule for generating W must be established. This can be achieved using the characteristic function g:P®W, where g maps property pi to the ith member of an N dimensional tuple and assigns to the ith member of the tuple the value 0 to denote the presence of property pi and 1 to denote the absence of property pi . Each member of the set W is a word of length N. It is clear that W contains 2N members. Each member of W can be uniquely represented by one of the first 2N binary numbers and/or one of symbols wi, where i takes all integral values between 1 and 2N.
 
P: The Set of all distinct properties g : P ® W W: The set of all possible distinct property codings.

Figure 2.2: The characteristic function g : P ® W that constructs W, the finite set of all distinct property codings, from P, the finite set of available properties.

The property coding for each member of Šis described uniquely by one of the members of W. The map y:ŮW, that assigns to each member of Šthe member of W that describes the distribution of properties at the member of Šis an injective function from Što W. In general the map y:ŮW will not be surjective and it is to be expected that many members of the set W will not have any members of the set Šmapped to them by y:ŮW.
Å : The set of all distinct location elements y : Å ® W W: The set of all possible distinct property codings.

Figure 2.3: The injective function y:ŮW that maps every member of the finite set of location elements to a member of W, the set of all distinct property codings.

Partitioning Sets With Equivalence Relations.

If ~ is an equivalence relation on any set Z then the family of all equivalence classes is a partition of Z. Additionally, if {Sa} is a partition of Z then there exists an equivalence relation on Z whose equivalence classes are the {Sa}(Rotman; 1965). Because the mapping y:Å®W is an injective function from Å to W, any partition of W can be used to partition the set of all location elements, Å. The properties pi, for i=1 to N, can be used to construct a partition the set W of all possible words of length N.

Measuring The Environment At Each Member Of The Set Å.

When measuring with the nominal scale it is possible to identify either a most numerous class or a number of most numerous classes. The group structure permits that subject to certain conditions, a contingency correlation can be defined which allows hypotheses regarding the distribution of cases among the classes to be tested. In order to do this it is necessary to identify exactly which entities are to be considered as classes and exactly which entities are to be considered as cases.

Constructing Cases And Classes For Investigating The Distribution Of Organisms With Respect To Coded Properties.

W is the set of all possible distinct property codings that can be assigned to location elements, members of the set Å. W consists of words of length N letters. To investigate interactions between biological organisms and coded properties it is necessary to split the set of N letters into 2 sets of letters:

  1. Letters that describe measurable properties of the biological organism (at any given location) about which information concerning interactions between it and other coded properties is sought. The organism of interest can be called organism Z.
  2. Letters that describe other measurable features for each location element.

Without loss of generality, it can be claimed that X4 of the N recorded properties are measurable properties describing organism Z and that N-X of the properties are measurable properties describing entities other than organism Z.

In order to prove that no general algorithm can be written for analysing interactions between the measurable properties of biological organisms and other coded properties it is necessary to demonstrate only that intractable problems arise when the simplest possible measuring scale is applied to the organism. The simplest measurement that can be made for a biological organism can be achieved by applying the nominal scale using 2 classes. Class 0 organism coded present and class 1 organism coded absent could be used. This approach has the added advantage of keeping the argument and notation simple.

b And a The Sets Of Classes And Cases Respectively.

Using such an approach, the following two sets can be generated:

1. The set a consisting of a 1 dimensional tuple in which a 0 is placed to denote the presence of organism Z and a 1 to denote the absence of organism Z at a specified location element. The set a has two members.

2. The set b consisting of all distinct property codings using the remaining N-1 coded properties. The set b can be generated in an analogous way to the set W. The set b has 2N/2 members.

Two additional mappings can now be proposed:

1. The map §: Å®a

2. The map ¨: Å® b .

Both of the mappings §: Å®a and ¨: Å® b are injective functions from Å to a and Å to b respectively. Therefore any partition of a or b can be used to partition Å
 
Å: The set of all distinct location elements §: Å®a a: The set of all possible cases.
¨: Å® b b: The set of all possible classes.

Figure 2.4: The injective functions §: Å®a and ¨: Å® b that map every member of the finite set of location elements Å, to a member of a and b.

The two members of a would be referred to as the cases: case 1, organism Z present and case 2 organism Z absent. The number of classes is the number of members of b constituting the range of ¨:Å®b; frequently the number of classes will be completely described by a proper subset of b. The most numerous class, or most numerous classes, are defined by the member or members of the set b that have at least as many members of Å mapped to them by the injective function ¨:Å® b as any other single member of b. The group structure permits that subject to certain conditions, a contingency correlation can be defined which allows hypotheses regarding the distribution of cases (members of the set a) among the classes (a subset of members of the set b) to be tested. For example, it may be possible to test the hypothesis that organism Z is distributed at random and therefore the distribution of members of the set a is independent of any, and all, the classes defined by elements of the set b. It may be possible to test such an hypothesis by comparing the assignment of members of Å to elements of a and the assignment of members of Å to members of b, with the expected assignment if elements of a were assigned using a random walk over the set Å.

Members Of The Sets W And b Need Conform Only To A Nominal Scale.

The ordinal scale is applied to place objects in a rank ordering scheme. Interval scales are defined by imposing the constraint of linearity on an ordinal scale. Ratio scales can be defined when it is possible to define the four relations of equality, rank-order, equality of intervals and equality of ratios (Stevens; 1946). It is known that members of the set P need not conform to an ordinal scale and therefore there is no requirement that members of either of the sets W or b conform to an ordinal scale. Any statistic requiring an ordinal, interval or ratio scale need not be applicable to elements of the sets W or b. Because elements of the set W or b need be measurable only with the nominal scale it will not generally be possible to define a property of magnitude on elements of the sets W or b.

Taking A Representative Sample Of Elements From The Set Å.

Generally it will not be possible to take the average or standard deviation of elements of the sets W or b. In order to take a "representative sample" it is necessary to invoke some measure of central tendency. Importantly, the average and standard deviation measures of central tendency can not generally be applied to elements of Å. In the case where it is not possible to establish a measure of central tendency it is impossible to take a "representative sample".

Decomposing The Set b To Establish The Effect Of Coded Properties On A Biological Organism.

In order to establish the effect (potential or actual) of properties pi (for i=1 to N-1), on a biological organism Z, at a location described by a member of Å, it is necessary to establish an operation for decomposing the member of b that describes the location element to its component letters. Essentially, in order to determine interactions (potential or actual) between the biological organism Z and particular properties it is necessary to factor elements of b in some reasonable manner.

Any attempt at such a decomposition must consider the issue of what the Property codes actually represent and how they are likely to impact on the biological organism Z. The properties describe physical characteristics of the environment in which organisms live. The concept of an organisms niche is well established and arises from the elementary observation that organisms are not distributed at random throughout the worlds ecosystems. Each of the properties mentioned can be thought of as a dosage. Weber's general quantitative assertion of psychophysics holds that there exists a constant C such that two stimuli S and S+DS will be physiologically indistinguishable unless the ratio DS/S>C. C is frequently approximately equal to 1/30 (Resnikoff; 1989). Organisms respond catastrophically to dosages associated with any selective physical phenomenon. This observation forms the basis of the LD50; or logdose 50 statistic. The presence or absence of some property codes (represented by letters) may be immediately fatal; regardless of how suitable the rest of the character string is. Due to the catastrophic response of organisms to some properties it may be impossible to decompose a word to determine the individual effects of coded properties on an organism.

Sinks Of Information And Stochastic Processes.

Once a fatal property, for a particular organism, is present at a given location, it is impossible to say anything about the potential or actual effect of other recorded properties on the organism. A fatal property acts as a sink of information.

It is apparent that even if a property pi has no effect on organism Z, organism Z will not generally be distributed independently of location elements containing property pi. Therefore, it is not generally possible to use a stochastic process operating over the set Å to test a hypotheses regarding the effect of property pi on organism Z.

The absence of any adequate decomposition, for members of b, means that it may be impossible to construct a numeric representation to investigate the interaction between all properties pi (for i=1 to N) and the biological organism Z distributed at locations described by members of Å.

Breaking Down The Set Å To Establish A Mathematical Representation To Determine Interactions Between Biological Organisms And Coded Properties.

In order to construct a numeric representation directed at establishing potential or actual relationships between the organism Z at various elements of the set Å and particular properties pi it is necessary to apply a filtration to elements of the set Å and exclude from consideration any elements of the set Å at which a fatal attribute, for the particular organism, is present (whether coded or not). If the location of all fatal properties was known then it would definitely be possible to apply such a filtration to elements of the set Å. However, in the case where the property pi is fatal for a particular organism and the absence of the property pi implies the absence of property pk at all location elements, it is impossible, from the set Å, to say anything about the interaction between Property pk and the particular organism. Given this problem, the fact that any attempt to establish an equivalence relation on members of the sets W or b defines a partition of these sets, and the observation that measures of central tendency need not exist for either b or W it may be impossible to take a statistically valid representative sample of location elements from the set Å. The distribution of fatal properties at both the sampled and unsampled sites must be accounted for.

There May Exist Distinct Partitions Of W And b That Can Be Used To Aggregate Members Of Å Identically.

It is observed that the mappings y:ŮW need not be bijective and therefore there may exist distinct partitions of W and b that can be used to aggregate members of Šidentically. In the case where a subset of location elements mapped by y:ŮW to elements of W having Properties p1 AND p2 AND p3 coded as present is sought a partition of W can be defined by the equivalence relation that relates elements of W having identical coding for properties p1, p2, and p3. In this case the partition consists of a family of eight equivalence classes. These eight equivalence classes can be denoted by the symbols e1, e2, e3,...,e8. A distinct partition of W could be obtained by the family of 8 equivalence classes having identical property coding for p4, p5 and p6. These eight equivalence classes can be denoted by the symbols f1, f2, f3,...,f8. In such a case each member of Šcan be assigned uniquely to both one e and one f class. Since y:ŮW is only an injective function there may exist a 1:1 correspondence between the members of Šassigned to each e class and the members of Šassigned to each f class.

The Property Specification Required To Select A Particular Subset Of Location Elements Need Not Be Unique.

An obvious corollary to the fact there may exist distinct partitions of W that can be used to aggregate members of Šidentically is the fact that the property specification required to select a particular subset of location elements need not be unique. A statement to select a subset of location elements mapped by the injective mapping y:ŮW to members of W having up to N specified properties coded as present always selects a well defined subset of location elements. For example, if N=10 the statement to select all location elements mapped by y:ŮW to elements of W having properties p1, p2 and p3 coded as present would always select a well defined subset of location elements. Such a subset of location elements can be denoted by the symbol j. It would not be at all difficult to construct a database in which the subset j was equivalent to the set of location elements mapped by y:ŮW to elements of W having property pi (where i is any single specified integer between 1 and N) coded as present. In such circumstances there may be no reasonable way of assigning location elements not in the set j to classes established by a combination of the specified properties used to select j. If the property coding required to select a particular subset of location elements is not unique then it may be impossible to analyse the database to investigate the effects (potential or actual) of coded properties on an organism.

Analysing For The Effect Of 1 Property When Specifying A Union Of Elements Of W.

Boolean operators acting on members of W are not associative. For example, the statement to select all location elements mapped by y:ŮW to members of W having properties p1 AND p2 OR p3 coded as present is ambiguous and can be given an analytical meaning only by imposing an order of operations. The Axiom of Choice, which permits the construction of a set containing exactly one element of each of a given family of sets, requires that the sets used for the construction be disjoint. Sets defined by a union of elements of W are not disjoint and therefore in the case where a selected subset of location elements is determined by a union of elements of W it will not generally be possible to analyse for the effect of 1 specified property on an organism.

Determining Interactions Between Biological Organisms & Coded Properties - Insufficient Conditions

An algorithm sufficient to permit the construction of a numerical representation for analysing interactions between biological organisms and particular properties recorded at various locations described by a locationally linked database need not exist because:

It follows immediately from the observation that some organisms require properties that are fatal for other organisms that in order to write an algorithm for examining interactions (potential or actual) between a biological organism and coded properties it is necessary to have knowledge of both the organisms physiology and the distribution of fatal properties. It is not generally possible to use a stochastic process operating over the set Å to test a hypotheses regarding the effect of property pi on organism Z and therefore one has to know how the organism will respond to coded properties prior to writing the algorithm for determining interactions between the organism and coded properties.

Apart from the theoretical problems associated with attempting to apply mathematical analysis methods to such databases there are practical problems:

Establishing Prediction Using Data From A Locationally Linked Database

The success or failure of managing data of mega- or giga-byte proportions depends on whether or not the accuracy and content of each data entry is adequate and appropriate to give consistent results for the types of question being asked. Attempts to estimate the change of an ecosystem over time can not be mounted in any reasonable manner until the relationship between the rate at which the information becomes obsolete and the scale at which the information describes the ecosystem is established.

The natural environment is susceptible to external influences and is "governed by rules" that are not rigorously understood. It is a characteristic of mathematical thought that it can have no success where generalisations can not be made (Peirce; 1895). Unfortunately, because the construction of a GIS database defines a partition, and data that has been aggregated to define a partition either can not be, or can not readily be generalised, prediction is practically impossible. In order to generalise the data it is usually necessary to establish cause and effect and this must be done statistically. In order to statistically interrogate the database for the effects of coded properties on organisms, a filtration (the existence of which can not be guaranteed) must be applied, and this inevitably means that much of the data must be rejected. Additionally, conclusions derived from filtered data may not be valid when extended over the entire database. The assumption that more data necessarily increases predictive ability does not hold under circumstances where sinks of information exist.

Supplement to Chapter 2: The Meaning Of Some Terms Used In Chapter 2

I substantially attempted to apply standard definitions; as published in the Collins Dictionary of Mathematics. To suit their purpose, the meaning of some terms defined by authors varies, therefore explanation of some terms serves to prevent confusion.


1 A general algorithm is an algorithm that must be able to answer any well posed question concerning interactions between all coded properties and all organisms distributed at locations described by a locationally linked database.

2 Where M is a positive integer.

3 Where N is a positive integer

4 Where X is a positive integer less than N.


A paper summarising aspects of the information contained withing this thesis was subjected to independent review by three impartial referees and published in the Australian Journal of Soil and Water Conservation (Volume 1 number 9 February 1996).

Go To Meta Data Record: Environmental Mapping Systems - Locationally Linked Databases

Go To Table of Contents (closed_book.jpg - 9K) Table of Contents: Environmental Mapping Systems - Locationally Linked Databases

  Top
Single Frame
Single Frame Page View (scroll.gif - 1K) 
Page View
Printer
Print Version
Friendly Page
Email &
Mail To
Feedback
Home to http://au.riversinfo.org
au.riversinfo.org

Warning IconRiversinfo Australia (au.riversinfo.org): Publication Information:


Go To Copyright © Go To Disclaimer