The Construction of Cantor Sets
By Ng Tze Beng
Cantor sets play an important role in real analysis, particularly in furnishing counter examples and exotic or pathological functions. Most frequently, we meet Cantor set of zero measure but the construction is canonical enough to apply to give Cantor set of positive measure. These are subsets of the closed interval [0,1] and of measure greater or equal to 0 but less than 1. For the use of Cantor sets see Composition and Riemann Integrability, Lebesgue Integration and Composition and Change of Variable or Subsitution in Riemann Integration.
The first Cantor set we shall construct is the Cantor set of measure zero. This is the set left over after repeatedly deleting the middle third open intervals.
The Cantor set C0
We shall start from the closed unit interval [0,1]. At the first stage, we delete the middle open interval (1/3, 2/3) from [0, 1]. We shall enumerate the open intervals to be deleted. We denote (1/3, 2/3) by I(1,1). That is I(1,1) = (1/3,2/3). The first construction gives us the remaining 2 closed intervals
[0, 1] - I(1,1) = [0,1] - (1/3,2/3) = [0, 1/3] È[2/3, 1].
Then at the second stage we delete the middle third open interval from each of the closed intervals. Thus there are 2 open intervals to be deleted and they are
I(2,1) = 1/3 I(1, 1) =(1/9, 2/9) and
I(2, 2) = I(2, 1) + 2/3 , the translation of I(2,1) by 2/3.
Hence we are left with 4 = 22 closed intervals, two in [0, 1/3] and two in [2/3, 1]. By rescaling we see that the middle open third intervals of each of these 4 closed intervals are given by
I(3,1) = 1/3 I(2,1), I(3,2) =1/3 I(2,2) and
I(3, 3) = I(3,1) + 2/3, I(3, 4) = I(3,2) +2/3.
Repeating this construction, at the k-th stage, there are 2k-1 open intervals, each of length 1/3k, to be deleted from [0, 1] and for k ³ 3, they are given by
I(k, j) = 1/3 I(k-1, j), j = 1, 2, ¼, 2k-2 ;
I(k, 2k-2+ j) = I(k, j) + 2/3, j = 1, 2, ¼, 2k-2.
Let G be the union of these countable collection of disjoint open intervals,
.
Thus G is open. The Cantor set C0 is defined to be the complement of G in [0, 1]. That is C0 = [0, 1] - G and is therefore a closed subset of [0, 1]. Note that at the k-th stage the total length of the intervals to be deleted is 2k-1 (1/3k). Therefore, the measure of G, since the intervals in G are all disjoint, is given by
Thus the measure of C0 is equal to m([0, 1]) - m(G) = 1-1 = 0.
In order to define the Cantor function, we shall consider another representation of the Cantor set C0 . This will also show that C0 has the same cardinality as [0, 1] and so it is uncountable. We shall make use of the ternary representation of numbers in [0, 1]. There is a unique way of representing real numbers in (0, 1] by non-terminating ternary expansion.
We shall redefine these real numbers as the supremum of a sequence of rational numbers.
Take any 1 ³ x > 0 in [0,1]. Choose the
greatest integer a1 such that a1/3
< x £. a1/3 + 1/3. Consider the rational numbers 0/3, 1/3 , 2/3. Then take the
largest of these which is < x.
That is we choose a1 in {0,1,2} such that
< x £
+
. Thus a1
= 0 Û 0/3 < x £ 1/3, a1 = 1 Û 1/3 < x £ 2/3 and a1 =
2 Û 2/3 < x £ 3/3=1. Let d1
= 0 +
. Then d1 < x £ d1 +
. Now
choose integer a2 in {0,1,2} such that d1 +
<
x £ d1
+ ![]()
. Let
now d2 = d1 +
= 0 +
+
. We have d2 < x £
d2 +
. Continuing like this we shall obtain a sequence a0
, a1 , a2 , ¼, of integers, 0 £ ai £ 2, such
that if dn = dn-1 +
= 0+
+
+ ¼+
then dn < x £ dn +
. The
symbol 0 × a1 a2 a3 ¼ is called the non-terminating ternary
expansion of x . So the set {d1 , d2 , d3
, d4 ¼ } is
bounded above by x. Then the least upper bound
or supremum of this set is x. We deduce
this as follows. If x ¹ sup{d1
, d2 , d3 , d4 ¼ } = M, then x > M . Then by the Archimedean property of the set of
real numbers, there exists a counting number m such that
<
x - M and so M
+
< x. Now take
a non-terminating ternary expansion 0 × c1 c2
c3 ¼ for
. Let L be the first
integer such that cL ¹ 0. Then
. Therefore, M +
< M +
< x. Hence since M = sup{d1
, d2 , d3 , d4 ¼ }, dL +
£ M +
< x and this contradicts x £ dL +
. Hence x = sup{d1
, d2 , d3 , d4 ¼ }.
This representation of the real number x in (0, 1] is unique. Suppose two non-terminating ternary expressions 0× a1 a2 a3 ¼ and 0× b1 b2 b3 ¼ are such that for some integer j ³ 0, aj ¹ bj and ai = bi for i £ j - 1. If 0× a1 a2 a3 ¼ represents x and 0× b1 b2 b3 ¼ represents y we shall show that x ¹ y. Suppose that aj < bj . By the hypothesis we have
0 +
+
+ ¼+
+
< x £ 0
+
+
+ ¼+
+
.
But 0 +
+
+ ¼+
+
= 0 +
+
+ ¼+
+
£ 0 +
+
+ ¼+
+
< y.
Therefore, x < y and so x ¹ y. Similarly if aj >
bj, we can show that x ¹ y. Thus this way of representing any real number in (0, 1] is unique. Let 0 be
represented by the terminating ternary expansion 0.00. Notice that for x =
, the corresponding non-terminating expansion is the
expression 0.022222¼¼ which is a
non-terminating expression with recurring '2'.
is represented by
0.122222....... a non-terminating expression with recurring
'2'. 1 is represented by 0.22222........ a
non-terminating expression with recurring '2'. We are going to change our
representation a little. We are going to use some terminating expansion in our
representation as follows. If the non-terminating expansion of x consists of
exactly one '1' in its expansion, say 0× a1 a2
a3 .... has precisely aj = 1 and that ak
= 2 for k > j ,
then we replace it by the terminating expansion 0× b1
b2 b3 ... bj -1
2 with bk = ak for k £ j-1, bj = 2 and bk
= 0 for k > j, hence this representation
has no '1' in it. Thus we are going to use the following convention: if a number x
in [0, 1] has two ternary expansion, one with no 1's and one with at least one '1', then
it is the one that has no 1's that is to be used. We refer to this as our system of
expansion. We are going to use this representation to describe the set G in [0, 1].
x Î I(1,1) = (1/3, 2/3) Û the non terminating expansion of x has a1 = 1 and a2 and subsequent aj are not all equal to 2 Û the system of expansion used for x has a1 = 1 (because the terminating expansion excluded here has no '1', namely 2/3 which has terminating expansion 0.2 and recurring expansion 0.12 , where the underscore denotes repeating infinitely many times the number underscored. )
Now dividing by 3 has the effect of shifting the expansion by one place to the right and introducing a zero, i.e. it has the effect of moving the ternary point to the left. Thus
x Î I(2,1) = 1/3 I(1,1) = (1/9, 2/9)
Û the system of expansion used for x has a1 = 0 and a2 = 1
Therefore, x Î I(2,2) = I(2,1) + 2/3
Û the system of expansion used for x has a1 = 2 and a2 = 1.
Hence x Î I(2,1)È I(2,2) Û the system of expansion used for x has the first '1' occurring in the second ternary place.
Then since I(3,1) = 1/3 I(2,1) and I(3,2) =1/3 I(2,2), x
Î I(3,1)È I(3,2) Û the system of expansion used for x
has a1 = 0 and has the first '1' occurring in the third
ternary place. Since I(3, 3) = I(3,1) + 2/3 and I(3, 4) = I(3,2)
+2/3, x Î I(3, 3)È I(3, 4) Û the system of expansion used for x
has a1 = 2 and has the first '1' occurring in the third
ternary place. Therefore, x Î I(3,1)È I(3,2)ÈI(3, 3)È I(3, 4) Û the system of expansion used for x has the first
'1' occurring in the third ternary place. Thus by induction we see that
Û the system of expansion used for x
has the first '1' occurring in the k-th ternary place.
Therefore,
Û the system of
expansion used for x has at least one '1'. Hence x Î C0 = [0, 1] - G
Û the system of expansion used
for x has no 1's. We have thus proved the following:
Lemma 1. C0 consists of those numbers in [0, 1] whose representation in the above system of ternary expansion has no 1's. Therefore, we can write for any x in C0
,
where bk = 0 or 1.
Lemma 2. C0 is uncountable.
Proof. Define g : C0 ®[0, 1] by
, where
. Then since every real number y in
[0, 1] has a non-terminating binary representation of the form
or y = 0, we can
take x in C0 to be
or 0 if y = 0 and so g(x)
= y and thus g is surjective. Since [0, 1] is uncountable, C0 is
uncountable.
Next we would like to extend the function to all of [0, 1]. Let I(i, j) be denoted by the open interval (a(i, j), b(i, j)). Then {a(i, j), b(i, j)} Í C0 . We shall show that g(a(i, j)) = g(b(i, j)). It is easily seen that in our system of ternary representation
Hence
. We define for x in I(i, j), g(x) =
g(b(i, j)) = g(a(i, j)) . We shall
next show that g is a non decreasing function, that is g is a monotonic increasing
function (not necessarily strictly increasing).
Lemma 3. Represent the real numbers in [0, 1] by non-terminating ternary
expansions except for 0. In this representation suppose
and
. Then
1. x = y Û ai = bi for all i ³1.
2. x < y Û a1 < b1 or there exists integer k ³ 2 such that ai = bi for all 1 £ i £ k -1 and ak < bk.
Proof. Part 1 follows from the uniqueness of the representation of the numbers either by the non terminating ternary expansion or by the system of representation used above.
Suppose a1 < b1. Then
. Similarly, if there
exists integer k ³ 2 such
that ai = bi for
all 1 £ i £ k -1 and ak < bk, then
Now if x < y,
then y - x > 0. Then by the Archimedean
property of the real number system , there exists an integer k ³ 1 such that
. Then
Therefore,
. Then a1 £ b1 . For if a1
> b1,
then
and so
contradicting
. Hence either a1
< b1 or a1 = b1. If a1 = b1, then
Therefore, we conclude again that a2 £ b2. If a2
< b2,
then we get the conclusion of the Lemma. Because
ai = bi for 1
£ i < j and aj < bj . This completes the proof.Next we shall see how our system of representation also gives the same conclusion. Observe that the convention used in our system of representation deviates from a non-terminating expansion only if the number is zero or has two possible ternary expansions with one of them involving no 1's. Thus if y has a terminating expansion either y = 0 or the terminating expansions has no 1's and ends in the number '2'. Thus if x < y , and y has a terminating ternary expansion,
Then we can write
Thus if
, x < y implies that
there exists integer k ³ 1
such that ai = ci
for all 1 £ i £ k -1 and ak < ck or a1 < c1, if k = 1 . If k £ l -1, then we have ai
= ci = bi' for all 1 £ i £ k -1 and ak < bk'. If k
= l , then we have ai
= bi' for all 1 £ i £ l -1 and al < cl = 1<
2 = bl'. If k > l , then we have ai
= bi' for all 1 £ i £ l -1 and al = cl
= 1 < 2 = bl'. Hence in
any case we obtain that x < y implies that there exists integer l ³ k > 1 such that ai
= bi' for all 1 £ i £ k -1 and ak < bk' or a1 < b1' .
Now if x has the terminating ternary expansion
, we can write it as
Hence
by what we have just proved, there exists integer l ³ k
> 1 such that di = bi' for all 1 £ i £ k -1 and dk < bk' or d1 < b1' . If p =1, then d1 = 1 and if d1
< b1'
, b1' =2 and l must be
greater than 1, otherwise x = y. Thus if p =1, then a1' = 2 =b1'
and there exists an integer l ³ k > 1 such that ai = bi' for all 1 £ i £ k -1 and ak '< bk' . We
now assume that p > 1. If k < p, then ai'= di = bi'
for all 1 £ i £ k -1 and ak'=
dk < bk'. If If k =
p, then ai'= di = bi'
for all 1 £ i £ k -1 and dk = dp = 1 < bp'.
Hence bp' = 2 and so ap'=2 = bp'. Then it
follows that l > p and that there exists q such that p < q £ l and ai'
= bi' for all 1 £ i £ q -1 and aq < bq'. Now
we shall see that k £ p.
if k > p, then we have ai'= di = bi'
for all 1 £ i £ k -1 and dk < bk'.
Therefore, ai'= di = bi'
for 1 £ i £ p -1, 1 = dp = bp' , hence
contradicting that bp' is even. Hence we have shown that the same
conclusion for Lemma 3 is true also for our system of representation of real numbers in
[0, 1] by ternary expansion. The converse of the statement is obvious.
We summarise what we have proved in the following lemma.
Lemma 4. Represent the real numbers in [0, 1] by the system of ternary expansion
as described above. In this representation suppose
and
. Then
1. x = y Û ai' = bi' for all i ³1.
2. x < y Û a1' < b1' or there exists integer k ³ 2 such that ai' = bi' for all 1 £ i £ k -1 and ak' < bk'.
Now we are ready to investigate the monotonicity of the function g:[0, 1] ®[0, 1]. We have shown that g is a surjective map.
Proposition 5. The map defined previously
g:[0, 1] ®[0, 1] is a bounded surjective monotonic increasing map. That is to say x < y Þ g(x) £ g(y).Proof. We have already seen that g is surjective. It is obviously bounded. Now suppose x and y are in C0 and that x < y. Then x and y have the representation:
a1 = 0 and b1 =1 or there exists integer k ³ 2 such that ai = bi for all 1 £ i £ k -1 and ak = 0 < bk = 1.
Note that
and
. Hence if a1 = 0 and b1 =1,
then
. If there exists integer k ³ 2 such that ai
= bi for all 1 £ i £ k -1 and ak = 0 < bk = 1,
then
Hence if x < y and x and y are in C0, then g(x) £ g(y). Suppose now that x < y, x Ï C0 and y is in C0. Then x Î I(i, j) = (a(i, j), b(i, j)) for some (i, j). Obviously x < y implies that a(i, j) < y and so since both a(i, j) and y are in C0 by what we have just proved, g(a(i, j)) £ g(y). Therefore, by definition of g, g(x) = g(a(i, j)) £ g(y).
Similarly, if x < y, y Ï C0 and x is in C0, then y Î I(i, j) = (a(i, j), b(i, j)) for some (i, j) and x < b(i, j). Again since both x and b(i, j) are in C0, we have g(x) £ g(b(i, j) ) = g(y).
Suppose now x < y, x, y Ï C0. Then for some (i, j) and (i', j') , x Î I(i, j), y Î I(i', j'). Then a(i, j) < x < y < b(i', j'). Therefore, since both a(i, j) and b(i', j') are in C0, g(a(i, j)) £ g(b(i', j') and so g(x) = g(a(i, j)) £ g(b(i', j') = g(y). Hence g is monotonically increasing. This completes the proof.
Next we shall state a general result concerning bounded monotone function whose range is an interval.
Theorem 6. Suppose f : [a, b] ®R is an increasing function. Then f is continuous if and only if the range of f , J = f ([a, b]) is an interval.
Proof. If f is continuous, then by the Intermediate Value Theorem, the range f ([a, b]) is an interval. Now if f is increasing, then by Theorem 2 of Monotone Function, Function of Bounded Variation, Fundamental Theorem of Calculus, the discontinuity of f can only be jump discontinuity. Suppose the range J is an interval. Suppose f is discontinuous at x = k in (a, b). Then we have
Note that for any y < k, because f is increasing
. Also
for any z > k,
. Therefore, ( f ( [a, k)È(k, b]))Ç( f (k -
), f (k +
)) =Æ. But by assumption
the range J is an interval and so ( f (k
- ), f (k
+ )) Í J and so ( f ( [a, k)È(k, b]))Ç( f (k -
), f (k +
)) = (J - { f (k)})Ç( f (k - ), f (k
+ ))¹ Æ. This contradicts
( f ( [a, k)È(k, b]))Ç( f (k - ), f (k
+ )) =Æ and so we have f (k - )
= f (k +
) = f (k) and so
and that means f is continuous at x = k. We
can similarly derive a contradiction when k = a or b. If f is not
continuous at k = a, then f (a) < f (a+)
and so ( f (a) , f (a+))ÇJ = Æ, contradicting ( f (a) , f (a+)) Í J. Thus f must be continuous at x = a. If f
is not continuous at k = b, then f (b-)
< f (b) and so ( f (b-)
, f (b))ÇJ = Æ, contradicting ( f (b-) , f (b))
Í J. Thus f must be
continuous at x = b. Thus f cannot have any discontinuity and so it is
continuous. This completes the proof.
Proposition 7. The function
g:[0, 1] ®[0, 1] is continuous.Proof. By Proposition 5, the map g is onto and increasing. Since the range [0, 1] is an interval, by Theorem 6, g is continuous.
Next we shall reveal some interesting facts concerning the Cantor set C0 .
Theorem 8. The Cantor set C0 is
(1) compact,
(2) no where dense, i.e., it contains no open intervals,
(3) its own boundary points,
(4) perfect, i.e.,it is its own set of accumulation point,
(5) totally disconnected and
(6) between any two points in C0 , there is an open interval not contained in C0.
Proof. (1) Since C0 is closed and bounded, it is compact by the Heine-Borel Theorem.
(2) Let
(3). C0 is closed, by a characterization of closed set, because C0 = [0, 1]
- G =(4) Since C0 is closed the set of accumulation points of C0, C0' is a subset of C0. We claim that C0' = C0 . Let x
Î C0 . Suppose x Ï C0' . Then there exists d > 0, such that ( x - d, x + d) Ç C0 = {x}. Thus ( x - d, x + d)Ç([0, 1] - C0) =( x - d, x) È (x, x + d). That means (x - d, x) Í ([0, 1] - C0) =(x
- d, x) =Since the collection
(x, x
+d) =We deduce as before then that for some p and q , (x
, x + d, ) Í I(p, q). Thus because x Ï I(p, q), x = inf I(p, q) = a(p, q). Therefore, a(p, q) = x = b(i, j), contradicting that a(p, q) ¹ b(i, j) by virtue of the definition of I(i, j)'s . Therefore x Î C0' . Thus C0' = C0 .(5) Since by (2) C0 does not contain any open interval and since the connected subsets of R are either the singleton sets or the intervals, the only components of C0 are the singletons {x}, x
Î C0. Therefore, C0 is totally disconnected.(6) Suppose x < y and x , y are in C0. Then (x, y)
Ç([0, 1] - C0) ¹ Æ and is a disjoint union of open intervals and so (x, y) contains at least one open interval.The Cantor Set Ck
Now we turn our attention towards constructing a Cantor set of positive measure in [0,
1]. Indeed the construction also applies to give a different Cantor set of measure zero.
The procedure of deleting the middle portion of each of the remaining closed sets is
followed here but using a different specified length. (This idea can be generalised by not
requiring the intervals to be deleted to be the middle portion, to give a generalised
Cantor set.) Hence the construction is canonical. Let k be any real number with
0 £ k < 1. Let d = (1 - k) > 0. Start with the closed unit interval [0, 1]. Let I(1,1)
be the middle open interval of length d/2. Then if we write I(1,1) = (a(1,1), b(1,1)),
then a(1,1) = 1/2 - d/4.
Then let G1 = I(1,1) and F1 = [0,1] - G1. The first stage
is to form F1 . Then F1 is the disjoint union of 2
closed intervals each of length equal to a(1,1) = 1/2 -
d/4. Let I(2,1) and I(2,2) be
respectively the middle open intervals, each of length d/23, of the 2 closed intervals of F1 .
The open intervals are ordered from the left to the right by the second indices. Let G2
= I(1,1)ÈI(2,1)ÈI(2,2). Suppose I(2,1) =
(a(2,1), b(2,1)). Then a(2,1) = 1/2 a(1,1) - d/24 = 1/22 -d(1/22 - 1/24). The second stage is
to form F2 = [0,1] - G2 = [0,1] - G1 - I(2,1)ÈI(2,2). Then F2 is the disjoint union of 22
closed intervals each of length equaling a(2,1) = 1/22 -d(1/22 - 1/24). Then let I(3,
j), j =1,..,22 be respectively the middle portion of length d/25 of the 22
closed intervals again ordered by the second indices, from left to right in the sense that
I(3, j) < I(3, k) if and only if j < k if
and only if there exist x in I(3, j) and y in I(3, k)
such that x < y. Stage three is to remove from F2 these
22 open intervals, that is to form F3 = F2 - È{ I(3, j), j =1,..,22 } = [0, 1] - G3 , where
. Thus F3
consists of 23 disjoint closed intervals, each of length equaling a(3,1), where I(3,1)
= (a(3,1), b(3,1)). It is easily seen that a(3,1) = 1/23 -d(1/23 - 1/26). Let
, where I(4,
j), j =1,..,23 are the middle open intervals each of length d/27, of each of the
disjoint closed intervals. Continuing in this way, at the n-th
stage we have
, where I(n, j)
= (a(n, j), b(n, j)) is an open interval of
length d/22n-1 and the I(n, j)'s
are ordered from left to right by the second indices according to the natural ordering of
elements in the intervals. Note that a(n,1) = 1/2 a(n-1,1)-d(1/22n) =1/2n -d(1/2n-1/22n). We
obtain Fn by deleting from Fn-1 the uion of open intervals Hn. Hence
Fn = Fn-1- Hn = [0,1] - Gn , where
and Fn is
a disjoint union of 2n closed intervals each of length equaling a(n,1) = 1/2n -d(1/2n-1/22n). To
obtained Fn+1 , we shall delete from Fn open intervals
each of length equaling d/2
2n+1 from the 2n closed
intervals. That is if we let
, where I(n+1, j), j =1,..,2n are these open intervals to be
deleted, then Fn+1 = Fn - Hn+1 =[0, 1] - Gn+1 ,
where
. Obviously Fn+1 is a
collection of disjoint closed intervals, each of length equal to a(n+1,1). Note also that for each n ³ 1, Fn+1 Í Fn.
Let
. Then G is a disjoint union of open intervals. The length or measure
of Hk is given by
. Thus the measure or length of G,
The Cantor set Ck is defined to be the complement of G in [0,
1]. That is,
. Thus the measure of Ck is equal to m([0,1]) - m(G)
= 1 - d = k.
Then Ck satisfies the properties stated in Theorem 8.
Theorem 9. The Cantor set Ck defined above for 0
£ k < 1 is uncountable and(1) compact,
(2) no where dense, i.e., it contains no open intervals,
(3) its own boundary points,
(4) perfect, i.e., is its own set of accumulation point,
(5) totally disconneceted and
(6) between any two points in Ck , there is an open interval not contained in Ck.
Proof. Part (1) and part (3) to part(6) are proved in exactly the same way as in
Theorem 8. Part (2) needs a different approach to prove. Suppose Ck
contains an open interval say (c, d). Then
implies that (c, d)Í Fk for each k
³ 1. Now since
,
there exists an integer N such that k > N implies that
Now we fix a k > N. Note first that (c,
d)ÇFk =(c, d). Since the
non-trivial interval (c, d) is connected and Fk is a
disjoint union of closed intervals, (c, d) must be contained in one of these
closed intervals, each of length
. Hence,
contradicting
Therefore, Ck does not contain an open interval. (The main thrust of the
argument is that the Fk consists of disjoint closed intervals, whose
maximum length tends to zero as k tends to infinity.) This proves part (2). That Ck
is uncountable is a consequence of the following proposition.
Proposition 10. There is a continuous strictly increasing bijective map f : [0, 1] ® [0,1] mapping the Cantor set Ck onto the the Cantor set C0 defined earlier using the "deleted middle third intervals" construction.
Proof. This is given by Lemma 1 of Composition and Riemann Integrability. Note that the proof given there applies also to the case k = 0.
Since C0 constructed using the "deleted middle third intervals" construction is uncountable and the function f given by Proposition 10 maps Ck bijectively onto C0, Ck is also uncountable.
ã Ng Tze Beng 2001