Stuck on a simple proof about regular expressions

Question

I'm trying to formalize some properties on regular expressions (REs) using Coq. But, I've got some troubles to prove a rather simple property:

For all strings s, if s is in the language of (epsilon)* RE, then s = "", where epsilon and * denotes the empty string RE and Kleene star operation.

This seems to be an obvious application of induction / inversion tactics, but I couldn't make it work.

The minimal working code with the problematic lemma is in the following gist. Any tip on how should I proceed will be appreciated.

EDIT: One of my tries was something like:


forall s : string, s <<- (#1 ^*) -> s = ""

forall s : string, s <<- (#1 ^*) -> s = ""
s: string
H: s <<- (#1 ^*)

s = ""
s: string
H0: s <<- #1 :+: #1 @ (#1 ^*)

s = ""
s: string
H: s <<- #1

s = ""
s: string
H: s <<- #1 @ (#1 ^*)
s = ""
s: string
H: s <<- #1

s = ""
now inversion_clear H.
s: string
H: s <<- #1 @ (#1 ^*)

s = ""
s, s0, s': string
H0: s0 <<- #1
H1: s' <<- (#1 ^*)
H2: s = s0 ++ s'

s = ""
s, s0, s': string
H0: s0 <<- #1
H1: s' <<- (#1 ^*)

s0 ++ s' = ""
s0, s': string
H0: s0 <<- #1
H1: s' <<- (#1 ^*)

s0 ++ s' = ""
s0, s': string
H1: s' <<- (#1 ^*)

"" ++ s' = ""
s': string
H1: s' <<- (#1 ^*)

"" ++ s' = ""
s': string
H1: s' <<- (#1 ^*)

s' = ""

that leave me with the following goal:

1 subgoal s' : string H1 : s' <<- (#1 ^*) ============================ s' = ""

At least to me, it appears that using induction would finish the proof, since I could use H1 in induction hypothesis to finish the proof, but when I start the proof using

    

"" = ""
c: ascii
String c "" = ""
e, e': regex
s, s', s1: string
H: s <<- e
H0: s' <<- e'
H1: s1 = s ++ s'
IHin_regex1: s = ""
IHin_regex2: s' = ""
s1 = ""
s: string
e, e': regex
H: s <<- e
IHin_regex: s = ""
s = ""
s': string
e, e': regex
H: s' <<- e'
IHin_regex: s' = ""
s' = ""
s: string
e: regex
H: s <<- #1 :+: e @ (e ^*)
IHin_regex: s = ""
s = ""

instead of

    
s: string
H0: s <<- #1 :+: #1 @ (#1 ^*)

s = ""

I got some (at least for me) senseless goals. In Idris / Agda, such proof just follows by pattern matching and recursion over the structure of s <<- (#1 ^*). My point is how to do such recursion in Coq.

Answer (Anton Trunov)

Here is one possible solution of the original problem:


forall s : string, s <<- (#1 ^*) -> s = ""

forall s : string, s <<- (#1 ^*) -> s = ""
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s: string
prf: s <<- (#1 ^*)

s = ""
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s: string
H: s <<- #1 :+: #1 @ (#1 ^*)

s = ""
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s: string
H0: s <<- #1

s = ""
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s: string
H0: s <<- #1 @ (#1 ^*)
s = ""
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s: string
H0: s <<- #1

s = ""
now inversion H0.
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s: string
H0: s <<- #1 @ (#1 ^*)

s = ""
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s0, s': string
H: s0 <<- #1
H1: s' <<- (#1 ^*)

s0 ++ s' = ""
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s0, s': string
H1: s' <<- (#1 ^*)

"" ++ s' = ""
star_lemma: forall s : string, s <<- (#1 ^*) -> s = ""
s0, s': string
H1: s' <<- (#1 ^*)

"" ++ "" = ""
reflexivity. Qed.

The main idea is to introduce a term in the context which will resemble the recursive call in a typical Idris proof. The approaches with remember and dependent induction don't work well (without modifications of in_regex) because they introduce impossible to satisfy equations as induction hypotheses' premises.

Note: it can take a while to check this lemma (around 40 seconds on my machine under Coq 8.5pl3). I think it's due to the fact that the inversion tactic tends to generate big proof terms.


A: Nice! I really don't see how to prove it without using an hand-crafted fixpoint. Following your example, I factored the proof through a star_unfold lemma which says that if s <<- (e ^*) then exists n, s <<- ntimes n e. In the case of a full-blown library, this should isolate the expensive check to star_unfold alone given that later proofs can simply use induction on a natural number.

A: Very good point! Please consider making your comment into an answer -- I hear comments are not very reliable on SO. You could've also used Nat.iter instead of ntimes to make the code shorter: exists n, s <<- Nat.iter n (Cat e) #1. (But maybe it hurts readability a bit).

Answer (eponier)

This problem has obsessed me for a week, and I have finally found a solution that I find elegant.

I had already read that when an induction principle does not fit your needs, you can write and prove another one, more adapted to your problem. That is what I have done in this case. What we would want is the one obtained when using the more natural definition given in this answer. By doing this, we can keep the same definition (if changing it implies too many changes, for example) and reason about it more easily.

Here is the proof of the induction principle (I use a section to specify precisely the implicit arguments, since otherwise I observe strange behaviours with them, but the section mechanism is not necessary at all here).

Section induction_principle.

Context (P : string -> regex -> Prop)
  (H_InEps : P "" #1)
  (H_InChr : forall c, P (String c "") ($ c))
  (H_InCat : forall {e e' s s' s1}, s <<- e -> P s e -> s' <<- e' ->
    P s' e' -> s1 = s ++ s' -> P s1 (e @ e'))
  (H_InLeft : forall {s e e'}, s <<- e -> P s e -> P s (e :+: e'))
  (H_InRight : forall {s' e e'}, s' <<- e' -> P s' e' -> P s' (e :+: e'))
  (H_InStar_Eps : forall e, P "" (e ^*))
  (H_InStar_Cat : forall {s1 s2 e}, s1 <<- e -> s2 <<- (e ^*) ->
    P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)).

Arguments H_InCat {_ _ _ _ _} _ _ _ _ _.
Arguments H_InLeft {_ _ _} _ _.
Arguments H_InRight {_ _ _} _ _.
Arguments H_InStar_Cat {_ _ _} _ _ _ _.

P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)

forall (s : string) (r : regex), s <<- r -> P s r
P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)

forall (s : string) (r : regex), s <<- r -> P s r
P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)
in_regex_ind2: forall (s : string) (r : regex), s <<- r -> P s r
s: string
r: regex
prf0: s <<- r
s0: string
r0: regex
prf: s0 <<- #1 :+: r0 @ (r0 ^*)

P s0 (r0 ^*)
P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)
in_regex_ind2: forall (s : string) (r : regex), s <<- r -> P s r
s: string
r: regex
prf0: s <<- r
s0: string
r0: regex
prf: s0 <<- #1 :+: r0 @ (r0 ^*)
H1: s0 <<- #1

P s0 (r0 ^*)
P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)
in_regex_ind2: forall (s : string) (r : regex), s <<- r -> P s r
s: string
r: regex
prf0: s <<- r
s0: string
r0: regex
prf: s0 <<- #1 :+: r0 @ (r0 ^*)
H1: s0 <<- r0 @ (r0 ^*)
P s0 (r0 ^*)
P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)
in_regex_ind2: forall (s : string) (r : regex), s <<- r -> P s r
s: string
r: regex
prf0: s <<- r
s0: string
r0: regex
prf: s0 <<- #1 :+: r0 @ (r0 ^*)
H1: s0 <<- #1

P s0 (r0 ^*)
P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)
in_regex_ind2: forall (s : string) (r : regex), s <<- r -> P s r
s: string
r: regex
prf0: s <<- r
s0: string
r0: regex
prf: s0 <<- #1 :+: r0 @ (r0 ^*)
H1: s0 <<- #1
H: "" = s0

P "" (r0 ^*)
apply H_InStar_Eps.
P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)
in_regex_ind2: forall (s : string) (r : regex), s <<- r -> P s r
s: string
r: regex
prf0: s <<- r
s0: string
r0: regex
prf: s0 <<- #1 :+: r0 @ (r0 ^*)
H1: s0 <<- r0 @ (r0 ^*)

P s0 (r0 ^*)
P: string -> regex -> Prop
H_InEps: P "" #1
H_InChr: forall c : ascii, P (String c "") ($ c)
H_InCat: forall (e e' : regex) (s s' s1 : string), s <<- e -> P s e -> s' <<- e' -> P s' e' -> s1 = s ++ s' -> P s1 (e @ e')
H_InLeft: forall (s : string) (e e' : regex), s <<- e -> P s e -> P s (e :+: e')
H_InRight: forall (s' : string) (e e' : regex), s' <<- e' -> P s' e' -> P s' (e :+: e')
H_InStar_Eps: forall e : regex, P "" (e ^*)
H_InStar_Cat: forall (s1 s2 : string) (e : regex), s1 <<- e -> s2 <<- (e ^*) -> P s1 e -> P s2 (e ^*) -> P (s1 ++ s2) (e ^*)
in_regex_ind2: forall (s : string) (r : regex), s <<- r -> P s r
s: string
r: regex
prf0: s <<- r
r0: regex
s1, s': string
H1: (s1 ++ s') <<- r0 @ (r0 ^*)
prf: (s1 ++ s') <<- #1 :+: r0 @ (r0 ^*)
H2: s1 <<- r0
H4: s' <<- (r0 ^*)

P (s1 ++ s') (r0 ^*)
apply H_InStar_Cat; try assumption; apply in_regex_ind2; assumption. Qed. End induction_principle.

And it turned out that the Qed of this proof was not instantaneous (probably due to inversion producing large terms as in this answer), but took less than 1s (maybe because the lemma is more abstract).

The star_lemma becomes nearly trivial to prove (as soon as we know the remember trick), as with the natural definition.


forall s : string, s <<- (#1 ^*) -> s = ""

forall s : string, s <<- (#1 ^*) -> s = ""
s: string
H: s <<- (#1 ^*)

s = ""
s: string
r: regex
Heqr: r = #1 ^*
H: s <<- r

s = ""
e: regex
Heqr: e ^* = #1 ^*

"" = ""
e: regex
Heqr: e ^* = #1 ^*
s1, s2: string
H: s1 <<- e
H0: s2 <<- (e ^*)
IHin_regex1: e = #1 ^* -> s1 = ""
IHin_regex2: e ^* = #1 ^* -> s2 = ""
s1 ++ s2 = ""
e: regex
Heqr: e ^* = #1 ^*

"" = ""
reflexivity.
e: regex
Heqr: e ^* = #1 ^*
s1, s2: string
H: s1 <<- e
H0: s2 <<- (e ^*)
IHin_regex1: e = #1 ^* -> s1 = ""
IHin_regex2: e ^* = #1 ^* -> s2 = ""

s1 ++ s2 = ""
Heqr: #1 ^* = #1 ^*
s1, s2: string
IHin_regex2: #1 ^* = #1 ^* -> s2 = ""
IHin_regex1: #1 = #1 ^* -> s1 = ""
H0: s2 <<- (#1 ^*)
H: s1 <<- #1

s1 ++ s2 = ""
Heqr: #1 ^* = #1 ^*
s1, s2: string
IHin_regex2: #1 ^* = #1 ^* -> s2 = ""
IHin_regex1: #1 = #1 ^* -> s1 = ""
H0: s2 <<- (#1 ^*)
H: s1 <<- #1
H1: "" = s1

"" ++ s2 = ""
Heqr: #1 ^* = #1 ^*
s1, s2: string
IHin_regex2: #1 ^* = #1 ^* -> s2 = ""
IHin_regex1: #1 = #1 ^* -> s1 = ""
H0: s2 <<- (#1 ^*)
H: s1 <<- #1
H1: "" = s1

"" ++ "" = ""
reflexivity. Qed.

Answer (Zimm i48)

I modified a bit the definition of your in_regex predicate:

Inductive in_regex : string -> regex -> Prop :=
| InEps : "" <<- #1
| InChr : forall c, (String c EmptyString) <<- ($ c)
| InCat : forall e e' s s' s1,
    s <<- e -> s' <<- e' -> s1 = s ++ s' -> s1 <<- (e @ e')
| InLeft : forall s e e', s <<- e -> s <<- (e :+: e')
| InRight : forall s' e e',  s' <<- e' -> s' <<- (e :+: e')
| InStarLeft : forall e, "" <<- (e ^*)
| InStarRight : forall s s' e, s <<- e -> s' <<- (e ^*) -> (s ++ s') <<- (e ^*)
where "s '<<-' e" := (in_regex s e).

and could prove your lemma:


forall s : string, s <<- (#1 ^*) -> s = ""

forall s : string, s <<- (#1 ^*) -> s = ""
s: string
H: s <<- (#1 ^*)

s = ""
s: string
r: regex
Heqr: r = #1 ^*
H: s <<- r

s = ""
e: regex
s, s': string
H: s <<- e
H0: s' <<- (e ^*)
IHin_regex1: e = #1 ^* -> s = ""
IHin_regex2: e ^* = #1 ^* -> s' = ""
H2: e = #1

s ++ s' = ""
s, s': string
IHin_regex2: #1 ^* = #1 ^* -> s' = ""
IHin_regex1: #1 = #1 ^* -> s = ""
H0: s' <<- (#1 ^*)
H: s <<- #1

s ++ s' = ""
s, s': string
IHin_regex2: #1 ^* = #1 ^* -> s' = ""
IHin_regex1: #1 = #1 ^* -> s = ""
H0: s' <<- (#1 ^*)
H: s <<- #1

s ++ "" = ""
inversion H; trivial. Qed.

Some explanations are necessary.

  1. I did an induction on H. The reasoning is: if I have a proof of s <<- (#1 ^*) then this proof must have the following form...

  2. The tactic remember create a new hypothesis Heqr which, combined with inversion will help get rid of cases which cannot possibly give this proof (in fact all the cases minus the ones where ^* is in the conclusion).

  3. Unfortunately, this path of reasoning does not work with the definition you had for the in_regex predicate because it will create an unsatisfiable condition to the induction hypothesis. That's why I modified your inductive predicate as well.

  4. The modified inductive tries to give a more basic definition of being in (e ^*). Semantically, I think this is equivalent.

I would be interested to read a proof on the original problem.


A: The proof of the original problem is here :).