# Micro-Project: Supersequences

As a search problem, we can consider the problem of
stringing overlapping sequences together to form a larger
sequence. We will refer to this as the *supersequence*
problem. A good place to start is a special case:
superpermutations.

## Superpermutations

There has been a lot of interest lately in the problem of
constructing "superpermutations." The idea is this: given
a set of *n* letters starting at `A`

, find a sequence that
contains all permutations of that set.

For example, if *n=2*, the set will be *{A, B}* and the
possible permutations of that set are the two permutations
`AB`

and `BA`

. Thus, the sequence `ABA`

is a
superpermutation: it has as subsequences both `AB`

and
`BA`

. Of course, the sequence `BAB`

will work as well. Note
that these are shortest superpermutations: one can see that
no shorter sequence can possibly work.

For a superpermutation of size 3, we must construct a
sequence containing all 6 of the permutations of *{A, B,
C}*: these are

```
ABC ACB BAC BCA CAB CBA
```

One such superpermutation is

```
ABCABACBA
```

which can be seen to contain all six permutations

```
ABCABACBA
ABC
BCA
CAB
BAC
ACB
CBA
```

It is less obvious whether this is a shortest superpermutation, although it surely has to be close. It turns out that it is optimal.

For superpermutations up to size 5, there is a simple
recursive algorithm for generating a shortest
superpermutation. Greg Egan's
Superpermutations Page
delves deeply into the highly mathematical theory of
superpermutations, including the discoveries over the last
decade that have sparked new interest in the problem. The
Wikipedia page
describes the 2011 "Haruhi Problem" proof that started it: a
lower bound on minimal superpermutation length published on
*4chan* by an anonymous poster.

This is well-trodden ground, and we won't cover it further.

## Supersequences

A generalization of superpermutations goes like this: as
before, take an alphabet of *n* letters. However, instead of
trying to make our sequence cover all permutations of that
alphabet, we can list a specific set of subsequences that we
would like to cover.

For example, given the alphabet *{A,B,C}* we might try to
cover these three subsequences with a shortest sequence:

```
ABBA BAACC ACAB CB
```

One such supersequence would be

```
ABBACABAACCB
ABBA
ACAB
BAACC
CB
```

A shorter supersequence would be

```
ACABBAACCB
ACAB
ABBA
BAACC
CB
```

There's a lot of elegant math and computer science here, but let's not use any of it. Instead, let's do state-space search to try to find short supersequences covering a given set of subsequences!

## The Input

Inputs will be in a file formatted as follows:

The first line is an integer denoting the alphabet size.

The second line is a known upper bound on the supersequence length. This bound is there for convenience, and can be used or ignored. It is just a bound: there may be a shorter supersequence.

Subsequent lines are subsequences that must be covered.

For our previous example, the input might look like

```
3
10
ABBA
BAACC
ACAB
CB
```

You should run your program against all the instances in http://github.com/pdx-cs-ai/superseq-instances. You may choose to run against additional instances.

## The Output

The output of your program should be a supersequence that covers the given subsequences using the given alphabet, as in the examples above. We are looking for "short" sequences, not necessarily a provably shortest one: you may use complete search or local search. Your program must report a supersequence of some length within 10 seconds of CPU time on your local box. (Note that merely joining the subsequences will produce a supersequence, although it will be longer than necessary.)

## The Program

Your program should be named something like `superseq`

(for
example, `superseq.py`

). It should accept an input instance
file in the specified format as a command-line argument, and
produce the specified output.

## Hints

The best state space I have found seems to be to treat the ordering as the variable, and the subsequence as the value. For DFS, pick a subsequence to go first, then one to go second, etc. For local search, pick an ordering of the subsequences, then swap subsequences in the ordering.

You will probably want a function that determines the amount of overlap between the end of one sequence and the start of another. This should be carefully debugged, as it is a bit tricky and must work correctly.

Your search needs a method to stop after 10 seconds and print the best answer it has found so far. This normally involves getting the start time (in Python use

`time.time()`

) and then checking every so often to see if the current time is more than 10 seconds later. Getting the time is expensive, so you might want to let the thing run for a while before checking again: I've found that checking every 1000 search steps or so is a good compromise.

## Submission

The usual submission rules for programs in this course
apply. Submit a single ZIP archive named
`superseq.zip`

that contains the source code for your
program, a `Makefile`

if the program needs to be compiled,
and a `README.md`

containing a writeup describing your
solution to the problem. Your writeup should include
sequence length and runtime for each of the test instances
given in the test repo
http://github.com/pdx-cs-ai/superseq-instances.