This project implements a sophisticated cipher decryption technique that combines Markov chain analysis with an evolutionary algorithm. The system breaks substitution ciphers by using:
- Markov chain bigram frequency analysis as a fitness heuristic
- Evolutionary algorithm as a meta-heuristic search strategy
- Uses bigram frequency probabilities to evaluate decryption quality
- Calculates a score based on the likelihood of character sequences
- Derives probabilities from a reference text corpus
- Generates and evolves potential decryption keys
- Applies genetic algorithm principles to explore solution space
- Employs strategies like:
- Population-based search
- Crossover of candidate solutions
- Mutation to introduce variation
-
Frequency Analysis
- Preprocesses reference text to create bigram probability matrix
- Calculates log-probabilities for character sequences
- Generates a comprehensive frequency dictionary
-
Evolutionary Decryption
- Initializes population of random substitution alphabets
- Evaluates fitness using Markov chain bigram probabilities
- Iteratively improves solutions through:
- Tournament selection
- Partially mapped crossover (PMX)
- Controlled mutation
- Python 3.7+
- Dependencies:
unidecode tqdm
git clone <repository-url>
pip install unidecode tqdm
Generate bigram frequency dictionary:
python process_data.py <reference_text_corpus>
Break substitution cipher:
python crack_bigram_evol.py <encrypted_text_file>
population_size
: Number of candidate decryption keysmutation_rate
: Probability of introducing random changeselite_size
: Top solutions preserved between generationsgenerations
: Number of evolutionary iterations
breaker = CipherBreaker(
encrypted_text,
freq_dict,
population_size=100, # Larger population increases search depth
mutation_rate=0.9, # High mutation explores more solutions
elite_size=3 # Preserves top-performing candidates
)
- Generate initial random substitution alphabets
- Evaluate each alphabet's decryption quality using Markov chain probabilities
- Select top-performing solutions
- Create new generation through:
- Crossover of promising candidates
- Controlled random mutations
- Repeat for specified number of generations
- Return best-performing decryption key
-
Strengths
- Handles complex substitution ciphers
- Adaptable to different text domains
- Explores large solution spaces efficiently
-
Limitations
- Computational intensity increases with complexity
- Solution quality depends on reference text
- Not guaranteed to find perfect decryption
- Fitness Scoring: Log-probability of bigram sequences
- Search Strategy: Genetic algorithm with tournament selection
- Exploration vs Exploitation: Balanced through mutation and crossover
- Implement adaptive mutation rates
- Integrate trigram or higher-order Markov analysis
- Parallel processing of candidate solutions