Longest Common Substring Algorithm
Find the longest common substring!
For example, given two strings: 'academy' and 'abracadabra', the common and the longest is 'acad'.
Another example: ''ababc', 'abcdaba'. For this one, we have two substrings with length of 3: 'abc' and 'aba'.
There are several algorithms to solve this problem such as Generalized suffix tree.
In this page, I'll solve the problem brute force like way with mxn complexity where m and n are the lengths of the two given strings.
The code looks like this:
def lcs(S,T): m = len(S) n = len(T) counter = [[0]*(n+1) for x in range(m+1)] longest = 0 lcs_set = set() for i in range(m): for j in range(n): if S[i] == T[j]: c = counter[i][j] + 1 counter[i+1][j+1] = c if c > longest: lcs_set = set() longest = c lcs_set.add(S[i-c+1:i+1]) elif c == longest: lcs_set.add(S[i-c+1:i+1]) return lcs_set # test 1 ret = lcs('academy', 'abracadabra') for s in ret: print s python_solutions.php # test 2 ret = lcs('ababc', 'abcdaba') for s in ret: print s
Output:
acad aba
- Initally, we initialized the counter array all 0:
m = len(S) n = len(T) counter = [[0]*(n+1) for x in range(m+1)]
Note that the array size is (m+1)x(n+1). - Starting from the 1st row, we will compare the fist character of a string S with all characters in a string T.
for i in range(m): for j in range(n): if S[i] == T[j]:
- While we traverses the characters in T, if it matches with the character in S, we increment the counter. It will be saved counter[i+1][j+1] which is at diagonally one lower position.
if S[i] == T[j]: c = counter[i][j] + 1 counter[i+1][j+1] = c
python_solutions.php
The figure shows when 'a' in S meets 'a' in T, we make an increment to the counter and stores it at (i+1, j+1). Also, if the counter is greater than the longest, we should update. Also, we reset the lcs_set and add the string.if c > longest: lcs_set = set() longest = c lcs_set.add(S[i-c+1:i+1]) elif c == longest: lcs_set.add(S[i-c+1:i+1])
If the counter is the same as the current longest, it does not reset the lcs_set. It just add the substring to the set.
- The picture below is the final state of the code:
When 'd' meets 'd', the counter is updated to 4 which means the longest substring is 4. So, it takes 4 string from the current i index which is 3, and add it the the set.lcs_set.add(S[i-c+1:i+1])
So, at that point, the set has 'acad' substring! - Finally, the lcs() returns the set lcs_set
return lcs_set
Q: Given a string, find the longest substring that contains at most 2 distinct characters.
Samples: 'ababcbcbaaabbdef', 'ababcbcbaaabbdefggggg'
Answer is here: Solutions.
Python tutorial
Python Home
Introduction
Running Python Programs (os, sys, import)
Modules and IDLE (Import, Reload, exec)
Object Types - Numbers, Strings, and None
Strings - Escape Sequence, Raw String, and Slicing
Strings - Methods
Formatting Strings - expressions and method calls
Files and os.path
Traversing directories recursively
Subprocess Module
Regular Expressions with Python
Regular Expressions Cheat Sheet
Object Types - Lists
Object Types - Dictionaries and Tuples
Functions def, *args, **kargs
Functions lambda
Built-in Functions
map, filter, and reduce
Decorators
List Comprehension
Sets (union/intersection) and itertools - Jaccard coefficient and shingling to check plagiarism
Hashing (Hash tables and hashlib)
Dictionary Comprehension with zip
The yield keyword
Generator Functions and Expressions
generator.send() method
Iterators
Classes and Instances (__init__, __call__, etc.)
if__name__ == '__main__'
argparse
Exceptions
@static method vs class method
Private attributes and private methods
bits, bytes, bitstring, and constBitStream
json.dump(s) and json.load(s)
Python Object Serialization - pickle and json
Python Object Serialization - yaml and json
Priority queue and heap queue data structure
Graph data structure
Dijkstra's shortest path algorithm
Prim's spanning tree algorithm
Closure
Functional programming in Python
Remote running a local file using ssh
SQLite 3 - A. Connecting to DB, create/drop table, and insert data into a table
SQLite 3 - B. Selecting, updating and deleting data
MongoDB with PyMongo I - Installing MongoDB ...
Python HTTP Web Services - urllib, httplib2
Web scraping with Selenium for checking domain availability
REST API : Http Requests for Humans with Flask
Blog app with Tornado
Multithreading ...
Python Network Programming I - Basic Server / Client : A Basics
Python Network Programming I - Basic Server / Client : B File Transfer
Python Network Programming II - Chat Server / Client
Python Network Programming III - Echo Server using socketserver network framework
Python Network Programming IV - Asynchronous Request Handling : ThreadingMixIn and ForkingMixIn
Python Coding Questions I
Python Coding Questions II
Python Coding Questions III
Python Coding Questions IV
Python Coding Questions V
Python Coding Questions VI
Python Coding Questions VII
Python Coding Questions VIII
Python Coding Questions IX
Python Coding Questions X
Image processing with Python image library Pillow
Python and C++ with SIP
PyDev with Eclipse
Matplotlib
Redis with Python
NumPy array basics A
NumPy Matrix and Linear Algebra
Pandas with NumPy and Matplotlib
Celluar Automata
Batch gradient descent algorithm
Longest Common Substring Algorithm
Python Unit Test - TDD using unittest.TestCase class
Simple tool - Google page ranking by keywords
Google App Hello World
Google App webapp2 and WSGI
Uploading Google App Hello World
Python 2 vs Python 3
virtualenv and virtualenvwrapper
Uploading a big file to AWS S3 using boto module
Scheduled stopping and starting an AWS instance
Cloudera CDH5 - Scheduled stopping and starting services
Removing Cloud Files - Rackspace API with curl and subprocess
Checking if a process is running/hanging and stop/run a scheduled task on Windows
Apache Spark 1.3 with PySpark (Spark Python API) Shell
Apache Spark 1.2 Streaming
bottle 0.12.7 - Fast and simple WSGI-micro framework for small web-applications ...
Flask app with Apache WSGI on Ubuntu14/CentOS7 ...
Fabric - streamlining the use of SSH for application deployment
Ansible Quick Preview - Setting up web servers with Nginx, configure enviroments, and deploy an App
Neural Networks with backpropagation for XOR using one hidden layer
NLP - NLTK (Natural Language Toolkit) ...
RabbitMQ(Message broker server) and Celery(Task queue) ...
OpenCV3 and Matplotlib ...
Simple tool - Concatenating slides using FFmpeg ...
iPython - Signal Processing with NumPy
iPython and Jupyter - Install Jupyter, iPython Notebook, drawing with Matplotlib, and publishing it to Github
iPython and Jupyter Notebook with Embedded D3.js
Downloading YouTube videos using youtube-dl embedded with Python
Machine Learning : scikit-learn ...
Django 1.6/1.8 Web Framework ...
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization