ContentsIndex
Lcs
Description

LCS stands for Longest Common Subsequence, and it is a relatively challenging problem to find an LCS efficiently. This module implements the algorithm described in:

An O(ND) Difference Algorithm and its Variations, Eugene Myers, Algorithmica Vol. 1 No. 2, 1986, pp. 251-266; especially the variation described in section 4.2 and most refinements implemented in GNU diff (D is the edit-distance).

There is currently no heuristic to reduce the running time and produce suboptimal output for large inputs with many differences. It behaves like GNU diff with the -d option in this regard.

In the first step, a hash value for every line is calculated and collisions are marked with a special value. This reduces a string comparison to an int comparison for line tuples where at least one of the hash values is not equal to the special value. After that, lines which only exists in one of the files are removed and marked as changed which reduces the running time of the following difference algorithm. GNU diff additionally removes lines that appear very often in the other file in some cases. The last step tries to create longer changed regions and line up deletions in the first file to insertions in the second by shifting changed lines forward and backward.

Synopsis
getChanges :: [ByteString] -> [ByteString] -> [(Int, [ByteString], [ByteString])]
dropStart :: PArray -> PArray -> Int -> [(Int, [ByteString], [ByteString])]
dropEnd :: PArray -> PArray -> Int -> Int -> [(Int, [ByteString], [ByteString])]
getSlice :: PArray -> Int -> Int -> [ByteString]
getChanges' :: (PArray, (Int, Int)) -> (PArray, (Int, Int)) -> [(Int, [ByteString], [ByteString])]
markColl :: Int32
diffArr :: [(Int, Int32)] -> [(Int, Int32)] -> (PArray, (Int, Int)) -> (PArray, (Int, Int)) -> (BArray, BArray)
cmpseq :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> BSTArray s -> BSTArray s -> Int -> Int -> Int -> Int -> ST s Int
findDiag :: Int -> HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> VSTArray s -> VSTArray s -> Int -> Int -> Int -> Int -> Int -> Bool -> ST s (Int, Int, Int)
findOne :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> VSTArray s -> Int -> Int -> Int -> Int -> Int -> ST s Int
findSnake :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> Int -> Int -> Int -> Int -> Int -> Int -> Int
findOneRev :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> VSTArray s -> Int -> Int -> Int -> Int -> ST s Int
findSnakeRev :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> Int -> Int -> Int -> Int -> Int
shiftBoundaries :: BSTArray s -> BSTArray s -> PArray -> Int -> Int -> ST s ()
nextUnchanged :: BSTArray s -> Int -> ST s Int
skipOneUnChanged :: BSTArray s -> Int -> ST s Int
nextUnchangedN :: BSTArray s -> Int -> Int -> ST s Int
nextChanged :: BSTArray s -> Int -> ST s (Maybe Int)
prevUnchanged :: BSTArray s -> Int -> ST s Int
type HArray = UArray Int Int32
type BArray = UArray Int Bool
type PArray = Array Int ByteString
type MapArray = UArray Int Int
type VSTArray s = STUArray s Int Int
type BSTArray s = STUArray s Int Bool
initV :: Int -> ST s (VSTArray s)
initVRev :: Int -> Int -> ST s (VSTArray s)
initVChanged :: Int -> ST s (BSTArray s)
initH :: [Int32] -> HArray
initM :: [Int] -> MapArray
initP :: [ByteString] -> PArray
aLen :: IArray a e => a Int e -> Int
aLenM :: MArray a e m => a Int e -> m Int
convertPatch :: Int -> PArray -> PArray -> (Int, Int, Int, Int) -> (Int, [ByteString], [ByteString])
getInsert :: PArray -> Int -> Int -> [ByteString]
getDelete :: PArray -> Int -> Int -> [ByteString]
createPatch :: BArray -> BArray -> [(Int, Int, Int, Int)]
createP :: BArray -> BArray -> Int -> Int -> [(Int, Int, Int, Int)]
skipChangedRev :: BArray -> Int -> Int
Documentation
getChanges :: [ByteString] -> [ByteString] -> [(Int, [ByteString], [ByteString])]
create a list of changes between a and b, each change has the form (starta, lima, startb, limb) which means that a[starta, lima) has to be replaced by b[startb, limb)
dropStart :: PArray -> PArray -> Int -> [(Int, [ByteString], [ByteString])]
dropEnd :: PArray -> PArray -> Int -> Int -> [(Int, [ByteString], [ByteString])]
getSlice :: PArray -> Int -> Int -> [ByteString]
getChanges' :: (PArray, (Int, Int)) -> (PArray, (Int, Int)) -> [(Int, [ByteString], [ByteString])]
markColl :: Int32
mark hash value where collision occured
diffArr :: [(Int, Int32)] -> [(Int, Int32)] -> (PArray, (Int, Int)) -> (PArray, (Int, Int)) -> (BArray, BArray)
return arrays with changes in a and b (1 indexed), offsets start with 0
cmpseq :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> BSTArray s -> BSTArray s -> Int -> Int -> Int -> Int -> ST s Int
set changes array for a and b and return number of changed lines
findDiag :: Int -> HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> VSTArray s -> VSTArray s -> Int -> Int -> Int -> Int -> Int -> Bool -> ST s (Int, Int, Int)
return (xmid, ymid, cost) for the two substrings a[off_a+1..off_a+1+l_a] and b
findOne :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> VSTArray s -> Int -> Int -> Int -> Int -> Int -> ST s Int
find position on diag d with one more insert/delete going forward
findSnake :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> Int -> Int -> Int -> Int -> Int -> Int -> Int
follow snake from northwest to southeast, x and y are absolute positions
findOneRev :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> VSTArray s -> Int -> Int -> Int -> Int -> ST s Int
find position on diag d with one more insert/delete going backward
findSnakeRev :: HArray -> HArray -> PArray -> PArray -> MapArray -> MapArray -> Int -> Int -> Int -> Int -> Int
follow snake from southeast to northwest, x and y are absolute positions
shiftBoundaries :: BSTArray s -> BSTArray s -> PArray -> Int -> Int -> ST s ()
try to create nicer diffs by shifting around regions of changed lines
nextUnchanged :: BSTArray s -> Int -> ST s Int
goto next unchanged line, return the given line if unchanged
skipOneUnChanged :: BSTArray s -> Int -> ST s Int
skip at least one unchanged line, if there is none advance behind the last line
nextUnchangedN :: BSTArray s -> Int -> Int -> ST s Int
goto n-th next unchanged line
nextChanged :: BSTArray s -> Int -> ST s (Maybe Int)
goto next changed line, return the given line if changed
prevUnchanged :: BSTArray s -> Int -> ST s Int
goto previous unchanged line, return the given line if unchanged
type HArray = UArray Int Int32
type BArray = UArray Int Bool
type PArray = Array Int ByteString
type MapArray = UArray Int Int
type VSTArray s = STUArray s Int Int
type BSTArray s = STUArray s Int Bool
initV :: Int -> ST s (VSTArray s)
initVRev :: Int -> Int -> ST s (VSTArray s)
initVChanged :: Int -> ST s (BSTArray s)
initH :: [Int32] -> HArray
initM :: [Int] -> MapArray
initP :: [ByteString] -> PArray
aLen :: IArray a e => a Int e -> Int
aLenM :: MArray a e m => a Int e -> m Int
convertPatch :: Int -> PArray -> PArray -> (Int, Int, Int, Int) -> (Int, [ByteString], [ByteString])
getInsert :: PArray -> Int -> Int -> [ByteString]
getDelete :: PArray -> Int -> Int -> [ByteString]
createPatch :: BArray -> BArray -> [(Int, Int, Int, Int)]
createP :: BArray -> BArray -> Int -> Int -> [(Int, Int, Int, Int)]
skipChangedRev :: BArray -> Int -> Int
Produced by Haddock version 2.4.2