ImageMagick montage natural file order - imagemagick

I am trying to use the ImageMagick montage feature to combine chunks of maps from a game. My issue is that the game's original files are naturally ordered like
part1.png
part2.png
...
part10.png
ImageMagick reads this and will tile part10.png after part1.png. Is there a flag/option to tell IM to read the directory the way I want it to? This is a live code sample of what I'm doing.
montage \
-alpha on \
-background none \
-mode concatenate \
-tile x{$grid} \
-mattecolor none \
{$input_dir}/*.png \
{$output_file}

You can possibly use sort -g (that's general numeric sort) to get what you want.
Test:
Create 12 dummy files:
for i in {1.12} ; do touch ${i}.txt ; done
List the files:
ls -1 *.txt
1.txt
10.txt
11.txt
12.txt
2.txt
3.txt
4.txt
5.txt
6.txt
7.txt
8.txt
9.txt
Use sort -g to list them in different order:
ls -1 *.txt | sort -g
1.txt
2.txt
3.txt
4.txt
5.txt
6.txt
7.txt
8.txt
9.txt
10.txt
11.txt
12.txt
Real Life:
Now apply this to solve your problem:
montage \
-alpha on \
-background none \
-mode concatenate \
-tile x{$grid} \
-mattecolor none \
$(ls -1 {$input_dir}/*.png | sort -g) \
{$output_file}
If your $input_dir name doesn't contain any spaces, this should work flawlessly.

Related

how to remove black bands that occur regularly in an image in imagemagick

I've captured an image, and the image capture extension have left regular black bands that occur at regular intervals (see example below)
Is there an imagemagick command to remove all bands at once? I've tried to run it recursively, using the below pseudo-code, without success:
for i=1 to height of image/1000
split image at 1000 pixels * i
crop 10 pixels, top
stitch image with cropped image
EDIT: changed example image to a full resolution one
Here is how to crop each white section of your slides in ImageMagick 6 in Unix.
#
# threshold image
# use morphology to close up small black or white regions
# convert to bilevel
# do connected-component processing to find all regions larger than 1000 pixels in area
# keep only gray(255) i.e. white regions and get the bounding box and color and replace WxH+X+Y with W H X Y.
# sort by Y (rather than area) and put the x and +s back to re-form WxH+X+Y
# loop over data to get the bounding box and crop the image
#
OLD_IFS=$IFS
IFS=$'\n'
arr=(`convert slides.jpg -threshold 25% \
-morphology close rectangle:5 +write x1.png \
-morphology open rectangle:5 +write x2.png \
-type bilevel \
-define connected-components:verbose=true \
-define connected-components:exclude-header=true \
-define connected-components:area-threshold=1000 \
-define connected-components:mean-color=true \
-connected-components 8 y.png | grep "gray(255)" | sed 's/[x+]/ /g' | awk '{print $2, $3, $4, $5}'`)
IFS=$OLD_IFS
num=${#arr[*]}
echo $num
echo "${arr[*]}"
# sort array by Y value
sortArr=(`echo "${arr[*]}" | sort -n -t " " -k4,4 | sed -n 's/^\(.*\) \(.*\) \(.*\) \(.*\)$/\1x\2+\3+\4/p'`)
echo "${sortArr[*]}"
for ((i=0; i<num; i++)); do
bbox="${sortArr[$i]}"
convert slides.jpg -crop $bbox +repage slides_section_$i.jpg
done
For Imagemagick 7, change "convert" to "magick"

Crop page from facing-page scan

Suppose I want to crop just the left pages from facing-page scans of a spiral notebook like the example below (from Paolini.net).
Is there a more robust way than simply dividing the image's width by half? For example, a smarter algorithm would detect the spiral binding and make that the right boundary and even exclude black area to the left of the page.
If there's a relatively easy way to do this with OpenCV or ImageMagick, I'd love to learn it.
One possible way in ImageMagick 6 with Unix scripting is to do the following:
Trim the image to remove most of the black on the sides
Scale the image down to 1 row, then scale up to 50 rows just for visualization
Threshold the scaled image so that you get the black region down the spine as the largest black region
Do connected components process to find the x coordinate of the largest black region
Crop the image according to the results from the connected components
Input:
convert img.jpg -fuzz 25% -trim +repage img_trim.png
convert img_trim.png -scale x1! -scale x50! -threshold 80% img_trim_x1.png
centx=$(convert img_trim_x1.png -type bilevel \
-define connected-components:mean-color=true \
-define connected-components:verbose=true \
-connected-components 4 null: | \
grep "gray(0)" | head -n 1 | awk '{print $3}' | cut -d, -f1)
convert img_trim.png -crop ${centx}x+0+0 img_result.jpg
Data from connected components has the following header and structure:
Objects (id: bounding-box centroid area mean-color):
So head -n 1 gets the first black, i.e. gray(0) region which is the largest (sorted largest to smallest). The awk prints the 3rd entry, centroid, and the cut gets the x component.
If using ImageMagick 7, then change convert to magick
If you want to exclude the binders in the middle, then use the x-offset of the bounding box from the connected components listing:
convert img_trim.png -scale x1! -scale x50! -threshold 80% img_trim_x1.png
leftcenterx=$(convert img_trim_x1.png -type bilevel \
-define connected-components:mean-color=true \
-define connected-components:verbose=true \
-connected-components 4 null: | \
grep "gray(0)" | head -n 1 | awk '{print $2}' | cut -d+ -f2 | cut -d+ -f1)
convert img_trim.png -crop ${leftcenterx}x+0+0 img_result2.jpg
If you want just both pages, then we can find the white regions, i.e. gray(255) and crop them according to the width and x offset from the bounding boxes.
convert img.jpg -fuzz 25% -trim +repage img_trim.png
convert img_trim.png -scale x1! -scale x50! -threshold 80% img_trim_x1.png
OLDIFS=$IFS
IFS=$'\n'
bboxArr=(`convert img_trim_x1.png -type bilevel \
-define connected-components:mean-color=true \
-define connected-components:area-threshold=100 \
-define connected-components:verbose=true \
-connected-components 4 null: | \
grep "gray(255)" | awk '{print $2}'`)
IFS=$OLDIFS
num=${#bboxArr[*]}
for ((i=0; i<num; i++)); do
WW=`echo ${bboxArr[$i]} | cut -dx -f1`
Xoff=`echo ${bboxArr[$i]} | cut -d+ -f2`
convert img_trim.png -crop ${WW}x+${Xoff}+0 img_result3_$i.jpg
done

Grep sorted files in S3 folder

I'd like to sort files in S3 folders and then check if files contain a certain string.
When I usually want to grep a file I do the following:
aws s3 cp s3://s3bucket/location/file.csv.gz - | zcat | grep 'string_to_find'
I see I can sort files like this:
aws s3api list-objects-v2 \
--bucket s3bucket \
--prefix location \
--query 'reverse(sort_by(Contents,&LastModified))'
Tried something like this so far but got broken pipe:
aws s3api list-objects-v2 \
--bucket s3bucket \
--prefix location \
--query 'reverse(sort_by(Contents,&LastModified))' | cp - | zcat | grep 'string_to_find'
You can specify which fields to output and force them into text-only:
aws s3api list-objects-v2 \
--bucket s3bucket \
--prefix location \
--query 'reverse(sort_by(Contents,&LastModified))[].[Key]' \
--output text
Basically, the sort_by and reverse output the Contents array, and this extracts the Key element. I put [Key] in square brackets to force each result onto its own line.

ImageMagick change all images to preset size with DPI

Trying to use imagemagick to have all images set to a preset size like a letterhead (8 1/2x11) for example... id prefer to not use resize and trying to get them to a 100 dpi setting... Im personally not very good with imagemagick and after 2 days of searching around Ive got it mostly complete?
for f in `ls *jpg`; do
convert -compress Group4 -type bilevel \
-depth 100 -units PixelsPerInch \
-monochrome -resize 850X1100 $f 2-$f;
done
Anyone have any further pointers on this?
You would use -density option to set the DPI.
for f in `ls *jpg`
do
convert -compress Group4 \
-type bilevel \
-depth 100 \
-units PixelsPerInch
-monochrome \
-resize 850X1100 \
-density 100 \
$f 2-$f
done
You can verify by using the identify utility.
identify -format "%x x %y" some_image.jpg
Edit:
As Birei pointed out. You can use "*.jpg" wildcard to iterate over the files in a directory, and quoting the output file name would be important for file names with spaces. You can use Filename Percent Escapes to create & preserve source image information.
convert *.jpg \
-compress Group4 \
-type bilevel \
-depth 100 \
-units PixelsPerInch
-monochrome \
-resize 850X1100 \
-density 100 \
-set filename:f '%f' \
'2-%[filename:f]'
The -set filename:f '%f' will preserver the original file name w/ proper escaping, and '2-%[filename:f]' will write the 'f' value with custom prefix '2-'. No need to use Bash for-loop.

Run cvb in mahout 0.8

The current Mahout 0.8-SNAPSHOT includes a Collapsed Variational Bayes (cvb) version for Topic Modeling and removed the Latent Dirichlet Analysis (lda) approach, because cvb can be parallelized way better. Unfortunately there is only documentation for lda on how to run an example and generate meaningful output.
Thus, I want to:
preprocess some texts correctly
run the cvb0_local version of cvb
inspect the results by looking at the top n words in each of the generated topics
So here are the subsequent Mahout commands I had to call in a linux shell to do it.
$MAHOUT_HOME points to my mahout/bin folder.
$MAHOUT_HOME/mahout seqdirectory \
-i path/to/directory/with/texts \
-o out/sequenced
$MAHOUT_HOME/mahout seq2sparse -i out/sequenced \
-o out/sparseVectors \
--namedVector \
-wt tf
$MAHOUT_HOME/mahout rowid \
-i out/sparseVectors/tf-vectors/ \
-o out/matrix
$MAHOUT_HOME/mahout cvb0_local \
-i out/matrix/matrix \
-d out/sparseVectors/dictionary.file-0 \
-a 0.5 \
-top 4 -do out/cvb/do_out \
-to out/cvb/to_out
Inspect the output by showing the top 10 words of each topic:
$MAHOUT_HOME/mahout vectordump \
-i out/cvb/to_out \
--dictionary out/sparseVectors/dictionary.file-0 \
--dictionaryType sequencefile \
--vectorSize 10 \
-sort out/cvb/to_out
Thanks to JoKnopp for the detail commands.
If you get:
Exception in thread "main" java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String
you need to add the command line option "maxIterations":
--maxIterations (-m) maxIterations
I use -m 20 and it works
refer to:
https://issues.apache.org/jira/browse/MAHOUT-1141

Resources