Optimize an image once and avoid doing it again - imagemagick

I have a large respository of images, mostly JPEG, which I'd like to optimize using a library like ImageMagick or a Linux CLI tool like jpegtran (as covered in JPG File Size Optimization - PHP, ImageMagick, & Google's Page Speed), but I don't want to have to track which ones have been optimized already and I don't want to re-optimize every one again later. Is there some sort of flag I could easily add to the file that would make it easy to detect and skip the optimization? Preferably one that would stay with the file when backed up to other filesystems?
E.g.: a small piece of exif data, a filesystem flag, some harmless null bytes added at the end of the file, a tool that is already intelligent enough to do this itself, etc..

You could use "extended attributes" which are metadata and stored in the filesystem. Read and write them with xattr:
# Read existing attributes
xattr -l image.png
# Set an optimised flag of your own invention/description
xattr -w optimised true image.png
# Read attributes again
xattr -l image.png
optimised: true
The presence of an extended attribute can be detected in a long listing - it is the # sign after the permissions:
-rw-r--r--# 1 mark staff 275 29 May 07:54 image.png
As you allude in your comments, make sure that any backup programs you use honour the attributes before committing to it as a strategy. FAT-32 filesystems are notoriously poor at this sort of thing - though tar file or similar may survive a trip to Windows-land and back.
As an alternative, just set a comment in the EXIF header - I have already covered that in this related answer...

To piggyback off of Mark Setchell's answer, if you use xattr you'll most likely need to use a trusted namespace, otherwise you're likely to get an "Operation not supported" error. Most of the documentation I could find referred to "setfattr", however, the same trusted namespace rules apply to "xattr" as well.
For example, using the user namespace:
# Set an optimised flag of your own invention/description
xattr -w user.optimised true image.png

Related

Using debug symbols with perf

I monitor a process on a PowerPC system in order to extract performance information.
How I can load the debug symbols of this process?
I use the following command
perf record -g dwarf -p 4591
and I take an error that the "dwarf cannot be found (no such file or directory)"
Could you please give me a hint how to load debug information about the functions that have been called when the report is generated?
You are using an old version of perf which do not support -g dwarf but only -g (without argument) i.e. it does not support DWARF unwinding.
perf record -g dwarf -p 4591
These days the correct option to chose a method is --call-graph whereas -g is only a flag to enable call stacks with it's default method fp.
from man perf-record.
-g
Enables call-graph (stack chain/backtrace) recording.
--call-graph
Setup and enable call-graph (stack chain/backtrace) recording,
implies -g. Default is "fp".
Allows specifying "fp" (frame pointer) or "dwarf"
(DWARF's CFI - Call Frame Information) or "lbr"
(Hardware Last Branch Record facility) as the method to collect
the information used to show the call graphs.
In some systems, where binaries are build with gcc
--fomit-frame-pointer, using the "fp" method will produce bogus
call graphs, using "dwarf", if available (perf tools linked to
the libunwind or libdw library) should be used instead.
Using the "lbr" method doesn't require any compiler options. It
will produce call graphs from the hardware LBR registers. The
main limitation is that it is only available on new Intel
platforms, such as Haswell. It can only get user call chain. It
doesn't work with branch stack sampling at the same time.
When "dwarf" recording is used, perf also records (user) stack dump
when sampled. Default size of the stack dump is 8192 (bytes).
User can change the size by passing the size after comma like
"--call-graph dwarf,4096".
By the way, try fp first - it's much more efficient, but doesn't work well with optimized binaries (e.g. --fomit-frame-pointer). Also this has very little to do with debug information. If you do not need to know the stack trace, you needn't add -g.

How to show Photoshop Action in Forum; save, decrypt, convert .atn file to text?

Briefly, I would like to show a moderately complicated Photoshop action in a forum. Saving the .atn file is easy, but it is encrypted by adobe.
I found a 25,475 line .jsx file which will apparently convert it to XML but is unusable without any usage or documentation
http://ps-scripts.cvs.sourceforge.net/viewvc/ps-scripts/xtools/apps/ActionFileToXML.jsx
What is the easiest way, other than read action word, type word in text editor, to get the 6 inches of action (as seen in Photoshop) into plain text?
GORY DETAILS:
I have a large number of files which I inadvertently damaged by using perfectlyclear on them. It enhances some of the areas but pathologically destroys all darkish areas by converting them to pure black and near zero contrast. When printed, the pictures look like somebody took a black magic marker and redacted large areas. They are damaged beyond use as-is.
The Photoshop fix is to
duplicate layer
select color range, click on a black area, set fuzziness to ~12, range=100%
select expand 4, feather 3
make new mask channel
select backward (original) layer
delete (nukes blackened area under mask)
save as PNG with transparency
This leaves a PNG file with the redacted areas transparent and with feathering around them. By placing the original file beneath it, the original non-blackeded areas are shown.
I would like to document this modest solution in an ImageMagick forum but can not believe how far adobe has gone to lock my action into adobe-only tools. I want to jailbreak this and all of my other actions.
NOTE: There is a one line usage in ActionFileToXML.jsx: "This script reads an ActionFile and converts it to XML" and no documentation of any type. An alert I stumbled upon states that it will only work in CS2/3/4 and I have CS6. It has a 2007 date on it.
I have read that this .JSX is adobe's version of JavaScript and that you run them from inside Illustrator (which I don't have).
I want to figure out how to decrypt my actions and write a useable script:
USAGE: decrypt.atn.to.txt.pl encrypted.atn [-o text_file_name] <enter>
Supply fully qualified path to a .atn" file and it will be deciphered
into a useable .txt file with the same path/basename and a .txt
extension unless you use the -O option which will attempt to write to
the file name you supply.
Perhaps, I could even make a CPAN module?!
Good thing the .JSX writer had the foresight to include 0.0039% documentation or the program would be completely useless! :)
SOLUTION == and STEP by STEP instructions:
The link:
http://ps-scripts.cvs.sourceforge.net/viewvc/ps-scripts/xtools/apps/ActionFileToXML.jsx
points to a gigantic adope extend-script. Reading the file, line 3 has the ~only documentation:
// This script reads an ActionFile and converts it to XML.
The filename already tells you this: ActionFileToXML.jsx
Without wading through 25,000 lines of largely uncommented, 8-year-old code/data/??? it is completely unusable.
What the link poster failed to include was the PACKAGE containing the other 300 files which includes the README.txt, INSTALLATION.txt, /docs, etc.
The PACKAGE supplying context, install, usage, etc can be found at
http://sourceforge.net/projects/ps-scripts/files/xtools/v2.2betas/
How to Decrypt adope's .atn file, step by step:
download README.txt and xtools*.zip from http :// sourceforge.net|projects|ps-scripts|files|xtools|v2.2betas
READ README.txt and unzip zip to any place you like (and REMEMBER where you put it). NOTE: evilnet explorer will by default hide it under some mile long, incredibly ugly file path where you may never find it so use FIREFOX: set tools -> options -> general -> downloads to Always_Ask_Me (or set a reasonable download directory)
Photoshop -> actions, click on action set you want to decipher and click the "arrow box" to the right of actions -> save_actions and put them where you can find them
Photoshop file -> scripts -> browse and navigate to where you stashed ActionFileToXML.jsx and execute. This pops up a GUI as shown at http :// ps-scripts.sourceforge.net|xtools.html
Navigate to where you hid you .atn file, the XML file box will be populated with the same path/file_BASE_name and an XML extension as a default. Adjust name/location to suit
hit PROCESS and in a delightfully brief period (in my case), it was done
Get ready to marvel at the succinct efficiency with which adope stores an action like [select->color_range, localized, fuzziness=14, range=100%] (56 bytes written by hand) in only 3635 bytes of unfathomably labyrinthine XML with no default values left underspecified. It look a lot like IRS regulation fine print! ;)
The main difficulty in trying to make sense of the XML is that it is written in some funky interpreter psycho-code which bears absolutely no resemblance to the keys/clicks you actually used to create it.
One of the steps I was attempting to elucidate was was simply layer (I NEVER ToucheDER) -> layer_mask -> hide_selection. It is diabolically obfuscated as (and I quote):
<ActionItem key="TEXT" expanded="false" enabled="true" withDialog="false" dialogOptions="2" identifier="TEXT" event="make" name="Make" hasDescriptor="true"><ActionDescriptor key="make" count="3"> <DescValueType.CLASSTYPE key="1316429856" id="1316429856" symname="New" sym="Nw " classString="Channel" class="Chnl"/><DescValueType.REFERENCETYPE key="1098129440" id="1098129440" symname="At" sym="At "><ActionReference key="1098129440" id="1098129440" symname="At" sym="At " count="1">
make .. new .. channel .. at .. mask .. hideSelection? Huh?
I had to scratch my head and fiddle around with the Channels panel options before I found the menu solution.
According to the generous and personable developer, Xbytor (who patiently answers emails from agitated would-be users), this XML can be hacked (carefully), translated back into a .ATN file and used by Photoshop. A very powerful possibility.
Brian

Options for MeCab Japanese tokenizer on iOS?

I'm using the iPhone library for MeCab found at https://github.com/FLCLjp/iPhone-libmecab . I'm having some trouble getting it to tokenize all possible words. Specifically, I cannot tokenize "吉本興業" into two pieces "吉本" and "興業". Are there any options that I could use to fix this? The iPhone library does not expose anything, but it uses C++ underneath the objective-c wrapper. I assume there must be some sort of setting I could change to give more fine-grained control, but I have no idea where to start.
By the way, if anyone wants to tag this 'mecab' that would probably be appropriate. I'm not allowed to create new tags yet.
UPDATE: The iOS library is calling mecab_sparse_tonode2() defined in libmecab.cpp. If anyone could point me to some English documentation on that file it might be enough.
There is nothing iOS-specific in this. The dictionary you are using with mecab (probably ipadic) contains an entry for the company name 吉本興業. Although both parts of the name are listed as separate nouns as well, mecab has a strong preference to tag the compound name as one word.
Mecab lacks a feature that allows the user to choose whether or not compounds should be split into parts. Note that such a feature is generally hard to implement because not everyone agrees on which compounds can be split and which ones can't. E.g. is 容疑者 a compound made up of 容疑 and 者? From a purely morphological point of view perhaps yes, but for most practical applications probably no.
If you have a list of compounds you'd like to get segmented, a quick fix is to create a user dictionary for the parts they consist of, and make mecab use this in addition to the main dictionary.
There is Japanese documentation on how to do this here. For your particular example, it would involve the steps below.
Make a user dictionary with two entries, one for 吉本 and one for 興業:
吉本,,,100,名詞,固有名詞,人名,名,*,*,よしもと,ヨシモト,ヨシモト
興業,,,100,名詞,一般,*,*,*,*,こうぎょう,コウギョウ,コウギョウ
I suspect that both entries exist in the default dictionary already, but by adding them to a user dictionary and specifying a relatively low specificness indicator (I've used 100 for both -- the lower, the more likely to be split), you can get mecab to tend to prefer the parts over the whole.
Compile the user dictionary:
$> $MECAB/libexec/mecab/mecab-dict-index -d /usr/lib64/mecab/dic/ipadic -u mydic.dic -f utf-8 -t utf-8 ./mydic
You may have to adjust the command. The above assumes:
Mecab was installed from source in $MECAB. If you use mecab installed by a package manager, you might have difficulties finding the mecab-dict-index tool. Best install from source.
The default dictionary is in /usr/lib64/mecab/dict/ipadic. This is not part of the mecab package; it comes as a separate package (e.g. this) and you may have difficulties finding this, too.
mydic is the name of the user dictionary created in step 1. mydic.dic is the name of the compiled dictionary you'll get as output (needs not exist).
Both the system dictionary (-t option) and the user dictionary (-f option) are encoded in UTF-8. This may be wrong, in which case you'll get an error message later when you use mecab.
Modify the mecab configuration. In a system-wide installation, this is a file named /usr/lib64/mecab/dic/ipadic/dicrc or similar. In your case it may be located somewhere else. Add the following line to the end of the configuration file:
userdic = home/myhome/mydic.dic
Make sure the absolute path to the dictionary compiled above is correct.
If you then run mecab against your input, it will split the compound into its parts (I tested it, using mecab 0.994 on a Linux system).
A more thorough fix would be to get the source of the default dictionary and manually remove all compoun nouns you want to get split, then recompile the dictionary. As a general remark, using a CJK tokenizer for a serious application in production mode over a longer period of time usually involves a certain amount of dictionary maintenance (adding/removing entries) regularly.

Rails: possible to check if a string is binary?

In a particular Rails application, I'm pulling binary data out of LDAP into a variable for processing. Is there a way to check if the variable contains binary data? I don't want to continue with processing of this variable if it's not binary. I would expect to use is_a?...
In fact, the binary data I'm pulling from LDAP is a photo. So maybe there's an even better way to ensure the variable contains binary JPEG data? The result of this check will determine whether to continue processing the JPEG data, or to render a default JPEG from disk instead.
There is actually a lot more to this question than you might think. Only since Ruby 1.9 has there been a concept of characters (in some encoding) versus raw bytes. So in Ruby 1.9 you might be able to get away with requesting the encoding. Since you are getting stuff from LDAP the encoding for the strings coming in should be well known, most likely ISO-8859-1 or UTF-8.
In which case you can get the encoding and act on that:
some_variable.encoding # => when ASCII-8BIT, treat as a photo
Since you really want to verify that the binary data is a photo, it would make sense to run it through an image library. RMagick comes to mind. The documentation will show you how to verify that any binary data is actually JPEG encoded. You will then also be able to store other properties such as width and height.
If you don't have RMagick installed, an alternative approach would be to save the data into a Tempfile, drop down into Unix (assuming you are on Unix) and try to identify the file. If your system has ImageMagick installed, the identify command will tell you all about images. But just calling file on it will tell you this too:
~/Pictures$ file P1020359.jpg
P1020359.jpg: JPEG image data, EXIF standard, comment: "AppleMark"
You need to call the identify and file commands in a shell from Ruby:
%x(identify #{tempfile})
%x(file #{tempfile})

ack misses results (vs. grep)

I'm sure I'm misunderstanding something about ack's file/directory ignore defaults, but perhaps somebody could shed some light on this for me:
mbuck$ grep logout -R app/views/
Binary file app/views/shared/._header.html.erb.bak.swp matches
Binary file app/views/shared/._header.html.erb.swp matches
app/views/shared/_header.html.erb.bak: <%= link_to logout_text, logout_path, { :title => logout_text, :class => 'login-menuitem' } %>
mbuck$ ack logout app/views/
mbuck$
Whereas...
mbuck$ ack -u logout app/views/
Binary file app/views/shared/._header.html.erb.bak.swp matches
Binary file app/views/shared/._header.html.erb.swp matches
app/views/shared/_header.html.erb.bak
98:<%= link_to logout_text, logout_path, { :title => logout_text, :class => 'login-menuitem' } %>
Simply calling ack without options can't find the result within a .bak file, but calling with the --unrestricted option can find the result. As far as I can tell, though, ack does not ignore .bak files by default.
UPDATE
Thanks to the helpful comments below, here are the new contents of my ~/.ackrc:
--type-add=ruby=.haml,.rake
--type-add=css=.less
ack is peculiar in that it doesn't have a blacklist of file types to ignore, but rather a whitelist of file types that it will search in.
To quote from the man page:
With no file selections, ack-grep only searches files of types that it recognizes. If you have a file called foo.wango, and ack-grep doesn't know what a .wango file is, ack-grep won't search it.
(Note that I'm using Ubuntu where the binary is called ack-grep due to a naming conflict)
ack --help-types will show a list of types your ack installation supports.
If you are ever confused about what files ack will be searching, simply add the -f option. It will list all the files that it finds to be searchable.
ack --man states:
If you want ack to search every file,
even ones that it always ignores like
coredumps and backup files, use the
"−u" switch.
and
Why does ack ignore unknown files by
default? ack is designed by a
programmer, for programmers, for
searching large trees of code. Most
codebases have a lot files in them
which aren’t source files (like
compiled object files, source control
metadata, etc), and grep wastes a lot
of time searching through all of those
as well and returning matches from
those files.
That’s why ack’s behavior of not
searching things it doesn’t recognize
is one of its greatest strengths: the
speed you get from only searching the
things that you want to be looking at.
EDIT: Also if you look at the source code, bak files are ignored.
Instead of wrestling with ack, you could just use plain old grep, from 1973. Because it uses explicitly blacklisted files, instead of whitelisted filetypes, it never omits correct results, ever. Given a couple of lines of config (which I created in my home directory 'dotfiles' repo back in the 1990s), grep actually matches or surpasses many of ack's claimed advantages - in particular, speed: When searching the same set of files, grep is faster than ack.
The grep config that makes me happy looks like this, in my .bashrc:
# Custom 'grep' behaviour
# Search recursively
# Ignore binary files
# Output in pretty colors
# Exclude a bunch of files and directories by name
# (this both prevents false positives, and speeds it up)
function grp {
grep -rI --color --exclude-dir=node_modules --exclude-dir=\.bzr --exclude-dir=\.git --exclude-dir=\.hg --exclude-dir=\.svn --exclude-dir=build --exclude-dir=dist --exclude-dir=.tox --exclude=tags "$#"
}
function grpy {
grp --include=*.py "$#"
}
The exact list of files and directories to ignore will probably differ for you: I'm mostly a Python dev and these settings work for me.
It's also easy to add sub-customisations, as I show for my 'grpy', that I use to grep Python source.
Defining bash functions like this is preferable to setting GREP_OPTIONS, which will cause ALL executions of grep from your login shell to behave differently, including those invoked by programs you have run. Those programs will probably barf on the unexpectedly different behaviour of grep.
My new functions, 'grp' and 'grpy', deliberately don't shadow 'grep', so that I can still use the original behaviour any time I need that.

Resources