Doesn’t look like a character based (Bytes Are All You Need) model (DeepSpeech) - machine-learning

I have been following DeepSpeech documentation in order to build my own scorer. After implementing this blocks of code
cd data/lm
python3 generate_lm.py --input_txt vocabulary.txt --output_dir .
–top_k 1500 --kenlm_bins path/to/kenlm/build/bin/
–arpa_order 3 --max_arpa_memory “50%” --arpa_prune “0|0|1”
–binary_a_bits 255 --binary_q_bits 8 --binary_type trie
curl -LO http://github.com/mozilla/DeepSpeech/releases/…
tar xvf native_client.*.tar.xz
./generate_scorer_package --alphabet …/alphabet.txt --lm lm.binary --vocab vocab-1500.txt
–package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
I get the following errors:
Doesn’t look like a character based (Bytes Are All You Need) model.
–force_bytes_output_mode was not specified, using value infered from vocabulary contents: false
Error: Can’t parse scorer file, invalid header. Try updating your scorer file.
Error loading language model file: Invalid magic in trie header.
I want to mention that I use a different alphabet that also contains other characters besides english characters.

This error is usually encountered with non-English alphabets. You can overcome the error by using the --force_bytes_output_mode parameter when calling generate_scorer_package. I pushed a change to the PlayBook this morning which covers this error.

Related

Read a list from stream using Yap-Prolog

I want to run a (python3) process from my (yap) prolog script and read its output formatted as a list of integers, e.g. [1,2,3,4,5,6].
This is what I do:
process_create(path(python3),
['my_script.py', MyParam],
[stdout(pipe(Out))]),
read(Out, OutputList),
close(Out).
However, it fails at read/2 predicate with the error:
PL_unify_term: PL_int64 not supported
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
I am sure that I can run the process correctly because with [stdout(std)] parameter given to process_create the program outputs [1,2,3,4,5,6] as expected.
Weird thing is that when I change the process to output some constant term (as constant_term) it still gives the same PL_int64 error. Appending a dot to the process' output ([1,2,3,4,5,6].) doesn't solve the error. Using read_term/3 gives the same error. read_string/3 is undefined in YAP-Prolog.
How can I solve this problem?
After asking at the yap-users mailing list I got the solution.
Re-compiled YAP Prolog 6.2.2 with libGMP option and now it works. It may also occur in 32-bit YAP.

Woodstox parser works fine in test run in Eclipse, but fails from command line

One of my JUnit tests uses (behind the scenes) the Woodstox parser.
When I run the test from within Eclipse, the test succeeds as expected.
But running the same test on the command line, using
mvn clean test -Dtest=com.example.MyClassTest#someParserTest
results in the test to fail with the following exception messages:
Error on line 114 column 21
SXXP0003: Error reported by XML parser: Invalid UTF-8 middle byte 0x3f (at char #4174, byte #3999)
...
at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:314)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:205)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:55)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:961)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4580)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1063)
at com.ctc.wstx.sax.WstxSAXParser.fireEvents(WstxSAXParser.java:524)
at com.ctc.wstx.sax.WstxSAXParser.parse(WstxSAXParser.java:452)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)
at net.sf.saxon.event.Sender.send(Sender.java:171)
at net.sf.saxon.jaxp.IdentityTransformer.transform(IdentityTransformer.java:363)
I took a look at the to-be-parsed InputStream. The InputStreams are identical in both cases.
Also, there is no "line 114 column 21" in the InputStream. Line 114 ends on column 11.
How can I investigate what causes the different behavior?
It turned out that a library I used made wrong assumptions about the environment's default character encoding (also called platform's default charset).
In the Eclipse environment, calling Charset.defaultCharset() returned UTF-8, while in the command line environment it returned CP1252.
Many standard and third-party Java APIs behave differently depending on the platform's default charset, among them:
String.getBytes()
ByteArrayOutputStream.toString()
XMLOutputFactory.createXMLStreamWriter(OutputStream stream)
IOUtils.toString(InputStream input)
To resolve my issue, I had to update that library to explicitly use the correct character set:
String.getBytes(StandardCharsets.UTF_8)
ByteArrayOutputStream.toString( StandardCharsets.UTF_8.name() )
XMLOutputFactory.createXMLStreamWriter( OutputStream stream, StandardCharsets.UTF_8.name() )
IOUtils.toString(InputStream input, StandardCharsets.UTF_8)

Error when converting form Python 2. to Python 3

can you help me to convert this to python 3.5 ? I tried but it don't work. I did the following steps:
I change the package md5 to hashlib
I change all the id = md5.new("%s"%str(clf.get_params())).hexdigest() to id = hashlib.md5(("%s"%str(clf.get_params())).encode('utf-8') ).hexdigest()
but I still have somme problems when I put a directory to these parameters
save_preds="",
save_params=""
save_test_only=""
clf_name="XX"
I have the folowing error when I put something in thise parameters:
TypeError: a bytes-like object is required, not 'str'
Please see the code here:
blend_proba.py
Thanks,
cdk
Replacing
clf_name="XX"
by
clf_name=b"XX"
would convert the strings into objects of type bytes. Whether those changes will be enough, I honestly have no idea.

lua - invalid argument type

I am a newbie to Lua. Currently getting the following error message:
invalid argument type for argument -model (should be the model checkpoint
to use for sampling)
Usage: [options] <model>
I am sure it is something pretty easy to solve, but cannot manage to find the solution.
The 'model' is a file lm_checkpoint_epoch50.00_2.7196.t7, which is in the directory
/home/ubuntu/xxx/nn/cv
I am running the program from the parent directory (/home/ubuntu/xxx/nn)
I have tried out the following options to run the program (from one directory above the one the model is saved):
th sample.lua - model lm_chelm_checkpoint_epoch50.00_2.5344.t7
th sample.lua lm_chelm_checkpoint_epoch50.00_2.5344.t7
th sample.lua /cv/lm_chelm_checkpoint_epoch50.00_2.5344.t7
th sample.lua - /cv/model lm_chelm_checkpoint_epoch50.00_2.5344.t7
Also, the program has a torch.CmdLine() object where :argument equals '/cv/lm_checkpoint_epoch50.00_2.7196.t7'. The program prints the parameters, so that you see the following output on the screen:
Options
<model> /cv/lm_checkpoint_epoch50.00_2.7196.t7
so it finds a value for argument 'model', which is picked up from the .lua file, not the parameter in the command line. This file is a valid mode.
Pretty lost, hope someone relates to this issue. Thanks.
found the issue - it was a bug as smhx suggested. I inadvertently changed the source code from:
require 'torch'
cmd = torch.CmdLine()
cmd:argument('-model','model checkpoint to use for sampling')
Note that there is no argument in the source code. To:
cmd:argument('-model','/cv/model lm_chelm_checkpoint_epoch50.00_2.5344.t7'
'model checkpoint to use for sampling')
So the argument must be passed through the command line, not the source code. With parameters, it is different - you can include them in the source code.
So if I change back the source code and run the following from the command line:
th sample.lua cv/lm_chelm_checkpoint_epoch50.00_2.5344.t7
it works.

Paamayim nekudotayims in PHP 5.2

I can upgrade php 5.2 in my server. I have to make this server work today (the vacation I have planned tomorrow is under question because of this error) with new testlink. I am stuck with following error i.e Paamayim nekudotayims.
What changes I should do to resolve it?
This link contains the file with the bug.
The Scope Resolution Operator (also called Paamayim Nekudotayim) or in simpler terms, the double colon, is a token that allows access to static, constant, and overridden properties or methods of a class.
SO may be in your codes you try to call static method or properties with wrong operator.
From Wikipedia:
In PHP, the scope resolution operator is also called Paamayim
Nekudotayim (Hebrew: פעמיים נקודתיים‎), which means “double colon” in
Hebrew.
The name "Paamayim Nekudotayim" was introduced in the
Israeli-developed Zend Engine 0.5 used in PHP 3. Although it has been
confusing to many developers who do not speak Hebrew, it is still
being used in PHP 5, as in this sample error message:
$ php -r :: Parse error: syntax error, unexpected
T_PAAMAYIM_NEKUDOTAYIM
As of PHP 5.4, error messages concerning the scope resolution operator
still include this name, but have clarified its meaning somewhat:
$ php -r :: Parse error: syntax error, unexpected '::'
(T_PAAMAYIM_NEKUDOTAYIM)

Resources