Ephesoft key value extraction - ephesoft

When I do a key value extraction on an uploaded batch, it doesn't work and proceeds to the end of the process none the less.
How can I ensure that if my key value extraction doesn't work, that user will be asked to enter the value?

Did you added the VALIDATE_DOCUMENT plugin? That should force the manual review of the key value extraction.
Another thing to check could be the presence of kv extraction rules for each field.
Finally, I would check the workflow, since the KV_PAGE_PROCESS and KEY_VALUE_EXTRACTION may should be loaded and swithed to ON.

Related

How to replace Telegraf's default timestamp?

I use telegraf to send some data from a database to InfluxDB in regular intervals which works fine apart from one issue:
I need to replace telegraf's auto-generated timestamp (which is the current time at the moment of telegraf reading the data to transmit) with a field from the data.
(To answer the "why?" question: So the data I get in InfluxDB as a result actually matches the time of the event I want to record).
I would have thought there's some standard configuration parameter or an easy to find processor plugin, that let's me replace the default timestamp with the content of a field, but I didn't find any.
It does not seem to me to be a very exotic request and Telegraf's "Metric" does have a "SetTime" function, so I hope someone already solved that and can answer that.
You can use a starlark processor to accomplish this, at least if you are able to put the correct timestamp into a tag first.
Here an example of how I use a starlark processor to replace the timestamp of the measurement with the content of a tag that I already populated with the correct timestamp in the input plugin:
[[processors.starlark]]
source = '''
def apply(metric):
if "nanoseconds" in metric.tags:
metric.time = int(metric.tags["nanoseconds"])
return metric
'''
Due to the fact that
no one had an idea so far,
that feature isn't available in all plugins coming with telegraf, and
I have to solve this pretty urgently,
I wrote a telegraf processor plugin that does exactly what I needed.
(I will likely offer to contribute this to telegraf in the future, when I have a bit more time to breathe than right now.)
There exists a method to do this both for JSON format input and CSV input.
Here's a link to the JSON format description and how to set timestamps based on a value that's in the payload (to the format you can specify).
use
[[processors.date]]
tag_key = "name_of_column of your timestamp"
date_format = {time format applicable in Go language}

Should I specify full key names when using Lua in Redis Cluster, or can I just pass the hashtags?

I have Lua script which I'm considering migrating to Redis Cluster
Should I specify full key names when call eval?
Or can I get away just by specifying hashtags?
For example, I wish to pass only {UNIQUE_HASH_TAG} instead of {UNIQUE_HASH_TAG}/key1, {UNIQUE_HASH_TAG}/key2 ... etc
I have lots of keys, and logic is pretty complicated - sometimes I end up generating key names dynamically but within the same hash tag.
Would I violate some specifications by passing just hash tags instead of key names?
Should I specify full key names
That's the recommended practice.
Would I violate some specifications
No, the specs do not state that key names need to be explicitly passed. The KEYS/ARGV mechanism was put in place in preparation for the cluster but before the cluster actually came to be. At that time, hash tags were not a part of the cluster's design so the recommendation was to avoid hard-coding/dynamically generating key names in scripts as there's no assurance they'll be in the same cluster hash slot.
Your approach is perfectly valid and would work as expected. I do want to emphasize that this only makes sense if you're managing a lot of the so-called {UNIQUE_HASH_TAG}s - otherwise you'll be hitting just a few slots which could become a scalability challenge.
EDIT: All that said, you really should always explicitly pass the key names to the script rather than tricking. While this isn't currently blocked, this unspecified behavior may change in the future and result in breakage of your code.

File status 23 on READ after START

My question is pertaining to a file status 23, which according to MicroFocus means that upon my attempt to READ from a .DAT file:
"Indicates no record found."
or
"Indicates a duplicate key condition. Attempt has been made to store a
record that would create a duplicate key in the indexed or relative
file or a duplicate alternate record key that does not allow
duplicates."
I have eliminated the fact that the latter is my issue because I'm allowing duplicates in this case.
The reason I'm stumped is that I'm using a START to navigate to the record inside of my .DAT file, and when I execute a READ just after the START has positioned my file pointer, I get the file status 23.
Here is my code:
900-GET-INST-ID.
OPEN INPUT INST-MST.
MOVE FALL-IN-INST TO INST-NAME-REC.
START INST-MST
KEY EQUAL TO INST-NAME-REC
INVALID KEY
DISPLAY "RECORD NOT FOUND"
NOT INVALID KEY
READ INST-MST
MOVE INST-ID-REC TO WS-INST-ID
END-START.
CLOSE INST-MST.
So when I am running this code my START successfully runs and goes into the NOT INVALID KEY block, and then the very next line executes and my read is null. How can this be if my alternate key (INST-NAME-REC) is actually found inside the .DAT?
I have ensured that my FD picture clauses match exactly in the ISAM Build program and in this program (the reading program).
The second reason you show is excluded not because you allow duplicate keys, but because that error message with that file-status is for a WRITE, and your failure is on a READ.
Here's your problem:
READ INST-MST
Here's how you fix it:
READ INST-MST NEXT
In COBOL 85, the READ statement has two formats. Format 1 is for a sequential read and Format 2 is for a keyed (random) read.
Unfortunately, the minimum READ syntax for both sequential and keyed reads is:
READ file-name
Which means if you use READ file-name the compiler will implicitly treat it as Format 1 or Format 2 depending on your SELECT statement.
READ file-name NEXT RECORD is identical to READ file-name NEXT.
Consult your actual documentation for a full explanation and discovery of possible Language Extensions from the vendor. If you consult closely, the behaviour of READ file-name with no further option depends on the type of file. With a keyed file, the default is a keyed READ. You key field (luckily) does not contain a key that exists, so you get the 23.
Even if it didn't work like that, what would be the point of not using the word NEXT? The compiler always knows what you tell it (which sometimes is not what you think you tell it), but in a situation like this, the human reader can be very unsure. The last thing you want to do when bug-hunting is break off to look at the manual to discover exactly how that behaves, and then try to work it if that behaviour was the one sought by the original coder. The bug? A bug? Intended, but sloppy, code? No-one wants to spend that time, and look, even now, it is you.
A couple of comments on your code.
Look up the FILE STATUS clause of the SELECT. Use it. One field per file. Check after each IO. It'll save you grief.
Once using the FILE STATUS, ditch the imperative parts of the IO statements (the something/NOT something) and replace by tests of the file-status field (using 88s).
It looks like you are OPENing and CLOSEing your look-up file all the time. Please don't. OPEN and CLOSE can be very heavy and time-consuming, so do them once per program per file. If you've done that because of a problem, find a correct resolution to that problem, don't use a hack.
Drop the full-stops/periods except where they are needed. This is COBOL 85, which means for 30 years the number of full-stops/periods required in the PROCEDURE DIVISION have been greatly reduced. Get modern, and take advantage of that, it'll save you Gotcha!s as you copy/paste code, leaving the one which shouldn't be there and changing the way the program behaves.

Rapidminer - unable to apply learning algorithm as process document is making regular to text

have the following process :
Process documents from files (where I load the text files with respective 6 classes ) --> this connects to set Role (which changes text attribute to REGULAR attribute to allow machine learning) -> Process documents from data ( I dont need the word vectors so I uncheck that, I keep text, within this process I tokenize, stopwords, stemming etc.) and then I feed this into validation operator. (bayes/svm)
What is happening here is in the example set the text column is going back to type "TEXT" from regular after running Process documents from Data. And hence I get the error Input ExampleSet has no attributes as there are zero regular attributes. And this is causing the process to fail. I have no idea why. I try to set the role again after this but then the error says "No examples in example set"
PLEASE HELP. I am stuck since two days!!!
EDIT : I think I know the issue - I was applying a 10-fold X-Validation on a dataset with few examples
I think I know the issue :
I was applying a 10-fold X-Validation on a dataset with very few examples

Convert SHA1 back to string

I have a user model on my app, and my password field uses sha1. What i want is to, when i get the sha1 from the DB, to make it a string again. How do i do that?
You can't - SHA1 is a one-way hash. Given the output of SHA1(X), is not possible to retrieve X (at least, not without a brute force search or dictionary/rainbow table scan)
A very simple way of thinking about this is to imagine I give you a set of three-digit numbers to add up, and you tell me the final two digits of that sum. It's not possible from those two digits for me to work out exactly which numbers you started out with.
See also
Is it possible to reverse a sha1?
Decode sha1 string to normal string
Thought relating MD5, these other questions may also enlighten you:
Reversing an MD5 Hash
How can it be impossible to “decrypt” an MD5 hash?
You can't -- that's the point of SHA1, MDB5, etc. Most of those are one-way hashes for security. If it could be reversed, then anyone who gained access to your database could get all of the passwords. That would be bad.
Instead of dehashing your database, instead hash the password attempt and compare that to the hashed value in the database.
If you're talking about this from a practical viewpoint, just give up now and consider it impossible. Finding the original string is impossible (except by accident). Most of the point of a cryptographically secure hash is to ensure you can't find any other string that produces the same hash either.
If you're interested in research into secure hash algorithms: finding a string that will produce a given hash is called a "preimage". If you can manage to do so (with reasonable computational complexity) for SHA-1 you'll probably become reasonably famous among cryptanalysis researchers. The best "break" against SHA-1 that's currently known is a way to find two input strings that produce the same hash, but 1) it's computationally quite expensive (think in terms of a number of machines running 24/7 for months at a time to find one such pair), and does not work for an arbitrary hash value -- it finds one of a special class of input strings for which a matching pair is (relatively) easy to find.
SHA is a hashing algorithm. You can compare the hash of a user-supplied input with the stored hash, but you can't easily reverse the process (rebuild the original string from the stored hash).
Unless you choose to brute-force or use rainbow tables (both extremely slow when provided with a sufficiently long input).
You can't do that with SHA-1. But, given what you need to do, you can try using AES instead. AES allows encryption and decryption.

Resources