Watson Personality insight minimum number of words required Issue - watson

I'm following this Personality insight starter but always get below error message for API call
{"help":"http:\/\/www.ibm.com\/smarterplanet\/us\/en\/ibmwatson\/developercloud\/doc\/personality-insights\/#overviewInput","code":400,"sub_code":"S00014","error":"The number of words 2 is less than the minimum number of words required for analysis: 100"}
Here is the curl request
curl -X POST --user xxxx:yyyy --header "Content-Type: text/plain;charset=utf-8" --data-binary "profile.txt" "https://gateway.watsonplatform.net/personality-insights/api/v3/profile?version=2017-11-14"
Am I missing something here?

Personality insights requires a minimum of 100 words to work. But you won’t get a true insight until around 1,200 words (IIRC).
It’s telling you that you only supplied two words. If this isn’t the case, ensure that you JSON data is correctly escaped.

The question is old but, no one seems to have added the answer. Just in case, someone also encountered the same error, the issue is missing "#" while specifying the file from which the content has to be read.
From "man curl" on ubuntu 16.04
```
--data-binary
(HTTP) This posts data exactly as specified with no extra processing whatsoever.
If you start the data with the letter #, the rest should be a filename. Data is posted in a similar manner as --data-ascii
does, except that newlines and carriage returns are preserved and conversions are never done.
```
So, the request should have been
curl -X POST --user xxxx:yyyy --header "Content-Type: text/plain;charset=utf-8" --data-binary "#profile.txt" "https://gateway.watsonplatform.net/personality-insights/api/v3/profile?version=2017-11-14"

Related

Ag / Grep Exact Match Only Search

I am having an issue with using Ag (The Silver Searcher)...
In the docs it says to use -Q for exact match, but I don't understand why it does not work for my purposes. If I type something like ag -Q actions or ag -Q 'actions' into my terminal, it returns all instances of actions, including things like transactions and any other strings that actions is part of.
I have tried a couple other combinations of flags from the docs, including -s and -S, among others, but still I cannot get strictly strings matching just actions to return for me.
I can't get this to work with grep either. Does anyone know how I can get what I need with ag? (or even with grep)...?
Thank you in advance!
Because ag (and grep), find files that contain something. ag -Q means to interpret the search as an exact literal string, not a fuzzy string or a regex. Okay. But a file that has the word "transactions" in it contains exactly, literally the character sequence actions. Sure, it contains more than that too, but that's not surprising.
Probably you're looking for a word-boundary search, grep '\bactions\b' or ag -w -Q actions (maybe ag -w -Q -s actions). But that is not at all the same thing as "just actions", it's a specific requirement on the things surrounding "actions" (namely that they be the beginning or end of a line, or non-letter characters). You have to tell the computer what you actually mean.

How to find match of words with reoccuring character in a file

It might seems like a question that would already have been answered before so pardon me if it's the case, but I can't seems to find a clear answer or an explanation on how to find words in a file with a specified number of repeated character, (ex: words containing 3 times the character '-', such as 'long-and-complex-word').
I'm aware that it is possible to use the command
grep-oE '.{n}'
To find words with consecutive repetition of character, but I'm looking for a way to find repetition of character in no particular order.
Here are the commands that I've tried that aren't working
grep -E '*[-]*[-]*[-]*' file
grep -Ex '* \-* \-* \ -*' file
Thanks.

Grep command to extract email addresses

So I am using the following grep command to search over a lot of files.
I want to have it pattern match against just teh validly formatted email addresses.
grep -rnw ./ -e "email here"
My question is, what syntax should I use instead of "email here" to specify the pattern for a validly formatted email? Would that be some sort of regex?
Good thinking about using regex, you can find a lot of exemples of it on the web.
There is an exemple you can use for:
[-0-9a-zA-Z.+_]+#[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}
If you have any problem on using regex wikipedia is still a good place to learn : (check "Character classes" paragraph)
http://en.wikipedia.org/wiki/Regular_expression#Formal_language_theory

what is appropriate for me? generateAllGrams() or is generateCollocations() enough for me?

I am developing a project on wordnet-based document summarizer.in that i need to extract collocations. i tried to research as much as I could, but since i have not worked with Mahout before I am having difficulty in understanding how CollocDriver.java works (in API context)
while scouring through the web, i landed on this :
Mahout Collocations
this is the problem: i have a POSTagged input text. i need to identify collocations in it.i have got collocdriver.java code..now i need to know how do i use it? whether to use generateAllGrams() method or only generateCollocations() method is enough for my subtask within my summarizer..??
and most importantly HOW to use it? i raise this question coz I admit, i dont know the API well,
i also got a grepcode version of collocdriver the two implementations seem to be slightly different..the inputs are in string for the grepcode version and in the form of Path object in the original...
my questions: what is configuration object in input params and how to use it?? will the source / destn will be in string (as in grepcode) or Path (as in original)??
what will be the output?
i have done some further R & D on collocdriver program...i found out that it uses a sequence file and then vector generation...i wanna know how this sequence file / vector generation works..plz help..
To get collocation using mahout,u need to follow some simple steps
1) You must make a sequence file from ur input text file.
/bin/mahout seqdirectory -i /home/developer/Desktop/colloc/ -o /home/developer/Desktop/colloc/test-seqdir -c UTF-8 -chunk 5
2)There are two ways to generate collocations from a sequence file.
a)Convert sequence file to sparse vector and find out the collocation
b)Directly find out the collocation from the sequence file (with out creating the sparse vector)
3)Here i am considering choice b.
/bin/mahout org.apache.mahout.vectorizer.collocations.llr.CollocDriver -i /home/developer/Desktop/colloc/test-seqdir -o /home/developer/Desktop/colloc/test-colloc -a org.apache.mahout.vectorizer.DefaultAnalyzer -ng 3 -p
Just check out the output folder,the files u need is over there !!! (in sequence file format)
/bin/mahout seqdumper -s /home/developer/Desktop/colloc/test-colloc/ngrams/part-r-00000 >> out.txt will give u a text output !!!

How to determine which pattern in a file matched with grep?

I use procmail to do extensive sorting on my inbox. My next to last recipe matches the incoming From: to a (very) long white/gold list of historically good email addresses, and patterns of email addresses. The recipe is:
# Anything on the goldlist goes straight to inbox
:0
* ? formail -zxFrom: -zxReply-To | fgrep -i -f $HOME/Mail/goldlist
{
LOG="RULE Gold: "
:0:
$DEFAULT
}
The final recipe puts everything left in a suspect folder to be examined as probable spam. Goldlist is currenltty 7384 lines long (yikes...). Every once in a while, I get a piece of spam that has slipped through and I want to fix the failing pattern. I thought I read a while ago about a special flag in grep that helped show the matching patterns, but I can't find that again. Is there a way to use grep that shows the pattern from a file that matched the scanned text? Or another similar tool that would answer the question short of writing a script to scan pattern by pattern?
grep -o will output only the matched text (as opposed to the whole line). That may help. Otherwise, I think you'll need to write a wrapper script to try one pattern at a time.
I'm not sure if this will help you or not. There is a "-o" parameter to output only the matching expression.
From the man page:
-o, --only-matching
Show only the part of a matching line that matches PATTERN.

Resources