Python check if last digit is odd - digits

I´ve these .csv, and i need to know if the last digit is even/odd.
Numbers is like: "236.12" Last digit is "2", then is even.
i´m tried this:
########
import pandas as pd
data = pd.read_csv('v100t.csv')
data['Par'] = data['Ask'].str.extract('(^.*?(\d+)$)')
print (data)
########
What can i do?

Related

Datastage defining ; as delimiter and !; as not a delimeter

I have a data txt file which looks like following
1;2;3;4;5
1;2;3!;4;4;5
I'm expecting my output should look like as follows after reading the sequential file.
1 2 3 4 5
1 2 34 4 5
since there is only possiblity to define what's the delimiter in Datastage it don't detect !; as not a delimiter.
Could someone let me how can i overcome this problem.
One option could be to import it as a single column and and remove the "!," in a transformer and then do a column import stage dividing up the columns.
Read data as a single string. Convert "!;" to "" using Ereplace() or Change() function. Then parse using Transformer loop or Column Import stage.

extract data from string in lua - SubStrings and Numbers

I'm trying to phrase a string for a hobby project and I'm self taught from code snips from this site and having a hard time working out this problem. I hope you guys can help.
I have a large string, containing many lines, and each line has a certain format.
I can get each line in the string using this code...
for line in string.gmatch(deckData,'[^\r\n]+') do
print(line) end
Each line looks something like this...
3x Rivendell Minstrel (The Hunt for Gollum)
What I am trying to do is make a table that looks something like this for the above line.
table = {}
table['The Hunt for Gollum'].card = 'Rivendell Minstrel'
table['The Hunt for Gollum'].count = 3
So my thinking was to extract everything inside the parentheses, then extract the numeric vale. Then delete the first 4 chars in the line, as it will always be '1x ', '2x ' or '3x '
I have tried a bunch of things.. like this...
word=str:match("%((%a+)%)")
but it errors if there are spaces...
my test code looks like this at the moment...
line = '3x Rivendell Minstrel (The Hunt for Gollum)'
num = line:gsub('%D+', '')
print(num) -- Prints "3"
card2Fetch = string.sub(line, 5)
print(card2Fetch) -- Prints "Rivendell Minstrel (The Hunt for Gollum)"
key = string.gsub(card2Fetch, "%s+", "") -- Remove all Spaces
key=key:match("%((%a+)%)") -- Fetch between ()s
print(key) -- Prints "TheHuntforGollum"
Any ideas how to get the "The Hunt for Gollum" text out of there including the spaces?
Try a single pattern capturing all fields:
x,y,z=line:match("(%d+)x%s+(.-)%s+%((.*)%)")
t = {}
t[z] = {}
t[z].card = y
t[z].count = x
The pattern reads: capture a run of digits before x, skip whitespace, capture everything until whitespace followed by open parenthesis, and finally capture everything until a close parenthesis.

pytesseract ocr limit length of out characters using config

I am building an application in python using opencv which extracts characters from an image and runs pytesseract to convert to text.
I know that the characters are always 2 digits longs (range 10-99). How do I configure the parameters so that single digit outputs are not returned.
I have the following in my code:
text = pytesseract.image_to_string(Image.open(filename),config='--psm 100 --eom 3 -c tessedit_char_whitelist=0123456789')
what do I put instead of config='--psm 100 --eom 3 -c tessedit_char_whitelist=0123456789' so that it only returns 2 digit numbers (i.e. 01 but not 5)

How to parse more than one sentence from text file using Stanford dependency parse?

I have a text file which have many line, i wanted to parse all sentences, but it seems like i get all sentences but parse only the first sentence, not sure where m i making mistake.
import nltk
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser( model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
txtfile =open('sample.txt',encoding="latin-1")
s=txtfile.read()
print(s)
result = dependency_parser.raw_parse(s)
for i in result:
print(list(i.triples()))
but it give only the first sentence parse tripples not other sentences, any help ?
'i like this computer'
'The great Buddha, the .....'
'My Ashford experience .... great experience.'
[[(('i', 'VBZ'), 'nsubj', ("'", 'POS')), (('i', 'VBZ'), 'nmod', ('computer', 'NN')), (('computer', 'NN'), 'case', ('like', 'IN')), (('computer', 'NN'), 'det', ('this', 'DT')), (('computer', 'NN'), 'case', ("'", 'POS'))]]
You have to split the text first. You're currently parsing the literal text you posted with quotes and everything. This is evident by this part of the parsing result: ("'", 'POS')
To do that you seem to be able to use ast.literal_eval on each line. Note that an apostrophe (in a word like "don't") will ruin the formatting and you'll have to handle the apostrophes yourself with something like line = line[1:-1]:
import ast
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser( model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
with open('sample.txt',encoding="latin-1") as f:
lines = [ast.litral_eval(line) for line in f.readlines()]
for line in lines:
parsed_lines = dependency_parser.raw_parse(line)
# now parsed_lines should contain the parsed lines from the file
Try:
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
with open('sample.txt') as fin:
sents = fin.readlines()
result = dep_parser.raw_parse_sents(sents)
for parse in results:
print list(parse.triples())
Do check the docstring code or demo code in repository for examples, they're usually very helpful.

scraping text from multiple html files into a single csv file

I have just over 1500 html pages (1.html to 1500.html). I have written a code using Beautiful Soup that extracts most of the data I need but "misses" out some of the data within the table.
My Input: e.g file 1500.html
My Code:
#!/usr/bin/env python
import glob
import codecs
from BeautifulSoup import BeautifulSoup
with codecs.open('dump2.csv', "w", encoding="utf-8") as csvfile:
for file in glob.glob('*html*'):
print 'Processing', file
soup = BeautifulSoup(open(file).read())
rows = soup.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
#print >> csvfile,"#".join(col.string for col in cols)
#print >> csvfile,"#".join(td.find(text=True))
for col in cols:
print >> csvfile, col.string
print >> csvfile, "==="
print >> csvfile, "***"
Output:
One CSV file, with 1500 lines of text and columns of data. For some reason my code does not pull out all the required data but "misses" some data, e.g the Address1 and Address 2 data at the start of the table do not come out. I modified the code to put in * and === separators, I then use perl to put into a clean csv file, unfortunately I'm not sure how to work my code to get all the data I'm looking for!
find files where you get missed parameters,
and after that try to analyse what happened...
I think that same files have different format, or maybe realy address filed is missed.

Resources