pandas parse dates from csv - parsing
I am trying to read a csv file which includes dates. The csv looks like this:
h1,h2,h3,h4,h5
A,B,C,D,E,20150420
A,B,C,D,E,20150420
A,B,C,D,E,20150420
For reading the csv I use this code:
df = pd.read_csv(filen,
index_col=None,
header=0,
parse_dates=[5],
date_parser=lambda t:parse(t))
The parse function looks like this:
def parse(t):
string_ = str(t)
try:
return datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
except:
return datetime.date(1900,1,1)
My strange problem now is that in the parsing function, t looks like this:
ndarray: ['20150420' '20150420' '20150420']
As you can see t is the whole array of the data column. I think it should be only the first value when parsing the first row, the second value, when parsing the second row, etc. Right now, the parse always ends up in the except-block because int(string_[:4]) contains a bracket, which, obviously, cannot be converted to an int. The parse function is built to parse only one date at a time (e.g. 20150420) in the first place.
What am I doing wrong?
EDIT:
okay, I just read in the pandas doc about the date_parser argument, and it seems to work as expected (of course ;)). So I need to adapt my code to that. My above example is copy&pasted from somewhere else and I expected it to work, hence, my question.. I will report back, when I did my code adaption.
EDIT2:
My parse function now looks like this, and I think, the code works now. If I am still doing something wrong, please let me know:
def parse(t):
ret = []
for ts in t:
string_ = str(ts)
try:
tsdt = datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
except:
tsdt = datetime.date(1900,1,1)
ret.append(tsdt)
return ret
There are six columns, but only five titles in the first line. This is why parse_dates failed. you can skip the first line:
df = pd.read_csv("tmp.csv", header=None, skiprows=1, parse_dates=[5])
you can try this parser :
parser = lambda x: pd.to_datetime(x, format='%Y%m%d', coerce=True)
and use
df = pd.read_csv(filen,
index_col=None,
header=0,
parse_dates=[5],
date_parser=parser)
Related
Convert a .csv file into a 2D table in Lua
as the title suggests I'd like to know how to convert a .csv file in Lua into a 2D table. So, for example, say I have a .csv file that looks like this: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,-1,-1,-1,-1,-1,-1,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,0 0,-1,-1,-1,-1,-1,-1,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,0 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0 0,0,-1,-1,-1,-1,-1,-1,0,0,0,0,-1,-1,-1,-1,-1,-1,0,0 0,0,0,-1,-1,-1,-1,0,0,0,0,0,0,-1,-1,-1,-1,0,0,0 0,0,0,0,-1,-1,0,0,0,0,0,0,0,0,-1,-1,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 How would I convert it into something like this? local example_table = {{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, {0,-1,-1,-1,-1,-1,-1,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,0}, {0,-1,-1,-1,-1,-1,-1,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,0}, {0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0}, {0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0}, {0,0,-1,-1,-1,-1,-1,-1,0,0,0,0,-1,-1,-1,-1,-1,-1,0,0}, {0,0,0,-1,-1,-1,-1,0,0,0,0,0,0,-1,-1,-1,-1,0,0,0}, {0,0,0,0,-1,-1,0,0,0,0,0,0,0,0,-1,-1,0,0,0,0}, {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}} Your help will be greatly appreciated.
1. Don't underestimate CSV. If you need it generic, get a proper CSV parsing library. If you do the parsing yourself, you will miss lots of special cases that could happen, so it's only suitable for cases where you know your data and would notice if something went wrong. 2. Changing the file If you want the equivalent Lua code as output, assuming you're doing the parsing in Lua, you could do something like this: local input = get_input_somehow() -- probably using io.open, etc. local output = "local example_table = {\n" .. input:gmatch("[^\n]*", function(line) return "{" .. line .. "};" end) .. "\n}" save_output_somehow(output) -- Probably just write to a new file 3. Parsing CSV into a table If you want to read the CSV file directly into a Lua table, you could instead do it like this: local input = get_input_somehow() -- probably using io.open, etc. local output = {} input:gmatch("[^\n]", function(line) local row = {} table.insert(output, row) line:gmatch("[^,]", function(item) table.insert(row, tonumber(item)) end) end) do_something_with(output) -- Whatever you need your data for
LUA: How to Create 2-dimensional array/table from string
I see several posts about making a string in to a lua table, but my problem is a little different [I think] because there is an additional dimension to the table. I have a table of tables saved as a file [i have no issue reading the file to a string]. let's say we start from this point: local tot = "{{1,2,3}, {4,5,6}}" When I try the answers from other users I end up with: local OneDtable = {"{1,2,3}, {4,5,6}"} This is not what i want. how can i properly create a table, that contains those tables as entries? Desired result: TwoDtable = {{1,2,3}, {4,5,6}} Thanks in advance
You can use the load function to read the content of your string as Lua code. local myArray = "{{1,2,3}, {4,5,6}}" local convert = "myTable = " .. myArray local convertFunction = load(convert) convertFunction() print(myTable[1][1]) Now, myTable has the values in a 2-dimensional array.
For a quick solution I suggest going with the load hack, but be aware that this only works if your code happens to be formatted as a Lua table already. Otherwise, you'd have to parse the string yourself. For example, you could try using lpeg to build a recursive parser. I built something very similar a while ago: local lpeg = require 'lpeg' local name = lpeg.R('az')^1 / '\0' local space = lpeg.S('\t ')^1 local function compile_tuple(...) return string.char(select('#', ...)) .. table.concat{...} end local expression = lpeg.P { 'e'; e = name + lpeg.V 't'; t = '(' * ((lpeg.V 'e' * ',' * space)^0 * lpeg.V 'e') / compile_tuple * ')'; } local compiled = expression:match '(foo, (a, b), bar)' print(compiled:byte(1, -1)) Its purpose is to parse things in quotes like the example string (foo, (a, b), bar) and turn it into a binary string describing the structure; most of that happens in the compile_tuple function though, so it should be easy to modify it to do what you want. What you'd have to adapt: change name for number (and change the pattern accordingly to lpeg.R('09')^1, without the / '\0') change the compile_tuple function to a build_table function (local function build_tanle(...) return {...} end should do the trick) Try it out and see if something else needs to be changed; I might have missed something. You can read the lpeg manual here if you're curious about how this stuff works.
Reading a column file of x y z into table in Lua
Been trying to find my way through Lua, so I have a file containing N lines of numbers, 3 per line, it is actually x,y,z coordinates. I could make it a CSV file and use some Lua CSV parser, but I guess it's better if I learn how to do this regardless. So what would be the best way to deal with this? So far I am able to read each line into a table line by the code snippet below, but 1) I don't know if this is a string or number table, 2) if I print tbllinesx[1], it prints the whole line of three numbers. I would like to be able to have tbllines[1][1], tbllines[1][2] and tbllines[1][3] corresponding to the first 3 number of 1st line of my file. local file = io.open("locations.txt") local tbllinesx = {} local i = 0 if file then for line in file:lines() do i = i + 1 tbllinesx[i] = line end file:close() else error('file not found') end
From Programming in Lua https://www.lua.org/pil/21.1.html You can call read with multiple options; for each argument, the function will return the respective result. Suppose you have a file with three numbers per line: 6.0 -3.23 15e12 4.3 234 1000001 ... Now you want to print the maximum of each line. You can read all three numbers in a single call to read: while true do local n1, n2, n3 = io.read("*number", "*number", "*number") if not n1 then break end print(math.max(n1, n2, n3)) end In any case, you should always consider the alternative of reading the whole file with option "*all" from io.read and then using gfind to break it up: local pat = "(%S+)%s+(%S+)%s+(%S+)%s+" for n1, n2, n3 in string.gfind(io.read("*all"), pat) do print(math.max(n1, n2, n3)) end I'm sure you can figure out how to modify this to put the numbers into table fields on your own. If you're using three captures you can just use table.pack to create your line table with three entries.
Assuming you only have valid lines in your data file (locations.txt) all you need is change the line: tbllinesx[i] = line to: tbllinesx[i] = { line:match '(%d+)%s+(%d+)%s+(%d+)' } This will put each of the three space-delimited numbers into its own spot in a table for each line separately. Edit: The repeated %d+ part of the pattern will need to be adjusted according to your actual input. %d+ assumes plain integers, you need something more involved for possible minus sign (%-?%d+) and for possible dot (%-?%d-%.?%d+), and so on. Of course the easy way would be to grab everything that is not space (%S+) as a potential number.
prolog: parsing a sentence and generating a response in a simple language parser
so far I have the following working: gen_phrase(S1,S3,Cr) :- noun_phrase(S1,S2,Cr1), verb_phrase(S2,S3,Cr2), append([Cr1],[Cr2],Cr),add_rule(Cr). question_phrase(S1,S5,Cr) :- ist(S1,S2),noun_phrase(S2,S3,Cr1), noun_phrase(S3,S4,Cr2), append([Cr1],[Cr2],Cr). add_rule([X,Y]) :- Fact =.. [Y, X], assertz(Fact). Given test run, code generates following: 1 ?- gen_phrase([the,comp456,is,a,computing,course],S3,Cr). S3 = [] Cr = [comp456, computing_course]. add_rule(Cr) asserts existence of predicate computing_course(comp456). Now what I would like to do is ask a question: 4 ?- question_phrase([is,the,comp456,a,computing,course],X,Cr). Cr = [comp456, computing_course] . What I need to do is extract computing_course and comp456, which I can do, then convert it into form accepted by prolog. This should look like Y(X) where Y = computing_course is a predicate and X = comp456 is atom. The result should be something similar to: 2 ?- computing_course(comp456). true. And later on for questions like "What are computing courses": 3 ?- computing_course(X). X = comp456. I thought about using assertz, however I still do not know how to call predicate once it is constructed. I am having hard time finding what steps need to be taken to accomplish this. (Using swi-prolog). Edit: I have realized that there is a predicate call(). However I would like to construct something like this: ask([X,Y]) :- call(Y(X)). 2 ?- gen_phrase([a,comp456,is,a,computing,course],S3,Cr). S3 = [], Cr = [comp456, computing_course] 4 ?- question_phrase([is,the,comp456,a,computing,course],X,Cr),ask(Cr). ERROR: toplevel: Undefined procedure: ask/1 (DWIM could not correct goal) It doesn't appear that such call() is syntactically correct. Would be good to know if this is at all possible and how.
call/N it's what you need (here N == 2): ask([X,Y]) :- call(Y,X). You could as well use something very similar to what you already use in add_rule/1: ask([X,Y]) :- C =.. [Y,X], call(C). The first form it's more efficient, and standardized also.
Parsing a file with BodyParser in Scala Play20 with new lines
Excuse the n00bness of this question, but I have a web application where I want to send a potentially large file to the server and have it parse the format. I'm using the Play20 framework and I'm new to Scala. For example, if I have a csv, I'd like to split each row by "," and ultimately create a List[List[String]] with each field. Currently, I'm thinking the best way to do this is with a BodyParser (but I could be wrong). My code looks something like: Iteratee.fold[String, List[List[String]]]() { (result, chunk) => result = chunk.splitByNewLine.splitByDelimiter // Psuedocode } My first question is, how do I deal with a situation like the one below where a chunk has been split in the middle of a line: Chunk 1: 1,2,3,4\n 5,6 Chunk 2: 7,8\n 9,10,11,12\n My second question is, is writing my own BodyParser the right way to go about this? Are there better ways of parsing this file? My main concern is that I want to allow the files to be very large so I can flush a buffer at some point and not keep the entire file in memory.
If your csv doesn't contain escaped newlines then it is pretty easy to do a progressive parsing without putting the whole file into memory. The iteratee library comes with a method search inside play.api.libs.iteratee.Parsing : def search (needle: Array[Byte]): Enumeratee[Array[Byte], MatchInfo[Array[Byte]]] which will partition your stream into Matched[Array[Byte]] and Unmatched[Array[Byte]] Then you can combine a first iteratee that takes a header and another that will fold into the umatched results. This should look like the following code: // break at each match and concat unmatches and drop the last received element (the match) val concatLine: Iteratee[Parsing.MatchInfo[Array[Byte]],String] = ( Enumeratee.breakE[Parsing.MatchInfo[Array[Byte]]](_.isMatch) ><> Enumeratee.collect{ case Parsing.Unmatched(bytes) => new String(bytes)} &>> Iteratee.consume() ).flatMap(r => Iteratee.head.map(_ => r)) // group chunks using the above iteratee and do simple csv parsing val csvParser: Iteratee[Array[Byte], List[List[String]]] = Parsing.search("\n".getBytes) ><> Enumeratee.grouped( concatLine ) ><> Enumeratee.map(_.split(',').toList) &>> Iteratee.head.flatMap( header => Iteratee.getChunks.map(header.toList ++ _) ) // an example of a chunked simple csv file val chunkedCsv: Enumerator[Array[Byte]] = Enumerator("""a,b,c ""","1,2,3",""" 4,5,6 7,8,""","""9 """) &> Enumeratee.map(_.getBytes) // get the result val csvPromise: Promise[List[List[String]]] = chunkedCsv |>>> csvParser // eventually returns List(List(a, b, c),List(1, 2, 3), List(4, 5, 6), List(7, 8, 9)) Of course you can improve the parsing. If you do, I would appreciate if you share it with the community. So your Play2 controller would be something like: val requestCsvBodyParser = BodyParser(rh => csvParser.map(Right(_))) // progressively parse the big uploaded csv like file def postCsv = Action(requestCsvBodyParser){ rq: Request[List[List[String]]] => //do something with data }
If you don't mind holding twice the size of List[List[String]] in memory then you could use a body parser like play.api.mvc.BodyParsers.parse.tolerantText: def toCsv = Action(parse.tolerantText) { request => val data = request.body val reader = new java.io.StringReader(data) // use a Java CSV parsing library like http://opencsv.sourceforge.net/ // to transform the text into CSV data Ok("Done") } Note that if you want to reduce memory consumption, I recommend using Array[Array[String]] or Vector[Vector[String]] depending on if you want to deal with mutable or immutable data. If you are dealing with truly large amount of data (or lost of requests of medium size data) and your processing can be done incrementally, then you can look at rolling your own body parser. That body parser would not generate a List[List[String]] but instead parse the lines as they come and fold each line into the incremental result. But this is quite a bit more complex to do, in particular if your CSV is using double quote to support fields with commas, newlines or double quotes.