How to use the load command with MySQL in Ruby/Rails? - ruby-on-rails

In MySQL, I can run:
LOAD DATA LOCAL INFILE "/home/pt/test/bal.csv" INTO TABLE bal FIELDS
TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '"' IGNORE 1
LINES;
However, in my Ruby program:
str="LOAD DATA LOCAL INFILE "/home/pt/test/bal.csv" INTO TABLE bal
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '"'
IGNORE 1 LINES;"
puts str
dbh.query(str)
The output is:
LOAD DATA LOCAL INFILE "/home/pt/test/bal.csv" INTO TABLE bal FIELDS
TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '"' IGNORE 1
LINES;
/home/pt/test/ptb.rb:34:in `query': Field separator argument is not what
is expected; check the manual (Mysql::Error)
from /home/pt/test/ptb.rb:34:in `<main>'
What's wrong with this code?

Remove the space in ENCLOSED BY ' \"'
str="LOAD DATA LOCAL INFILE \"/home/pt/test/bal.csv\" INTO TABLE bal FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES;"
You may also find useful the %Q[] Ruby syntax. It's analog of "", but you don't need to escape " inside the string:
str=%Q[LOAD DATA LOCAL INFILE "/home/pt/test/bal.csv" INTO TABLE bal FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES;]

Related

MemSQL load data infile does not support hexadecimal delimiter

From this, MySQL load data infile command works well with hexadecimal delimiter like X'01' or X'1e' in my case. But the same command can't be run with same command load data infile on MemSQL.
I tried specifying various forms of of the same delimiter \x1e like:
'0x1e' or 0x1e
X'1e'
'\x1e' or 'x1e'
All the above don't work and throw either syntax error or other error like this:
This is like the delimiter can't be resolved correctly:
mysql> load data local infile '/container/data/sf10/region.tbl.hex' into table REGION CHARACTER SET utf8 fields terminated by '\x1e' lines terminated by '\n';
ERROR 1261 (01000): Row 1 doesn't contain data for all columns
This is syntax error:
mysql> load data local infile '/container/data/sf10/region.tbl.hex' into table REGION CHARACTER SET utf8 fields terminated by 0x1e lines terminated by '\n';
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '0x1e lines terminated by '\n'' at line 1
mysql>
The data is actually delimited by non-printable hexadecimal character of \x1e and line terminated by regular \n. Use cat -A can see the delimited characters as ^^. So the delimiter should be correct.
$ cat -A region.tbl.hex
0^^AFRICA^^lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to $
1^^AMERICA^^hs use ironic, even requests. s$
Are there a correct way to use hex values as delimiter? I can't find such information in documentation.
For the purpose of comparison, hex delimiter (0x1e) can work well on MySQL:
mysql> load data local infile '/tmp/region.tbl.hex' into table region CHARACTER SET utf8 fields terminated by 0x1e lines terminated by '\n';
Query OK, 5 rows affected (0.01 sec)
Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
MemSQL supported hex delimiters as of 6.7, of the form in the last code block in your question. Prior to that, you would need the literal quoted 0x1e character in your sql string, which is annoying to do from a CLI. If youre on an older version you may need to upgrade.

How to deal with string-like values between xml tags in an xtext grammar

I am attempting to create a strongly defined xml language, but have run into trouble on element values between element tags. I want them to be treated like a string except they are not wrapped in quotes. Here is a basic grammar I created to demonstrate the idea:
grammar org.xtext.example.myxml.MyXml hidden(WS)
generate myXml "http://www.xtext.org/example/myxml/MyXml"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
Element:
{Element}
'<Element' attributes+=ElementAttribute* ('/>' | '>'
subElement+=SubElement*
'</Element' '>')
;
SubElement:
{SubElement}
'<SubElement' attributes+=SubElementAttribute* ('/>' | '>'
value=ElementValue
'</SubElement' '>')
;
ElementAttribute:
NameAttribute | TypeAttribute
;
SubElementAttribute:
NameAttribute
;
TypeAttribute:
'type' '=' type=STRING
;
NameAttribute:
'name' '=' name=STRING
;
ElementValue hidden():
value=ID
;
terminal STRING:
'"' ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|'"') )* '"' |
"'" ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|"'") )* "'"
;
terminal WS: (' '|'\t'|'\r'|'\n')+;
terminal ID: '^'?('a'..'z'|'A'..'Z'|'_'|'0'..'9'|':'|'-'|'('|')')*;
Here is a test to demonstrate its usage:
#Test
def void parseXML() {
val result = parseHelper.parse('''
<Element type="myType" name="myName">
<SubElement>some string:like-stuff here </SubElement>
</Element>
''')
Assert.assertNotNull(result)
val errors = result.eResource.errors
for (error : errors) {
println(error.message)
}
}
The error I get from this exact code is mismatched input 'string:like-stuff' expecting '</SubElement'
Obviously this will not work because ID does not allow for white space, adding white space to ID fixes the above error, but causes other issues parsing. So my question is how can I parse the element value into a string-like representation without causing ambiguity for the parser in other areas. The only way I have been able to get this to work in any form in my full language is by turning the ElementValue into a list of ID's separated by white space. (I could not get it to work on this minimal example however, not sure what is different)
I would not really recommend it because Xtext is usually not the best fit for XML parsing, but it would probably be possible by turning ElementValue into a datatype rule that allows everything that doesn't create an ambiguity.
Something along the lines of:
ElementValue returns ecore::EString hidden(): (ID|WS|STRING|UNMATCHED)+ ;
and at the end of the grammar:
terminal UNMATCHED: .;
You will probably want to make SubElement.value optional to allow for an empty element.
value=ElementValue?

Error EOF inside action Flex

I am creating a file that will be compiled with flex, but I am having trouble understanding why I am getting this error. I am inexperienced with this.
The error says line 43 (ie the last line) end of file inside the action.
What I have so far.
%{
#ifdef PRINT
#define TOKEN(t) printf("Token: " #t "/n");
#else
#define TOKEN(t) return(t);
#endif
%}
%%
"," TOKEN(COMMA")
";" TOKEN(SEMICOLON)
"->" TOKEN(ARROW)
"(" TOKEN(BRA)
")" TOKEN(KET)
"=" TOKEN(EQUALS)
"<>" TOKEN(LESMORE)
"<" TOKEN(LESS)_THAN)
">" TOKEN(MORE_THAN)
"<=" TOKEN(LESS_EQUAL)
">=" TOKEN(MORE_EQUAL)
"*" TOKEN(MULTIPLY)
"/" TOKEN(DIVIDE)
"'" TOKEN(CHAR_SHOW)
ENDP TOKEN(ENDP)
DECLARATIONS TOKEN(DECLARATIONS)
CHARACTER TOKEN(CHARACTER)
INTEGER TOKEN(INTEGER)
REAL TOKEN(REAL)
ENDIF TOKEN(ENDIF)
ELSE TOKEN(ELSE)
ENDDO TOKEN(ENDDO)
WHILE TOKEN(WHILE)
DO TOKEN(DO)
ENDWHILE TOKEN(ENDWHILE)
FOR TOKEN(FOR)
IS TOKEN(IS)
BY TOKEN(BY)
TO TOKEN(TO)
ENDFOR TOKEN(ENDFOR)
WRITE TOKEN(WRITE)
NEWLINE TOKEN(NEWLINE)
READ TOKEN(READ)
%%
Any help is appreciated
The first action is:
"," TOKEN(COMMA")
which has a mismatched quote.
Also, there is a problem with
"<" TOKEN(LESS)_THAN)
And it is not clear to me if all the lines from that one down are incorrectly indented by one space; if so, that is also a problem.
Finally, there is very little point in that TOKEN macro (which is probably copied from somewhere else where it is unnecessary) because you can use the --debug command-line option to Flex to produce very accurate scanner traces, and there is a similar tracing facility in bison which will also reveal the result of the scanner (including the name of the token, which the flex trace does not, unfortunately, provide).
You have two Typos, you have to change that to :
Line 10 : "," TOKEN(COMMA") --> "," TOKEN(COMMA)
Line 17: "<" TOKEN(LESS)_THAN) --> "<" TOKEN(LESS_THAN)

ANTLR4 - parse a file line-by-line

I try to write a grammar to parse a file line by line.
My grammar looks like this:
grammar simple;
parse: (line NL)* EOF
line
: statement1
| statement2
| // empty line
;
statement1 : KW1 (INT|FLOAT);
statement2 : KW2 INT;
...
NL : '\r'? '\n';
WS : (' '|'\t')-> skip; // toss out whitespace
If the last line in my input file does not have a newline, I get the following error message:
line xx:37 missing NL at <EOF>
Can somebody please explain, how to write a grammar that actually accepts the last line without newline
Simply don't require NL to fall after the last line. This form is efficient, and simplified based on the fact that line can match the empty sequence (essentially the last line will always be the empty line).
// best if `line` can be empty
parse
: line (NL line)* EOF
;
If line was not allowed to be empty, the following rule is efficient and performs the same operation:
// best if `line` cannot be empty
parse
: (line (NL line)* NL?)? EOF
;
The following rule is equivalent for the case where line cannot be empty, but is much less efficient. I'm including it because I frequently see people write rules in this form where it's easy to see that the specific NL token which was made optional is the one following the last line.
// INEFFICIENT!
parse
: (line NL)* (line NL?)? EOF
;

Token in JavaCC: make sure that a symbol is single on a line

I need "{" will be single on a line. Therefore I have to use a token that recognize it. This are right examples:
program
{
or
program
{
And this are incorrect examples:
program {
or
program
{ sentence;
Then I have a token like this:
TOKEN: { < openKey: "{" > {System.out.print(image +"\n");}}
SKIP: { < ( " " | "\r" | "\t" | "\n" )+ > }
But I can not think how to make the symbol "{" is exactly between one or more "\n". And after recognized it I have to write exactly:
program
{
If I try:
TOKEN: { < openKey: ( " " | "\r" | "\t" | "\n" )+ "{" ( " " | "\r" | "\t" | "\n" )+ > {System.out.print(image +"\n");}}
This runs but it writes so many "\n" like there was in the input.
The basic problem is that you're printing the input without any interpretation. In other words, what goes in is what comes out, as you've discovered.
To make it easier to read --- and in order to not be in some respects misusing the lexical analyzer by forcing it to do the entire task --- I recommend moving your print statement down into the parser (e.g., in the Start() function). (I actually tend to move all of my output out of the parser entirely unless I'm doing something really tiny that I'm never going to reuse, but that's for another question.)
Next, to address the actual problem, you have do some interpretation to get from a bunch of newlines to just one. The simplest way to do that is a basic replaceAll. Here's my Start() function, where openKey is defined just as you've done, and WORD is simply a concatenation of letters.
void Start() :
{
Token t;
}
{
(
t = <WORD>
{System.out.print((t.image).replaceAll("(\n)+","\n"));}
)*
(
t = <openKey>
{System.out.print((t.image).replaceAll("(\n)+","\n"));}
(
t = <WORD>
{System.out.print((t.image).replaceAll("(\n)+","\n"));}
)*
)*
<EOF>
}
So basically, this takes zero or more words, followed by the unit that consists of 1 or more newlines followed by the left curly brace followed by 1 or more newlines, followed by zero or more words, and outputs the words, the curly brace, and just 1 newline per 1-or-more-newline set.
If you can start a file with a curly brace, instead of requiring a word, then it outputs and empty line, a curly brace, and a newline. I don't know if that's what you want, being able to begin the output with an empty line, so you will need to play with the output code to get the exact formatting you're going for, plus, as you can see you've got some very nice repeated code in there that could be extracted into a function, so I leave that for an exercise for the reader.
Anyway, the basic premise of this answer is --- and I believe this is really something a maxim for the ages, suitable for use in all areas of life, not just coding --- "Unless you change what you take in before outputting it, it's going to be exactly what you took in!"
I did it differently:
TOKEN: { < openKey: "\n" (" " | "\t")* "{" (" " | "\t")* ("\r" | "\n") >{System.out.print("{\r\n");}}
SKIP: { " " | "\r" | "\t" | "\n" }
There were some problems with the carriage return, but this way works well.

Resources