How Can I Sanitize MySQL Query Parameters In Esper - esper

I am using Esper 5.0 and need to perform a query on data from a relational database. This is in my case a MySQL database connected via JDBC. Now I would like to formulate an EPL query on the data from the database. Hence, my query is similar to this one:
String parameter = "any"; // untrusted (!) parameter from some user input
String mySqlQuery = "SELECT `sth` FROM `mytable` WHERE `att` = " + parameter;
String query = "select sth from sql:myDB ['" + mySqlQuery + "']"
Now how can I sanitize the untrusted parameter which is then send to my MySQL database? The Esper documentation says that the query is basically passed to the database software unchanged. So what can I do? I came up with four ideas:
Using EPL substitution parameters: Does not work, because it is not supported by Esper in SQL expressions.
Using EPL variables: It should be possible to define a variable via something like epService.getEPAdministrator().getConfiguration().addVariable("parameter", String.class, parameter);, then use the syntax "SELECT `sth` FROM `mytable` WHERE `att` = ${parameter}" for the MySQL query. Esper should replace the variable with the respective value. It is not nice to define a global variable for it, although one could again remove it with epService.getEPAdministrator().getConfiguration().removeVariable("parameter", true); afterwards. But much more important: That also does not sanitize the untrusted parameter and does not make it safe to pass it to the database, right?
Sanitizing the parameters on the Java side: Queries on the Java side should be done via PreparedStatement. Since it is technically impossible to get a MySQL query string out of a PreparedStatement, this is not an option. I do not think that there is another safe way to sanitize a parameter passed to a database on the Java side.
Defining all the constraints on the EPL side: One could probably just do a SELECT * FROM `mytable` without defining any constraints in order to select everything from the MySQL database and then define the constraints via EPL and use a EPPreparedStatement for it. Is this the proper way to go? I fear that this is not really performant, because a lot of entries are read from the MySQL database which are not needed.
Any ideas?

Would it work with the statement object model API? That API gives you complete control over all parts of EPL.

Related

Extract SQL queries from Snowflake Stored-Procedure using Snowflake-JDBC API

I have requirement to extract SQL queries from snowflake stored procedure. I have decided to extract SQL queries via Snowflake-JDBC API.
I have analyzed Java documentation of Snowflake-JDBC API but unfortunately could not find any methods to extract SQL queries from stored procedure. I found a class namely QueryExecDTO in Snowflake-JDBC API , which has getSqlText() method but it is of no use in my concern (I have to extract SQL from stored procedure). I am also aware of Snowflake-JavaScript API's Statement object , which has method getSqlText() to get text of SQL queries but it can be use inside JavaScript only as it is part of JavaScript-API
Is there any way to extract SQL from stored procedure using Snowflake-JDBC API?
You would need to run something like:
select get_ddl('procedure', '*proc_name*(*arg list*)');
To get the text of the SP and then you would need to parse that text to extract the SQL statements.
If you want to just extract the SQL statements that should be relatively straightforward; however if you want to parse the statements to, for example, list the tables being used, then you are going to struggle.
Parsing SQL is incredibly complex (given how flexible the language is) which is illustrated by the fact that there are very few general SQL parsers available - and those that actually work are not cheap.

Implementing a unique surrogate key in Advantage Database Server

I've recently taken over support of a system which uses Advantage Database Server as its back end. For some background, I have years of database experience but have never used ADS until now, so my question is purely about how to implement a standard pattern in this specific DBMS.
There's a stored procedure which has been previously developed which manages an ID column in this manner:
#ID = (SELECT ISNULL(MAX(ID), 0) FROM ExampleTable);
#ID = #ID + 1;
INSERT INTO Example_Table (ID, OtherStuff)
VALUES (#ID, 'Things');
--Do some other stuff.
UPDATE ExampleTable
SET AnotherColumn = 'FOO'
WHERE ID = #ID;
My problem is that I now need to run this stored procedure multiple times in parallel. As you can imagine, when I do this, the same ID value is getting grabbed multiple times.
What I need is a way to consistently create a unique value which I can be sure will be unique even if I run the stored procedure multiple times at the same moment. In SQL Server I could create an IDENTITY column called ID, and then do the following:
INSERT INTO ExampleTable (OtherStuff)
VALUES ('Things');
SET #ID = SCOPE_IDENTITY();
ADS has autoinc which seems similar, but I can't find anything conclusively telling me how to return the value of the newly created value in a way that I can be 100% sure will be correct under concurrent usage. The ADS Developer's Guide actually warns me against using autoinc, and the online help files offer functions which seem to retrieve the last generated autoinc ID (which isn't what I want - I want the one created by the previous statement, not the last one created across all sessions). The help files also list these functions with a caveat that they might not work correctly in situations involving concurrency.
How can I implement this in ADS? Should I use autoinc, some other built-in method that I'm unaware of, or do I genuinely need to do as the developer's guide suggests, and generate my unique identifiers before trying to insert into the table in the first place? If I should use autoinc, how can I obtain the value that has just been inserted into the table?
You use LastAutoInc(STATEMENT) with autoinc.
From the documentation (under Advantage SQL->Supported SQL Grammar->Supported Scalar Functions->Miscellaneous):
LASTAUTOINC(CONNECTION|STATEMENT)
Returns the last used autoinc value from an insert or append. Specifying CONNECTION will return the last used value for the entire connection. Specifying STATEMENT returns the last used value for only the current SQL statement. If no autoinc value has been updated yet, a NULL value is returned.
Note: Triggers that operate on tables with autoinc fields may affect the last autoinc value.
Note: SQL script triggers run on their own SQL statement. Therefore, calling LASTAUTOINC(STATEMENT) inside a SQL script trigger would return the lastautoinc value used by the trigger's SQL statement, not the original SQL statement which caused the trigger to fire. To obtain the last original SQL statement's lastautoinc value, use LASTAUTOINC(CONNECTION) instead.
Example: SELECT LASTAUTOINC(STATEMENT) FROM System.Iota
Another option is to use GUIDs.
(I wasn't sure but you may have already been alluding to this when you say "or do I genuinely need to do as the developer's guide suggests, and generate my unique identifiers before trying to insert into the table in the first place." - apologies if so, but still this info might be useful for others :) )
The use of GUIDs as a surrogate key allows either the application or the database to create a unique identifier, with a guarantee of no clashes.
Advantage 12 has built-in support for a GUID datatype:
GUID and 64-bit Integer Field Types
Advantage server and clients now support GUID and Long Integer (64-bit) data types in all table formats. The 64-bit integer type can be used to store integer values between -9,223,372,036,854,775,807 and 9,223,372,036,854,775,807 with no loss of precision. The GUID (Global Unique Identifier) field type is a 16-byte data structure. A new scalar function NewID() is available in the expression engine and SQL engine to generate new GUID. See ADT Field Types and Specifications and DBF Field Types and Specifications for more information.
http://scn.sap.com/docs/DOC-68484
For earlier versions, you could store the GUIDs as a char(36). (Think about your performance requirements here of course.) You will then need to do some conversion back and forth in your application layer between GUIDs and strings. If you're using some intermediary data access layer, e.g. NHibernate or Entity Framework, you should be able to at least localise the conversions to one place.
If some part of your logic is in a stored procedure, you should be able to use the newid() or newidstring() function, depending on the type of the backing column:
INSERT INTO Example_Table (newid(), OtherStuff)

Setting node labels with a parameter

I'm trying to pile a load of Twitter data into Neo4J using the .Net Neo4JClient. It's essentially the same type of Twitter user data for each node, but some of the nodes have a different significance to others, hence I would like to label them differently.
(I'm brand new both to Neo4J and the client, too).
So I've been trying to label them like so:
var query = _client.Cypher
.Create("(primaryNode:nodeLabel {twitterUser})")
.WithParams(new { nodeLabel = "nodeType", twitterUser } );
query.ExecuteWithoutResults();
Note: I split out the ExecuteWithoutResults so I could debug the query, and it is registering the parameters OK. The documentation here:
https://github.com/Readify/Neo4jClient/wiki/cypher#explicit-parameters
... suggests that parameters can be created "at any point in the fluent query" - but the Neo documentation about parameters here:
http://docs.neo4j.org/chunked/1.8.2/cypher-parameters.html
... kind of suggests otherwise, that parameters are specifically for things like WHERE clauses, indexes and relationship Ids.
Anyway - when I execute the above, I get a shiny new node with the label "nodeLabel" - so the parameter ain't working. Could somebody clarify whether or not I'm just making a dumb newbie mistake?
You can call WithParams whenever you want in the query. That's what the Neo4jClient doco means about "at any point in the fluent query".
However, Neo4j only supports parameters in certain parts of the Cypher text. If the parameter would affect the query plan, it's not allowed.
In this case, you cannot use parameters for labels. You will need to actually construct the query dynamically if you want to do that.
Edit: Even if this was a supported place for parameters, you'd at least have to write {nodeLabel} in your Cypher instead of just nodeLabel.

Map a Rails Custom SQL Query to an ActiveRecord Model

I'm setting up a custom query that uses a range of OR statements in conjunction with BETWEEN statements and a final GROUP BY id HAVING COUNT(*) >= #{tolerance}. Not to mention INNER and LEFT join operations.
I would assume that it would not be possible to setup using active record. So I used the Model.connection.select_all() command to fire a query. This works, but how do I not map all of the rows to that specific model?
Rails is pretty powerful especially if you are using Rails 3 & ARel. So I wouldn't be surprised if you actually could write your query using rails.
However, there will always be times when writing raw SQL is desired.
To do that, instead of Model.connection use Model.find_by_sql(QUERY_STRING).
This way the query will get parsed for you automatically just make sure you only select "model.*"

MSSQL2000: Using a stored procedure results as a table in sql

Let's say I have 'myStoredProcedure' that takes in an Id as a parameter, and returns a table of information.
Is it possible to write a SQL statement similar to this?
SELECT
MyColumn
FROM
Table-ify('myStoredProcedure ' + #MyId) AS [MyTable]
I get the feeling that it's not, but it would be very beneficial in a scenario I have with legacy code & linked server tables
Thanks!
You can use a table value function in this way.
Here is a few tricks...
No it is not - at least not in any official or documented way - unless you change your stored procedure to a TVF.
But however there are ways (read) hacks to do it. All of them basically involved a linked server and using OpenQuery - for example seehere. Do however note that it is quite fragile as you need to hardcode the name of the server - so it can be problematic if you have multiple sql server instances with different name.
Here is a pretty good summary of the ways of sharing data between stored procedures http://www.sommarskog.se/share_data.html.
Basically it depends what you want to do. The most common ways are creating the temporary table prior to calling the stored procedure and having it fill it, or having one permanent table that the stored procedure dumps the data into which also contains the process id.
Table Valued functions have been mentioned, but there are a number of restrictions when you create a function as opposed to a stored procedure, so they may or may not be right for you. The link provides a good guide to what is available.
SQL Server 2005 and SQL Server 2008 change the options a bit. SQL Server 2005+ make working with XML much easier. So XML can be passed as an output variable and pretty easily "shredded" into a table using the XML functions nodes and value. I believe SQL 2008 allows table variables to be passed into stored procedures (although read only). Since you cited SQL 2000 the 2005+ enhancements don't apply to you, but I mentioned them for completeness.
Most likely you'll go with a table valued function, or creating the temporary table prior to calling the stored procedure and then having it populate that.
While working on the project, I used the following to insert the results of xp_readerrorlog (afaik, returns a table) into a temporary table created ahead of time.
INSERT INTO [tempdb].[dbo].[ErrorLogsTMP]
EXEC master.dbo.xp_readerrorlog
From the temporary table, select the columns you want.

Resources