Influxdb Move Copy data between databases within Influxdb - influxdb

I have my_db1, my_db2, my_db3 in Influxdb, now is there a way to move or copy data between these databases with a query?

InfluxQL provides an INTO clause that can be used to copy data between databases.
For example, if I had the point cpu,host=server1 value=100 123 in db_1 and wanted to copy that data to the point new_cpu,host=server1 value=100 123 in db_2. I could issue the following query:
SELECT * INTO db_2..new_cpu FROM db_1..cpu group by *
For more information, see the documentation

Related

Influxdb query on tag returns nothing

I have an Influxdb with lots of fields and a single tag:
> show tag keys
name: rtl433
tagKey
------
model
Now, I want a list of all possible values for model, so I run
SELECT model FROM rtl433
>
-and it returns nothing. Why? There's lots of data in model if I select *.
You are trying to use classic SQL solution, but InfluxDB is not classic SQL DB. You should check InfluxDB doc and you will find solution:
SHOW TAG VALUES WITH KEY = "model"

Resulting KSQL join stream shows no data

I am joining a KSQL stream and a KSQL Table. Both are mapped to same key.
But no data is coming to the resulting stream.
create stream kz_yp_loan_join_by_bandid WITH (KAFKA_TOPIC='kz_yp_loan_join_by_bandid',VALUE_FORMAT='AVRO') AS
select ypl.loan_id, ypl.userid ,ypk.name as user_band_id_name
FROM kz_yp_loan_stream_partition_by_bandid ypl
INNER JOIN kz_yp_key_table ypk
ON ypl.user_band_id = ypk.id;
No data is in stream kz_yp_loan_join_by_bandid
But if I do simply :
select ypl.loan_id, ypl.userid ,ypk.name as user_band_id_name
FROM kz_yp_loan_stream_partition_by_bandid ypl
INNER JOIN kz_yp_key_table ypk
ON ypl.user_band_id = ypk.id;
There is data present.
It shows that stream is not written but why is it so?
I have tried doing entire setup again.
A few things to check:
If you want to process all the existing data as well as new data, make sure that before you run your CREATE STREAM … AS SELECT ("CSAS") you have run SET 'auto.offset.reset' = 'earliest';
If the join is returning data when run outside of the CSAS then this may not be relevant, but always good to check your join is going to match all the requirements
Check the KSQL server log in case there's an issue with writing to the target topic, creating the schema on the Schema Registry, etc.
These references will be useful:
https://www.confluent.io/blog/troubleshooting-ksql-part-1
https://www.confluent.io/blog/troubleshooting-ksql-part-2

How to clone complete database in influxDB

I am creating back up of measurement with in Db as shown below:
SELECT * INTO backdata FROM sourcedata
Above command created backdata measurement. Is there a way to clone complete DB with all measurement with different dbname?
First you'll need to create a new database
CREATE DATABASE mydb
Then the following query should work
SELECT * INTO mydb..:MEASUREMENT FROM /.*/ GROUP BY *

Can I use another dataframe column to query spark sql

I have two huge tables in Hive. 'table 1' and 'table 2'. Both table has a common column 'key'.
I have queried 'table 1' with the desired conditions and created a DataFrame 'df1'.
Now, I want to query 'table 2' and want to use a column from 'df1' in the where clause.
Here is the code sample:
val df1 = hiveContext.sql("select * from table1 limit 100")
Can I do something like
val df2 = hiveContext.sql("select * from table2 where key = df1.key")
** Note : I don't want to make a single query with joining both tables
Any help will be appreciated.
Since you have explicitly written that you do NOT want to join the tables, then the short answer is "No, you cannot do such a query".
I'm not sure why you don't want to do the join, but it is definitely needed if you want to do the query. If you are worried about joining two "huge tables", then don't be. Spark was build for this kind of thing :)
The solution that I found is the following
Let me first give the dataset size.
Dataset1 - pretty small (10 GB)
Dataset2 - big (500 GB+)
There are two solutions to dataframe joins
Solution 1
If you are using Spark 1.6+, repartition both dataframes by the
column on which join has to be done. When I did it, the join was done
in less than 2 minutes.
df.repartition(df("key"))
Solution 2
If you are not using Spark 1.6+ (even if using 1.6+), if one
data is small, cache it and use that in broadcast
df_small.cache
df_big.join(broadcast(df_small) , "key"))
This was done in less than a minute.

How to get results of stored procedure #1 into a temporary table in stored procedure #2

I am trying to combine the results of several stored procedures into a single temporary table. The results of the various stored procedures have the same column structure. Essentially, I would like to UNION ALL the results of the various stored procedures. A significant fact: each of the stored procedures creates a temporary table to store its data and the results each returns are based on a select against the temporary table:
create proc SP1
as
.
. <snip>
.
select * from #tmp -- a temporary table
Noting that select * from OPENQUERY(server, 'exec SP1') does not work if the select in SP1 is against a temporary table (see this question for details), is there another way for a different stored proc, SP2, to get the results of executing SP1 into a temporary table?
create proc SP2 as
-- put results of executing SP1 into a temporary table:
.
.
.
NOTE: SP1 cannot be modified (e.g. to store its results in a temporary table with session scope).
Create your temporary table such that it fits the results of your stored procedures.
Assuming your temp. table is called "#MySuperTempTable", you would do something like this...
INSERT INTO #MySuperTempTable (Column1, Column2)
EXEC SP1
That should do the trick.
INSERT INTO #MySuperTempTable
EXEC SP1
Following above code style but sqlserver is not compialing instructions.

Resources