Limit of 1024 stream entries in the handler in DolphinDB subscription? - stream

n=1000000
tmpTrades = table(n:0, colNames, colTypes)
lastMinute = [00:00:00.000]
colNames = `time`sym`vwap
colTypes = [MINUTE,SYMBOL,DOUBLE]
enableTableShareAndPersistence(table=streamTable(n:0, colNames, colTypes), tableName="vwap_stream")
go
def calcVwap(mutable vwap, mutable tmpTrades, mutable lastMinute, msg){
tmpTrades.append!(msg)
curMinute = time(msg.time.last().minute()*60000l)
t = select wavg(price, qty) as vwap from tmpTrades where time < curMinute, time >= lastMinute[0] group by time.minute(), sym
if(t.size() == 0) return
vwap.append!(t)
t = select * from tmpTrades where time >= curMinute
tmpTrades.clear!()
lastMinute[0] = curMinute
if(t.size() > 0) tmpTrades.append!(t)
}
subscribeTable(tableName="trades_stream", actionName="vwap", offset=-1, handler=calcVwap{vwap_stream, tmpTrades, lastMinute}, msgAsTable=true)
This is what I wrote to subscribe to the stream. Even though the data ingested to the publishing table vwap_stream is 5000 records per batch, the maximum number ingested to the handler is still 1024. Is there a limit to the subscription?

You can modify the configuration parameter maxMsgNumPerBlock to specify the maximum number of records in a message block and the default value is 1024.
For a standalone mode the configuration file is dolphindb.cfg, and for a clustered mode is cluster.cfg.

Related

How to find the byte size of a table in Lua

I'm writing a log aggregator and I want to send the logs if it reaches a max byte size. Thus is there a way in Lua to get to know the size of the variable (active_batch size)?
local batch = {
flush_timeout
retry_count
batch_max_size
batch_count
batch_to_execute = {},
active_batch = { entries = {}, count = 0, retries = 0 }
}
You only can have total memory used by LUA by collectgarbage.
In this case I think that storing string len and sum of it will work.

Split events based on criteria and handle in order

Having the following problem: given a list of events that have a partitionId property (0-10 for example), I'd like incoming events to be split according to the paritionId so that events with same partitionId are handled in order they're received.
With more or less even distribution, that would lead to 10 events (for each partition) being handled in parallel.
Besides creating 10 single-threaded dispatchers and sending the event to the right dispatcher, is there a way to accomplish the above using Project Reactor ?
Thanks.
The code below
splits source stream into partitions,
creates ParallelFlux, one "rail" per partition,
schedules "rails" into separate threads,
collects the results
Having dedicated thread for each partition guaranties its values are processed in original order.
#Test
public void partitioning() throws InterruptedException {
final int N = 10;
Flux<Integer> source = Flux.range(1, 10000).share();
// partition source into publishers
Publisher<Integer>[] publishers = new Publisher[N];
for (int i = 0; i < N; i++) {
final int idx = i;
publishers[idx] = source.filter(v -> v % N == idx);
}
// create ParallelFlux each 'rail' containing single partition
ParallelFlux.from(publishers)
// schedule partitions into different threads
.runOn(Schedulers.newParallel("proc", N))
// process each partition in its own thread, i.e. in order
.map(it -> {
String threadName = Thread.currentThread().getName();
Assert.assertEquals("proc-" + (it % 10 + 1), threadName);
return it;
})
// collect results on single 'rail'
.sequential()
// and on single thread called 'subscriber-1'
.publishOn(Schedulers.newSingle("subscriber"))
.subscribe(it -> {
String threadName = Thread.currentThread().getName();
Assert.assertEquals("subscriber-1", threadName);
});
Thread.sleep(1000);
}

how to fetch all the records every minute from a sql table using Apache Flume

I am trying to get all the data from sql table every minute using Flume.
Can someone please suggest what config changes needs to be done?
Configs :
agent.channels = ch1
agent.sinks = kafkaSink
agent.sources = sql-source
agent.channels.ch1.type = memory
agent.channels.ch1.capacity = 1000000
agent.sources.sql-source.channels = ch1
agent.sources.sql-source.type = org.keedio.flume.source.SQLSource
# URL to connect to database
agent.sources.sql-source.connection.url = jdbc:sybase:Tds:abcServer:4500
# Database connection properties
agent.sources.sql-source.user = user
agent.sources.sql-source.password = XXXXXXX
agent.sources.sql-source.table = person
agent.sources.sql-source.columns.to.select = *
# Increment column properties
agent.sources.sql-source.incremental.column.name = person_id
# Increment value is from you want to start taking data from tables (0 will import entire table)
agent.sources.sql-source.incremental.value = 0
# Query delay, each configured milisecond the query will be sent
agent.sources.sql-source.run.query.delay=1000
# Status file is used to save last readed row
agent.sources.sql-source.status.file.path = /dump/apache-flume-1.6.0-bin
agent.sources.sql-source.status.file.name = sql-source.status
Change value of agent.sources.sql-source.run.query.delay to 60000..

How to setup the correct logic for picking a random item from a list based on item's rarity i.e "rare" "normal"

I'm writing a game using Corona SDK in lua language. I'm having a hard time coming up with a logic for a system like this;
I have different items. I want some items to have 1/1000 chance of being chosen (a unique item), I want some to have 1/10, some 2/10 etc.
I was thinking of populating a table and picking a random item. For example I'd add 100 of "X" item to the table and than 1 "Y" item. So by choosing randomly from [0,101] I kind of achieve what I want but I was wondering if there were any other ways of doing it.
items = {
Cat = { probability = 100/1000 }, -- i.e. 1/10
Dog = { probability = 200/1000 }, -- i.e. 2/10
Ant = { probability = 699/1000 },
Unicorn = { probability = 1/1000 },
}
function getRandomItem()
local p = math.random()
local cumulativeProbability = 0
for name, item in pairs(items) do
cumulativeProbability = cumulativeProbability + item.probability
if p <= cumulativeProbability then
return name, item
end
end
end
You want the probabilities to add up to 1. So if you increase the probability of an item (or add an item), you'll want to subtract from other items. That's why I wrote 1/10 as 100/1000: it's easier to see how things are distributed and to update them when you have a common denominator.
You can confirm you're getting the distribution you expect like this:
local count = { }
local iterations = 1000000
for i=1,iterations do
local name = getRandomItem()
count[name] = (count[name] or 0) + 1
end
for name, count in pairs(count) do
print(name, count/iterations)
end
I believe this answer is a lot easier to work with - albeit slightly slower in execution.
local chancesTbl = {
-- You can fill these with any non-negative integer you want
-- No need to make sure they sum up to anything specific
["a"] = 2,
["b"] = 1,
["c"] = 3
}
local function GetWeightedRandomKey()
local sum = 0
for _, chance in pairs(chancesTbl) do
sum = sum + chance
end
local rand = math.random(sum)
local winningKey
for key, chance in pairs(chancesTbl) do
winningKey = key
rand = rand - chance
if rand <= 0 then break end
end
return winningKey
end

I need to get more than 100 pages in my query

I want to get all video information posible from Youtube for my proyect. I know that the limit page is 100.
I do the next code:
ArrayList<String> videos = new ArrayList<>();
int i = 1;
String peticion = "http://gdata.youtube.com/feeds/api/videos?category=Comedy&alt=json&max-results=50&page=" + i;
URL oracle = new URL(peticion);
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine = in.readLine();
while (in.readLine() != null)
{
inputLine = inputLine + in.readLine();
}
System.out.println(inputLine);
JSONObject jsonObj = new JSONObject(inputLine);
JSONObject jsonFeed = jsonObj.getJSONObject("feed");
JSONArray jsonArr = jsonFeed.getJSONArray("entry");
while(i<=100)
{
for (int j = 0; j < jsonArr.length(); j++) {
videos.add(jsonArr.getJSONObject(j).getJSONObject("id").getString("$t"));
System.out.println("Numero " + videosTotales + jsonArr.getJSONObject(j).getJSONObject("id").getString("$t"));
videosTotales++;
}
i++;
}
When the program finish, I have 5000 videos per category, but I need much more, much much more, but the limit is page = 100.
So, how can I get more than 10 millions of videos?
Thank you!
Are those 5000 also unique id's ?
I see the use of max-results=50, but not a start-index parameter in your url.
There is a limit on the results you can get per request. There is also a limit on the number of requests that you can send within some time interval. By checking the statuscode of the response and any error message you can find these limits, as they may change in time.
Besides the category parameter, use some other parameters too. For instance, you may vary the q parameter (used with some keywords) and/or order parameter to get a different results set.
See the documentation for available parameters.
Note, that you are using api version 2, which is deprecated. There is an api version 3.

Resources