I am getting live tick data consisting of Time, Symbol Name, Last Traded Price, Cumulative Volume (Daily).
Now how to get VWAP using 1) Custom function 2) TimeSeriesEngine 3) ReactiveStateEngine with DolphinDB? Please Help me. Necessary code is as under.
This is stream table for getting ticks from python
t_colNames=`ts`symbol`price`vol`upd_tick
t_colTypes=`TIMESTAMP`SYMBOL`DOUBLE`DOUBLE`TIMESTAMP
This is stream table to store 1 min OHLC data
ohlc_colNames=`ts`symbol`open`high`low`close`volume`tp`last_tick`upd_1m
ohlc_colTypes=`TIMESTAMP`SYMBOL`DOUBLE`DOUBLE`DOUBLE`DOUBLE`DOUBLE`DOUBLE`TIMESTAMP`TIMESTAMP
This is 1 min OHLC TimeSeriesEngine
OHLC_sm1 = createTimeSeriesEngine(name="OHLC_sm1", windowSize=60000, step=60000, metrics=<[first(price) as open, max(price) as high, min(price) as low, last(price) as close, sum(vol) as volume, (max(price)+min(price)+last(price))/3 as tp, last(upd_tick) as last_tick, now() as upd_1m]>, dummyTable=tmp, outputTable=sm1 , timeColumn=`ts, useSystemTime=true, keyColumn=`symbol, updateTime=60000, useWindowStartTime=false);
This is the function to convert cumulative volume to volume
def calcVolume(mutable dictVolume, mutable tsAggrOHLC, msg){
t = select ts,symbol,price,vol,upd_tick from msg context by symbol limit -1
update t set prevVolume = dictVolume[symbol]
dictVolume[t.symbol] = t.vol
tsAggrOHLC.append!(t.update!("vol", <vol-prevVolume>))
}
dictVol = dict(STRING, DOUBLE)
subscribeTable(tableName="t", actionName="OHLC_sm1", offset=0, handler=calcVolume{dictVol,OHLC_sm1}, msgAsTable=true, hash=1)
I recommend using ReactiveStateEngine to convert cumulative volume to volume and then connecting two engines in series. Here is an example:
tradesData = your_tick_data
//define Trade Table
x=tradesData.schema().colDefs
share streamTable(100:0, x.name, x.typeString) as Trade
//define OHLC outputTable
share streamTable(100:0, `datetime`symbol`open`high`low`close`volume`updatetime,[TIMESTAMP,SYMBOL,DOUBLE,DOUBLE,DOUBLE,DOUBLE,LONG,TIMESTAMP]) as OHLC
//1 min OHLC TimeSeriesEngine
tsAggrOHLC = createTimeSeriesAggregator(name="aggr_ohlc", windowSize=60000, step=60000, metrics=<[first(Price),max(Price),min(Price),last(Price),wavg(Price,Volume),now()]>, dummyTable=Trade, outputTable=OHLC, timeColumn=`Datetime, keyColumn=`Symbol)
//ReactiveStateEngine:convert cumulative volume to volume
rsAggrOHLC = createReactiveStateEngine(name="calc_vol", metrics=<[Datetime, Price, deltas(Volume) as Volume]>, dummyTable=Trade, outputTable=tsAggrOHLC, keyColumn=`Symbol)
//subscribe table and insert data into engines
subscribeTable(tableName="Trade", actionName="minuteOHLC2", offset=0, handler=append!{rsAggrOHLC}, msgAsTable=true)
replay(inputTables=tradesData, outputTables=Trade, dateColumn=`Datetime)
You can use user-defined functions in any of the engine's matrics.
Related
I want to be able to pick say C3 from a list of Google spreadsheets in a folder.
I have a bunch of structurally identical sheets, but I'd like to be able to provide a sum of the values in C3 across say a hundred sheets in a directory.
Ultimately, would be great to highlight the largest or smallest value of C3 in a directory.
This could be useful in many places where you want to be able to aggregate, aggregate data.
SUGGESTION
If you have hundreds of Google spreadsheet files in a Google Drive folder, I agree with #player0 that it is best to use a script. With the Apps Script, you can:
Automate the process in iterating through Spreadsheet files in your Drive folder.
Filter only the Google Spreadsheet type (e.g you have a bunch of
different file types inside).
Get the range data & process them the way you want.
See this sample below that was derived from existing resources:
Script:
function readSheetsInAFolder() {
//FOLDER_ID is your drive folder ID
var query = '"FOLDER_ID" in parents and trashed = false and ' +
'mimeType = "application/vnd.google-apps.spreadsheet"';
var range = "C3"; //The range to look for on every Spreadsheet files in the Drive folder
var files, pageToken;
var finalRes = [];
do {
files = Drive.Files.list({
q: query,
maxResults: 100,
pageToken: pageToken
});
files.items.forEach(sheet => {
finalRes.push(viewRangeValue(range, sheet.id));
})
pageToken = files.nextPageToken;
} while (pageToken);
const arrSum = array =>
array.reduce(
(sum, num) => sum + (Array.isArray(num) ? arrSum(num) : num * 1),
0
);
var max = Math.max.apply(null, finalRes.map(function(row){ return Math.max.apply(Math, row) })); //Gets the largest number
var min = Math.min.apply(null, finalRes.map(function(row){ return Math.min.apply(Math, row); })); //Gets the smallest number
var sum = arrSum(finalRes) // Gets the sum
console.log('RANGE VALUES: %s \nRANGE: %s \nTOTAL SHEET(s) FOUND: %s \n________________\nSUM OF VALUES: %s \nLargest Value: %s \nSmallest Value: %s',finalRes,range, files.items.length,sum,max,min)
}
function viewRangeValue(range, sheetID) {
var sid = sheetID;
var rn = range;
var parms = { valueRenderOption: 'UNFORMATTED_VALUE', dateTimeRenderOption: 'SERIAL_NUMBER' };
var res = Sheets.Spreadsheets.Values.get(sid, rn, parms);
return res.values.map(num => {return parseInt(num)});
}
Demonstration:
Sample Test Drive Folder (w/ 3 test Spreadsheet files):
Every C3 cell on each of these 3 files contain either 0,10 or 6 value.
On the Apps Script Editor, I've added the Drive & Sheets API on the services:
Result
After running the script:
Resources:
Advanced Drive Service
Drive API Files: list
Sheets API spreadsheets.values.get
Max Value of an array
n=1000000
tmpTrades = table(n:0, colNames, colTypes)
lastMinute = [00:00:00.000]
colNames = `time`sym`vwap
colTypes = [MINUTE,SYMBOL,DOUBLE]
enableTableShareAndPersistence(table=streamTable(n:0, colNames, colTypes), tableName="vwap_stream")
go
def calcVwap(mutable vwap, mutable tmpTrades, mutable lastMinute, msg){
tmpTrades.append!(msg)
curMinute = time(msg.time.last().minute()*60000l)
t = select wavg(price, qty) as vwap from tmpTrades where time < curMinute, time >= lastMinute[0] group by time.minute(), sym
if(t.size() == 0) return
vwap.append!(t)
t = select * from tmpTrades where time >= curMinute
tmpTrades.clear!()
lastMinute[0] = curMinute
if(t.size() > 0) tmpTrades.append!(t)
}
subscribeTable(tableName="trades_stream", actionName="vwap", offset=-1, handler=calcVwap{vwap_stream, tmpTrades, lastMinute}, msgAsTable=true)
This is what I wrote to subscribe to the stream. Even though the data ingested to the publishing table vwap_stream is 5000 records per batch, the maximum number ingested to the handler is still 1024. Is there a limit to the subscription?
You can modify the configuration parameter maxMsgNumPerBlock to specify the maximum number of records in a message block and the default value is 1024.
For a standalone mode the configuration file is dolphindb.cfg, and for a clustered mode is cluster.cfg.
This may have been explained elsewhere, but not finding it. I have to work within the confines of wireshark 2.4.x.
So I defined some values for data as so.
{ &hf_td_timestamp,
{ "Timestamp", "td.timestamp",
FT_ABSOLUTE_TIME, ABSOLUTE_TIME_LOCAL, NULL, 0x0, NULL, HFILL
} },
{ &hf_td_timestamp_sec,
{ "Timestamp Seconds", "td.timestamp.sec",
FT_UINT64, BASE_DEC, NULL, 0x0, NULL, HFILL
} },
{ &hf_td_timestamp_nsec,
{ "Timestamp nSeconds", "td.timestamp.nsec",
FT_UINT32, BASE_DEC, NULL, 0x0, NULL, HFILL
} },
and the data for one of them gets stored and added to the dissect tree as so
proto_tree_add_item(td_tree, hf_td_timestamp, tvb, offset, 8, ENC_TIME_TIMESPEC);
I only want to display the one line item and not all three. The information for the other two are of course derived from the same bytes. Ultimately I would like to have the other fields available for adding to the columns and not the detail dissect.
That is part 1. Once I can establish the storing of the information I will of course add the other two as a single line into the detail as seconds.nanoseconds. The values just need to be stored separately so that the data can be parsed in an Excel csv file. Excel cannot handle the precision of the nanoseconds in decimal format, that is why they must be separate.
Part 2: store some metadata that is calculated from known fields. Specifically the delta between these timestamps. Wireshark can give the delta of the recorded timestamp but not the timestamp within the payload. So basically store the delta between the payload timestamp with the same port information as the last packet from the same port. Once I can get past part 1 then I should be able to accomplish part 2.
So, is there a function that will parse the tvb and only store the value as opposed to store to be displayed?
proto_tree_add_item(td_tree, hf_td_timestamp, tvb, offset, 8, ENC_TIME_TIMESPEC);
proto_item * ti_sec = proto_tree_add_item(td_tree, hf_td_timestamp_sec, tvb, offset-4, 8, ENC_BIG_ENDIAN);
proto_item * ti_nsec = proto_tree_add_item(td_tree, hf_td_timestamp_nsec, tvb, offset+4, 8, ENC_BIG_ENDIAN);
PROTO_ITEM_SET_HIDDEN(ti_sec);
PROTO_ITEM_SET_HIDDEN(ti_nsec);
If the proto_tree_add_item() appears in the PROTO_ITEM_SET_HIDDEN() then the item will be displayed. PROTO_ITEM_SET_HIDDEN() works on previously declared items not currently created items.As for part 2:The metadata was created by generating a GHashTable for each new metadata item to hold the payload timestamps for each of the line numbers. (If there is a better way I would like to know. Like access the completed list if stored in wireshark as opposed to creating my own). Generate my timestamp deltas and convert the values into a tvb, then read the tvb back to store in the fields.
GHashTable * timestamp_map = NULL;
GHashTable * timestamp_delta = NULL;
.
.
.
// Store the timestamp (key = linenum, value = timestamp)
timestamp_map = g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL, g_free);
// Store the list of line numbers from the associated client
timestamp_delta = g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL, g_list_free);
.
.
.
// find your deltas through your lookups
// convert to tvb and then to fields
// Create a new TVB to delta
tvbuff_t *tvbtmp = tvb_new_real_data(vals.buffer, 8, 8);
// Store in the field hf_tu_timestamp_delta
proto_tree_add_item(tree, hf_tu_timestamp_delta, tvbtmp, 0, 8, ENC_LITTLE_ENDIAN);
NOTE: For very large captures the memory usage will be large (very large and expensive).
I am trying to query 2 long columns for agents' name, the issue is the names are repeated on 2 tables, one for the total sum of productivity and the other is for total sum of utilization.
The thing is when I query the columns it returns back the numbers for Productivity and Utilization all together.
How can I make the query to search only for Productivity alone and for Utilization alone?
Link is here: https://docs.google.com/spreadsheets/d/12Sydw6ejFobySHUj5JoYkAPbhr0mKoInCWxtHY1W4lk/edit#gid=0
Apps Script would be a better solution in this case. The code below works as follows:
Gets the names from Column D and Column A.
For each name of Column D, it will compare it with each name of Column A (that's the 2 for loops)
If the names coincide (first if), it will check the background color (second if) of the Column A name to accumulate Total Prod and Total Util.
Once it reaches the end of the Column A, writes the values in Total Prod and Total Util (Columns E and F) for each name in D.
function onOpen() { //Will run every time you open the sheet
//Gets the active Spreadsheet and sheet
let sprsheet = SpreadsheetApp.getActiveSpreadsheet();
let sheet = sprsheet.getActiveSheet();
var lastRow = sheet.getLastRow();
var getNames = sheet.getRange(3, 1, lastRow).getValues(); //Names from row 2, col 1, until the last row
var totalNames = sheet.getRange("D4:D5").getValues(); //Change the range for more names
let prodColor = '#f2f4f7'; //hexadecimal codes of the background colors of names in A
let utilColor = '#cfe2f3'; //
for (var i = 0; i < totalNames.length; i++) {
var totalProd = 0, totalUtil = 0; //Starts at 0 for each name in D
for (var j = 0; j < getNames.length; j++) {
if (totalNames[i][0] == getNames[j][0]) {
if (sheet.getRange(j + 3, 1).getBackgroundObject().asRgbColor().asHexString() == prodColor) { //if colors coincide
totalProd += sheet.getRange(j + 3, 2).getValue();
} else if (sheet.getRange(j + 3, 1).getBackgroundObject().asRgbColor().asHexString() == utilColor) {
totalUtil += sheet.getRange(j + 3, 2).getValue();
}
}
}
sheet.getRange(i+4, 5, 1 ,2).setValues([[totalProd, totalUtil]]);
}
}
Note: You will have to run the code manually and accept permissions the first time you run it. After that it will run automatically each time you open the Sheet. It might take a few seconds for the code to run and to reflect changes on the Sheet.
To better understand loops and 2D arrays, I recommend you to take a look at this.
References:
Range Class
Get Values
Get BackgroundObject
Set Values
You can learn more about Apps Script and Sheets by following the Quickstart.
Being fairly new to Stata, I'm having a difficulty figuring out how to do the following:
I have time-series data on selling price (p) and quantity sold (q) for 10 products in a single datafile (i,e., 20 variables, p01-p10 and q01-q10). I am strugling with appropriate stata command that computes sales revenue (pq) time-series for each of these 10 products (i.e., pq01-pq10).
Many thanks for your help.
forval i = 1/10 {
local j : display %02.0f `i'
gen pq`j' = p`j' * q`j'
}
A standard loop over 1/10 won't get you the leading zero in 01/09. For that we need to use an appropriate format. See also
#article {pr0051,
author = "Cox, N. J.",
title = "Stata tip 85: Looping over nonintegers",
journal = "Stata Journal",
publisher = "Stata Press",
address = "College Station, TX",
volume = "10",
number = "1",
year = "2010",
pages = "160-163(4)",
url = "http://www.stata-journal.com/article.html?article=pr0051"
}
(added later) Another way to do it is
local j = string(`i', "%02.0f")
That makes it a bit more explicit that you are mapping from numbers 1,...,10 to strings "01",...,"10".