Weighted Graph DijkstraShortestPath : getpath() does not return path with least cost - jung

Thanks for prompt response chessofnerd and Joshua.I am sorry for unclear logsand unclear question.Let me rephrase it.
Joshua:
I am storing my weights in DB and retrieving from DB in transformer.
I have 4 devices connected in my topology and between some devices there are multiple connections and between 2 devices only single connection as shown below.
I am using undirected weighted graph.
Initially all links are assigned weight of 0.When I request a path between D1 and D4 , I increase the weight of each link by 1.
When a second request comes for another path, I am feeding all the weights through Transformer.
When request comes second time, I am correctly feeding weight of 1 for links L1,L2,L3 and 0 for other links.
Since weight of (L4,L5,L3) or (L6,L7,L3) or (L8,L9,L3) is less than weight of (L1,L2,L3), I am expecting I will get one of these paths - (L4,L5,L3) or (L6,L7,L3) or (L8,L9,L3). But I am getting again (L1,L2,L3)
D1---L1-->D2---L2--->D3--L3--->D4
D1---L4-->D2---L5--->D3--L3--->D4
D1---L6-->D2---L7--->D3--L3--->D4
D1---L8-->D2---L9--->D3--L3---->D4
transformer simply returns the weight previosuly stored for link.
Graph topology = new UndirectedSparseMultigraph()
DijkstraShortestPath pathCalculator = new DijkstraShortestPath(topology, wtTransformer);
List path = pathCalculator.getPath(node1, node2);
private final Transformer wtTransformer = new Transformer() {
public Integer transform(Link link) {
int weight = getWeightForLink(link, true);
return weight;
}
}

You're creating DijkstraShortestPath so that it caches results. Add a "false" parameter to the constructor to change this behavior.
http://jung.sourceforge.net/doc/api/edu/uci/ics/jung/algorithms/shortestpath/DijkstraShortestPath.html
(And no, the cache does not get poisoned if you change an edge weight; if you do that, it's your responsibility to create a new DSP instance, or not use caching in the first place.)

Related

Dynamic targetHits in Vespa yql

I'm trying to create a Vespa query where I would like to set the rate limit of the targetHits. For example the query below has a constant number of 3 targetHits:
'yql': 'select id, title from sources * where \
([{"targetHits":3}]nearestNeighbor(embeddings_vector,query_embeddings_vector));'
Is there some way I can set this number dynamically in every call?
Also what is the difference between hits and targetHits? Does it have to do with the minimum and desired requirement?
Thanks a lot.
I'm not sure what you mean by rate limit of targetHits but generally
The targetHits is per content node if you run Vespa with multi-node cluster
The targetHits is the number of hits you want to expose to the ranking profile's first phase ranking function
hits only controls how many to return in the SERP response. It's perfectly valid to ask for targethits 500 per content node for ranking and finally just the global best 10 (according to your ranking profile)
Is there some way I can set this number dynamically in every call?
You can modify the SQL you send of course, but a better way to do this is often to create a Searcher as part of your application that modifies it programmatically, e.g:
public class TargetHitsSearcher extends Searcher {
#Override
public Result search(Query query, Execution execution) {
Item root = query.getModel().getQueryTree().getRoot();
if (root instanceof NearestNeighborItem) { // In general; search recursively
int target = query.properties().getInteger("myTarget");
((NearestNeighborItem)root).setTargetNumHits(target);
}
return execution.search(query);
}
}

Forecasting.ForecastBySsa with Multiple variables as input

I've got this code to predict a time series. I want to have a prediction based upon a time series of prices and a correlated indicator.
So together with the value to forecast, I want to pass a side value but I cannot understand if this is taken into account because prediction doesn't change with or without it. In which way do I need to tell to the algorithm how to consider these parameters?
public static TimeSeriesForecast PerformTimeSeriesProductForecasting(List<TimeSeriesData> listToForecast)
{
var mlContext = new MLContext(seed: 1); //Seed set to any number so you have a deterministic environment
var productModelPath = $"product_month_timeSeriesSSA.zip";
if (File.Exists(productModelPath))
{
File.Delete(productModelPath);
}
IDataView productDataView = mlContext.Data.LoadFromEnumerable<TimeSeriesData>(listToForecast);
var singleProductDataSeries = mlContext.Data.CreateEnumerable<TimeSeriesData>(productDataView, false).OrderBy(p => p.Date);
TimeSeriesData lastMonthProductData = singleProductDataSeries.Last();
const int numSeriesDataPoints = 2500; //The underlying data has a total of 34 months worth of data for each product
// Create and add the forecast estimator to the pipeline.
IEstimator<ITransformer> forecastEstimator = mlContext.Forecasting.ForecastBySsa(
outputColumnName: nameof(TimeSeriesForecast.NextClose),
inputColumnName: nameof(TimeSeriesData.Close), // This is the column being forecasted.
windowSize: 22, // Window size is set to the time period represented in the product data cycle; our product cycle is based on 12 months, so this is set to a factor of 12, e.g. 3.
seriesLength: numSeriesDataPoints, // This parameter specifies the number of data points that are used when performing a forecast.
trainSize: numSeriesDataPoints, // This parameter specifies the total number of data points in the input time series, starting from the beginning.
horizon: 5, // Indicates the number of values to forecast; 2 indicates that the next 2 months of product units will be forecasted.
confidenceLevel: 0.98f, // Indicates the likelihood the real observed value will fall within the specified interval bounds.
confidenceLowerBoundColumn: nameof(TimeSeriesForecast.ConfidenceLowerBound), //This is the name of the column that will be used to store the lower interval bound for each forecasted value.
confidenceUpperBoundColumn: nameof(TimeSeriesForecast.ConfidenceUpperBound)); //This is the name of the column that will be used to store the upper interval bound for each forecasted value.
// Fit the forecasting model to the specified product's data series.
ITransformer forecastTransformer = forecastEstimator.Fit(productDataView);
// Create the forecast engine used for creating predictions.
TimeSeriesPredictionEngine<TimeSeriesData, TimeSeriesForecast> forecastEngine = forecastTransformer.CreateTimeSeriesEngine<TimeSeriesData, TimeSeriesForecast>(mlContext);
// Save the forecasting model so that it can be loaded within an end-user app.
forecastEngine.CheckPoint(mlContext, productModelPath);
ITransformer forecaster;
using (var file = File.OpenRead(productModelPath))
{
forecaster = mlContext.Model.Load(file, out DataViewSchema schema);
}
// We must create a new prediction engine from the persisted model.
TimeSeriesPredictionEngine<TimeSeriesData, TimeSeriesForecast> forecastEngine2 = forecaster.CreateTimeSeriesEngine<TimeSeriesData, TimeSeriesForecast>(mlContext);
// Get the prediction; this will include the forecasted product units sold for the next 2 months since this the time period specified in the `horizon` parameter when the forecast estimator was originally created.
prediction = forecastEngine.Predict();
return prediction;
}
TimeSeriesData has multiple attributes, not only the value of the series that I ant to forecast. Just wonder if they are taken into account when forecasting o not.
Is there a better method to forecast this type of series like LMST? Is this method available in ML.NET?
There is a new ticket for enhancement: Multivariate Time based series forecasting to ML.Net
See ticket: github.com/dotnet/machinelearning/issues/5638

How can i find all the path that less than a maximum in Neo4j DB?

Everyone. I am a new one to use Neo4j DataBase.
Now, I have a graph which contains nodes and relationships, I want to get all the path that from A to other nodes which the total cost is less than a maximum.
The maximum should be changed.
I use Java to query Neo4j. I know Evaluator cl ones can depend when we stop traversal the path. But i can give my Maximum to The interface evaluate()
My code is here:
public class MyEvaluators implements Evaluator {
#Override
public Evaluation evaluate(Path path) {
// TODO Auto-generated method stub
Iterable<Relationship> rels = path.relationships();
double totalCost = 0.0;
for(Relationship rel: rels){
totalCost += (double) rel.getProperty("cost");
}
return totalCost > MAXIMUM ? Evaluation.EXCLUDE_AND_PRUNE:Evaluation.INCLUDE_AND_CONTINUE;
}}
And I don't want to limit the path depth.
So how can I do this query quickly?
Which version are you looking at?
https://neo4j.com/docs/java-reference/current/tutorial-traversal/
In the current API you can pass a context object (branch-state) to the traversal, that keeps your current state per branch. So you can accumulate the total cost in the PathEvaluator:
https://neo4j.com/docs/java-reference/3.4/javadocs/org/neo4j/graphdb/traversal/PathEvaluator.html
Also perhaps you want to derive from the Dijkstra Evaluator.
https://github.com/neo4j/neo4j/blob/3.5/community/graph-algo/src/main/java/org/neo4j/graphalgo/impl/path/Dijkstra.java

InfluxDB design issue

I am using influxDB and using line protocol to insert large set of data into Data base. Data i am getting is in the form of Key value pair, where key is long string contains Hierarchical data and value is simple integer value.
Sample Key Value data :
/path/units/unit/subunits/subunit[name\='NAME1']/memory/chip/application/filter/allocations
value = 500
/path/units/unit/subunits/subunit[name\='NAME2']/memory/chip/application/filter/allocations
value = 100
(Note Name = 2)
/path/units/unit/subunits/subunit[name\='NAME1']/memory/chip/application/filter/free
value = 700
(Note Instead of allocation it is free at the leaf)
/path/units/unit/subunits/subunit[name\='NAME2']/memory/graphics/application/filter/swap
value = 600
Note Instead of chip, graphics is in path)
/path/units/unit/subunits/subunit[name\='NAME2']/harddisk/data/size
value = 400
Note Different path but till subunit it is same
/path/units/unit/subunits/subunit[name\='NAME2']/harddisk/data/free
value=100
Note Same path but last element is different
Below is the line protocol i am using to insert data.
interface, Key= /path/units/unit/subunits/subunit[name\='NAME2']/harddisk/data/free, valueData= 500
I am Using one measurement namely, Interface. And one tag and one field set. But this DB design is causing issue for querying data.
How can I design database so that i can query like, Get all record for subunit where name = Name1 or get all size data for every hard disk.
Thanks in advance.
The Schema I'd recommend would be the following:
interface,filename=/path/units/unit/subunits/subunit[name\='NAME2']/harddisk/data/free value=500
Where filename is a tag and value is the field.
Given that the cardinality of filename in the thousands this schema should work well.

How to reduce Azure Table Storage latency?

I have a rather huge (30 mln rows, up to 5–100Kb each) Table on Azure.
Each RowKey is a Guid and PartitionKey is a first Guid part, for example:
PartitionKey = "1bbe3d4b"
RowKey = "1bbe3d4b-2230-4b4f-8f5f-fe5fe1d4d006"
Table has 600 reads and 600 writes (updates) per second with an average latency of 60ms. All queries use both PartitionKey and RowKey.
BUT, some reads take up to 3000ms (!). In average, >1% of all reads take more than 500ms and there's no correlation with entity size (100Kb row may be returned in 25ms and 10Kb one – in 1500ms).
My application is an ASP.Net MVC 4 web-site running on 4-5 Large instances.
I have read all MSDN articles regarding Azure Table Storage performance goals and already did the following:
UseNagle is turned Off
Expect100Continue is also disabled
MaxConnections for table client is set to 250 (setting 1000–5000 doesn't make any sense)
Also I checked that:
Storage account monitoring counters have no throttling errors
There are some kind of "waves" in performance, though they does not depend on load
What could be the reason of such performance issues and how to improve it?
I use the MergeOption.NoTracking setting on the DataServiceContext.MergeOption property for extra performance if I have no intention of updating the entity anytime soon. Here is an example:
var account = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("DataConnectionString"));
var tableStorageServiceContext = new AzureTableStorageServiceContext(account.TableEndpoint.ToString(), account.Credentials);
tableStorageServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
tableStorageServiceContext.MergeOption = MergeOption.NoTracking;
tableStorageServiceContext.AddObject(AzureTableStorageServiceContext.CloudLogEntityName, newItem);
tableStorageServiceContext.SaveChangesWithRetries();
Another problem might be that you are retrieving the entire enity with all its properties even though you intend only use one or two properties - this is of course wasteful but can't be easily avoided. However, If you use Slazure then you can use query projections to only retrieve the entity properties that you are interested in from the table storage and nothing more, which would give you better query performance. Here is an example:
using SysSurge.Slazure;
using SysSurge.Slazure.Linq;
using SysSurge.Slazure.Linq.QueryParser;
namespace TableOperations
{
public class MemberInfo
{
public string GetRichMembers()
{
// Get a reference to the table storage
dynamic storage = new QueryableStorage<DynEntity>("UseDevelopmentStorage=true");
// Build table query and make sure it only return members that earn more than $60k/yr
// by using a "Where" query filter, and make sure that only the "Name" and
// "Salary" entity properties are retrieved from the table storage to make the
// query quicker.
QueryableTable<DynEntity> membersTable = storage.WebsiteMembers;
var memberQuery = membersTable.Where("Salary > 60000").Select("new(Name, Salary)");
var result = "";
// Cast the query result to a dynamic so that we can get access its dynamic properties
foreach (dynamic member in memberQuery)
{
// Show some information about the member
result += "LINQ query result: Name=" + member.Name + ", Salary=" + member.Salary + "<br>";
}
return result;
}
}
}
Full disclosure: I coded Slazure.
You could also consider pagination if you are retrieving large data sets, example:
// Retrieve 50 members but also skip the first 50 members
var memberQuery = membersTable.Where("Salary > 60000").Take(50).Skip(50);
Typically, if a specific query requires scanning a large number of rows, that will take longer time. Is the behavior you are seeing specific a query / data? Or, are you seeing the performance varies for the same data and query?

Resources