Unable to query Cosmos DB using numeric partition key - stored-procedures

I am trying to use a numeric field as a partition key but I am unable to run stored procedures on them. I am not sure if I am doing something wrong or if it is not possible.
I have two collections with two different partition keys.
A sample document from collection 1
{
"id":"1",
"group": "a"
}
A sample document from collection 2
{
"id":"1",
"group": 1
}
The difference is that the group field in second collection does not have double quotes around it.
Both the collections have group as the partition key and I am trying to execute the sample stored procedure provided in the Azure Portal for Cosmos DB.
While i am able to run the first collection successfully, the second collection produces an error of "no docs found".
I am importing the data from sql server database and the int fields are imported without double quotes. I searched quite a lot but could not find documentation relating to numeric partition key without double quotes.
Is it possible to do partition keys from numeric fields?? Any help appreciated as this field is the best fit for my collection for partitioning and would be really useful.

Please don't worry about the numeric partition key which is supported in the cosmos db stored procedure definitely. You did everything rightly.You met the unexpected results because azure portal identify the partition key input as String type even though you filled Number type.
You could test it with cosmos db sdk or rest api. For example, java sdk sample as below:
import com.microsoft.azure.documentdb.*;
public class ExecuteSPTest {
static private String YOUR_COSMOS_DB_ENDPOINT = "https://***.documents.azure.com:443/";
static private String YOUR_COSMOS_DB_MASTER_KEY = "***";
public static void main(String[] args) throws DocumentClientException {
DocumentClient client = new DocumentClient(
YOUR_COSMOS_DB_ENDPOINT,
YOUR_COSMOS_DB_MASTER_KEY,
new ConnectionPolicy(),
ConsistencyLevel.Session);
RequestOptions options = new RequestOptions();
PartitionKey partitionKey = new PartitionKey(1);
options.setPartitionKey(partitionKey);
StoredProcedureResponse queryResults = client.executeStoredProcedure(
"/dbs/db/colls/group/sprocs/sample",
options,
null);
String document = queryResults.getResponseAsString();
System.out.println(document);
}
}
Output:

Related

"Guid should contain 32 digits" serilog error with sql server sink

I am getting this error occasionally with the MSSQLServer sink. I can't see what's wrong with this guid. Any ideas? I've verified in every place I can find the data type of the source guid is "Guid" not a string. I'm just a bit mystified.
Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).Couldn't store <"7526f485-ec2d-4ec8-bd73-12a7d1c49a5d"> in UserId Column. Expected type is Guid.
The guid in this example is:
7526f485-ec2d-4ec8-bd73-12a7d1c49a5d
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
seems to match the template to me?
Further details:
This is an occasional issue, but when it arises it arises a lot. It seems to be tied to specific Guids. Most Guids are fine, but a small subset have this issue. Our app logs thousands of messages a day, but these messages are not logged (because of the issue) so it is difficult for me to track down exactly where the specific logs that are causing this error come from. However, we use a centralized logging method that is run something like this. This test passes for me, but it mirrors the setup and code we use for logging generally, which normally succeeds. As I said, this is an intermittent issue:
[Fact]
public void Foobar()
{
// arrange
var columnOptions = new ColumnOptions
{
AdditionalColumns = new Collection<SqlColumn>
{
new SqlColumn {DataType = SqlDbType.UniqueIdentifier, ColumnName = "UserId"},
},
};
columnOptions.Store.Remove(StandardColumn.MessageTemplate);
columnOptions.Store.Remove(StandardColumn.Properties);
columnOptions.Store.Remove(StandardColumn.LogEvent);
columnOptions.Properties.ExcludeAdditionalProperties = true;
var badGuid = new Guid("7526f485-ec2d-4ec8-bd73-12a7d1c49a5d");
var connectionString = "Server=(localdb)\\MSSQLLocalDB;Database=SomeDb;Trusted_Connection=True;MultipleActiveResultSets=true";
var logConfiguration = new LoggerConfiguration()
.MinimumLevel.Information()
.Enrich.FromLogContext()
.WriteTo.MSSqlServer(connectionString, "Logs",
restrictedToMinimumLevel: LogEventLevel.Information, autoCreateSqlTable: false,
columnOptions: columnOptions)
.WriteTo.Console(restrictedToMinimumLevel: LogEventLevel.Information);
Log.Logger = logConfiguration.CreateLogger();
// Suspect the issue is with this line
LogContext.PushProperty("UserId", badGuid);
// Best practice would be to do something like this:
// using (LogContext.PushProperty("UserId", badGuid)
// {
Log.Logger.Information(new FormatException("Foobar"),"This is a test");
// }
Log.CloseAndFlush();
}
One thing I have noticed since constructing this test code is that the "PushProperty" for the UserId property is not captured and disposed. Since behaviour is "undefined" in this case, I am inclined to fix it anyway and see if the problem goes away.
full stack:
2020-04-20T08:38:17.5145399Z Exception while emitting periodic batch from Serilog.Sinks.MSSqlServer.MSSqlServerSink: System.ArgumentException: Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).Couldn't store <"7526f485-ec2d-4ec8-bd73-12a7d1c49a5d"> in UserId Column. Expected type is Guid.
---> System.FormatException: Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
at System.Guid.GuidResult.SetFailure(Boolean overflow, String failureMessageID)
at System.Guid.TryParseExactD(ReadOnlySpan`1 guidString, GuidResult& result)
at System.Guid.TryParseGuid(ReadOnlySpan`1 guidString, GuidResult& result)
at System.Guid..ctor(String g)
at System.Data.Common.ObjectStorage.Set(Int32 recordNo, Object value)
at System.Data.DataColumn.set_Item(Int32 record, Object value)
--- End of inner exception stack trace ---
at System.Data.DataColumn.set_Item(Int32 record, Object value)
at System.Data.DataRow.set_Item(DataColumn column, Object value)
at Serilog.Sinks.MSSqlServer.MSSqlServerSink.FillDataTable(IEnumerable`1 events)
at Serilog.Sinks.MSSqlServer.MSSqlServerSink.EmitBatchAsync(IEnumerable`1 events)
at Serilog.Sinks.PeriodicBatching.PeriodicBatchingSink.OnTick()
RESOLUTION
This issue was caused because someone created a log message with a placeholder that had the same name as our custom data column, but was passing in a string version of a guid instead of one typed as a guid.
Very simple example:
var badGuid = "7526f485-ec2d-4ec8-bd73-12a7d1c49a5d";
var badGuidConverted = Guid.Parse(badGuid); // just proving the guid is actually valid.
var goodGuid = Guid.NewGuid();
using (LogContext.PushProperty("UserId",goodGuid))
{
Log.Logger.Information("This is a problem with my other user {userid} that will crash serilog. This message will never end up in the database.", badGuid);
}
The quick fix is to edit the message template to change the placeholder from {userid} to something else.
Since our code was centralized around the place where the PushProperty occurs, I put some checks in there to monitor for this and throw a more useful error message in the future when someone does this again.
I don't see anything obvious in the specific code above that would cause the issue. The fact that you call PushProperty before setting up Serilog would be something I would change (i.e. set up Serilog first, then call PushProperty) but that doesn't seem to be the root cause of the issue you're having.
My guess, is that you have some code paths that are logging the UserId as a string, instead of a Guid. Serilog is expecting a Guid value type, so if you give it a string representation of a Guid it won't work and will give you that type of exception.
Maybe somewhere in the codebase you're calling .ToString on the UserId before logging? Or perhaps using string interpolation e.g. Log.Information("User is {UserId}", $"{UserId}");?
For example:
var badGuid = "7526f485-ec2d- 4ec8-bd73-12a7d1c49a5d";
LogContext.PushProperty("UserId", badGuid);
Log.Information(new FormatException("Foobar"), "This is a test");
Or even just logging a message with the UserId property directly:
var badGuid = "7526f485-ec2d-4ec8-bd73-12a7d1c49a5d";
Log.Information("The {UserId} is doing work", badGuid);
Both snippets above would throw the same exception you're having, because they use string values rather than real Guid values.

Using MySQL as input source and writing into Google BigQuery

I have an Apache Beam task that reads from a MySQL source using JDBC and it's supposed to write the data as it is to a BigQuery table. No transformation is performed at this point, that will come later on, for the moment I just want the database output to be directly written into BigQuery.
This is the main method trying to perform this operation:
public static void main(String[] args) {
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
Pipeline p = Pipeline.create(options);
// Build the table schema for the output table.
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("phone").setType("STRING"));
fields.add(new TableFieldSchema().setName("url").setType("STRING"));
TableSchema schema = new TableSchema().setFields(fields);
p.apply(JdbcIO.<KV<String, String>>read()
.withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
"com.mysql.jdbc.Driver", "jdbc:mysql://host:3306/db_name")
.withUsername("user")
.withPassword("pass"))
.withQuery("SELECT phone_number, identity_profile_image FROM scraper_caller_identities LIMIT 100")
.withRowMapper(new JdbcIO.RowMapper<KV<String, String>>() {
public KV<String, String> mapRow(ResultSet resultSet) throws Exception {
return KV.of(resultSet.getString(1), resultSet.getString(2));
}
})
.apply(BigQueryIO.Write
.to(options.getOutput())
.withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)));
p.run();
}
But when I execute the template using maven, I get the following error:
Test.java:[184,6] cannot find symbol symbol: method
apply(com.google.cloud.dataflow.sdk.io.BigQueryIO.Write.Bound)
location: class org.apache.beam.sdk.io.jdbc.JdbcIO.Read<com.google.cloud.dataflow.sdk.values.KV<java.lang.String,java.lang.String>>
It seems that I'm not passing BigQueryIO.Write the expected data collection and that's what I am struggling with at the moment.
How can I make the data coming from MySQL meets BigQuery's expectations in this case?
I think that you need to provide a PCollection<TableRow> to BigQueryIO.Write instead of the PCollection<KV<String,String>> type that the RowMapper is outputting.
Also, please use the correct column name and value pairs when setting the TableRow.
Note: I think that your KVs are the phone and url values (e.g. {"555-555-1234": "http://www.url.com"}), not the column name and value pairs (e.g. {"phone": "555-555-1234", "url": "http://www.url.com"})
See the example here:
https://beam.apache.org/documentation/sdks/javadoc/0.5.0/
Would you please give this a try and let me know if it works for you? Hope this helps.

Dataflow output parameterized type to avro file

I have a pipeline that successfully outputs an Avro file as follows:
#DefaultCoder(AvroCoder.class)
class MyOutput_T_S {
T foo;
S bar;
Boolean baz;
public MyOutput_T_S() {}
}
#DefaultCoder(AvroCoder.class)
class T {
String id;
public T() {}
}
#DefaultCoder(AvroCoder.class)
class S {
String id;
public S() {}
}
...
PCollection<MyOutput_T_S> output = input.apply(myTransform);
output.apply(AvroIO.Write.to("/out").withSchema(MyOutput_T_S.class));
How can I reproduce this exact behavior except with a parameterized output MyOutput<T, S> (where T and S are both Avro code-able using reflection).
The main issue is that Avro reflection doesn't work for parameterized types. So based on these responses:
Setting Custom Coders & Handling Parameterized types
Using Avrocoder for Custom Types with Generics
1) I think I need to write a custom CoderFactory but, I am having difficulty figuring out exactly how this works (I'm having trouble finding examples). Oddly enough, a completely naive coder factory appears to let me run the pipeline and inspect proper output using DataflowAssert:
cr.RegisterCoder(MyOutput.class, new CoderFactory() {
#Override
public Coder<?> create(List<? excents Coder<?>> componentCoders) {
Schema schema = new Schema.Parser().parse("{\"type\":\"record\,"
+ "\"name\":\"MyOutput\","
+ "\"namespace\":\"mypackage"\","
+ "\"fields\":[]}"
return AvroCoder.of(MyOutput.class, schema);
}
#Override
public List<Object> getInstanceComponents(Object value) {
MyOutput<Object, Object> myOutput = (MyOutput<Object, Object>) value;
List components = new ArrayList();
return components;
}
While I can successfully assert against the output now, I expect this will not cut it for writing to a file. I haven't figured out how I'm supposed to use the provided componentCoders to generate the correct schema and if I try to just shove the schema of T or S into fields I get:
java.lang.IllegalArgumentException: Unable to get field id from class null
2) Assuming I figure out how to encode MyOutput. What do I pass to AvroIO.Write.withSchema? If I pass either MyOutput.class or the schema I get type mismatch errors.
I think there are two questions (correct me if I am wrong):
How do I enable the coder registry to provide coders for various parameterizations of MyOutput<T, S>?
How do I values of MyOutput<T, S> to a file using AvroIO.Write.
The first question is to be solved by registering a CoderFactory as in the linked question you found.
Your naive coder is probably allowing you to run the pipeline without issues because serialization is being optimized away. Certainly an Avro schema with no fields will result in those fields being dropped in a serialization+deserialization round trip.
But assuming you fill in the schema with the fields, your approach to CoderFactory#create looks right. I don't know the exact cause of the message java.lang.IllegalArgumentException: Unable to get field id from class null, but the call to AvroCoder.of(MyOutput.class, schema) should work, for an appropriately assembled schema. If there is an issue with this, more details (such as the rest of the stack track) would be helpful.
However, your override of CoderFactory#getInstanceComponents should return a list of values, one per type parameter of MyOutput. Like so:
#Override
public List<Object> getInstanceComponents(Object value) {
MyOutput<Object, Object> myOutput = (MyOutput<Object, Object>) value;
return ImmutableList.of(myOutput.foo, myOutput.bar);
}
The second question can be answered using some of the same support code as the first, but otherwise is independent. AvroIO.Write.withSchema always explicitly uses the provided schema. It does use AvroCoder under the hood, but this is actually an implementation detail. Providing a compatible schema is all that is necessary - such a schema will have to be composed for each value of T and S for which you want to output MyOutput<T, S>.

Dynamic table name when writing to BQ from dataflow pipelines

As a followup question to the following question and answer:
https://stackoverflow.com/questions/31156774/about-key-grouping-with-groupbykey
I'd like to confirm with google dataflow engineering team (#jkff) if the 3rd option proposed by Eugene is at all possible with google dataflow:
"have a ParDo that takes these keys and creates the BigQuery tables, and another ParDo that takes the data and streams writes to the tables"
My understanding is that ParDo/DoFn will process each element, how could we specify a table name (function of the keys passed in from side inputs) when writing out from processElement of a ParDo/DoFn?
Thanks.
Updated with a DoFn, which is not working obviously since c.element().value is not a pcollection.
PCollection<KV<String, Iterable<String>>> output = ...;
public class DynamicOutput2Fn extends DoFn<KV<String, Iterable<String>>, Integer> {
private final PCollectionView<List<String>> keysAsSideinputs;
public DynamicOutput2Fn(PCollectionView<List<String>> keysAsSideinputs) {
this.keysAsSideinputs = keysAsSideinputs;
}
#Override
public void processElement(ProcessContext c) {
List<String> keys = c.sideInput(keysAsSideinputs);
String key = c.element().getKey();
//the below is not working!!! How could we write the value out to a sink, be it gcs file or bq table???
c.element().getValue().apply(Pardo.of(new FormatLineFn()))
.apply(TextIO.Write.to(key));
c.output(1);
}
}
The BigQueryIO.Write transform does not support this. The closest thing you can do is to use per-window tables, and encode whatever information you need to select the table in the window objects by using a custom WindowFn.
If you don't want to do that, you can make BigQuery API calls directly from your DoFn. With this, you can set the table name to anything you want, as computed by your code. This could be looked up from a side input, or computed directly from the element the DoFn is currently processing. To avoid making too many small calls to BigQuery, you can batch up the requests using finishBundle();
You can see how the Dataflow runner does the streaming import here:
https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/util/BigQueryTableInserter.java

Is there an easy way to view the contents of a FormCollection?

When testing/debugging an ASP.NET MVC application, it's common to submit a form and then check all of the name/value pairs to make sure
All of the expected keys are present
All of the expected keys have the expected values
Debugging in Visual Studio is great for checking if a single variable (or even a simple object) contains the expected value(s), and as far as a FormCollection, it's pretty easy to check the presence of the keys. However, checking the key/value pairings in a FormCollection is a huge hassle. Is there a simple way to get Visual Studio to list the keys and their values side-by-side for a quick check?
Just a quick custom check
public void Edit(FormCollection team)
{
System.Text.StringBuilder st = new System.Text.StringBuilder();
foreach (string key in team.Keys)
{
st.AppendLine(String.Format("{0} - {1}", key, team.GetValue(key).AttemptedValue));
}
string formValues = st.ToString();
//Response.Write(st.ToString());
}
You can then place your mouse on formValues to check the key-value. Clicking the magnifier would reveal the Key-Values
Take a look at Glimpse, it is on nuGet. It exposes lots of information and is invaluable with AJAX and MVC development.
At its core Glimpse allows you to debug your web site or web service right in the browser. Glimpse allows you to "Glimpse" into what's going on in your web server. In other words what Firebug is to debugging your client side code, Glimpse is to debugging your server within the client.
Here is a method that outputs the collection to the Immediate window in a readable key/value pair format, one pair per line of output:
private void DebugWriteFormCollection(FormCollection collection)
{
foreach (string key in collection.Keys)
{
string value = collection.GetValue(key).AttemptedValue;
string output = string.Format("{0}: {1}", key, value);
System.Diagnostics.Debug.WriteLine(output);
}
}
Add that to your code and set a breakpoint. When you hit the breakpoint, call the method from the Immediate window:
DebugWriteFormCollection(collection);
Result:
Key1: Value1
Key2: Value2
Key3: Value3
etc.

Resources