How to setup Service accounts in Dataflow jobs - google-cloud-dataflow

Need to setup service account in Dataflow program which pull messages from subscribers and do the transformation on data and finally store in BigQuery table.
Approach:
GoogleCredential credentials = GoogleCredential.fromStream(new FileInputStream("credentials.json")).createScoped(Collections.singleton(??));
Couldn't find the correct scope. Appreciate your help with code and invoking Dataflow job using credential setup.

Here is the code to trigger dataflow job from Java API to GCP platform.
Scope code:
final List<String> SCOPES = Arrays.asList(
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/devstorage.full_control",
"https://www.googleapis.com/auth/userinfo.email",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/pubsub");
DataflowpipelineOptions code:
options.setGcpCredential(ServiceAccountCredentials.fromStream(
new FileInputStream("abc.json")).createScoped(SCOPES));

Not sure where you're trying to pull data from, but here's a list of available scopes for Google products https://developers.google.com/identity/protocols/googlescopes

Related

Google Dataflow API Filter by Job Name

Is there a way to filter Dataflow jobs by Job Name with REST API ? I am looking for a way to get list of job details filter by job name . Currently , I am able to do it through cloud dataflow console, but not from dataflow rest api.
GET /v1b3/projects/{projectId}/jobs
The filter that is performed within the Dataflow console is not part of the API (It seems that the Dataflow API is requested to get the jobs but the frontend layer is the one that performs the filtering function).
Therefore, you could replicate this by following the same steps:
1- To list all jobs across all regions, use projects.jobs.aggregated (GET/v1b3/projects/{projectId}/jobs: aggregated). Additionally, this method allows you to pre-filter jobs for a specified job state.
Projects.jobs.list (GET/v1b3/projects/{projectId}/ jobs) is not recommended, as you can only get the list of jobs that are running in us-central1.
2- Both methods mentioned above, return a JSON ListJobsResponse object, this object contains a list of Jobs. Therefore, you can iterate this list in some programming language like Python and filter the jobs by a regex over the job name to get the desired jobs:
import json
import re
desired_name = 'REGEX_STRING'
filtered_jobs = list()
with open('ListJobsResponse.json') as json_file:
response_dict = json.load(json_file)
jobs = response_dict['jobs']
for j in jobs:
x = re.search(desired_name, j['name'])
if x:
filtered_jobs.append(j)
print(filtered_jobs)
You can use filter the Dataflow jobs by name while using below API.
API Reference: https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/aggregated
API Methods:
GET https://dataflow.googleapis.com/v1b3/projects/{projectId}/jobs:aggregated
or
GET https://dataflow.googleapis.com/v1b3/projects/{projectId}/jobs
parameters (works for both APIs):
name: name of the dataflow job

C# MsGraph-SDK: Send a BatchRequest to get manager links using Microsoft Graph SDK

First of all please share if there is any MSGraph SDK official documentation anywhere that I can use for reference.
I have a scenario, where I want to query all manager and member links from AAD without providing the user and group objectID respectively. This is currently supported in DQ channel, i.e. I can do something like this using MsGraphSDK:
MsGraphClient.Users.Delta().Request().Select("manager")
OR
MsGraphClient.Groups.Delta().Request().Select("members")
I don't want to use DQ for initial-sync due to performance problems, and other issues.
My fallback option is to query through Graph directly, so I want to do something like the following, but this doesn't return any result:
MsGraphClient.Users.Request().Select("manager")
OR
MsGraphClient.Groups.Request().Select("members")
It looks like this isn't even supported currently at the lower (AADGraph) layer. Please correct me if I am wrong, and provide a solution if any!
So my fallback approach is to pull all the user and group aadObjectIds, and explicitly query the manager and member links respectively.
In my case, there can potentially be 500K User-Objects in AAD, and I want to avoid making 500K separate GetManager calls to AAD. Instead, I want to batch the Graph requests as much as possible.
I wasn't able to find much help from the Internet on sending Batch requests through SDK.
Here's what I am doing:
I have this BatchRequestContent:
var batchRequestContent = new BatchRequestContent();
foreach (string aadObjectId in aadObjectIds)
{
batchRequestContent.AddBatchRequestStep(new BatchRequestStep(aadObjectId, Client.Users[aadObjectId].Manager.Request().GetHttpRequestMessage()));
}
and I am trying to send a BatchRequest through GraphSDK with this content to get a BatchResponse. Is this currently supported in SDK? If yes, then what's the procedure? Any documentation or example? How to read the batch-response back? Finally, is there any limit for the # of requests in a batch?
Thanks,
Here is a related post: $expand=manager does not expand manager
$expand is currently not supported on the manager and directReports relationships in the v1.0 endpoint. It is support in the beta endpoint but
the API returns way to much throw away information: https://graph.microsoft.com/beta/users?$expand=manager
The client library partially supports Batch at this time although we have a couple of pull requests to provide better support
with the next release (PR 1 and 2).
To use batch with the current library and your authenticated client, you'll do something like this:
var authProv = MsGraphClient.AuthenticationProvider;
var httpClient = GraphClientFactory.Create(authProv);
// Send batch request with BatchRequestContent.
HttpResponseMessage response = await httpClient.PostAsync("https://graph.microsoft.com/v1.0/$batch", batchRequestContent);
// Handle http responses using BatchResponseContent.
BatchResponseContent batchResponseContent = new BatchResponseContent(response);

Not able to filter messages based on header properties in Azure Stream analytics

I have created an Azure Stream Analytics (ASA) job to filter data based on a custom header property i send from a client app.
How would i read/filter message header properties in Azure stream analytics?
The portal return no results when i try to test out my query. Below is my query in azure portal.
So far this is my query as simple as this:
SELECT
*
INTO
[mystorage]
FROM
[iothubin]
WHERE Properties.type = "type1"
I also tried to call out the key without its parent (such as: where type = "") with no results as well.
I am sure that i am sending messages with this custom property in the header since i can view it using device explorer tool.
any idea how to get this working?
I haven't tried this yet myself, but supposedly you can access custom properties via GetMetadataPropertyValue(). Give this a try:
https://msdn.microsoft.com/en-us/library/azure/mt793845.aspx
You can use the query described here as an example to query complex schemas.
If you share your schema, we can look at the query for you.
Let me know if it works for you.
Thanks,
JS

Slack: Retrieve all messages

I want to retrieve all the messages that were sent in my teams slack domain. Although, I'd prefer that the data be received in XML or JSON I am able to handle the data in just about any form.
How can I retrieve all these messages? Is it possible? If not, can I retrieve all the messages for a specific channel?
If you need to do this dynamically via API you can use the channels.list method to list all of the channels in your team and channels.history method to retrieve the history of each channel. Note that this will not include DMs or private groups.
If you need to do this as a one time thing, go to https://my.slack.com/services/export to export your team's message archives as series of JSON files
This Python script exports everything to JSON by a simple run:
https://gist.github.com/Chandler/fb7a070f52883849de35
It creates the directories for you and you have the option to exclude direct messages or channels.
All you need to install is the slacker module, which is simply pip install slacker. Then run it with --token='secret-token'. You need a legacy token, which is available here at the moment.
For anyone looking for Direct Message history downloads, this node based cli tool allows you to download messages from DMs and IMs in both JSON and CSV. I've used it, and it works very well.
With the new Conversations API this task is bit easier now. Here is a full overview:
Fetching messages from a channel
The new API method conversations.history will allow you to download messages from every type of conversation / channel (public, private, DM, Group DM) as long as your token has access to it.
This method also supports paging allowing you to download large amounts of messages.
Resolving IDs to names
Note that this method will return messages in a raw JSON format with IDs only, so you will need to call additional API method to resolve those IDs into plain text:
user ID: users.list
channel IDs: conversations.list
bot IDs: bots.info (there is no official bots.list method, but there is an unofficial one, which might help in some cases)
Fetching threads
In addition use conversations.replies to download threads in a conversation. Threads function a bit like conversations within a conversation and need to be downloaded separately.
Check out this page of the official documentation for more details on threading.
If anyone is still looking for a solution in 2021, and of course have no assistance from their workspace admins to export messages then obviously they can do the following.
Step 1: Get the api token from your UI cookie
Clone and install requirements and run SlackPirate
Open slack on a browser and copy the value of the cookie named d
Run python3 SlackPirate.py --cookie '<value of d cookie>'
Step 2: Dump the channel messages
Install slackchannel2pdf (Requires python)
slackchannel2pdf --token 'xoxb-1466...' --write-raw-data T0EKHQHK2/G015H62SR3M
Step 3: Dump the direct messages
Install slack-history-export (Requires node)
slack-history-export -t 'xoxs-1466...' -u '<correct username>' -f 'my_colleagues_chats.json'
I know that this might be late for the OP, but if anyone is looking for a tool capable of doing the full Slack Workspace export, try Slackdump it's free and open source (I'm the author, but anyone can contribute).
To do the workspace export, run it with -export switch:
./slackdump -export my_export.zip
If you need to download attachments as well, use the -f switch (stands for "files"):
./slackdump -f -export my_export.zip
It will open the browser asking you to login. If you need to do it headless, grab a token and cookie, as described in the documentation
It will generate the export file that would be compatible with another nice tool slack-export-viewer.
In order to retrieve all the messages from a particular channel in slack this can be done by using conversations.history method in slack_sdk library in python.
def get_conversation_history(self, channel_id, latest, oldest):
"""Method to fetch the conversation history of particular channel"""
try:
result = client.conversations_history(
channel=channel_id,
inclusive=True,
latest=latest,
oldest=oldest,
limit=100)
all_messages = []
all_messages += result["messages"]
ts_list = [item['ts'] for item in all_messages]
last_ts = ts_list[:-1]
while result['has_more']:
result = client.conversations_history(
channel=channel_id,
cursor=result['response_metadata']['next_cursor'],
latest=last_ts)
all_messages += result["messages"]
return all_messages
except SlackApiError as e:
logger.exception("Error while fetching the conversation history")
Here, i have provided latest and oldest timestamps to cover a time range when we need to collect the messages from the all messages in conversation history.
And the cusor argument is being used to point the next cursor value as this method can only collect 100 messages at one time but it supports pagination through which we can point the next cursor value from result['response_metadata']['next_cursor'].
Hope this will be helpful.
Here is another tool for exporting all messages from a channel.
The tool is called slackchannel2pdf and will export all messages from a public or private channel to a PDF document.
You only need a token with the required scopes and access.

How can I specify the jvm agent id when querying the metrics on the New Relic v1 REST API?

I am trying to get JVM metrics from my application, which runs three instances, with three separate JVMs. I can see the different data that I am interested in in the New Relic dashboard, on the Monitoring -> JVMs tab. I can also get the information I want for one of those JVMs, by hitting the REST API like so:
% curl -gH "x-api-key:KEY" 'https://api.newrelic.com/api/v1/applications/APPID/data.xml?metrics%5B%5D=GC%2FPS%20Scavenge&field=time_percentage&begin=T1&end=T2'
(I've replaced the values of some fields, but this is the full form of my request.)
I get a response including a long list of elements like this:
<metric name="GC/PS Scavenge" begin="T1" end="T2" app="MYAPP" agent_id="AGENTID">
<field name="time_percentage">0.018822634485032824</field>
</metric>
All of the metric elements include the same agent_id fields, and I never specified which agent to use. How can I either:
get metrics for all agents
specify which agent I am interested in (so I can send multiple requests, one for each JVM)
agent_id can be a particular JVM instance, and while you can't request for multiple agents at once you can request metrics for a single JVM.
You can get the JVM's agent_id in one of two ways:
1) an API call to
https://api.newrelic.com/api/v1/accounts/:account_id/applications/:app_id/instances.xml
2) browse to the JVM in the New Relic user interface (use the 'JVM' drop-down at the top right after you select your app), then grab the ID from the URL.
The ID will look something like [account_id]_i2043442
Some data is not available broken down by JVM, most notably a call to threshold_values.xml won't work if the agent_id isn't an application.
full documentation of the V1 API: http://newrelic.github.io/newrelic_api/

Resources