How to get 20k tweets in a specific language - twitter

I need to get more than 20 tweets in a specific language, around 1000 but using the following code I am getting just 15-20 tweets. can anyone help me?
public static void main(String[] args) throws TwitterException {
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setJSONStoreEnabled(true)
.setDebugEnabled(true)
.setOAuthConsumerKey("*********")
.setOAuthConsumerSecret("*******")
.setOAuthAccessToken("********")
.setOAuthAccessTokenSecret("****");
TwitterFactory twitterFactory = new TwitterFactory(cb.build());
twitter4j.Twitter twitterClient = twitterFactory.getInstance();
final ResponseList<Status> homeTimelineStatuses = twitterClient.getHomeTimeline();
for (final Status status : homeTimelineStatuses) {
final String lang = status.getLang();
final String rawText = status.getText();
if (lang.equals("en")) {
System.out.println("^^ " + rawText);
} else {
System.out.println("not en");
}
}
}

If you read the documentation for home_timeline you will see that it says the number of tweet you can get...
Defaults to 20. The value of count is best thought of as a limit to the number of tweets to return because suspended or deleted content is removed after the count has been applied.
If you want more than 20 Tweets, you can use the count parameter.
Specifies the number of records to retrieve. Must be less than or equal to 200.
If you want to get more than 200 Tweets, you will need to understand Pagination. I'm unfamiliar with Twitter4J - but the documentation covers it in detail

Related

EMQX- Publish MQTT Topic with unique identifier is taking much more time than Static MQTT Topic

I was trying to publish messages on emqx broker on different topics.Scenario takes much time while publishing with dynamic topic with one client and if we put topic name as static it takes much less time.
Here I have posted result and code for the same.
I am using EMQX broker with Eclipse paho client Version 3 and Qos level 1.
Time for different topics with 100 simple publish message (Consider id as dynamic here):
Total time pattern 1: /config/{id}/outward::36 sec -----------------> HERE TOPIC is DYNAMIC. and {id} is a variable whose value is changing in loop as shown in below code
Total time pattern 2: /config/test::1.2 sec -----------------------> HERE TOPIC is STATIC
How shall I publish message with different id so topic creation wont take much time?
public class MwttPublish {
static IMqttClient instance= null;
public static IMqttClient getInstance() {
try {
if (instance == null) {
instance = new MqttClient(mqttHostUrl, "SimpleTestMQTT");
}
if (!instance.isConnected()) {
MqttConnectOptions options = new MqttConnectOptions();
options.setUserName("test");
options.setPassword("test".toCharArray());
options.setAutomaticReconnect(true);
options.setCleanSession(false);
options.setConnectionTimeout(10);
instance.connect(options);
}
} catch (final Exception e) {
System.out.println("Exception in mqtt: {}" + e.getMessage());
}
return instance;
}
public static void publishMessage() throws MqttException {
IMqttClient iMqttClient = getInstance();
MqttMessage mqttMessage = new MqttMessage("Hello".getBytes());
mqttMessage.setQos(1);
mqttMessage.setRetained(true);
System.out.println("Publish Start for pattern 1");
int i =0;
final BigDecimal mqttmsgPublishstartTime = new BigDecimal(System.currentTimeMillis());
do {
iMqttClient.publish("/config/" +i +"/outward", mqttMessage);
i++;
}while(i<100);
System.out.println("Total time pattern 1 /config/i/outward::" + (new BigDecimal(System.currentTimeMillis())).subtract(mqttmsgPublishstartTime));
System.out.println("Publish Start for pattern 2");
final BigDecimal mqttmsgPublishstartTime1 = new BigDecimal(System.currentTimeMillis());
i =0;
do {
iMqttClient.publish("/config/test", mqttMessage);
i++;
}while(i<100);
System.out.println("Total time pattern 2 /config/test::" + (new BigDecimal(System.currentTimeMillis())).subtract(mqttmsgPublishstartTime1));
}
}
This is not a valid test, you've fallen into many of the clasic micro benchmark traps e.g.
Way too small a sample size
No account for JVM JIT warm up or GC overhead
Not comparing like to like e.g. time taken to concatenate the strings for the topics
Please check out the following: https://stackoverflow.com/a/2844291/504554
Also from a MQTT point of view topics are ephemeral they only really "exist" for the instant a message is published while the broker checks for subscribed clients with a matching pattern.

Flink Count of Events using metric

I have a topic in kafka where i am getting multiple type of events in json format. I have created a filestreamsink to write these events to S3 with bucketing.
FlinkKafkaConsumer errorTopicConsumer = new FlinkKafkaConsumer(ERROR_KAFKA_TOPICS,
new SimpleStringSchema(),
properties);
final StreamingFileSink<Object> errorSink = StreamingFileSink
.forRowFormat(new Path(outputPath + "/error"), new SimpleStringEncoder<>("UTF-8"))
.withBucketAssigner(new EventTimeBucketAssignerJson())
.build();
env.addSource(errorTopicConsumer)
.name("error_source")
.setParallelism(1)
.addSink(errorSink)
.name("error_sink").setParallelism(1);
public class EventTimeBucketAssignerJson implements BucketAssigner<Object, String> {
#Override
public String getBucketId(Object record, Context context) {
StringBuffer partitionString = new StringBuffer();
Tuple3<String, Long, String> tuple3 = (Tuple3<String, Long, String>) record;
try {
partitionString.append("event_name=")
.append(tuple3.f0).append("/");
String timePartition = TimeUtils.getEventTimeDayPartition(tuple3.f1);
partitionString.append(timePartition);
} catch (Exception e) {
partitionString.append("year=").append(Constants.DEFAULT_YEAR).append("/")
.append("month=").append(Constants.DEFAULT_MONTH).append("/")
.append("day=").append(Constants.DEFAULT_DAY);
}
return partitionString.toString();
}
#Override
public SimpleVersionedSerializer<String> getSerializer() {
return SimpleVersionedStringSerializer.INSTANCE;
}
}
Now i want to publish hourly count of each event as metrics to prometheus and publish a grafana dashboard over that.
So please help me how can i achieve hourly count for each event using flink metrics and publish to prometheus.
Thanks
Normally, this is done by simply creating a counter for requests and then using the rate() function in Prometheus, this will give you the rate of requests in the given time.
If You, however, want to do this on Your own for some reason, then You can do something similar to what has been done in org.apache.kafka.common.metrics.stats.Rate. So You would, in this case, need to gather list of samples with the time at which they were collected, along with the window size You want to use for calculation of the rate, then You could simply do the calculation, i.e. remove samples that went out of scope and has expired and then simply calculate how many samples are in the window.
You could then set the Gauge to the calculated value.

Can I in a #RabbitConsumer find out if any messages are prefetched for this consumer

I need to know if there are more messages comming for this consumer.
Right now I count the messages on the queue. But that give me only what is left on the queue and not what has been prefetched.
#RabbitListener(queues = QUEUENAME)
public void recieve(Message message, Channel channel) throws IOException {
long messagesOnQueue = channel.messageCount(QUEUENAME);
if(messagesOnQueue>1) {
//add message to list
}
else {
//save the list
}
}
It would be really great If there was a way to tell if messages was prefetched for this consumer. Is that possible? If I can get that count then I dont care if there are messages on the queue as well.
After recieving suggestions from Gary I have changed the implementation to this, and it works.
When manually acknowledging a message it has to be done on the same channel as you get the message. But you can save a reference to it in case you need it in another thread.
In your spring boot application.yml add this
spring:
rabbitmq:
listener:
direct:
prefetch: 200
simple:
prefetch: 200
acknowledgeMode: MANUAL
Code from the consumer.
//The list we build and save in one transaction
private Set<PayloadDto> unhandledPayloads = new HashSet<>();
private long latestTag = 0L;
private Channel latestChannel;
#RabbitListener(queues = QUEUE_NAME, id = "consumerId")
public void recieve(Message message, Channel channel) throws IOException {
PayloadDto payloadDto = parse(message.getBody());
unhandledPayloads.add(payloadDto);
latestTag = message.getMessageProperties().getDeliveryTag();
latestChannel = channel;
if (unhandledPayloads.size() > UNHANDLED_PAYLOADS_LIMIT) {
service.createOrUpdate(unhandledPayloads);
queue.clear();
channel.basicAck(latestTag, true);
}
}
#EventListener(condition = "event.listenerId == 'consumerId'")
public void onApplicationEvent(ListenerContainerIdleEvent event) {
if(!queue.isEmpty()) {
service.createOrUpdate(unhandledPayloads);
queue.clear();
latestChannel.basicAck(latestTag, true);
}
}
The reason we are trying to build up a list before saving it is to be able to do batch insert to make it run faster.
Not currently, but it wouldn't be hard to add a feature. Open a github issue to request it. However, I am not sure how useful it would be. If there are still messages in the queue, consuming a prefetched will fetch another.

Apache Flink: Stream Join Window is not triggered

I'm trying to join two streams in apache flink to get some results.
The current state of my project is, that I am fetching twitter data and map it into a 2-tuple, where the language of the user and the sum of tweets in a defined time window get saved.
I do these both for the number of tweets per language and retweets per language. The tweet/retweet aggregation works fine in other processes.
I now want to get the percentage of the number of retweets to the number of all tweets in a time window.
Therefore I use the following code:
Time windowSize = Time.seconds(15);
// Sum up tweets per language
DataStream<Tuple2<String, Integer>> tweetsLangSum = tweets
.flatMap(new TweetLangFlatMap())
.keyBy(0)
.timeWindow(windowSize)
.sum(1);
// ---
// Get retweets out of all tweets per language
DataStream<Tuple2<String, Integer>> retweetsLangMap = tweets
.keyBy(new KeyByTweetPostId())
.flatMap(new RetweetLangFlatMap());
// Sum up retweets per language
DataStream<Tuple2<String, Integer>> retweetsLangSum = retweetsLangMap
.keyBy(0)
.timeWindow(windowSize)
.sum(1);
// ---
tweetsLangSum.join(retweetsLangSum)
.where(new KeySelector<Tuple2<String, Integer>, String>() {
#Override
public String getKey(Tuple2<String, Integer> tweet) throws Exception {
return tweet.f0;
}
})
.equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
#Override
public String getKey(Tuple2<String, Integer> tweet) throws Exception {
return tweet.f0;
}
})
.window(TumblingEventTimeWindows.of(windowSize))
.apply(new JoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple4<String, Integer, Integer, Double>>() {
#Override
public Tuple4<String, Integer, Integer, Double> join(Tuple2<String, Integer> in1, Tuple2<String, Integer> in2) throws Exception {
String lang = in1.f0;
Double percentage = (double) in1.f1 / in2.f1;
return new Tuple4<>(in1.f0, in1.f1, in2.f1, percentage);
}
})
.print();
When I print tweetsLangSum or retweetsLangSum the output seems to be fine. My problem is that I never get an output from the join. Does anyone have an idea why? Or am I using the window function in the first step of aggregation wrong when it comes to the join?
This might be caused by a mix of different time semantics. The KeyedStream.timeWindow() method is a shortcut that creates a window operator based on the configured time characteristics, i.e., an event-time window if event-time is enabled or a processing-time window otherwise. For the join, you explicitly define an event-time window.
Did you enable event-time processing?
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

twitter4j search for users with a particular account creation date

I want to search for all users on twitter wit a particular account creation date. So does Twitter api allows that? And if yes, how do I do so using twitter4j?
I am not sure whether it is possible or not. However you might use following things.
For example, assume that you are searching users who has been created in 2010-03-05 (yyyy-MM-dd)
You can create a query like below by giving since and until date points like 1 day before to since point and 1 day later to until point. Because there is no explicit field to set exact date, I came up with this idea. Then, you should call SearchUsers() method with paging mechanism. You can explore how to implement pagination in this, I don't remember how it was, but you can easily find a sample on the Internet. Try the following codes, I hope it will work:
try
{
Query query = new Query();
query.setSince("2010-03-04");
query.setUntil("2010-03-06");
ResponseList<User> userList = twitterObj.searchUsers(query.toString(), -1);
for (User userItem : userList)
{
// Then here you can do whatever you want by using userItem object
}
}
catch (TwitterException ex)
{
// Do necessary error handling mechanism here
}
public void tweetSearch(String queryRequest) throws IOException, TwitterException{
// Create configuration builder and set key, token etc
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true);
cb.setOAuthConsumerKey("xxxxx");
// type your provided Consumer key and consumer secret key from twitter
//and leave access token access secret token as blank
cb.setOAuthConsumerSecret("xxxx");
cb.setOAuthAccessToken("xxxx");
cb.setOAuthAccessTokenSecret("xxxx");
// Create Twitter instance
Twitter Twitter = new TwitterFactory(cb.build()).getInstance();
// Create file writer and buffer writer
FileWriter fstream = new FileWriter("Twitterstream.txt",true);
BufferedWriter out = new BufferedWriter(fstream);
// Create Query object and set search string
Query query = new Query("");
//change the date as u wish...
query.setSince("2014-06-12");
query.setUntil("2014-06-14");
query.setQuery(queryRequest);
// Get query result
QueryResult qr = Twitter.search(query);
// Get tweets and write in the file
while(qr.hasNext()){
qr.nextQuery();
List<Status> tweets = qr.getTweets();
for (Status t: tweets){
System.out.println(t.getId() + " - " + t.getCreatedAt() + ": " + t.getText());
out.write("\n"+t.getId()+",");
out.write("\t"+t.getText()+",");
out.write("\t"+t.getUser()+",");
}
}
System.out.println("Generated Twitter Stream");
}

Resources