I am a business administration student who is currently learning the basics in social media analytics for a research project. My aim at the moment is to track the use of a keyword in tweets. I downloaded RapidMiner and figured out how to search for keywords. However, is there any possibility to fugure out how often the keyword was used in a certain time frame? Can I filter the results so that, as an example, only tweets containing my keyword from December 2017 will be displayed?
Thank you very much for considering my question.
if you have your data extracted as a RapidMiner ExampleSet, you can use the Aggregate-Operator to count the different key words used. Or you can simply use the Filter Examples-Operator to only show the tweets containing the key word.
See process below for a simple example. Just copy&paste the xml into the process view of RapidMiner.
Also feel free to ask further, or re-post, questions in the RapidMiner community forum.
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_direct_mailing_data" compatibility="8.0.001" expanded="true" height="68" name="Generate Direct Mailing Data" width="90" x="45" y="34">
<description align="center" color="transparent" colored="false" width="126">Generic sample data.<br>We use the "sports" Attribute as key words</description>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="246" y="34"/>
<operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="340">
<list key="filters_list">
<parameter key="filters_entry_key" value="sports.equals.athletics"/>
</list>
<description align="center" color="yellow" colored="true" width="126">Alternatively we can filter for a specific sport and then count.</description>
</operator>
<operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="715" y="340">
<parameter key="use_default_aggregation" value="true"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="sports"/>
<parameter key="default_aggregation_function" value="count"/>
<list key="aggregation_attributes"/>
<description align="center" color="yellow" colored="true" width="126">Type your comment</description>
</operator>
<operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="sports"/>
<parameter key="default_aggregation_function" value="count"/>
<list key="aggregation_attributes">
<parameter key="sports" value="count"/>
</list>
<parameter key="group_by_attributes" value="sports"/>
<description align="center" color="green" colored="true" width="126">The "group by" and the "aggregation" attributes are both set to "sports"</description>
</operator>
<connect from_op="Generate Direct Mailing Data" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Aggregate (2)" to_port="example set input"/>
<connect from_op="Aggregate (2)" from_port="example set output" to_port="result 2"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
Related
Currently working on an Educational Data Mining Project. I got a very common problem to some of my data sets I cant search this problem anywhere. Whenever I run my the process it always states
'Only one Label', The learning scheme Logistic regression does not sufficient capabilities for handling an example set with only one label.There are existing special modelling operators if only examples for one class are known. They Support the 'one class label' capability.
I got some dataset with one label and it works very fine. I also tried editing the labels because I used Multi label. I can't understand the problem. Please help guys!. Below is my XML .
<?xml version="1.0" encoding="UTF-8"?>
<process version="9.7.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.7.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="9.7.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="D:\MyDocuments\CMUFiles\RESEARCH AND EXTENSION\SHs Performance NAT in Bukidnon\ExcelSubjectTemplate\Language-and-communication\finaldataAnalysis\Humss-Language-and-Communication.xlsx"/>
<parameter key="sheet_selection" value="sheet number"/>
<parameter key="sheet_number" value="1"/>
<parameter key="imported_cell_range" value="A1"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="date_format" value=""/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Name.true.polynominal.attribute"/>
<parameter key="1" value="OC-G11-Q1.true.integer.attribute"/>
<parameter key="2" value="OC-G11-Q2.true.integer.attribute"/>
<parameter key="3" value="F-G11-Q1.true.integer.attribute"/>
<parameter key="4" value="F-G11-Q2.true.integer.attribute"/>
<parameter key="5" value="RWS-G11-Q3.true.integer.attribute"/>
<parameter key="6" value="RWS-G11-Q4.true.integer.attribute"/>
<parameter key="7" value="F-G11-Q3.true.integer.attribute"/>
<parameter key="8" value="F-G11-Q4.true.integer.attribute"/>
<parameter key="9" value="CW-G12-Q1.true.integer.attribute"/>
<parameter key="10" value="CW-G12-Q2.true.integer.attribute"/>
<parameter key="11" value="LC-PS-NAT.true.real.attribute"/>
<parameter key="12" value="LC-PS-NAT-Rem.true.polynominal.attribute"/>
<parameter key="13" value="LC-IL-NAT.true.real.attribute"/>
<parameter key="14" value="LC-IL-NAT-Rem.true.polynominal.attribute"/>
<parameter key="15" value="LC-CT-NAT.true.real.attribute"/>
<parameter key="16" value="LC-CT-NAT-Rem.true.polynominal.attribute"/>
<parameter key="17" value="Total-MPS.true.real.attribute"/>
<parameter key="18" value="overall-remarks.true.polynominal.attribute"/>
<parameter key="19" value="T.true.polynominal.attribute"/>
<parameter key="20" value="U.true.polynominal.attribute"/>
<parameter key="21" value="V.true.polynominal.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="false"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="subprocess" compatibility="9.7.001" expanded="true" height="82" name="Subprocess" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="replace_missing_values" compatibility="9.7.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="45" y="34">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default" value="average"/>
<list key="columns"/>
</operator>
<operator activated="true" class="generate_id" compatibility="9.7.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34">
<parameter key="create_nominal_ids" value="true"/>
<parameter key="offset" value="0"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.7.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="CW-G12-Q1|CW-G12-Q2|F-G11-Q1|F-G11-Q2|F-G11-Q3|F-G11-Q4|OC-G11-Q1|OC-G11-Q2|overall-remarks|RWS-G11-Q3|RWS-G11-Q4"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="remove_useless_attributes" compatibility="9.7.001" expanded="true" height="82" name="Remove Useless Attributes" width="90" x="514" y="34">
<parameter key="numerical_min_deviation" value="0.0"/>
<parameter key="nominal_useless_above" value="1.0"/>
<parameter key="nominal_remove_id_like" value="false"/>
<parameter key="nominal_useless_below" value="0.0"/>
</operator>
<connect from_port="in 1" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Remove Useless Attributes" to_port="example set input"/>
<connect from_op="Remove Useless Attributes" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="9.7.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="id"/>
<parameter key="target_role" value="batch"/>
<list key="set_additional_roles">
<parameter key="overall-remarks" value="label"/>
</list>
</operator>
<operator activated="true" class="split_data" compatibility="9.7.001" expanded="true" height="103" name="Split Data" width="90" x="447" y="85">
<enumeration key="partitions">
<parameter key="ratio" value="0.7"/>
<parameter key="ratio" value="0.3"/>
</enumeration>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
<operator activated="true" class="optimize_selection_evolutionary" compatibility="9.7.001" expanded="true" height="145" name="Optimize Selection (Evolutionary)" width="90" x="581" y="34">
<parameter key="use_exact_number_of_attributes" value="false"/>
<parameter key="restrict_maximum" value="false"/>
<parameter key="min_number_of_attributes" value="1"/>
<parameter key="max_number_of_attributes" value="1"/>
<parameter key="exact_number_of_attributes" value="1"/>
<parameter key="initialize_with_input_weights" value="false"/>
<parameter key="population_size" value="5"/>
<parameter key="maximum_number_of_generations" value="30"/>
<parameter key="use_early_stopping" value="false"/>
<parameter key="generations_without_improval" value="2"/>
<parameter key="normalize_weights" value="true"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="user_result_individual_selection" value="false"/>
<parameter key="show_population_plotter" value="false"/>
<parameter key="plot_generations" value="10"/>
<parameter key="constraint_draw_range" value="false"/>
<parameter key="draw_dominated_points" value="true"/>
<parameter key="maximal_fitness" value="Infinity"/>
<parameter key="selection_scheme" value="tournament"/>
<parameter key="tournament_size" value="0.25"/>
<parameter key="start_temperature" value="1.0"/>
<parameter key="dynamic_selection_pressure" value="true"/>
<parameter key="keep_best_individual" value="false"/>
<parameter key="save_intermediate_weights" value="false"/>
<parameter key="intermediate_weights_generations" value="10"/>
<parameter key="p_initialize" value="0.5"/>
<parameter key="p_mutation" value="-1.0"/>
<parameter key="p_crossover" value="0.5"/>
<parameter key="crossover_type" value="uniform"/>
<process expanded="true">
<operator activated="true" class="time_series:multi_label_model_learner" compatibility="9.7.000" expanded="true" height="103" name="Multi Label Modeling" width="90" x="112" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="overall-remarks"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="add_macros" value="false"/>
<parameter key="current_label_name_macro" value="current_label_attribute"/>
<parameter key="current_label_type_macro" value="current_label_type"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="set_role" compatibility="9.7.001" expanded="true" height="82" name="Set Role (2)" width="90" x="112" y="34">
<parameter key="attribute_name" value="overall-remarks"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.7.001" expanded="true" height="145" name="Cross Validation" width="90" x="313" y="34">
<parameter key="split_on_batch_attribute" value="false"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_folds" value="10"/>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="polynomial_by_binomial_classification" compatibility="9.7.001" expanded="true" height="82" name="Polynominal by Binominal Classification" width="90" x="179" y="34">
<parameter key="classification_strategies" value="1 against all"/>
<parameter key="random_code_multiplicator" value="2.0"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<process expanded="true">
<operator activated="true" class="h2o:logistic_regression" compatibility="9.7.001" expanded="true" height="124" name="Logistic Regression" width="90" x="45" y="136">
<parameter key="solver" value="AUTO"/>
<parameter key="reproducible" value="false"/>
<parameter key="maximum_number_of_threads" value="4"/>
<parameter key="use_regularization" value="false"/>
<parameter key="lambda_search" value="false"/>
<parameter key="number_of_lambdas" value="0"/>
<parameter key="lambda_min_ratio" value="0.0"/>
<parameter key="early_stopping" value="true"/>
<parameter key="stopping_rounds" value="3"/>
<parameter key="stopping_tolerance" value="0.001"/>
<parameter key="standardize" value="true"/>
<parameter key="non-negative_coefficients" value="false"/>
<parameter key="add_intercept" value="true"/>
<parameter key="compute_p-values" value="true"/>
<parameter key="remove_collinear_columns" value="true"/>
<parameter key="missing_values_handling" value="MeanImputation"/>
<parameter key="max_iterations" value="0"/>
<parameter key="max_runtime_seconds" value="0"/>
</operator>
<connect from_port="training set" to_op="Logistic Regression" to_port="training set"/>
<connect from_op="Logistic Regression" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
</process>
</operator>
<connect from_port="training set" to_op="Polynominal by Binominal Classification" to_port="training set"/>
<connect from_op="Polynominal by Binominal Classification" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.7.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="9.7.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
<parameter key="main_criterion" value="first"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="false"/>
<parameter key="kappa" value="false"/>
<parameter key="weighted_mean_recall" value="false"/>
<parameter key="weighted_mean_precision" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="false"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_mean_squared_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="false"/>
<parameter key="squared_correlation" value="false"/>
<parameter key="cross-entropy" value="false"/>
<parameter key="margin" value="false"/>
<parameter key="soft_margin_loss" value="false"/>
<parameter key="logistic_loss" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="9.7.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="514" y="187">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<connect from_port="training set" to_op="Set Role (2)" to_port="example set input"/>
<connect from_port="input 1" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Apply Model (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="9.7.001" expanded="true" height="82" name="Apply Model (3)" width="90" x="246" y="136">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.7.001" expanded="true" height="82" name="Set Role (3)" width="90" x="380" y="34">
<parameter key="attribute_name" value="overall-remarks"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="prediction(overall-remarks)" value="prediction"/>
</list>
</operator>
<operator activated="true" class="performance_classification" compatibility="9.7.001" expanded="true" height="82" name="Performance (2)" width="90" x="514" y="34">
<parameter key="main_criterion" value="first"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="false"/>
<parameter key="kappa" value="false"/>
<parameter key="weighted_mean_recall" value="false"/>
<parameter key="weighted_mean_precision" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="false"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_mean_squared_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="false"/>
<parameter key="squared_correlation" value="false"/>
<parameter key="cross-entropy" value="false"/>
<parameter key="margin" value="false"/>
<parameter key="soft_margin_loss" value="false"/>
<parameter key="logistic_loss" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="example set" to_op="Multi Label Modeling" to_port="input 1"/>
<connect from_port="through 1" to_op="Multi Label Modeling" to_port="training set"/>
<connect from_port="through 2" to_op="Apply Model (3)" to_port="unlabelled data"/>
<connect from_op="Multi Label Modeling" from_port="model" to_op="Apply Model (3)" to_port="model"/>
<connect from_op="Apply Model (3)" from_port="labelled data" to_op="Set Role (3)" to_port="example set input"/>
<connect from_op="Set Role (3)" from_port="example set output" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="performance"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="source_through 2" spacing="0"/>
<portSpacing port="source_through 3" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Subprocess" to_port="in 1"/>
<connect from_op="Subprocess" from_port="out 1" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Optimize Selection (Evolutionary)" to_port="example set in"/>
<connect from_op="Set Role" from_port="original" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Optimize Selection (Evolutionary)" to_port="through 1"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Optimize Selection (Evolutionary)" to_port="through 2"/>
<connect from_op="Optimize Selection (Evolutionary)" from_port="example set out" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
from your process (without having access to the data) I guess the problem is that the data set you try to train the logistic regression for only has one label class (for example only TRUE and no FALSE). This can also happen, if you have an example set with very few examples and by chance only one class ends up in a training fold.
Regarding your process shown, I also wonder, why you are using the Multi Label Modeling when you only have one label column named ''overall-remarks''. In this case a normal classification strategy should work fine.
For more information and a detailed discussion about process design and general questions about RapidMiner, I also recommend to re-post your question in the RapidMiner community: https://community.rapidminer.com
I am working with the RapidMiner Windowing operator, in order to forecast the value of a company's revenue in the future.
The dataset contains a value per month, so that I used a window size of 12. However, I am not able to know which the values are going to be for 3 months in advance. I thought the "horizon" parameter was the one to choose how many time-units in advance to forecast, but this didn't work.
Dataset example:
date value
2016-01-01 5,0
2016-02-01 15,0
2016-03-01 10,0
2016-04-01 20,0
2016-05-01 15,0
2016-06-01 25,0
2016-07-01 20,0
2016-08-01 30,0
2016-09-01 25,0
2016-10-01 35,0
What should I do in order to forecast some values in the future?? Let's say values for 2016-11-01 and 2016-12-01 ??
As #awchisholm proposed, here it is the two windowing process. However, I don't know the parameters needed in order to forecast this future months' values.
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="7.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="D:\Users\iesnaola\Desktop\prueba.xlsx"/>
<parameter key="imported_cell_range" value="A1:B11"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.1.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
<parameter key="attribute_name" value="date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="82" name="Windowing for Training" width="90" x="313" y="34">
<parameter key="window_size" value="5"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="value"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="82" name="Windowing for Test (2)" width="90" x="313" y="136">
<parameter key="window_size" value="5"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Windowing for Training" to_port="example set input"/>
<connect from_op="Windowing for Training" from_port="example set output" to_port="result 1"/>
<connect from_op="Windowing for Training" from_port="original" to_op="Windowing for Test (2)" to_port="example set input"/>
<connect from_op="Windowing for Test (2)" from_port="example set output" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
It's difficult to know without your data, so here's a reproducible example process that shows windowing as well as the use of the horizon parameter. This works if the attribute to be used as a label is already a label.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="136">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="id"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="generate_copy" compatibility="7.0.001" expanded="true" height="82" name="Generate Copy" width="90" x="45" y="238">
<parameter key="attribute_name" value="id"/>
<parameter key="new_name" value="idcopy"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<parameter key="attribute_name" value="id"/>
<list key="set_additional_roles">
<parameter key="idcopy" value="label"/>
</list>
</operator>
<operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="82" name="Windowing" width="90" x="380" y="34">
<parameter key="window_size" value="5"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="idcopy"/>
<parameter key="horizon" value="5"/>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Generate Copy" to_port="example set input"/>
<connect from_op="Generate Copy" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_port="result 1"/>
<connect from_op="Windowing" from_port="original" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Hope that helps as a start.
I would like tokenize and apply stop word filter on Twitter comments contained in a database, but Process Document does nothing. What am I doing wrong?
My goal is to apply these filters but keep the comments in rows instead of a single word vector.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_database" compatibility="5.3.015" expanded="true" height="60" name="Server Connection (2)" width="90" x="45" y="30">
<parameter key="connection" value="sqlserver2014"/>
<parameter key="query" value="select top 60 tweetid,content from [Tweets General]"/>
<enumeration key="parameters"/>
</operator>
<operator activated="true" class="text:data_to_documents" compatibility="5.3.002" expanded="true" height="60" name="Data to Documents" width="90" x="246" y="30">
<parameter key="select_attributes_and_weights" value="true"/>
<list key="specify_weights"/>
</operator>
<operator activated="true" class="text:process_documents" compatibility="5.3.002" expanded="true" height="94" name="Process Documents" width="90" x="447" y="30">
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (3)" width="90" x="246" y="75"/>
<connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
<connect from_op="Tokenize (3)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Server Connection (2)" from_port="output" to_op="Data to Documents" to_port="example set"/>
<connect from_op="Data to Documents" from_port="documents" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
You need to convert any attributes of type nominal to be of type text before the Data to Documents operator. The operator Nominal to Text will do this. You also need to set the option select attributes and weights to false in Data to Documents because I think the setting you have will deselect everything.
I use Log4Net for logging. When the application starts, I call
log4net.Config.XmlConfigurator.Configure();
But this line takes 15 seconds to finish. Am I doing something wrong? Or is it normal?
I am developing with ASP.NET MVC, an use Unity for dependency injection.
At the application start, I call a Bootstrapper Initialise function
protected void Application_Start()
{
IUnityContainer container = Bootstrapper.Initialise();
...
...
}
In the Bootstrapper Initialize function, I register the type ILog.
private static IUnityContainer BuildUnityContainer()
{
var container = new UnityContainer();
...
...
container.RegisterType<ILog>("", new ContainerControlledLifetimeManager(),
new InjectionFactory(factory =>
LogManager.GetLogger(typeof(HomeController).Assembly, connectionString)));
...
...
}
At the beginning of GetLogger function I call the configure function
public static ILog GetLogger(Assembly assembly, string connectionString)
{
log4net.Config.XmlConfigurator.Configure(); //<----- it takes 15 seconds to finish
...
...
}
EDIT
---------------------------------------------------------------------------------
<log4net>
<appender name="AdoNetAppender" type="log4net.Appender.AdoNetAppender">
<bufferSize value="0" />
<connectionType value="System.Data.SqlClient.SqlConnection, System.Data, Version=1.0.3300.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />
<connectionString value="data source=[database server];initial catalog=[database name];integrated security=false;persist security info=True;User ID=[user];Password=[password]" />
<commandText value="INSERT INTO Log ([Date],[Thread],[Level],[Logger],[Message],[Exception],[UserId],[Operation],[EntityType],[EntityId],[IP],[Host],[SessionId],[LogGroup]) VALUES (#log_date, #thread, #log_level, #logger, #message, #exception, #UserId, #Operation, #EntityType, #EntityId, #IP, #Host, #SessionId, #LogGroup)" />
<parameter>
<parameterName value="#log_date" />
<dbType value="DateTime" />
<layout type="log4net.Layout.RawTimeStampLayout" />
</parameter>
<parameter>
<parameterName value="#thread" />
<dbType value="String" />
<size value="255" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%thread" />
</layout>
</parameter>
<parameter>
<parameterName value="#log_level" />
<dbType value="String" />
<size value="50" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%level" />
</layout>
</parameter>
<parameter>
<parameterName value="#logger" />
<dbType value="String" />
<size value="255" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%logger" />
</layout>
</parameter>
<parameter>
<parameterName value="#message" />
<dbType value="String" />
<size value="4000" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%message" />
</layout>
</parameter>
<parameter>
<parameterName value="#exception" />
<dbType value="String" />
<size value="2000" />
<layout type="log4net.Layout.ExceptionLayout" />
</parameter>
<parameter>
<parameterName value="#UserId"/>
<dbType value="Int32" />
<layout type="log4net.Layout.RawPropertyLayout">
<key value="UserId" />
</layout>
</parameter>
<parameter>
<parameterName value="#IP"/>
<dbType value="String" />
<size value="25" />
<layout type="log4net.Layout.RawPropertyLayout">
<key value="IP" />
</layout>
</parameter>
<parameter>
<parameterName value="#Host"/>
<dbType value="String" />
<size value="50" />
<layout type="log4net.Layout.RawPropertyLayout">
<key value="Host" />
</layout>
</parameter>
<parameter>
<parameterName value="#LogGroup"/>
<dbType value="Int32" />
<layout type="log4net.Layout.RawPropertyLayout">
<key value="LogGroup" />
</layout>
</parameter>
<parameter>
<parameterName value="#Operation"/>
<dbType value="Int32" />
<layout type="log4net.Layout.RawPropertyLayout">
<key value="Operation" />
</layout>
</parameter>
<parameter>
<parameterName value="#EntityType"/>
<dbType value="Int32" />
<layout type="log4net.Layout.RawPropertyLayout">
<key value="EntityType" />
</layout>
</parameter>
<parameter>
<parameterName value="#EntityId"/>
<dbType value="Int32" />
<layout type="log4net.Layout.RawPropertyLayout">
<key value="EntityId" />
</layout>
</parameter>
<parameter>
<parameterName value="#SessionId"/>
<dbType value="String" />
<size value="88" />
<layout type="log4net.Layout.RawPropertyLayout">
<key value="SessionId" />
</layout>
</parameter>
</appender>
<root>
<level value="ALL" />
<appender-ref ref="AdoNetAppender" />
</root>
</log4net>
15 Seconds sounds like a (connection) timeout, I believe the default timeout is 15 seconds.
I had a similar problem once and it turned out to be The CLR tried to verify the authenticode signature at load time to create publisher evidence for an assembly.
I am not sure about the details but there is a configuration element named "generatePublisherEvidence" in the assembly section where it can be turned off. You should check if you want to do this though. And what the implications for this are.
If you are using .Net 4 (or greater) this should have no impact on load time.
For web applications this setting cannot be set in the applications web.config. It should be set in the aspnet.config in the .Net framework directory.
When you try to call the following statement
log4net.Config.XmlConfigurator.Configure();
The system will try to validate the given database connection string. Here the problem is, your given connection string might be not connecting and it is trying to connect until it's given a time out.
Please verify is your given connection string is valid or not.
http://techxposer.com/2017/08/08/log4net-config-xmlconfigurator-configure-taking-too-much-time/
My Visual Studio Solution contains:
[DLL] Sol.DataAccess (NHibernate sessionManager)
[DLL] Sol.Core (Models and Repository)
[MVC] Sol.WebMvc (Controler, View)
All my application contains are (nhibernate.dll [v3.0] and log4net.dll[v1.2.10])
I have 3 configs:
web.config:
<configuration>
<configSections>
<section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net, Version=1.2.10.0, Culture=neutral, PublicKeyToken=1b44e1d426115821" requirePermission="false" />
<section name="hibernate-configuration" type="NHibernate.Cfg.ConfigurationSectionHandler, NHibernate"/>
</configSections>
</configuration>
nhibernate.config:
<hibernate-configuration xmlns="urn:nhibernate-configuration-2.2" >
<session-factory name="...">
<property name="connection.driver_class">NHibernate.Driver.SqlClientDriver</property>
<property name="connection.connection_string_name">...</property>
<property name="adonet.batch_size">10</property>
<property name="show_sql">true</property>
<property name="generate_statistics">true</property>
<property name="dialect">NHibernate.Dialect.MsSql2008Dialect</property>
<property name="use_outer_join">true</property>
<property name="max_fetch_depth">2</property>
<property name="command_timeout">60</property>
<property name="adonet.batch_size">25</property>
<property name="query.substitutions">true 1, false 0, yes 'Y', no 'N'</property>
<property name="proxyfactory.factory_class">NHibernate.ByteCode.Castle.ProxyFactoryFactory, NHibernate.ByteCode.Castle</property>
<property name="current_session_context_class">web</property>
<property name="cache.use_query_cache">true</property>
<property name="cache.provider_class">NHibernate.Caches.SysCache2.SysCacheProvider, NHibernate.Caches.SysCache2</property>
<mapping assembly="..."/>
</session-factory>
</hibernate-configuration>
and log4net.config:
<log4net>
<appender name="AdoNetAppender" type="log4net.Appender.AdoNetAppender">
<!--for release-->
<!--<bufferSize value="10" />-->
<!--for debug-->
<bufferSize value="1" />
<connectionType value="System.Data.SqlClient.SqlConnection, System.Data, Version=1.0.3300.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />
<connectionString value="Data Source=xxxxx; Initial Catalog=xxxx; User Id=xxxx; Password=xxxxx; App=xxxx" />
<commandText value="INSERT INTO Logs ([Application],[Host],[User],[Date],[Thread],[Level],[Operation],[Logger],[Message],[Exception]) VALUES (#app, #hostName, #userName, #log_date, #thread, #log_level, #operation, #logger, #message, #exception)" />
<parameter>
<parameterName value="#app" />
<dbType value="String" />
<size value="255" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="xxxx" />
</layout>
</parameter>
<parameter>
<parameterName value="#hostName" />
<dbType value="String" />
<size value="255" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%property{hostName}" />
</layout>
</parameter>
<parameter>
<parameterName value="#userName" />
<dbType value="String" />
<size value="255" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%property{userName}" />
</layout>
</parameter>
<parameter>
<parameterName value="#log_date" />
<dbType value="DateTime" />
<layout type="log4net.Layout.RawTimeStampLayout" />
</parameter>
<parameter>
<parameterName value="#thread" />
<dbType value="String" />
<size value="255" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%thread" />
</layout>
</parameter>
<parameter>
<parameterName value="#log_level" />
<dbType value="String" />
<size value="50" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%level" />
</layout>
</parameter>
<parameter>
<parameterName value="#operation" />
<dbType value="String" />
<size value="50" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%property{Operation}" />
</layout>
</parameter>
<parameter>
<parameterName value="#logger" />
<dbType value="String" />
<size value="255" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%logger" />
</layout>
</parameter>
<parameter>
<parameterName value="#message" />
<dbType value="String" />
<size value="4000" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%message" />
</layout>
</parameter>
<parameter>
<parameterName value="#exception" />
<dbType value="String" />
<size value="2000" />
<layout type="log4net.Layout.ExceptionLayout" />
</parameter>
</appender>
<appender name="FileAppender" type="log4net.Appender.FileAppender">
<file value="Logs/Logs.txt"/>
<appendToFile value="true"/>
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%date [%thread] %-5level %logger [%property{NDC}] - %message%newline"/>
</layout>
</appender>
<appender name="Console" type="log4net.Appender.AspNetTraceAppender">
<!--A1 uses PatternLayout-->
<layout type="log4net.Layout.PatternLayout">
<!--Print the date in ISO 8601 format-->
<conversionPattern value="%date [%thread] %-5level %logger %ndc - %message%newline"/>
</layout>
</appender>
<root>
<level value="ALL"/>
<appender-ref ref="Console"/>
<appender-ref ref="FileAppender"/>
<appender-ref ref="AdoNetAppender"/>
</root>
</log4net>
Global.cs:
protected void Application_Start()
{
...
// Configuration
#region log4net
// log4net.config
System.IO.FileInfo fi = new System.IO.FileInfo(Server.MapPath("~/log4net.config"));
if (fi != null && fi.Exists)
{
// Code that runs on application startup
log4net.Config.XmlConfigurator.Configure(fi);
}
// web.config
//log4net.Config.XmlConfigurator.Configure();
// set properti hostName
log4net.GlobalContext.Properties["hostName"] = Dns.GetHostName();
#endregion
#region NHibernate
//HibernatingRhinos.Profiler.Appender.NHibernate.NHibernateProfiler.Initialize();
var factory = NHibernateSessionManager.ConfigureFromFile(Server.MapPath("~/hibernate.config"));
#endregion
}
in my test controller i have:
public class TestController : BaseController
{
[NHibernateSession]
public ActionResult Index()
{
Logger.Error("fake error", new Exception());
}
}
In my log file - Logs/Logs.txt:
2011-03-11 18:19:23,097 [8] ERROR System.Web.Mvc.Controller [(null)] - fake error
System.Exception: Exception of type 'System.Exception' was thrown.
QUESTION:
Why log4net doesn't log NHibernate information (info, debug ....)
Aren't the versions of these dlls compatible?
I have create an empty ASP.Net MVC3 project. and loss much time trying to fix this issue.
And I find VS2010 bug. Visual Studio 2010 don't copy dll's in bin when you refer it in project.
I put log4net.dll manual in my bin folder and work fine. (Interesting thing is that Logger.Error("fake error") work fine without log4net.dll in bin folder ...)
First off: you speak of "log4net.config", but you don't include that anywhere. The way you configure it in web.config, you should actually include a section called <log4net> inside your web.config, not as a separate file.
If you don't want it in your web.config, you can remove the log4net related sections altogether and add the following line to yur global.asax.cs:
log4net.Config.XmlConfigurator.ConfigureAndWatch(
New FileInfo(Server.MapPath("~/yourreleativepath/log4net.config")))
Also, possibly you miss the priority setting, not entirely sure this helps, but give it a try:
<root>
<priority value="DEBUG"/>
<appender-ref ref="Console"/>
<appender-ref ref="FileAppender"/>
<appender-ref ref="AdoNetAppender"/>
</root>
Also, to finetune (because you'll get a lot messages), use something like this:
<logger name="NHibernate.SQL">
<level value="DEBUG"/>
</logger>
<logger name="NHibernate">
<level value="WARN"/>
</logger>
On nhibernate.info you'll find a full example.