Title: | An Information About Deputies and Votings in Polish Diet from Seventh to Eighth Term of Office |
---|---|
Description: | Set of functions that access information about deputies and votings in Polish diet from webpage <http://www.sejm.gov.pl>. The package was developed as a result of an internship in MI2 Group - <http://mi2.mini.pw.edu.pl>, Faculty of Mathematics and Information Science, Warsaw University of Technology. |
Authors: | Piotr Smuda [aut, cre], Przemyslaw Biecek [aut], Tomasz Mikolajczyk [ctb] |
Maintainer: | Piotr Smuda <[email protected]> |
License: | GPL-2 |
Version: | 1.3.4 |
Built: | 2024-11-12 04:41:52 UTC |
Source: | https://github.com/mi2-warsaw/sejmrp |
Function create_database
creates a database with four empty
tables: deputies, votings, votes, statements.
create_database(dbname, user, password, host)
create_database(dbname, user, password, host)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
Created tables: 1. deputies with columns: 1) id_deputy - deputy's id, 2) nr_term_of_office - Polish Diet's number of term of office, 3) surname_name - deputy's names and surnames, 2. votings with columns: 1) id_voting - voting's id, 2) nr_term_of_office - Polish Diet's number of term of office, 3) nr_meeting - meeting's number, 4) date_meeting - meeting's date, 5) nr_voting - voting's number, 6) topic_voting - voting's topic, 7) link_results - link with voting's results, 3. votes with columns: 1) id_vote - vote's id, 2) nr_term_of_office - Polish Diet's number of term of office, 3) id_deputy - deputy's id, 4) id_voting - voting's id, 5) vote - deputy's vote, one of: 'Za','Przeciw', 'Wstrzymal sie','Nieobecny', 6) club - deputy's club, 4. statements with columns: 1) id_statement - statement's id, like: (meeting's number).(voting's number).(statement's number), 2) nr_term_of_office - Polish Diet's number of term of office, 3) surname_name - author of statement, 4) date_statement - statement's date, 5) titles_order_points - title of order points, 6) statement - content of statement.
invisible NULL
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: create_database(dbname, user, password, host) ## End(Not run)
## Not run: create_database(dbname, user, password, host) ## End(Not run)
Function deputies_add_new
adds new deputies to a table with deputies.
deputies_add_new(dbname, user, password, host, type, id, nr_term_of_office = 8)
deputies_add_new(dbname, user, password, host, type, id, nr_term_of_office = 8)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
type |
type of deputies which be add to table with deputies: active, inactive |
id |
id of deputies from which we start add new deputies |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
Function deputies_add_new
adds new deputies to a table with deputies.
Also there is a choice between types of deputies, because on the page
of Polish diet deputies are splitted into active and inactive.
In addition id of the last added deputy in deputies table is needed.
invisible NULL
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: deputies_add_new(dbname, user, password, host, 'active', id) deputies_add_new(dbname, user, password, host, 'inactive', id) ## End(Not run)
## Not run: deputies_add_new(dbname, user, password, host, 'active', id) deputies_add_new(dbname, user, password, host, 'inactive', id) ## End(Not run)
Function deputies_create_table
creates a table with deputies.
deputies_create_table(dbname, user, password, host, nr_term_of_office = 8)
deputies_create_table(dbname, user, password, host, nr_term_of_office = 8)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
invisible NULL
Use only this function for first time, when the deputies table
is empty. Then use deputies_update_table
.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: deputies_create_table(dbname, user, password, host) ## End(Not run)
## Not run: deputies_create_table(dbname, user, password, host) ## End(Not run)
Function deputies_get_data
gets data about deputies.
deputies_get_data(type, nr_term_of_office = 8)
deputies_get_data(type, nr_term_of_office = 8)
type |
type of deputies which be add to table with deputies: active, inactive |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
Function deputies_get_data
gets deputies' ids and personal data like
name and surname. Also there is a choice between types of deputies, because
on the page of Polish diet deputies are splitted into active and inactive.
data frame with two columns: id_deputy, surname_name
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: deputies_get_data('active') deputies_get_data('inactive') ## End(Not run)
## Not run: deputies_get_data('active') deputies_get_data('inactive') ## End(Not run)
Function deputies_get_ids
gets deputies' ids from deputies table.
deputies_get_ids(dbname, user, password, host, nr_term_of_office = 8, windows = .Platform$OS.type == 'windows')
deputies_get_ids(dbname, user, password, host, nr_term_of_office = 8, windows = .Platform$OS.type == 'windows')
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
Function deputies_get_ids
gets deputies' ids from deputies table.
As result of this function you get named character vector with ids, where their
names are names and surnames of deputies. Because of encoding issue on Windows
operation system, you need to select if you use Windows.
named character vector
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: deputies_get_ids(dbname, user, password, host, TRUE) deputies_get_ids(dbname, user, password, host, FALSE) ## End(Not run)
## Not run: deputies_get_ids(dbname, user, password, host, TRUE) deputies_get_ids(dbname, user, password, host, FALSE) ## End(Not run)
Function deputies_update_table
updates a table with deputies.
deputies_update_table(dbname, user, password, host, nr_term_of_office = 8)
deputies_update_table(dbname, user, password, host, nr_term_of_office = 8)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
invisible NULL
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: deputies_update_table(dbname, user, password, host) ## End(Not run)
## Not run: deputies_update_table(dbname, user, password, host) ## End(Not run)
Function get_deputies_dendrogram
converts the distance matrix between deputies
into a dendrogram or ggplot of the dendrogram.
get_deputies_dendrogram(distances, plot = TRUE, method = "ward", k = NULL)
get_deputies_dendrogram(distances, plot = TRUE, method = "ward", k = NULL)
distances |
a distance matrix, preferably created with |
plot |
if |
method |
clustering method, see the |
k |
number of groups, will be passed to |
dendrogram or a ggplot
Przemyslaw Biecek
# votes <- get_filtered_votes(terms_of_office = c(7,7)) data(votes) v <- c(`Za` = 5, `Przeciw` = -5, `Wstrzymał się` = 2, `Nieobecny` = 0)/10 mat2 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")], weights = v) get_deputies_dendrogram(mat2, k=5)
# votes <- get_filtered_votes(terms_of_office = c(7,7)) data(votes) v <- c(`Za` = 5, `Przeciw` = -5, `Wstrzymał się` = 2, `Nieobecny` = 0)/10 mat2 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")], weights = v) get_deputies_dendrogram(mat2, k=5)
Function get_deputies_mds
converts the distance matrix between deputies
into a 2D representation (Multidimensional Scaling)
get_deputies_mds(distances, clubs = NULL, plot = TRUE, remove_missing_clubs = TRUE)
get_deputies_mds(distances, clubs = NULL, plot = TRUE, remove_missing_clubs = TRUE)
distances |
a distance matrix, preferably created with |
clubs |
a data.frame that maps |
plot |
if |
remove_missing_clubs |
if TRUE then rows of |
MDS coordinates or a ggplot
Przemyslaw Biecek
# votes <- get_filtered_votes(terms_of_office = c(7,7)) library(dplyr) library(ggplot2) data(votes) v <- c(`Za` = 5, `Przeciw` = -5, `Wstrzymał się` = 2, `Nieobecny` = 0)/10 mat2 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")], weights = v) df <- votes[,c("surname_name","club")] df %>% group_by(surname_name, club) %>% summarise(n = n()) %>% arrange(-n) %>% group_by(surname_name) %>% top_n(1) %>% as.data.frame() -> clubs row.names(clubs) <- clubs[,1] clubs$club[clubs$club == "niez."] = "cross-bencher" get_deputies_mds(mat2, clubs) # without cross bencher deputies clubs2 <- clubs[clubs$club != "cross-bencher",] get_deputies_mds(mat2, clubs2)
# votes <- get_filtered_votes(terms_of_office = c(7,7)) library(dplyr) library(ggplot2) data(votes) v <- c(`Za` = 5, `Przeciw` = -5, `Wstrzymał się` = 2, `Nieobecny` = 0)/10 mat2 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")], weights = v) df <- votes[,c("surname_name","club")] df %>% group_by(surname_name, club) %>% summarise(n = n()) %>% arrange(-n) %>% group_by(surname_name) %>% top_n(1) %>% as.data.frame() -> clubs row.names(clubs) <- clubs[,1] clubs$club[clubs$club == "niez."] = "cross-bencher" get_deputies_mds(mat2, clubs) # without cross bencher deputies clubs2 <- clubs[clubs$club != "cross-bencher",] get_deputies_mds(mat2, clubs2)
Function get_deputies_silhouette
converts the distance matrix between deputies
and their clubs into a silhouette plot
get_deputies_silhouette(distances, clubs = NULL, plot = TRUE, remove_missing_clubs = TRUE)
get_deputies_silhouette(distances, clubs = NULL, plot = TRUE, remove_missing_clubs = TRUE)
distances |
a distance matrix, preferably created with |
clubs |
a data.frame that maps |
plot |
if |
remove_missing_clubs |
if TRUE then rows of |
silhouette object or a ggplot
Przemyslaw Biecek
library("dplyr") # votes <- get_filtered_votes(terms_of_office = c(7,7)) data(votes) v <- c(`Za` = 5, `Przeciw` = -5, `Wstrzymał się` = 2, `Nieobecny` = 0)/10 mat2 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")], weights = v) df <- votes[,c("surname_name","club")] df %>% group_by(surname_name, club) %>% summarise(n = n()) %>% arrange(-n) %>% group_by(surname_name) %>% top_n(1) %>% as.data.frame() -> clubs row.names(clubs) <- clubs[,1] clubs$club[clubs$club == "niez."] = "cross-bencher" get_deputies_silhouette(mat2, clubs)
library("dplyr") # votes <- get_filtered_votes(terms_of_office = c(7,7)) data(votes) v <- c(`Za` = 5, `Przeciw` = -5, `Wstrzymał się` = 2, `Nieobecny` = 0)/10 mat2 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")], weights = v) df <- votes[,c("surname_name","club")] df %>% group_by(surname_name, club) %>% summarise(n = n()) %>% arrange(-n) %>% group_by(surname_name) %>% top_n(1) %>% as.data.frame() -> clubs row.names(clubs) <- clubs[,1] clubs$club[clubs$club == "niez."] = "cross-bencher" get_deputies_silhouette(mat2, clubs)
Function get_deputies_table
imports deputies table from a database.
get_deputies_table(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', sorted_by_id = TRUE, windows = .Platform$OS.type == 'windows')
get_deputies_table(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', sorted_by_id = TRUE, windows = .Platform$OS.type == 'windows')
dbname |
name of database; default: 'sejmrp' |
user |
name of user; default: 'reader' |
password |
password of database; default: 'qux94874' |
host |
name of host; default: 'services.mini.pw.edu.pl' |
sorted_by_id |
information if table should be sorted by id; default: TRUE |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
Function get_deputies_table
imports deputies table from a database.
The result of this function is a data frame with deputies' data. Because of
encoding issue on Windows operation system, you need to select if you use Windows.
data frame
Default parameters use privilages of 'reader'. It can only SELECT data from database.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: deputies <- get_deputies_table() dim(deputies) # [1] 983 3 names(deputies) # [1] 'id_deputy' 'nr_term_of_office' 'surname_name' ## End(Not run)
## Not run: deputies <- get_deputies_table() dim(deputies) # [1] 983 3 names(deputies) # [1] 'id_deputy' 'nr_term_of_office' 'surname_name' ## End(Not run)
Function get_distance_matrix
converts a data frame with tree columns (deputies' ids, votings' ids and votes)
into a deputies distance matrix.
get_distance_matrix(votes, weights = NULL, allowMissings = 0)
get_distance_matrix(votes, weights = NULL, allowMissings = 0)
votes |
a data frame with three columns, respectively: deputy id, voting id, vote |
weights |
if supplied it should be a named vector that converts votes into a numeric values that correspond to their similarity |
allowMissings |
maximum number of missing votigs allowd for deputy. |
Function get_distance_matrix
calculated distances among deputies based on their
votes The more similar are the voting the smaller distance between deputies.
distance matrix
Przemyslaw Biecek
# votes <- get_filtered_votes(terms_of_office = c(7,7)) data(votes) v <- c(`Za` = 5, `Przeciw` = -5, `Wstrzymał się` = 2, `Nieobecny` = 0)/10 mat2 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")], weights = v) ## Not run: mat1 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")]) ## End(Not run)
# votes <- get_filtered_votes(terms_of_office = c(7,7)) data(votes) v <- c(`Za` = 5, `Przeciw` = -5, `Wstrzymał się` = 2, `Nieobecny` = 0)/10 mat2 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")], weights = v) ## Not run: mat1 <- get_distance_matrix(votes[,c("surname_name", "id_voting", "vote")]) ## End(Not run)
Function get_filtered_statements
reads filtered statements from a database.
get_filtered_statements(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', windows = .Platform$OS.type == 'windows', terms_of_office = integer(0), deputies = character(0), dates = character(0), topics = character(0), content = character(0), max_rows = Inf)
get_filtered_statements(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', windows = .Platform$OS.type == 'windows', terms_of_office = integer(0), deputies = character(0), dates = character(0), topics = character(0), content = character(0), max_rows = Inf)
dbname |
name of database; default: 'sejmrp' |
user |
name of user; default: 'reader' |
password |
password of database; default: 'qux94874' |
host |
name of host; default: 'services.mini.pw.edu.pl' |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
terms_of_office |
range of terms of office's numbers that will be taken to filter data from database; default: integer(0) |
deputies |
full names of deputies that will be taken to filter data from database; default: character(0) |
dates |
period of time that will be taken to filter data from database; default: character(0) |
topics |
text patterns that will be taken to filter data from database; default: character(0) |
content |
text patterns that will be taken to filter data from database; default: character(0) |
max_rows |
maximum number of rows to download; default: Inf |
Function get_filtered_statements
reads filtered statements from a database.
The result of this function is an invisible data frame with statements' data.
Possible filters:
terms_of_office - range of terms of office's numbers. This filter is a integer vector with two elements, where the first describes a left boundary of range and the second a right boundary. It is possible to choose only one term of office, just try the same number as first and second element of vector.
deputies - full names of deputies. This filter is a character vector with full names of deputies in format: 'surname first_name second_name'. If you are not sure if the deputy you were thinking about has second name, try 'surname first_name' or just 'surname'. There is high probability that proper deputy will be chosen. It is possible to choose more than one deputy.
dates - period of time. This filter is a character vector with two elements in date format 'YYYY-MM-DD', where the first describes left boundary of period and the second right boundary. It is possible to choose only one day, just try the same date as first and second element of vector.
topics - text patterns. This filter is a character vector with text patterns of topics in order points. Note that the order points are written like sentences, so remember about case inflection of nouns and adjectives and use stems of words as patterns. For example if you want to find order points about education (in Polish: szkolnictwo) try 'szkolnictw'. It is possible to choose more than one pattern.
content - text patterns. This filter is a character vector with text patterns in statements. Note that strings with statements are sentences, so remember about case inflection of nouns and adjectives and use stems of words as patterns. For example if you want to find order points about education (in Polish: szkolnictwo) try 'szkolnictw'. It is possible to choose more than one pattern.
If you did not choose any filter, the whole database will be downloaded. Note that, due to data size (<= ~150 MB) it may take few seconds / minutes to download all statements.
Because of encoding issue on Windows operation system, you also need to select if you use Windows.
data frame with NULL
Default parameters use privilages of 'reader'. It can only SELECT data from database.
All information is stored in PostgreSQL database.
Tomasz Mikolajczyk, Piotr Smuda
## Not run: filtered_statements <- get_filtered_statements() dim(filtered_statements) # [1] 2568 6 names(filtered_statements) [1] 'id_statement' 'nr_term_of_office' 'surname_name' 'date_statement' [5] 'titles_order_points' 'statement' object.size(filtered_statements) # 6488552 bytes ## End(Not run)
## Not run: filtered_statements <- get_filtered_statements() dim(filtered_statements) # [1] 2568 6 names(filtered_statements) [1] 'id_statement' 'nr_term_of_office' 'surname_name' 'date_statement' [5] 'titles_order_points' 'statement' object.size(filtered_statements) # 6488552 bytes ## End(Not run)
Function get_filtered_votes
reads filtered votes from a database.
get_filtered_votes(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', windows = .Platform$OS.type == 'windows', clubs = character(0), dates = character(0), terms_of_office = integer(0), meetings = integer(0), votings = integer(0), deputies = character(0), topics = character(0), max_rows = Inf)
get_filtered_votes(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', windows = .Platform$OS.type == 'windows', clubs = character(0), dates = character(0), terms_of_office = integer(0), meetings = integer(0), votings = integer(0), deputies = character(0), topics = character(0), max_rows = Inf)
dbname |
name of database; default: 'sejmrp' |
user |
name of user; default: 'reader' |
password |
password of database; default: 'qux94874' |
host |
name of host; default: 'services.mini.pw.edu.pl' |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
clubs |
names of clubs that will be taken to filter data from database; default: character(0) |
dates |
period of time that will be taken to filter data from database; default: character(0) |
terms_of_office |
range of terms of office's numbers that will be taken to filter data from database; default: integer(0) |
meetings |
range of meetings' numbers that will be taken to filter data from database; default: integer(0) |
votings |
range of votings' numbers that will be taken to filter data from database; default: integer(0) |
deputies |
full names of deputies that will be taken to filter data from database; default: character(0) |
topics |
text patterns that will be taken to filter data from database; default: character(0) |
max_rows |
maximum number of rows to download; default: Inf |
Function get_filtered_votes
reads filtered votes from a database.
The result of this function is an invisible data frame with statements' data.
Possible filters:
clubs - names of clubs. This filter is a character vector with elements like for example: 'PO', 'PiS', 'SLD'. It is possible to choose more than one club.
dates - period of time. This filter is a character vector with two elements in date format 'YYYY-MM-DD', where the first describes left boundary of period and the second right boundary. It is possible to choose only one day, just try the same date as first and second element of vector.
terms_of_office - range of terms of office's numbers. This filter is a integer vector with two elements, where the first describes a left boundary of range and the second a right boundary. It is possible to choose only one term of office, just try the same number as first and second element of vector.
meetings - range of meetings' numbers. This filter is a integer vector with two elements, where the first describes a left boundary of range and the second a right boundary. It is possible to choose only one meeting, just try the same number as first and second element of vector.
votings - range of votings' numbers. This filter is a integer vector with two elements, where the first describes a left boundary of range and the second a right boundary. It is possible to choose only one voting, just try the same number as first and second element of vector.
deputies - full names of deputies. This filter is a character vector with full names of deputies in format: 'surname first_name second_name'. If you are not sure if the deputy you were thinking about has second name, try 'surname first_name' or just 'surname'. There is high probability that proper deputy will be chosen. It is possible to choose more than one deputy.
topics - text patterns. This filter is a character vector with text patterns of topics that you are interested about. Note that the votings' topics are written like sentences, so remember about case inflection of nouns and adjectives and use stems of words as patterns. For example if you want to find votings about education (in Polish: szkolnictwo) try 'szkolnictw'. It is possible to choose more than one pattern.
If you did not choose any filter, the whole database will be downloaded. Note that, due to data size (<= ~150 MB) it may take few seconds / minutes to download all votes.
Because of encoding issue on Windows operation system, you also need to select if you use Windows.
data frame with NULL
Default parameters use privilages of 'reader'. It can only SELECT data from database.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: filtered_votes <- get_filtered_votes() dim(filtered_votes) # [1] 2826483 9 names(filtered_votes) [1] 'surname_name' 'nr_term_of_office' 'club' 'vote' 'id_voting' [6] 'nr_meeting' 'nr_voting' 'date_meeting' 'topic_voting' object.size(filtered_votes) # 148694336 bytes ## End(Not run)
## Not run: filtered_votes <- get_filtered_votes() dim(filtered_votes) # [1] 2826483 9 names(filtered_votes) [1] 'surname_name' 'nr_term_of_office' 'club' 'vote' 'id_voting' [6] 'nr_meeting' 'nr_voting' 'date_meeting' 'topic_voting' object.size(filtered_votes) # 148694336 bytes ## End(Not run)
One deputy may belong to many different clubs and change clubs over time.
The function get_most_frequent_club
calculates the most frequent club for each deputy.
get_most_frequent_club(deputy_id, club)
get_most_frequent_club(deputy_id, club)
deputy_id |
a vector with deputy unique ids (may be also a vactor with characters name/surname) |
club |
vector of length equal to |
a data frame with club memberships of deputies
Przemyslaw Biecek
# votes <- get_filtered_votes(terms_of_office = c(7,7)) data(votes) clubs <- get_most_frequent_club(votes$surname_name, votes$club) head(clubs)
# votes <- get_filtered_votes(terms_of_office = c(7,7)) data(votes) clubs <- get_most_frequent_club(votes$surname_name, votes$club) head(clubs)
Function get_statements_table
imports statements table from a database.
get_statements_table(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', sorted_by_id = TRUE, windows = .Platform$OS.type == 'windows')
get_statements_table(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', sorted_by_id = TRUE, windows = .Platform$OS.type == 'windows')
dbname |
name of database; default: 'sejmrp' |
user |
name of user; default: 'reader' |
password |
password of database; default: 'qux94874' |
host |
name of host; default: 'services.mini.pw.edu.pl' |
sorted_by_id |
information if table should be sorted by id; default: TRUE |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
Function get_statements_table
imports statements table from a database.
The result of this function is a data frame with statements' data. Because of
encoding issue on Windows operation system, you need to select if you use Windows.
data frame
Default parameters use privilages of 'reader'. It can only SELECT data from database.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: statements <- get_statements_table() dim(statements) # [1] 43432 6 names(statements) # [1] 'id_statement' 'nr_term_of_office' 'surname_name' # [4] 'date_statement' 'titles_order_points' 'statement' ## End(Not run)
## Not run: statements <- get_statements_table() dim(statements) # [1] 43432 6 names(statements) # [1] 'id_statement' 'nr_term_of_office' 'surname_name' # [4] 'date_statement' 'titles_order_points' 'statement' ## End(Not run)
Function get_votes_table
imports votes table from a database.
get_votes_table(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', sorted_by_id = TRUE, windows = .Platform$OS.type == 'windows')
get_votes_table(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', sorted_by_id = TRUE, windows = .Platform$OS.type == 'windows')
dbname |
name of database; default: 'sejmrp' |
user |
name of user; default: 'reader' |
password |
password of database; default: 'qux94874' |
host |
name of host; default: 'services.mini.pw.edu.pl' |
sorted_by_id |
information if table should be sorted by id; default: TRUE |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
Function get_votes_table
imports votes table from a database.
The result of this function is a data frame with votes' data. Because of
encoding issue on Windows operation system, you need to select if you use Windows.
data frame
Default parameters use privilages of 'reader'. It can only SELECT data from database.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: votes <- get_votes_table() dim(votes) # [1] 2826483 6 names(votes) # [1] 'id_vote' 'nr_term_of_office' 'id_deputy' 'id_voting' 'vote' 'club' object.size(votes) # 90474040 bytes ## End(Not run)
## Not run: votes <- get_votes_table() dim(votes) # [1] 2826483 6 names(votes) # [1] 'id_vote' 'nr_term_of_office' 'id_deputy' 'id_voting' 'vote' 'club' object.size(votes) # 90474040 bytes ## End(Not run)
Function get_votings_table
imports votings table from a database.
get_votings_table(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', sorted_by_id = TRUE, windows = .Platform$OS.type == 'windows')
get_votings_table(dbname = 'sejmrp', user = 'reader', password = 'qux94874', host = 'services.mini.pw.edu.pl', sorted_by_id = TRUE, windows = .Platform$OS.type == 'windows')
dbname |
name of database; default: 'sejmrp' |
user |
name of user; default: 'reader' |
password |
password of database; default: 'qux94874' |
host |
name of host; default: 'services.mini.pw.edu.pl' |
sorted_by_id |
information if table should be sorted by id; default: TRUE |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
Function get_votings_table
imports votings table from a database.
The result of this function is a data frame with votings' data. Because of
encoding issue on Windows operation system, you need to select if you use Windows.
data frame
Default parameters use privilages of 'reader'. It can only SELECT data from database.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: votings <- get_votings_table() dim(votings) # [1] 6212 7 names(votings) # [1] 'id_voting' 'nr_term_of_office' 'nr_meeting' # [4] 'date_meeting' 'nr_voting' 'topic_voting' # [7] 'link_results' ## End(Not run)
## Not run: votings <- get_votings_table() dim(votings) # [1] 6212 7 names(votings) # [1] 'id_voting' 'nr_term_of_office' 'nr_meeting' # [4] 'date_meeting' 'nr_voting' 'topic_voting' # [7] 'link_results' ## End(Not run)
Function remove_database
remove whole database.
remove_database(dbname, user, password, host)
remove_database(dbname, user, password, host)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
invisible NULL
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: remove_database(dbname, user, password, host) ## End(Not run)
## Not run: remove_database(dbname, user, password, host) ## End(Not run)
Function safe_html
tries to download the URL several times.
safe_html(page, time = 60, attempts = 10)
safe_html(page, time = 60, attempts = 10)
page |
requested URL |
time |
sleep interval after each failure |
attempts |
max number of tries (if there is a problem with connection) |
Function safe_html
performes 10 (by default) attempts to download the URL
and waits 60sec (by default) after each failure
character vector
Przemyslaw Biecek
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=008') safe_html(page) ## End(Not run)
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=008') safe_html(page) ## End(Not run)
Function safe_readHTMLTable
tries to download the table from given URL several times.
safe_readHTMLTable(..., time = 60, attempts = 10)
safe_readHTMLTable(..., time = 60, attempts = 10)
... |
arguments that will be passed to readHTMLTable |
time |
sleep interval after each failure |
attempts |
max number of tries (if there is a problem with connection) |
Function safe_readHTMLTable
performes 10 (by default) attempts to download the URL
and waits 60sec (by default) after each failure
character vector
Przemyslaw Biecek
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'posiedzenie.xsp?posiedzenie=99&dzien=2') safe_readHTMLTable(page) ## End(Not run)
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'posiedzenie.xsp?posiedzenie=99&dzien=2') safe_readHTMLTable(page) ## End(Not run)
Function statements_create_table
creates a table with deputies' statements.
statements_create_table(dbname, user, password, host, nr_term_of_office = 8)
statements_create_table(dbname, user, password, host, nr_term_of_office = 8)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
invisible NULL
Use only this function for first time, when the statements table
is empty. Then use statements_update_table
.
All information is stored in PostgreSQL database.
Piotr Smuda, Tomasz Mikolajczyk
## Not run: statements_create_table(dbname, user, password, host) ## End(Not run)
## Not run: statements_create_table(dbname, user, password, host) ## End(Not run)
Function statements_get_statement
gets statement's content.
statements_get_statement(page, ...)
statements_get_statement(page, ...)
page |
deputy's statement's page |
... |
other arguments, that will be passed to safe_html() |
Function statements_get_statement
gets statement's content.
Example of page with deputy's statement:
http://www.sejm.gov.pl/Sejm7.nsf/wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=008
character vector
All information is stored in PostgreSQL database.
Piotr Smuda, Tomasz Mikolajczyk
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=008') statements_get_statement(page) ## End(Not run)
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=008') statements_get_statement(page) ## End(Not run)
Function statements_get_statements_data
gets data about statements.
statements_get_statements_data(statements_links, home_page = 'http://www.sejm.gov.pl/')
statements_get_statements_data(statements_links, home_page = 'http://www.sejm.gov.pl/')
statements_links |
list of elements of XMLNodeSet class with statements' ids, links and their's authors |
home_page |
main page of polish diet: http://www.sejm.gov.pl/ |
Function statements_get_statements_data
gets data about statements like
author, page with content of statement and it's id.
data frame with three columns: names, statements_links, ids
All information is stored in PostgreSQL database.
Piotr Smuda, Tomasz Mikolajczyk
## Not run: page <- safe_html(paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=0')) page <- html_nodes(page, '.stenogram') statements_links <- html_nodes(page, 'h2 a') statements_get_statements_data(statements_links, home_page = 'http://www.sejm.gov.pl/Sejm7.nsf/') ## End(Not run)
## Not run: page <- safe_html(paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=0')) page <- html_nodes(page, '.stenogram') statements_links <- html_nodes(page, 'h2 a') statements_get_statements_data(statements_links, home_page = 'http://www.sejm.gov.pl/Sejm7.nsf/') ## End(Not run)
Function statements_get_statements_table
gets statements' table from
meeting's page.
statements_get_statements_table(page)
statements_get_statements_table(page)
page |
meeting's page |
Function statements_get_statements_table
gets statements' table. from
meeting's page. Example of a meeting's page:
http://www.sejm.gov.pl/Sejm7.nsf/posiedzenie.xsp?posiedzenie=99&dzien=2
The result of this function is a data frame with three columns, where
the first includes author of statement, the second the number of order point
and the third is a title of order point.
data frame with three unnamed columns
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: page <- 'http://www.sejm.gov.pl/Sejm7.nsf/posiedzenie.xsp?posiedzenie=99&dzien=2' statements_get_statements_table(page) ## End(Not run)
## Not run: page <- 'http://www.sejm.gov.pl/Sejm7.nsf/posiedzenie.xsp?posiedzenie=99&dzien=2' statements_get_statements_table(page) ## End(Not run)
Function statements_update_table
updates a table with deputies' statements.
statements_update_table(dbname, user, password, host, nr_term_of_office = 8, verbose = FALSE)
statements_update_table(dbname, user, password, host, nr_term_of_office = 8, verbose = FALSE)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
verbose |
if TRUE then additional info will be printed |
invisible NULL
All information is stored in PostgreSQL database.
Piotr Smuda, Tomasz Mikolajczyk
## Not run: statements_update_table(dbname, user, password, host) ## End(Not run)
## Not run: statements_update_table(dbname, user, password, host) ## End(Not run)
Votes taken in the 7th office of Polish Sejm (2011-2015)
surname_name Surname and name of a deputy
nr_term_of_office Which office? In this sample there is only 7th office
club club of the deputy at the moment of voting (may change in time)
vote vote taken in the voting
id_voting unique id of the voting
nr_meeting no of the meeting
nr_voting no of the voting
date_meeting data of the meeting
topic_voting full title of the voting
data(votes)
data(votes)
2890479 rows and 9 columns
Function votes_create_table
creates a table with votes.
votes_create_table(dbname, user, password, host, nr_term_of_office = 8, windows = .Platform$OS.type == 'windows')
votes_create_table(dbname, user, password, host, nr_term_of_office = 8, windows = .Platform$OS.type == 'windows')
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
invisible NULL
Use only this function for first time, when the votes table
is empty. Then use votes_update_table
.
There is a possibility that someone's voice reader broke during voting and this situation is treated like this deputy was absent. Even if deputy made a decision, he's/she's vote is 'Nieobecny'.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: votes_create_table(dbname, user, password, host, 7, TRUE) votes_create_table(dbname, user, password, host, 7, FALSE) ## End(Not run)
## Not run: votes_create_table(dbname, user, password, host, 7, TRUE) votes_create_table(dbname, user, password, host, 7, FALSE) ## End(Not run)
Function votes_get_clubs_links
gets links with voting's results for each club
from voting's page.
votes_get_clubs_links(home_page = 'http://www.sejm.gov.pl/Sejm8.nsf/', page)
votes_get_clubs_links(home_page = 'http://www.sejm.gov.pl/Sejm8.nsf/', page)
home_page |
main page of polish diet: http://www.sejm.gov.pl/Sejm8.nsf/ |
page |
voting's page |
Function votes_get_clubs_links
gets links with voting's results for each club
from voting's page. Example of a voting's page:
http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=glosowania&
NrKadencji=7&NrPosiedzenia=1&NrGlosowania=1
data frame with two columns: club, links
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: home_page <- 'http://www.sejm.gov.pl/Sejm7.nsf/' page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?', 'symbol=glosowania&NrKadencji=7&NrPosiedzenia=1&NrGlosowania=1') votes_get_clubs_links(home_page, page) ## End(Not run)
## Not run: home_page <- 'http://www.sejm.gov.pl/Sejm7.nsf/' page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?', 'symbol=glosowania&NrKadencji=7&NrPosiedzenia=1&NrGlosowania=1') votes_get_clubs_links(home_page, page) ## End(Not run)
Function votes_get_results
gets voting's results for each club.
votes_get_results(page)
votes_get_results(page)
page |
club's voting's results page |
Function votes_get_results
gets voting's results for each club.
Example of page with voting's results of PO club:
http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=klubglos&
IdGlosowania=37494&KodKlubu=PO
data frame with two columns: deputy, vote
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?', 'symbol=klubglos&IdGlosowania=37494&KodKlubu=PO') votes_get_results(page) ## End(Not run)
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?', 'symbol=klubglos&IdGlosowania=37494&KodKlubu=PO') votes_get_results(page) ## End(Not run)
Function votes_match_deputies_ids
matches deputies from voting's results
page to theirs' ids from deputies table.
votes_match_deputies_ids(dbname, user, password, host, page, nr_term_of_office = 8, windows = .Platform$OS.type == 'windows')
votes_match_deputies_ids(dbname, user, password, host, page, nr_term_of_office = 8, windows = .Platform$OS.type == 'windows')
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
page |
club's voting's results page |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
Function votes_match_deputies_ids
matches deputies from voting's results
page to theirs' ids from deputies table. The result of this function is
a data frame with deputies' data, ids and votes. Because of encoding issue
on Windows operation system, you need to select if you use Windows.
Example of page with voting's results of PO club:
http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?
symbol=klubglos&IdGlosowania=37494&KodKlubu=PO
data frame with three columns: deputy, vote, id
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?', 'symbol=klubglos&IdGlosowania=37494&KodKlubu=PO') votes_match_deputies_ids(dbname, user, password, host, page, 7, TRUE) votes_match_deputies_ids(dbname, user, password, host, page, 7, FALSE) ## End(Not run)
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?', 'symbol=klubglos&IdGlosowania=37494&KodKlubu=PO') votes_match_deputies_ids(dbname, user, password, host, page, 7, TRUE) votes_match_deputies_ids(dbname, user, password, host, page, 7, FALSE) ## End(Not run)
Function votes_update_table
updates a table with votes.
votes_update_table(dbname, user, password, host, nr_term_of_office = 8, windows = .Platform$OS.type == 'windows', verbose = FALSE)
votes_update_table(dbname, user, password, host, nr_term_of_office = 8, windows = .Platform$OS.type == 'windows', verbose = FALSE)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
windows |
information of used operation system; default: .Platform$OS.type == 'windows' |
verbose |
if TRUE then additional info will be printed |
invisible NULL
There is a possibility that someone's voice reader broke during voting and this situation is treated like this deputy was absent. Even if deputy made a decision, he's/she's vote is 'Nieobecny'.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: votes_update_table(dbname, user, password, host, 7, TRUE) votes_update_table(dbname, user, password, host, 7, FALSE) ## End(Not run)
## Not run: votes_update_table(dbname, user, password, host, 7, TRUE) votes_update_table(dbname, user, password, host, 7, FALSE) ## End(Not run)
Function votings_create_table
creates a table with votings.
votings_create_table(dbname, user, password, host, nr_term_of_office = 8)
votings_create_table(dbname, user, password, host, nr_term_of_office = 8)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
invisible NULL
Use only this function for first time, when the votings table
is empty. Then use votings_update_table
.
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: votings_create_table(dbname, user, password, host) ## End(Not run)
## Not run: votings_create_table(dbname, user, password, host) ## End(Not run)
Function votings_get_date
gets a date of meeting.
votings_get_date(page)
votings_get_date(page)
page |
meeting's page |
Example of a meeting's page: http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179
date in format YYYY-MM-DD as character
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: page <- 'http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179' votings_get_date(page) ## End(Not run)
## Not run: page <- 'http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179' votings_get_date(page) ## End(Not run)
Function votings_get_meetings_links
gets meetings' links.
votings_get_meetings_links( home_page = 'http://www.sejm.gov.pl/Sejm8.nsf/', page = 'http://www.sejm.gov.pl/Sejm8.nsf/agent.xsp?symbol=posglos&NrKadencji=8')
votings_get_meetings_links( home_page = 'http://www.sejm.gov.pl/Sejm8.nsf/', page = 'http://www.sejm.gov.pl/Sejm8.nsf/agent.xsp?symbol=posglos&NrKadencji=8')
home_page |
main page of polish diet: http://www.sejm.gov.pl/Sejm8.nsf/ |
page |
page with votings in polish diet: http://www.sejm.gov.pl/Sejm8.nsf/agent.xsp? symbol=posglos&NrKadencji=8 |
character vector
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: votings_get_meetings_links() ## End(Not run)
## Not run: votings_get_meetings_links() ## End(Not run)
Function votings_get_meetings_table
gets meetings' table.
votings_get_meetings_table(page = 'http://www.sejm.gov.pl/Sejm8.nsf/agent.xsp?symbol=posglos&NrKadencji=8')
votings_get_meetings_table(page = 'http://www.sejm.gov.pl/Sejm8.nsf/agent.xsp?symbol=posglos&NrKadencji=8')
page |
page with votings in polish diet: http://www.sejm.gov.pl/Sejm8.nsf/agent.xsp? symbol=posglos&NrKadencji=8 |
Function votings_get_meetings_table
gets meetings' table. The
result of this function is a data frame with three columns, where
the first includes numbers of meetings, the second theirs' dates in
Polish and the third is with numbers of votings on each meeting.
data frame with three unnamed columns
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: votings_get_meetings_table() ## End(Not run)
## Not run: votings_get_meetings_table() ## End(Not run)
Function votings_get_votings_links
gets votings' links from
meeting's page.
votings_get_votings_links(home_page = 'http://www.sejm.gov.pl/Sejm8.nsf/', page)
votings_get_votings_links(home_page = 'http://www.sejm.gov.pl/Sejm8.nsf/', page)
home_page |
main page of polish diet: http://www.sejm.gov.pl/Sejm8.nsf/ |
page |
meeting's page |
Example of a meeting's page: http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179
character vector
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: home_page <- 'http://www.sejm.gov.pl/Sejm7.nsf/' page <- 'http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179' votings_get_votings_links(home_page, page) ## End(Not run)
## Not run: home_page <- 'http://www.sejm.gov.pl/Sejm7.nsf/' page <- 'http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179' votings_get_votings_links(home_page, page) ## End(Not run)
Function votings_get_votings_table
gets votings' table from
meeting's page.
votings_get_votings_table(page)
votings_get_votings_table(page)
page |
meeting's page |
Function votings_get_votings_table
gets votings' table from
meeting's page. Example of a meeting's page:
http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179
The result of this function is a data frame with three columns, where
the first includes numbers of votings, the second voting's time
and the third is with voting's topics.
data frame with three columns: Nr, Godzina (Time), Temat (Topic)
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: page <- 'http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179' votings_get_votings_table(page) ## End(Not run)
## Not run: page <- 'http://www.sejm.gov.pl/Sejm7.nsf/agent.xsp?symbol=listaglos&IdDnia=1179' votings_get_votings_table(page) ## End(Not run)
Function votings_update_table
updates table with votings.
votings_update_table(dbname, user, password, host, nr_term_of_office = 8, verbose = FALSE)
votings_update_table(dbname, user, password, host, nr_term_of_office = 8, verbose = FALSE)
dbname |
name of database |
user |
name of user |
password |
password of database |
host |
name of host |
nr_term_of_office |
number of term of office of Polish Diet; default: 8 |
verbose |
if TRUE then additional info will be printed |
invisible NULL
All information is stored in PostgreSQL database.
Piotr Smuda
## Not run: votings_update_table(dbname, user, password, host) ## End(Not run)
## Not run: votings_update_table(dbname, user, password, host) ## End(Not run)