Over the last three years we at Causal Map Ltd have developed a range of algorithms for causal mapping. The algorithms are published in the accompanying file cm3functions.R under an open-source license in the form of functions for the programming language R. These functions are the heart of the analysis engine in our app CausalMap3. The app itself is free to use for analysing existing causal mapping data files but it is closed source, so these functions are a reproducible window into the algorithms and can be used by anyone who wants to analyse and visualise causal mapping data without using the Causal Map app.
The main purpose is to publish and provide examples for a set of algorithms for causal mapping analysis: ways to manipulate, simplify and query causal mapping data.
If you already have causal codings e.g. in the form of an Excel sheet with at least two columns for “cause” and “effect”, these functions could be really interesting to you.
Though do note that if you have such data, you can use the Causal Map app itself for free without installing the functions.
The documentation for the functions isn’t very good yet, but if there is interest we can provide more. We are working on a training for doing causal mapping with methods which do not need the Causal Map app.
If you want to do causal coding, i.e. creating raw causal data, i.e. at minimum an ‘edge list’ of arrows between factors, these functions won’t help you. You’d have to use the Causal Map app or some other method.
When we get a chance, we will release them as a full R package. But for now you can read and download a single R script which contains the functions and all you need to use them. We are still in the process of documenting the functions. To make them easier to use, we provide this vignette which demonstrates the use of many of the functions as applied to example causal mapping data files (the script will download for you the latest version of the functions script, the main example file and a special example file which demonstrates the use of combined opposites).
This vignette is an .Rmd file: you process it using the package knitr or by pressing Knit in Rstudio. Or you can use it as a source of examples to modify for your own work in the R console.
The functions are used within Causal Map 3, a web app which provides an interface for managing causal mapping project and also enables autocoding with AI. The functions provided here do not yet include functions for using AI.
These functions completely replace the functions in the old R package CausalMapFunctions which is now deprecated. This new set of functions is much simpler.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tibble' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.2
## Warning: package 'purrr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'stringr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(shiny)
## Warning: package 'shiny' was built under R version 4.3.2
library(DiagrammeR)
## Warning: package 'DiagrammeR' was built under R version 4.3.3
library(textreadr)
knitr::opts_chunk$set(echo = T,warning=F,message=F)
htmlx <- message
if(Sys.getenv("USERDOMAIN") != "STEVE-P") {
url <- "https://www.dropbox.com/scl/fi/ulrtbj9fmgu1ltdamsvhy/example-file.xlsx?rlkey=pczexdevcxgz4if2muqwr0dng&dl=1"
download.file(url = url, destfile = "example-file.xlsx",method="curl",extra='-L')
url <- "https://www.dropbox.com/scl/fi/t9fnja78jo2zb650fesqs/opposites-example.xlsx?rlkey=t9cd4afr77te3bs98hnv6rm5v&dl=1"
download.file(url = url, destfile = "opposites-example.xlsx",method="curl",extra='-L')
url <- "https://www.dropbox.com/scl/fi/mtlaf5rmmdrgp3bxbrydz/cm3functions.R?rlkey=7hli0uaookisr9seahpzczynk&dl=1"
download.file(url = url, destfile = "cm3functions.R",method="curl",extra='-L')
}
notify <- function(message,level){
message(message)
}
# for the convenience of the developer
if(Sys.getenv("USERDOMAIN") == "STEVE-P") {
source("Rfiles/cm3functions.R")
ex="assets/import_examples/example-file.xlsx" %>%
get_from_excel()
ex$links$sentiment <- 0
ex2="assets/import_examples/opposites-example.xlsx" %>%
get_from_excel()
} else {
source("cm3functions.R")
ex="example-file.xlsx" %>%
get_from_excel()
ex2="opposites-example.xlsx" %>%
get_from_excel()
}
ex2$links$sentiment <- 0
ex$links$sentiment <- 0
# constants -----------------------------------------------------------------
contrary_color <- "#f2ab73"
ordinary_color <- "#058488"
flipchar <- "~"
Once the commands below have downloaded the data files, you can comment them out.
A causal map data file is a list of tibbles (data frames): any of links
statements
sources
questions
and settings
, but at least links
or statements
must be present.
These data frames correspond to the tabs in the corresponding Excel format. You can import or export Causal Map files as Excel (“round-tripping”).
An uncoded file might contain just a set of statements for coding.
A coded data file might contain just a set of links which have already been coded.
The additional, optional tibbles sources
and questions
provide additional meta-data about the statements and links.
All the main functions here deal only with the links table, which is of the most interest for causal mapping. The
pipe_*
functions take a links table as input and output a modified links table as output. So they can be chained indefinitely.
Each row is one chunk of text, usually one or a few paragraphs.
This table contains at least the fields statement_id
which must be unique for each row, and the field text
which contains the text of the statement to be coded.
Coding means adding rows to a table of links. Each row is one coding.
This table contains at least the fields link_id
(a unique integer), statement_id
which must correspond to a row in the statements table, and the field quote
which contains a substring of the corresponding statement.
We will inspect the links, sources and statements tables.
ex$links %>%
arrange(desc(source_count)) %>%
select(from_label,to_label,source_count,source_id,link_id,statement_id) %>%
head
## # A tibble: 6 × 6
## from_label to_label source_count source_id link_id statement_id
## <chr> <chr> <int> <chr> <dbl> <chr>
## 1 Health behaviour; Use pi… Improve… 16 MNX-2 939 MNX-2 | 244
## 2 Health behaviour; Use pi… Improve… 16 TWX-1 961 TWX-1 | 169
## 3 Health behaviour; Use pi… Improve… 16 TWX-2 964 TWX-2 | 206
## 4 Health behaviour; Use pi… Improve… 16 MSX-2 977 MSX-2 | 323
## 5 Health behaviour; Use pi… Improve… 16 MSY-1 981 MSY-1 | 363
## 6 Health behaviour; Use pi… Improve… 16 TWX-3 987 TWX-3 | 401
ex$links %>%
arrange(desc(source_count)) %>%
distinct(from_label,to_label,.keep_all = T) %>%
select(from_label,to_label,source_count,source_id,link_id,statement_id) %>%
head
## # A tibble: 6 × 6
## from_label to_label source_count source_id link_id statement_id
## <chr> <chr> <int> <chr> <dbl> <chr>
## 1 Health behaviour; Use pi… Improve… 16 MNX-2 939 MNX-2 | 244
## 2 Community groups/learning Communi… 14 MNY-1 354 MNY-1 | 70
## 3 Health behaviour; Use be… Improve… 14 MNX-2 938 MNX-2 | 244
## 4 Health behaviour; Use mo… Improve… 14 MNX-2 940 MNX-2 | 244
## 5 Farm production Income 13 MNY-1 348 MNY-1 | 62
## 6 Increased knowledge; Far… Planted… 13 TWX-1 358 TWX-1 | 88
ex$sources %>%
select(1:6) %>%
head
## # A tibble: 6 × 6
## source_id `Household code` `#Name of province` `#Name of village`
## <chr> <chr> <chr> <chr>
## 1 MNX-1 MNX1 A 1
## 2 MNX-2 MNX2 A 1
## 3 MNX-3 MNX3 A 1
## 4 MNX-4 MNX4 A 1
## 5 MNX-5 MNX5 A 1
## 6 MNX-6 MNX6 A 1
## # ℹ 2 more variables: `#Age of the main respondent` <chr>,
## # `Household size (residents > 6months)` <chr>
ex$statements %>%
select(1:6) %>%
head
## # A tibble: 6 × 6
## text source_id question_id created modified statement_id
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 This is a fully coded exa… MSY-3 $A8 2023-1… 2023-11… MSY-3 | 1
## 2 There has been a lot of c… MSY-3 $B1 2023-1… 2023-11… MSY-3 | 2
## 3 Improved MSY-3 $*B2 2023-1… 2023-11… MSY-3 | 3
## 4 There has been improvemen… MSY-3 $B3 2023-1… 2023-11… MSY-3 | 4
## 5 Yes there have been some … MSY-3 $C1 2023-1… 2023-11… MSY-3 | 5
## 6 Increased MSY-3 $*C2 2023-1… 2023-11… MSY-3 | 6
The optional sources table contains information about the sources in key-value format.
In Causal Map 3, there is no independent table for factors. Factors are implied by the link labels. The function make_factors_from_links()
shows what factors are implied by the links in the file.
ex$links %>%
make_factors_from_links() %>%
arrange(desc(source_count)) %>%
head
## # A tibble: 6 × 11
## label found found_source found_target source_count link_count in_degree
## <chr> <lgl> <lgl> <lgl> <int> <int> <int>
## 1 Improved he… FALSE FALSE FALSE 18 150 7
## 2 Diet improv… FALSE FALSE FALSE 17 51 7
## 3 Food consum… FALSE FALSE FALSE 17 37 6
## 4 Health beha… FALSE FALSE FALSE 17 19 16
## 5 Community g… FALSE FALSE FALSE 16 37 37
## 6 Farm produc… FALSE FALSE FALSE 16 158 81
## # ℹ 4 more variables: out_degree <int>, outcomeness <dbl>, flipped_from <int>,
## # flipped_to <int>
These functions use tidyverse conventions and workflows throughout.
There are no special functions for filtering the links because the tidyverse workflow is so simple:
ex$links %>%
filter(str_detect(source_id,"MNX"))
## # A tibble: 212 × 46
## from_label to_label quote link_id hashtags quote_start quote_end
## <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 Pregnancy refers:Pregn… Unable/… Ther… 435 <NA> 1 238
## 2 Increased knowledge; F… Improve… I pl… 436 <NA> 305 476
## 3 Increased knowledge; F… Improve… I pl… 437 <NA> 305 476
## 4 Farm production Produce… [Inc… 438 <NA> 1 129
## 5 Farm production Produce… [Inc… 439 <NA> 222 513
## 6 Increased knowledge; N… Increas… Ther… 440 <NA> 1 556
## 7 Increased knowledge; F… Able to… Ther… 441 <NA> 1 287
## 8 Receives money from ch… Increas… Yes … 442 <NA> 146 332
## 9 Farm production Income [Pur… 443 <NA> 1 40
## 10 Community groups/learn… Communi… With… 444 <NA> 1 507
## # ℹ 202 more rows
## # ℹ 39 more variables: statement_code <chr>, source_count <int>,
## # link_count <int>, bundle <chr>, statement_id <chr>, text <chr>,
## # source_id <chr>, question_id <chr>, created <chr>, modified <chr>,
## # respondent_id <lgl>, merge_number <dbl>, `Household code` <chr>,
## # `#Name of province` <chr>, `#Name of village` <chr>,
## # `#Age of the main respondent` <chr>, …
ex$links %>%
filter(statement_id=="MNY-1 | 43")
## # A tibble: 2 × 46
## from_label to_label quote link_id hashtags quote_start quote_end
## <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 Increased knowledge; He… Health … [Hea… 340 <NA> 1 151
## 2 Health behaviour Improve… [Hea… 657 <NA> 1 151
## # ℹ 39 more variables: statement_code <chr>, source_count <int>,
## # link_count <int>, bundle <chr>, statement_id <chr>, text <chr>,
## # source_id <chr>, question_id <chr>, created <chr>, modified <chr>,
## # respondent_id <lgl>, merge_number <dbl>, `Household code` <chr>,
## # `#Name of province` <chr>, `#Name of village` <chr>,
## # `#Age of the main respondent` <chr>,
## # `Household size (residents > 6months)` <chr>, …
However there are special functions for including (‘focus’) and excluding factors by label, see below.
Formatting the map is done inside the call to make_print_map2()
. The structure of the map is not changed.
In this example we make the background colour of the factors depend on their link count aka citation count aka degree. This example also demonstrates the use of the main function to actually print out a map.
ex$links %>%
pipe_zoom(1) %>%
pipe_top_links(8) %>%
make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count", map_label_factors_type="link_count")
Calculating surprises is a formatting function because it does not change the structure of the map, only the labels printed on the links and factors.
ex$links %>%
pipe_zoom(1) %>%
pipe_top_links(8) %>%
pipe_label(map_label_links = "#Name of village" ,type="surprise_links") %>%
make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count")
ex$links %>%
pipe_zoom(1) %>%
pipe_top_links(8) %>%
pipe_label(map_label_links = "#Name of village",type="surprise_sources" ) %>%
make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count")
ex$links %>%
pipe_zoom(1) %>%
pipe_top_links(18) %>%
pipe_label(map_label_links = "#Name of village",type = "surprise_links" ) %>%
make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="source_count")
ex$links %>%
pipe_zoom(1) %>%
pipe_top_links(8) %>%
pipe_label(map_label_links = "#Name of village" ,type="surprise_links") %>%
make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count",map_label_factors = "#Name of village" ,map_label_factors_type="surprise_links")
ex$links %>%
pipe_zoom(1) %>%
pipe_top_links(8) %>%
pipe_label(map_label_links = "#Name of village" ,type="surprise_sources") %>%
make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count",map_label_factors = "#Name of village" ,map_label_factors_type="surprise_sources")
These functions all (potentially) transform the links table in some way, either by filtering out some links or by changing their labels in order to create a different but related graph.
The focus()
function identifies factors by complete label or part of the label (any=TRUE
).
By default, focus
shows factors one step up and down from the focused factor(s).
The function accepts a list of factor labels or parts of labels.
These code examples also show how the functions can be piped into a long chain. The pipe_*
functions work on links tables and output modified links tables. These examples include the functions pipe_zoom()
and make_print_map2()
which are dealt with later.
These examples show:
focus
is case sensitiveany
to match any part of a labelThe matched labels are highlighted.
# ex$links %>%
# pipe_zoom() %>%
# pipe_focus(c("AbilitY TO buy fOOD")) %>%
# make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_zoom() %>%
pipe_focus(c("Ability"),any = T) %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_zoom() %>%
pipe_focus(c("Ability to buy food","Diet improved"),any = T) %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
The examples:
ex$links %>%
pipe_zoom() %>%
pipe_focus(c("Ability"),any = T,up=1,down=2) %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_zoom() %>%
pipe_focus(c("Ability to buy food","Diet improved"),any = T,up=1,down=0) %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
NOTE this doesn’t seem to be working FIXME
ex$links %>%
pipe_zoom() %>%
pipe_focus(c("Ability to buy food","Diet improved"),any = T,up=1,down=0,transforms_tracing_threads = F) %>%
pipe_label(type="source_count") %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_zoom() %>%
pipe_focus(c("Ability to buy food","Diet improved"),any = T,up=1,down=0,transforms_tracing_threads = T) %>%
pipe_label(type="source_count") %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
The exclude function works in a comparable way, to exclude matching factors from the map.
The second map shows the same as the first but with some factors excluded.
ex$links %>%
pipe_zoom() %>%
pipe_top_factors(type="Links",which="Top",top=6) %>%
select(from_label,to_label,link_count,source_count) %>%
distinct %>%
head
## # A tibble: 6 × 4
## from_label to_label link_count source_count
## <chr> <chr> <int> <int>
## 1 Increased knowledge Health behaviour 23 6
## 2 Farm production Income 32 13
## 3 Increased knowledge Increased knowledge 8 6
## 4 Increased knowledge Farm production 24 11
## 5 Increased knowledge Diet improved 30 8
## 6 Diet improved Improved health 8 4
ex$links %>%
pipe_zoom() %>%
pipe_top_factors(type="Links",which="Top",top=6) %>%
pipe_exclude("Increased knowledge" %>% c("Diet improved")) %>%
select(from_label,to_label,link_count,source_count) %>%
distinct %>%
head
## # A tibble: 2 × 4
## from_label to_label link_count source_count
## <chr> <chr> <int> <int>
## 1 Farm production Income 32 13
## 2 Health behaviour Improved health 128 18
# ## just checking it works with [ ]
# links <-
# ex$links %>%
# mutate(from_label=str_replace_all(from_label,"Income","[Income]")) %>%
# mutate(to_label=str_replace_all(to_label,"Income","[Income]"))
#
#
# links %>%
# pipe_zoom() %>%
# pipe_exclude("[Income]",any = T) %>%
# pipe_top_factors(type="Links",which="Top",top=6) %>%
# select(from_label,to_label,link_count,source_count) %>%
# distinct %>%
# head
We take advantage of hierarchical coding to collapse subsidiary factors into global factors.
So for example the hierarchically coded factors fish; cod
and fish; salmon
collapse into one factor fish
.
This means that all the links to the two previous factors are routed into a single factor fish
(which may or may not already exist in the dataset).
The examples:
ex$links %>%
make_factors_from_links() %>%
arrange(desc(source_count)) %>%
select(label,source_count) %>%
head
## # A tibble: 6 × 2
## label source_count
## <chr> <int>
## 1 Improved health 18
## 2 Diet improved; Diversified 17
## 3 Food consumption quantity 17
## 4 Health behaviour; Use pits to dispose of rubbish 17
## 5 Community groups/learning 16
## 6 Farm production 16
ex$links %>%
pipe_zoom() %>%
make_factors_from_links() %>%
arrange(desc(source_count)) %>%
select(label,source_count) %>%
head
## # A tibble: 6 × 2
## label source_count
## <chr> <int>
## 1 Diet improved 19
## 2 Health behaviour 18
## 3 Improved health 18
## 4 Increased knowledge 18
## 5 Food consumption quantity 17
## 6 Community groups/learning 16
ex$links %>%
pipe_zoom() %>%
pipe_top_factors(type="Sources",which="Top",top=3) %>%
make_print_map2(ex$links)
ex$links %>%
pipe_zoom() %>%
pipe_top_factors(type="Sources",which="Top",top=3) %>%
select(from_label,to_label,link_count) %>%
distinct() %>%
head
## # A tibble: 6 × 3
## from_label to_label link_count
## <chr> <chr> <int>
## 1 Increased knowledge Health behaviour 23
## 2 Increased knowledge Increased knowledge 8
## 3 Increased knowledge Diet improved 30
## 4 Diet improved Improved health 8
## 5 Health behaviour Improved health 128
## 6 Diet improved Diet improved 1
ex$links %>%
pipe_zoom() %>%
pipe_top_factors(type="Sources",which="At_least",top=15) %>%
make_factors_from_links() %>%
arrange(desc(source_count)) %>%
head
## # A tibble: 6 × 11
## label found found_source found_target source_count link_count in_degree
## <chr> <lgl> <lgl> <lgl> <int> <int> <int>
## 1 Health beha… FALSE FALSE FALSE 18 151 128
## 2 Improved he… FALSE FALSE FALSE 18 140 0
## 3 Farm produc… FALSE FALSE FALSE 16 99 57
## 4 Increased k… FALSE FALSE FALSE 16 118 110
## 5 Improved/ne… FALSE FALSE FALSE 14 38 17
## 6 Income FALSE FALSE FALSE 13 36 3
## # ℹ 4 more variables: out_degree <int>, outcomeness <dbl>, flipped_from <int>,
## # flipped_to <int>
ex$links %>%
pipe_zoom() %>%
pipe_top_factors(type="Links",which="Top",top=12) %>%
make_factors_from_links() %>%
arrange(desc(link_count)) %>%
head
## # A tibble: 6 × 11
## label found found_source found_target source_count link_count in_degree
## <chr> <lgl> <lgl> <lgl> <int> <int> <int>
## 1 Health beha… FALSE FALSE FALSE 18 151 128
## 2 Increased k… FALSE FALSE FALSE 17 150 142
## 3 Improved he… FALSE FALSE FALSE 18 140 0
## 4 Farm produc… FALSE FALSE FALSE 16 123 58
## 5 Diet improv… FALSE FALSE FALSE 14 75 9
## 6 Planted new… FALSE FALSE FALSE 14 65 36
## # ℹ 4 more variables: out_degree <int>, outcomeness <dbl>, flipped_from <int>,
## # flipped_to <int>
ex$links %>%
pipe_zoom() %>%
pipe_top_factors(type="Links",which="At_least",top=20) %>%
make_factors_from_links() %>%
arrange(desc(link_count)) %>%
head
## # A tibble: 6 × 11
## label found found_source found_target source_count link_count in_degree
## <chr> <lgl> <lgl> <lgl> <int> <int> <int>
## 1 Increased k… FALSE FALSE FALSE 18 193 175
## 2 Health beha… FALSE FALSE FALSE 18 153 130
## 3 Improved he… FALSE FALSE FALSE 18 147 7
## 4 Farm produc… FALSE FALSE FALSE 16 135 70
## 5 Diet improv… FALSE FALSE FALSE 16 78 9
## 6 Income FALSE FALSE FALSE 14 70 32
## # ℹ 4 more variables: out_degree <int>, outcomeness <dbl>, flipped_from <int>,
## # flipped_to <int>
ex$links %>%
pipe_zoom() %>%
pipe_top_links(type="Sources",which="Top",top=2) %>%
select(from_label,to_label,link_count) %>%
distinct() %>%
head
## # A tibble: 2 × 3
## from_label to_label link_count
## <chr> <chr> <int>
## 1 Health behaviour Improved health 128
## 2 Community groups/learning Community works together 20
ex$links %>%
pipe_zoom() %>%
pipe_top_links(type="Sources",which="At_least",top=4) %>%
select(from_label,to_label,link_count,source_count) %>%
head
## # A tibble: 6 × 4
## from_label to_label link_count source_count
## <chr> <chr> <int> <int>
## 1 Increased knowledge Health behaviour 23 6
## 2 Farm production Produced enough food to eat and/o… 8 6
## 3 Income Ability to buy food 14 11
## 4 Increased knowledge Broken taboo 6 5
## 5 Farm production Income 32 13
## 6 Increased knowledge Improved household relationships 11 7
ex$links %>%
pipe_zoom() %>%
pipe_top_links(type="Links",which="Top",top=20) %>%
select(from_label,to_label,link_count,source_count) %>%
arrange(desc(link_count)) %>%
distinct() %>%
head
## # A tibble: 6 × 4
## from_label to_label link_count source_count
## <chr> <chr> <int> <int>
## 1 Health behaviour Improved health 128 18
## 2 Farm production Income 32 13
## 3 Increased knowledge Diet improved 30 8
## 4 Increased knowledge Planted new crop/vegetable variet… 29 13
## 5 Increased knowledge Farm production 24 11
## 6 Increased knowledge Health behaviour 23 6
ex$links %>%
pipe_zoom() %>%
pipe_top_links(type="Sources",which="Top",top=20) %>%
select(from_label,to_label,link_count,source_count) %>%
arrange(desc(source_count)) %>%
distinct() %>%
head
## # A tibble: 6 × 4
## from_label to_label link_count source_count
## <chr> <chr> <int> <int>
## 1 Health behaviour Improved health 128 18
## 2 Community groups/learning Community works … 20 14
## 3 Farm production Income 32 13
## 4 Increased knowledge Planted new crop… 29 13
## 5 Planted new crop/vegetable varieties Farm production 23 12
## 6 Increased knowledge Improved/new far… 21 12
ex$links %>%
pipe_zoom() %>%
pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 2,transforms_tracing_threads = F) %>%
pipe_label(type="source_id") %>%
make_print_map2(original = ex$links,map_color_factors_column = "outcomeness")
ex$links %>%
pipe_zoom() %>%
pipe_trace(sess_links = ex$links,from_labels = "Farm",to_labels="Diet",steps = 2,transforms_tracing_threads = F,any=T) %>%
pipe_label(type = "source_id") %>%
make_print_map2(original = ex$links,map_color_factors_column = "outcomeness")
ex$links %>%
pipe_zoom() %>%
pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 2,transforms_tracing_threads = T) %>%
pipe_label(type="source_id") %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_zoom() %>%
pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 2,transforms_tracing_threads = F) %>%
pipe_label(type="source_count") %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_zoom() %>%
pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 2,transforms_tracing_threads = T) %>%
pipe_label(type="source_count") %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_zoom() %>%
pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 1) %>%
make_print_map2(original = ex$links)
ex$links %>%
pipe_zoom() %>%
pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 3,transforms_tracing_threads = T) %>%
pipe_label("link_count") %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
This file doesn’t actually have factor names with brackets, but we can create some
tmp <-
ex$links %>%
mutate(from_label=str_replace_all(from_label,"Wellbeing","Wellbeing [square] (round) [[doublesquare]] ((outcome))")) %>%
mutate(to_label=str_replace_all(to_label,"Wellbeing","Wellbeing [square] (round) [[doublesquare]] ((outcome))"))
tmp %>%
pipe_zoom() %>%
pipe_focus(c("Wellbeing"),any=T) %>%
pipe_top_factors(5) %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
tmp %>%
pipe_zoom() %>%
pipe_focus(c("Wellbeing"),any=T) %>%
pipe_top_factors(5) %>%
pipe_remove_brackets(round = T) %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_collapse(c("Ability")) %>%
pipe_zoom() %>%
pipe_top_factors() %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_collapse(c("Ability to buy food","Diet improved"),separate=F) %>%
pipe_zoom() %>%
pipe_top_factors() %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_collapse(c("Increased","Farm"),separate=T) %>%
pipe_zoom() %>%
pipe_top_factors() %>%
make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>%
pipe_focus(xc("~"),any = T) %>%
pipe_top_factors(20) %>%
make_print_map2(ex$links)
#
# ex$links %>%
# pipe_focus(xc("~"),any = T) %>%
# pipe_top_factors(20) %>%
# make_print_map2(ex$links,map_colour_opposites_red = T)
ex$links %>%
pipe_focus(xc("~"),any = T) %>%
pipe_combine_opposites() %>%
pipe_top_factors(20) %>%
make_print_map2(ex$links)
ex2$links %>%
pipe_combine_opposites() %>%
pipe_label(type="list_unique",map_label_links = "link_id") %>%
make_print_map2(ex$links)
ex2$links %>%
make_print_map2(ex$links)
Note that labelling links is a pipe function, not part of the call to make_print_map2()
. This is because labelling links actually transforms the links table.
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
make_print_map2(ex$links,map_color_factors_column = "outcomeness")
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
pipe_label(type="none") %>%
make_print_map2(ex$links,map_color_factors_column = "source_count",map_size_factors = "source_count",map_label_factors = "source_count",map_size_links = "source_count",map_wrap_factor_labels = 9)
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
pipe_label(type="source_count") %>%
make_print_map2(ex$links,map_color_factors_column = "source_count",map_size_factors = "source_count",map_label_factors = "source_count",map_size_links = "source_count",map_wrap_factor_labels = 9)
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
pipe_label(type="link_count") %>%
make_print_map2(ex$links,map_color_factors_column = "source_count",map_size_factors = "source_count",map_label_factors = "source_count",map_size_links = "source_count",map_wrap_factor_labels = 9)
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
pipe_label(type="list_unique",map_label_links = "source_id") %>%
make_print_map2(ex$links,map_color_factors_column = "source_count",map_size_factors = "source_count",map_label_factors = "source_count",map_size_links = "source_count",map_wrap_factor_labels = 9)
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
pipe_label(type="list_all",map_label_links = "source_id") %>%
make_print_map2(ex$links,map_color_factors_column = "source_count",map_size_factors = "source_count",map_label_factors = "source_count",map_size_links = "source_count",map_wrap_factor_labels = 9)
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
pipe_label(type="count_unique",map_label_links = "source_id") %>%
make_print_map2(ex$links,map_color_factors_column = "source_count",map_size_factors = "source_count",map_label_factors = "source_count",map_size_links = "source_count",map_wrap_factor_labels = 9)
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
pipe_label(type="count_all",map_label_links = "source_id") %>%
make_print_map2(ex$links,map_color_factors_column = "source_count",map_size_factors = "source_count",map_label_factors = "source_count",map_size_links = "source_count",map_wrap_factor_labels = 9)
ex$links %>%
pipe_zoom(1) %>%
pipe_top_factors(8) %>%
pipe_label(type="list_unique",map_label_links = "hashtags") %>%
make_print_map2(ex$links,map_color_factors_column = "source_count",map_size_factors = "source_count",map_label_factors = "source_count",map_size_links = "source_count",map_wrap_factor_labels = 9)
ex$links %>%
pipe_zoom(1) %>%
pipe_top_links(8) %>%
pipe_label(map_label_links = "#Name of village" ,type="link_tally") %>%
make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count")
ex$links %>%
pipe_zoom(1) %>%
pipe_top_links(18) %>%
pipe_label(map_label_links = "#Name of village",type = "link_tally" ) %>%
make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="source_count")
Most of the tables can be accessed directly
ex$links %>%
make_mentions_tabl() %>%
select(label,direction,link_id)
## # A tibble: 3,228 × 3
## label direction link_id
## <chr> <chr> <dbl>
## 1 Health behaviour consequence 340
## 2 Income consequence 341
## 3 Produced enough food to eat and/or sell consequence 342
## 4 Able to buy farming equipment/materials consequence 343
## 5 Ability to buy food consequence 344
## 6 Produced enough food to eat and/or sell consequence 345
## 7 Produced enough food to eat and/or sell consequence 346
## 8 Broken taboo consequence 347
## 9 Income consequence 348
## 10 Increased ability to borrow consequence 349
## # ℹ 3,218 more rows
ex$links %>%
make_mentions_tabl() %>%
select(label,direction,link_id) %>%
pivot_wider(names_from=direction,values_from = link_id,values_fn = length)
## # A tibble: 145 × 4
## label consequence influence either
## <chr> <int> <int> <int>
## 1 Health behaviour 6 15 21
## 2 Income 47 41 88
## 3 Produced enough food to eat and/or sell 11 9 20
## 4 Able to buy farming equipment/materials 6 NA 6
## 5 Ability to buy food 16 20 36
## 6 Broken taboo 7 8 15
## 7 Increased ability to borrow 1 1 2
## 8 Increased ability to save/increased savings 23 9 32
## 9 Improved household relationships 26 7 33
## 10 Shared household decision-making 12 9 21
## # ℹ 135 more rows
This function prepares different kinds of files.
tmp <- prepare_upload("test","assets/import_examples/example-file.xlsx")
This convenience function shows you the effect if you were to upload this file into another, appending or overwriting.
make_upload_table(ex,tmp$tmp,"Append")
## # A tibble: 3 × 5
## `-` statements sources links questions
## <chr> <chr> <chr> <chr> <chr>
## 1 Original 855 19 807 <NA>
## 2 Upload 855 19 807 70
## 3 Result Completely_new:0. - | Untouched_origi… Comple… Comp… Complete…
More examples.
make_upload_table(ex,tmp$tmp,"Overwrite")
## # A tibble: 3 × 5
## `-` statements sources links questions
## <chr> <chr> <chr> <chr> <chr>
## 1 Original 855 19 807 <NA>
## 2 Upload 855 19 807 70
## 3 Result New:855 New:19 New:807 New:70
tmp <- prepare_upload("test","assets/import_examples/example-file-just-sources.xlsx")
make_upload_table(ex,tmp$tmp,"Append")
## # A tibble: 3 × 5
## `-` statements sources links questions
## <chr> <chr> <chr> <chr> <chr>
## 1 Original 855 19 807 <NA>
## 2 Upload 0 19 0 "0"
## 3 Result The original 855 Completely_new:0. - | Untouc… The … "The ori…
make_upload_table(ex,tmp$tmp,"Overwrite")
## # A tibble: 3 × 5
## `-` statements sources links questions
## <chr> <chr> <chr> <chr> <chr>
## 1 Original 855 19 807 <NA>
## 2 Upload 0 19 0 "0"
## 3 Result The original 855 New:19 The original 807 "The original "
tmp <- prepare_upload("test","assets/import_examples/cm2.xlsx")
tmp <- prepare_upload("test","assets/import_examples/hybrid.xlsx")
tmp <- prepare_upload("test","assets/import_examples/sources-long.xlsx")
tmp <- prepare_upload("test","assets/import_examples/text_files/source_and_question_ids.docx")
tmp <- prepare_upload("test","assets/import_examples/text_files/only_source_ids.docx")
tmp <- prepare_upload("test","assets/import_examples/text_files/only_question_ids.docx")
tmp <- prepare_upload("test",
paste0("assets/import_examples/text_files/",xc("source_and_question_ids only_question_ids"),".docx")
)
# everything is broken.
# I took out save to usage db
# getting timeouts even with 3.5 sometimes
if(F){
config = configr::read.config("config.yml")$default
ex="assets/import_examples/example-file.xlsx" %>%
get_from_excel()
conn <- get_conn()
timeout(240)
ask_chat("What day is tomorrow in most science fiction?",api_key = config$open_ai_api_key,conn=conn,user="asdf",file="asdfadsf",type="typ")
ask_chat("What day is tomorrow in most science fiction?",model="3",api_key = config$open_ai_api_key,conn=conn,user="asdf",file="asdfadsf",type="typ")
ask_chat("What day is tomorrow in most science fiction?",model="gpt-4-turbo",api_key = config$open_ai_api_key,conn=conn,user="asdf",file="asdfadsf",type="typ",temperature = 0)
prompt <- readLines(paste0("assets/prompts/_prompt_0_simple.txt")) %>% collap("\n")%>% remove_lines_starting_with
statements <- ex$statements[4:9,]
auto_code(statements,start=prompt,model="3",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn,convert_to_links = T,file="testing_again")
statements <- ex$statements[4:9,];auto_code(statements,start=prompt,model="3",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn,convert_to_links = T,file="testing_again")
statements <- ex$statements[4:9,];auto_code(statements,start=prompt,model="4",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn,convert_to_links = T,file="testing_again")
statements <- ex$statements[4:9,];auto_code(statements,start=prompt,model="gpt-4-turbo",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn,convert_to_links = T,file="testing_again")
tbl(conn,"cm3links") %>% filter(file=="testing3")
auto_code(statements,start=prompt,model="4",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn)
code_statements(statements,
prompt=prompt,
remove_duplicates=T, # these can happen because of the way we ask for interlocking links
model="4",
characters_per_prompt=2000,
combine_small = T,
prompts_per_batch=1,
api_key=config$open_ai_api_key,
conn=conn,user="id links",file="asdfadsf"
)
}
if(F){
config = configr::read.config("config.yml")$default
conn <- DBI::dbConnect(
drv = RMariaDB::MariaDB(),
load_data_local_infile = F,
timeout=6000,
reconnect=T,#FIXME could be dangerous
# drv = RMySQL::MySQL(max.con=1000, fetch.default.rec=1000),
idleTimeout=900000,#15 minutes
interactive_timeout=900000,#15 minutes
wait_timeout=900000,#15 minutes
dbname = config$sql_cm$dbname,
host = config$sql_cm$host,
port = config$sql_cm$port,
username = config$sql_cm$username,
password = config$sql_cm$password,
mysql=F
)
api_key <- config$open_ai_api_key
library(tictoc)
ex="assets/import_examples/example-file.xlsx" %>%
get_from_excel()
ex$statements$file <- "example-file"
q <- paste0("DELETE FROM cm3embeddings_s WHERE file='","example-file","';")
q <- paste0("DELETE FROM cm3embeddings_s_long WHERE file='","example-file","';")
dbExecute(conn,q)
ex$statements %>%
filter(source_id %in% xc("MNX-1 MNX-2")) %>%
write_and_return_embeddings(api_key = api_key,conn = conn,long=F)
ex$statements %>%
filter(source_id %in% xc("MNX-1 MNX-2")) %>%
write_and_return_embeddings(api_key = api_key,conn = conn,long=T)
ex$statements %>%
filter(source_id %in% xc("MNX-1 MNX-2")) %>%
fetch_embeddings(.,api_key=api_key,conn=conn,file="example_file",reset=F,already=NULL)
already <-
ex$statements %>%
filter(source_id %in% xc("MNX-1 MNX-2"))
statements <-
ex$statements %>%
filter(source_id %in% xc("MNX-1 MNX-2"))
fetch_embeddings(statements,api_key=api_key,conn=conn,file="example_file",reset=F,already=already)
already <-
ex$statements %>%
filter(source_id %in% xc("MNX-1"))
fetch_embeddings(statements,api_key=api_key,conn=conn,file="example_file",reset=F,already=already)
q <- paste0("DELETE FROM cm3embeddings_s_long WHERE file='","tearfund-all-2024","';")
dbExecute(conn,q)
tic()
create_new_statement_embedding_rows_long(statements = statements,api_key = api_key,conn = conn,file = "example-file",reset=F,already=already)
toc()
already <-
ex$statements %>%
filter(source_id %in% xc("MNX-1"))
tic()
create_new_statement_embedding_rows_long(statements = statements,api_key = api_key,conn = conn,file = "example-file",reset=F,already=already)
toc()
}