Causal Map 3 Functions

Over the last three years we at Causal Map Ltd have developed a range of algorithms for causal mapping. The algorithms are published in the accompanying file cm3functions.R under an open-source license in the form of functions for the programming language R. These functions are the heart of the analysis engine in our app CausalMap3. The app itself is free to use for analysing existing causal mapping data files but it is closed source, so these functions are a reproducible window into the algorithms and can be used by anyone who wants to analyse and visualise causal mapping data without using the Causal Map app.

What use are they?

The main purpose is to publish and provide examples for a set of algorithms for causal mapping analysis: ways to manipulate, simplify and query causal mapping data.

If you already have causal codings e.g. in the form of an Excel sheet with at least two columns for “cause” and “effect”, these functions could be really interesting to you.

Though do note that if you have such data, you can use the Causal Map app itself for free without installing the functions.

The documentation for the functions isn’t very good yet, but if there is interest we can provide more. We are working on a training for doing causal mapping with methods which do not need the Causal Map app.

If you want to do causal coding, i.e. creating raw causal data, i.e. at minimum an ‘edge list’ of arrows between factors, these functions won’t help you. You’d have to use the Causal Map app or some other method.

How can you use them?

When we get a chance, we will release them as a full R package. But for now you can read and download a single R script which contains the functions and all you need to use them. We are still in the process of documenting the functions. To make them easier to use, we provide this vignette which demonstrates the use of many of the functions as applied to example causal mapping data files (the script will download for you the latest version of the functions script, the main example file and a special example file which demonstrates the use of combined opposites).

This vignette is an .Rmd file: you process it using the package knitr or by pressing Knit in Rstudio. Or you can use it as a source of examples to modify for your own work in the R console.

The functions are used within Causal Map 3, a web app which provides an interface for managing causal mapping project and also enables autocoding with AI. The functions provided here do not yet include functions for using AI.

These functions completely replace the functions in the old R package CausalMapFunctions which is now deprecated. This new set of functions is much simpler.

Setup

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tibble' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.2
## Warning: package 'purrr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'stringr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(shiny)
## Warning: package 'shiny' was built under R version 4.3.2
library(DiagrammeR)
## Warning: package 'DiagrammeR' was built under R version 4.3.3
library(textreadr)
knitr::opts_chunk$set(echo = T,warning=F,message=F)

htmlx <- message
if(Sys.getenv("USERDOMAIN") != "STEVE-P") {

url <- "https://www.dropbox.com/scl/fi/ulrtbj9fmgu1ltdamsvhy/example-file.xlsx?rlkey=pczexdevcxgz4if2muqwr0dng&dl=1"

download.file(url = url, destfile = "example-file.xlsx",method="curl",extra='-L')

url <- "https://www.dropbox.com/scl/fi/t9fnja78jo2zb650fesqs/opposites-example.xlsx?rlkey=t9cd4afr77te3bs98hnv6rm5v&dl=1"

download.file(url = url, destfile = "opposites-example.xlsx",method="curl",extra='-L')

url <- "https://www.dropbox.com/scl/fi/mtlaf5rmmdrgp3bxbrydz/cm3functions.R?rlkey=7hli0uaookisr9seahpzczynk&dl=1"

download.file(url = url, destfile = "cm3functions.R",method="curl",extra='-L')
}
notify <- function(message,level){
  message(message)
}

# for the convenience of the developer
if(Sys.getenv("USERDOMAIN") == "STEVE-P") {
source("Rfiles/cm3functions.R") 
ex="assets/import_examples/example-file.xlsx" %>%
  get_from_excel()
ex$links$sentiment <- 0
ex2="assets/import_examples/opposites-example.xlsx" %>%
  get_from_excel()
  
} else {

source("cm3functions.R") 
ex="example-file.xlsx" %>%
  get_from_excel()
ex2="opposites-example.xlsx" %>%
  get_from_excel()
} 
ex2$links$sentiment <- 0
ex$links$sentiment <- 0

# constants -----------------------------------------------------------------

contrary_color <- "#f2ab73"
ordinary_color <- "#058488"
flipchar <- "~"

Load the functions and download some example data

Once the commands below have downloaded the data files, you can comment them out.

How is causal map data structured?

A causal map data file is a list of tibbles (data frames): any of links statements sources questions and settings, but at least links or statements must be present.

These data frames correspond to the tabs in the corresponding Excel format. You can import or export Causal Map files as Excel (“round-tripping”).

An uncoded file might contain just a set of statements for coding.

A coded data file might contain just a set of links which have already been coded.

The additional, optional tibbles sources and questions provide additional meta-data about the statements and links.

All the main functions here deal only with the links table, which is of the most interest for causal mapping. The pipe_* functions take a links table as input and output a modified links table as output. So they can be chained indefinitely.

The statements table

Each row is one chunk of text, usually one or a few paragraphs. This table contains at least the fields statement_id which must be unique for each row, and the field text which contains the text of the statement to be coded.

Inspecting the example data file

We will inspect the links, sources and statements tables.

ex$links %>% 
  arrange(desc(source_count)) %>% 
  select(from_label,to_label,source_count,source_id,link_id,statement_id) %>% 
  head
## # A tibble: 6 × 6
##   from_label                to_label source_count source_id link_id statement_id
##   <chr>                     <chr>           <int> <chr>       <dbl> <chr>       
## 1 Health behaviour; Use pi… Improve…           16 MNX-2         939 MNX-2 | 244 
## 2 Health behaviour; Use pi… Improve…           16 TWX-1         961 TWX-1 | 169 
## 3 Health behaviour; Use pi… Improve…           16 TWX-2         964 TWX-2 | 206 
## 4 Health behaviour; Use pi… Improve…           16 MSX-2         977 MSX-2 | 323 
## 5 Health behaviour; Use pi… Improve…           16 MSY-1         981 MSY-1 | 363 
## 6 Health behaviour; Use pi… Improve…           16 TWX-3         987 TWX-3 | 401
ex$links %>% 
  arrange(desc(source_count)) %>% 
  distinct(from_label,to_label,.keep_all = T) %>% 
  select(from_label,to_label,source_count,source_id,link_id,statement_id) %>% 
  head
## # A tibble: 6 × 6
##   from_label                to_label source_count source_id link_id statement_id
##   <chr>                     <chr>           <int> <chr>       <dbl> <chr>       
## 1 Health behaviour; Use pi… Improve…           16 MNX-2         939 MNX-2 | 244 
## 2 Community groups/learning Communi…           14 MNY-1         354 MNY-1 | 70  
## 3 Health behaviour; Use be… Improve…           14 MNX-2         938 MNX-2 | 244 
## 4 Health behaviour; Use mo… Improve…           14 MNX-2         940 MNX-2 | 244 
## 5 Farm production           Income             13 MNY-1         348 MNY-1 | 62  
## 6 Increased knowledge; Far… Planted…           13 TWX-1         358 TWX-1 | 88
ex$sources %>% 
  select(1:6) %>% 
  head
## # A tibble: 6 × 6
##   source_id `Household code` `#Name of province` `#Name of village`
##   <chr>     <chr>            <chr>               <chr>             
## 1 MNX-1     MNX1             A                   1                 
## 2 MNX-2     MNX2             A                   1                 
## 3 MNX-3     MNX3             A                   1                 
## 4 MNX-4     MNX4             A                   1                 
## 5 MNX-5     MNX5             A                   1                 
## 6 MNX-6     MNX6             A                   1                 
## # ℹ 2 more variables: `#Age of the main respondent` <chr>,
## #   `Household size (residents > 6months)` <chr>
ex$statements %>% 
  select(1:6) %>% 
  head
## # A tibble: 6 × 6
##   text                       source_id question_id created modified statement_id
##   <chr>                      <chr>     <chr>       <chr>   <chr>    <chr>       
## 1 This is a fully coded exa… MSY-3     $A8         2023-1… 2023-11… MSY-3 | 1   
## 2 There has been a lot of c… MSY-3     $B1         2023-1… 2023-11… MSY-3 | 2   
## 3 Improved                   MSY-3     $*B2        2023-1… 2023-11… MSY-3 | 3   
## 4 There has been improvemen… MSY-3     $B3         2023-1… 2023-11… MSY-3 | 4   
## 5 Yes there have been some … MSY-3     $C1         2023-1… 2023-11… MSY-3 | 5   
## 6 Increased                  MSY-3     $*C2        2023-1… 2023-11… MSY-3 | 6

The sources table

The optional sources table contains information about the sources in key-value format.

Factors table

In Causal Map 3, there is no independent table for factors. Factors are implied by the link labels. The function make_factors_from_links() shows what factors are implied by the links in the file.

ex$links %>% 
  make_factors_from_links() %>% 
  arrange(desc(source_count)) %>% 
  head
## # A tibble: 6 × 11
##   label        found found_source found_target source_count link_count in_degree
##   <chr>        <lgl> <lgl>        <lgl>               <int>      <int>     <int>
## 1 Improved he… FALSE FALSE        FALSE                  18        150         7
## 2 Diet improv… FALSE FALSE        FALSE                  17         51         7
## 3 Food consum… FALSE FALSE        FALSE                  17         37         6
## 4 Health beha… FALSE FALSE        FALSE                  17         19        16
## 5 Community g… FALSE FALSE        FALSE                  16         37        37
## 6 Farm produc… FALSE FALSE        FALSE                  16        158        81
## # ℹ 4 more variables: out_degree <int>, outcomeness <dbl>, flipped_from <int>,
## #   flipped_to <int>

Formatting the map

Formatting the map is done inside the call to make_print_map2(). The structure of the map is not changed.

Factor labels

In this example we make the background colour of the factors depend on their link count aka citation count aka degree. This example also demonstrates the use of the main function to actually print out a map.

ex$links %>% 
  pipe_zoom(1) %>% 
  pipe_top_links(8) %>% 
  make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count", map_label_factors_type="link_count")

Surprises

Calculating surprises is a formatting function because it does not change the structure of the map, only the labels printed on the links and factors.

Surprise factors

ex$links %>% 
  pipe_zoom(1) %>% 
  pipe_top_links(8) %>% 
  pipe_label(map_label_links = "#Name of village" ,type="surprise_links") %>% 
  make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count",map_label_factors = "#Name of village" ,map_label_factors_type="surprise_links")
ex$links %>% 
  pipe_zoom(1) %>% 
  pipe_top_links(8) %>% 
  pipe_label(map_label_links = "#Name of village" ,type="surprise_sources") %>% 
  make_print_map2(original = ex$links,map_color_factors_column = "link_count",map_size_links ="link_count",map_label_factors = "#Name of village" ,map_label_factors_type="surprise_sources")

All the transforms functions

These functions all (potentially) transform the links table in some way, either by filtering out some links or by changing their labels in order to create a different but related graph.

Focus

The focus() function identifies factors by complete label or part of the label (any=TRUE).

By default, focus shows factors one step up and down from the focused factor(s).

The function accepts a list of factor labels or parts of labels.

These code examples also show how the functions can be piped into a long chain. The pipe_* functions work on links tables and output modified links tables. These examples include the functions pipe_zoom() and make_print_map2() which are dealt with later.

These examples show:

  • focus is case sensitive
  • you can use the argument any to match any part of a label
  • you can use multiple search strings

The matched labels are highlighted.

# ex$links %>% 
#   pipe_zoom() %>%
#   pipe_focus(c("AbilitY TO buy fOOD")) %>%  
#   make_print_map2(original = ex$links,map_label_factors = "source_count")
  
ex$links %>% 
  pipe_zoom() %>%
  pipe_focus(c("Ability"),any = T) %>%  
  make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>% 
  pipe_zoom() %>%
  pipe_focus(c("Ability to buy food","Diet improved"),any = T) %>%  
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Focus more than 1 step

The examples:

  • One step up and two steps down
  • One step up and zero steps down
ex$links %>% 
  pipe_zoom() %>%
  pipe_focus(c("Ability"),any = T,up=1,down=2) %>%  
  make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>% 
  pipe_zoom() %>%
  pipe_focus(c("Ability to buy food","Diet improved"),any = T,up=1,down=0) %>%  
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Tracing threads

NOTE this doesn’t seem to be working FIXME

ex$links %>% 
  pipe_zoom() %>%
  pipe_focus(c("Ability to buy food","Diet improved"),any = T,up=1,down=0,transforms_tracing_threads = F) %>%  
  pipe_label(type="source_count") %>% 
  make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>% 
  pipe_zoom() %>%
  pipe_focus(c("Ability to buy food","Diet improved"),any = T,up=1,down=0,transforms_tracing_threads = T) %>%  
  pipe_label(type="source_count") %>% 
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Exclude

The exclude function works in a comparable way, to exclude matching factors from the map.

The second map shows the same as the first but with some factors excluded.

ex$links %>% 
  pipe_zoom() %>%
  pipe_top_factors(type="Links",which="Top",top=6) %>%
  select(from_label,to_label,link_count,source_count) %>% 
  distinct %>% 
  head
## # A tibble: 6 × 4
##   from_label          to_label            link_count source_count
##   <chr>               <chr>                    <int>        <int>
## 1 Increased knowledge Health behaviour            23            6
## 2 Farm production     Income                      32           13
## 3 Increased knowledge Increased knowledge          8            6
## 4 Increased knowledge Farm production             24           11
## 5 Increased knowledge Diet improved               30            8
## 6 Diet improved       Improved health              8            4
ex$links %>% 
  pipe_zoom() %>%
  pipe_top_factors(type="Links",which="Top",top=6) %>%
  pipe_exclude("Increased knowledge" %>% c("Diet improved")) %>% 
  select(from_label,to_label,link_count,source_count) %>% 
  distinct %>% 
  head
## # A tibble: 2 × 4
##   from_label       to_label        link_count source_count
##   <chr>            <chr>                <int>        <int>
## 1 Farm production  Income                  32           13
## 2 Health behaviour Improved health        128           18
# ## just checking it works with [ ]
# links <- 
#   ex$links %>% 
#   mutate(from_label=str_replace_all(from_label,"Income","[Income]")) %>% 
#   mutate(to_label=str_replace_all(to_label,"Income","[Income]"))
# 
# 
# links %>% 
#   pipe_zoom() %>%
#   pipe_exclude("[Income]",any = T) %>% 
#   pipe_top_factors(type="Links",which="Top",top=6) %>%
#   select(from_label,to_label,link_count,source_count) %>% 
#   distinct %>% 
#   head

Hashtags

The hashtags function works in a comparable way, to include or exclude matching links to the map.

The second map shows the same as the first but with some factors excluded.

ex$links %>% 
  pipe_zoom() %>%
  pipe_retain_hashtags(hashtags = xc("#hypothetical #doubtful"),keep=T)  %>% 
  select(from_label,to_label,link_count,source_count,hashtags) 
## # A tibble: 6 × 5
##   from_label                      to_label      link_count source_count hashtags
##   <chr>                           <chr>              <int>        <int> <chr>   
## 1 Increased knowledge             Improved hea…          1            1 #hypoth…
## 2 Organisation 1                  Access to fe…          1            1 #hypoth…
## 3 Broken taboo                    Diet improved          1            1 #hypoth…
## 4 Improved/new farming techniques Farm product…          1            1 #hypoth…
## 5 Income                          Ability to b…          1            1 #hypoth…
## 6 Increased knowledge             Planted new …          1            1 #hypoth…
ex$links %>% 
  pipe_zoom() %>%
  pipe_retain_hashtags(hashtags = xc("#doubtful"),keep=T)  %>% 
  select(from_label,to_label,link_count,source_count,hashtags) 
## # A tibble: 2 × 5
##   from_label                      to_label      link_count source_count hashtags
##   <chr>                           <chr>              <int>        <int> <chr>   
## 1 Improved/new farming techniques Farm product…          1            1 #hypoth…
## 2 Income                          Ability to b…          1            1 #hypoth…
ex$links %>% 
  pipe_zoom() %>%
  pipe_retain_hashtags(hashtags = "#hypothetical",keep=T)  %>% 
  pipe_label("hashtags") %>% 
  make_print_map2(ex$links)
ex$links %>% 
  pipe_zoom() %>%
  select(from_label,to_label,link_count,source_count,hashtags) 
## # A tibble: 807 × 5
##    from_label                          to_label link_count source_count hashtags
##    <chr>                               <chr>         <int>        <int> <chr>   
##  1 Increased knowledge                 Health …         23            6 <NA>    
##  2 Planted new crop/vegetable varieti… Income            3            3 <NA>    
##  3 Farm production                     Produce…          8            6 <NA>    
##  4 Receives money from gov             Able to…          1            1 <NA>    
##  5 Income                              Ability…         14           11 <NA>    
##  6 Improved/new farming techniques     Produce…          2            1 <NA>    
##  7 Improved/new farming techniques     Produce…          2            1 <NA>    
##  8 Increased knowledge                 Broken …          6            5 <NA>    
##  9 Farm production                     Income           32           13 <NA>    
## 10 Increased ability to save/increase… Increas…          1            1 <NA>    
## # ℹ 797 more rows
ex$links %>% 
  pipe_zoom() %>%
  pipe_retain_hashtags(hashtags = "#doubtful",keep=F)  %>% 
  select(from_label,to_label,link_count,source_count,hashtags) 
## # A tibble: 805 × 5
##    from_label                          to_label link_count source_count hashtags
##    <chr>                               <chr>         <int>        <int> <chr>   
##  1 Increased knowledge                 Health …         23            6 <NA>    
##  2 Planted new crop/vegetable varieti… Income            3            3 <NA>    
##  3 Farm production                     Produce…          8            6 <NA>    
##  4 Receives money from gov             Able to…          1            1 <NA>    
##  5 Income                              Ability…         13           11 <NA>    
##  6 Improved/new farming techniques     Produce…          2            1 <NA>    
##  7 Improved/new farming techniques     Produce…          2            1 <NA>    
##  8 Increased knowledge                 Broken …          6            5 <NA>    
##  9 Farm production                     Income           32           13 <NA>    
## 10 Increased ability to save/increase… Increas…          1            1 <NA>    
## # ℹ 795 more rows
ex$links %>% 
  pipe_zoom() %>%
  pipe_retain_hashtags(hashtags = xc("#hypothetical #doubtful"),keep=F)  %>% 
  select(from_label,to_label,link_count,source_count,hashtags) 
## # A tibble: 801 × 5
##    from_label                          to_label link_count source_count hashtags
##    <chr>                               <chr>         <int>        <int> <chr>   
##  1 Increased knowledge                 Health …         23            6 <NA>    
##  2 Planted new crop/vegetable varieti… Income            3            3 <NA>    
##  3 Farm production                     Produce…          8            6 <NA>    
##  4 Receives money from gov             Able to…          1            1 <NA>    
##  5 Income                              Ability…         13           11 <NA>    
##  6 Improved/new farming techniques     Produce…          2            1 <NA>    
##  7 Improved/new farming techniques     Produce…          2            1 <NA>    
##  8 Increased knowledge                 Broken …          6            5 <NA>    
##  9 Farm production                     Income           32           13 <NA>    
## 10 Increased ability to save/increase… Increas…          1            1 <NA>    
## # ℹ 791 more rows

AND

ex$links %>% 
  pipe_zoom() %>%
  pipe_retain_hashtags(hashtags = xc("#hypothetical #doubtful"),or=F,keep=T)  %>% 
  select(from_label,to_label,link_count,source_count,hashtags) 
## # A tibble: 2 × 5
##   from_label                      to_label      link_count source_count hashtags
##   <chr>                           <chr>              <int>        <int> <chr>   
## 1 Improved/new farming techniques Farm product…          1            1 #hypoth…
## 2 Income                          Ability to b…          1            1 #hypoth…
ex$links %>% 
  pipe_zoom() %>%
  pipe_retain_hashtags(hashtags = xc("#hypothetical #doubtful"),or=F,keep=F)  %>% 
  select(from_label,to_label,link_count,source_count,hashtags) 
## # A tibble: 805 × 5
##    from_label                          to_label link_count source_count hashtags
##    <chr>                               <chr>         <int>        <int> <chr>   
##  1 Increased knowledge                 Health …         23            6 <NA>    
##  2 Planted new crop/vegetable varieti… Income            3            3 <NA>    
##  3 Farm production                     Produce…          8            6 <NA>    
##  4 Receives money from gov             Able to…          1            1 <NA>    
##  5 Income                              Ability…         13           11 <NA>    
##  6 Improved/new farming techniques     Produce…          2            1 <NA>    
##  7 Improved/new farming techniques     Produce…          2            1 <NA>    
##  8 Increased knowledge                 Broken …          6            5 <NA>    
##  9 Farm production                     Income           32           13 <NA>    
## 10 Increased ability to save/increase… Increas…          1            1 <NA>    
## # ℹ 795 more rows

Zoooming

We take advantage of hierarchical coding to collapse subsidiary factors into global factors. So for example the hierarchically coded factors fish; cod and fish; salmon collapse into one factor fish. This means that all the links to the two previous factors are routed into a single factor fish (which may or may not already exist in the dataset).

The examples:

  • The top raw factors
  • The top factors after zooming
ex$links %>% 
  make_factors_from_links() %>% 
  arrange(desc(source_count)) %>% 
  select(label,source_count) %>% 
  head
## # A tibble: 6 × 2
##   label                                            source_count
##   <chr>                                                   <int>
## 1 Improved health                                            18
## 2 Diet improved; Diversified                                 17
## 3 Food consumption quantity                                  17
## 4 Health behaviour; Use pits to dispose of rubbish           17
## 5 Community groups/learning                                  16
## 6 Farm production                                            16
ex$links %>% 
  pipe_zoom() %>%
  make_factors_from_links() %>% 
  arrange(desc(source_count)) %>% 
  select(label,source_count) %>% 
  head
## # A tibble: 6 × 2
##   label                     source_count
##   <chr>                            <int>
## 1 Diet improved                       19
## 2 Health behaviour                    18
## 3 Improved health                     18
## 4 Increased knowledge                 18
## 5 Food consumption quantity           17
## 6 Community groups/learning           16

Tracing paths

Tracing paths, showing source IDs

ex$links %>% 
  pipe_zoom() %>%
  pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 2,transforms_tracing_threads = F) %>%  
  pipe_label(type="source_id") %>% 
  make_print_map2(original = ex$links,map_color_factors_column = "outcomeness")

Tracing paths, showing source IDs, match anywhere

ex$links %>% 
  pipe_zoom() %>%
  pipe_trace(sess_links = ex$links,from_labels = "Farm",to_labels="Diet",steps = 2,transforms_tracing_threads = F,any=T) %>%  
  pipe_label(type = "source_id") %>% 
  make_print_map2(original = ex$links,map_color_factors_column = "outcomeness")

Tracing threads, showing source IDs

ex$links %>% 
  pipe_zoom() %>%
  pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 2,transforms_tracing_threads = T) %>%  
  pipe_label(type="source_id") %>% 
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Tracing paths, showing source counts

ex$links %>% 
  pipe_zoom() %>%
  pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 2,transforms_tracing_threads = F) %>%  
  pipe_label(type="source_count") %>% 
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Tracing threads, showing source counts

ex$links %>% 
  pipe_zoom() %>%
  pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 2,transforms_tracing_threads = T) %>%  
  pipe_label(type="source_count") %>% 
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Tracing just one step

ex$links %>% 
  pipe_zoom() %>%
  pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 1) %>%  
  make_print_map2(original = ex$links)

Tracing threads, three steps

ex$links %>% 
  pipe_zoom() %>%
  pipe_trace(sess_links = ex$links,from_labels = "Farm production",to_labels="Diet improved",steps = 3,transforms_tracing_threads = T) %>%  
  pipe_label("link_count") %>% 
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Removing brackets

This file doesn’t actually have factor names with brackets, but we can create some

tmp <- 
  ex$links %>% 
  mutate(from_label=str_replace_all(from_label,"Wellbeing","Wellbeing [square] (round) [[doublesquare]] ((outcome))")) %>%
  mutate(to_label=str_replace_all(to_label,"Wellbeing","Wellbeing [square] (round) [[doublesquare]] ((outcome))"))

tmp %>% 
  pipe_zoom() %>%
  pipe_focus(c("Wellbeing"),any=T) %>%  
  pipe_top_factors(5) %>% 
  make_print_map2(original = ex$links,map_label_factors = "source_count")
tmp %>% 
  pipe_zoom() %>%
  pipe_focus(c("Wellbeing"),any=T) %>%  
  pipe_top_factors(5) %>% 
  pipe_remove_brackets(round = T) %>% 
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Collapse

ex$links %>% 
  pipe_collapse(c("Ability")) %>%  
  pipe_zoom() %>%
  pipe_top_factors() %>%
  make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>% 
  pipe_collapse(c("Ability to buy food","Diet improved"),separate=F) %>%  
  pipe_zoom() %>%
  pipe_top_factors() %>%
  make_print_map2(original = ex$links,map_label_factors = "source_count")
ex$links %>% 
  pipe_collapse(c("Increased","Farm"),separate=T) %>%  
  pipe_zoom() %>%
  pipe_top_factors() %>%
  make_print_map2(original = ex$links,map_label_factors = "source_count")

Combining opposites

ex$links %>% 
  pipe_focus(xc("~"),any = T) %>% 
  pipe_top_factors(20) %>% 
  make_print_map2(ex$links)
# 
# ex$links %>% 
#   pipe_focus(xc("~"),any = T) %>% 
#   pipe_top_factors(20) %>% 
#   make_print_map2(ex$links,map_colour_opposites_red = T)

ex$links %>% 
  pipe_focus(xc("~"),any = T) %>% 
  pipe_combine_opposites() %>% 
  pipe_top_factors(20) %>% 
  make_print_map2(ex$links)
ex2$links %>% 
  pipe_combine_opposites() %>%
  pipe_label(type="list_unique",map_label_links = "link_id") %>% 
  make_print_map2(ex$links)
ex2$links %>% 
  make_print_map2(ex$links)

Creating additional tables

Most of the tables can be accessed directly

Mentions table

ex$links %>% 
  make_mentions_tabl() %>% 
  select(label,direction,link_id)
## # A tibble: 3,228 × 3
##    label                                   direction   link_id
##    <chr>                                   <chr>         <dbl>
##  1 Health behaviour                        consequence     340
##  2 Income                                  consequence     341
##  3 Produced enough food to eat and/or sell consequence     342
##  4 Able to buy farming equipment/materials consequence     343
##  5 Ability to buy food                     consequence     344
##  6 Produced enough food to eat and/or sell consequence     345
##  7 Produced enough food to eat and/or sell consequence     346
##  8 Broken taboo                            consequence     347
##  9 Income                                  consequence     348
## 10 Increased ability to borrow             consequence     349
## # ℹ 3,218 more rows
ex$links %>% 
  make_mentions_tabl() %>% 
  select(label,direction,link_id) %>% 
  pivot_wider(names_from=direction,values_from = link_id,values_fn = length)
## # A tibble: 145 × 4
##    label                                       consequence influence either
##    <chr>                                             <int>     <int>  <int>
##  1 Health behaviour                                      6        15     21
##  2 Income                                               47        41     88
##  3 Produced enough food to eat and/or sell              11         9     20
##  4 Able to buy farming equipment/materials               6        NA      6
##  5 Ability to buy food                                  16        20     36
##  6 Broken taboo                                          7         8     15
##  7 Increased ability to borrow                           1         1      2
##  8 Increased ability to save/increased savings          23         9     32
##  9 Improved household relationships                     26         7     33
## 10 Shared household decision-making                     12         9     21
## # ℹ 135 more rows

Creating files

This function prepares different kinds of files.

tmp <- prepare_upload("test","assets/import_examples/example-file.xlsx")

This convenience function shows you the effect if you were to upload this file into another, appending or overwriting.

make_upload_table(ex,tmp$tmp,"Append")
## # A tibble: 3 × 5
##   `-`      statements                                    sources links questions
##   <chr>    <chr>                                         <chr>   <chr> <chr>    
## 1 Original 855                                           19      807   <NA>     
## 2 Upload   855                                           19      807   70       
## 3 Result   Completely_new:0.     -    | Untouched_origi… Comple… Comp… Complete…

More examples.

make_upload_table(ex,tmp$tmp,"Overwrite")
## # A tibble: 3 × 5
##   `-`      statements sources links   questions
##   <chr>    <chr>      <chr>   <chr>   <chr>    
## 1 Original 855        19      807     <NA>     
## 2 Upload   855        19      807     70       
## 3 Result   New:855    New:19  New:807 New:70
tmp <- prepare_upload("test","assets/import_examples/example-file-just-sources.xlsx")
make_upload_table(ex,tmp$tmp,"Append")
## # A tibble: 3 × 5
##   `-`      statements       sources                              links questions
##   <chr>    <chr>            <chr>                                <chr> <chr>    
## 1 Original 855              19                                   807    <NA>    
## 2 Upload   0                19                                   0     "0"      
## 3 Result   The original 855 Completely_new:0.     -    | Untouc… The … "The ori…
make_upload_table(ex,tmp$tmp,"Overwrite")
## # A tibble: 3 × 5
##   `-`      statements       sources links            questions      
##   <chr>    <chr>            <chr>   <chr>            <chr>          
## 1 Original 855              19      807               <NA>          
## 2 Upload   0                19      0                "0"            
## 3 Result   The original 855 New:19  The original 807 "The original "
tmp <- prepare_upload("test","assets/import_examples/cm2.xlsx")
tmp <- prepare_upload("test","assets/import_examples/hybrid.xlsx")
tmp <- prepare_upload("test","assets/import_examples/sources-long.xlsx")
tmp <- prepare_upload("test","assets/import_examples/text_files/source_and_question_ids.docx")
tmp <- prepare_upload("test","assets/import_examples/text_files/only_source_ids.docx")
tmp <- prepare_upload("test","assets/import_examples/text_files/only_question_ids.docx")
tmp <- prepare_upload("test",
  paste0("assets/import_examples/text_files/",xc("source_and_question_ids only_question_ids"),".docx")
  )

Autocoding

# everything is broken. 
# I took out save to usage db 
# getting timeouts even with 3.5 sometimes

if(F){
config = configr::read.config("config.yml")$default
  ex="assets/import_examples/example-file.xlsx" %>%
  get_from_excel()

conn <- get_conn()
timeout(240)

ask_chat("What day is tomorrow in most science fiction?",api_key = config$open_ai_api_key,conn=conn,user="asdf",file="asdfadsf",type="typ")

ask_chat("What day is tomorrow in most science fiction?",model="3",api_key = config$open_ai_api_key,conn=conn,user="asdf",file="asdfadsf",type="typ")

ask_chat("What day is tomorrow in most science fiction?",model="gpt-4-turbo",api_key = config$open_ai_api_key,conn=conn,user="asdf",file="asdfadsf",type="typ",temperature = 0)

prompt <- readLines(paste0("assets/prompts/_prompt_0_simple.txt")) %>% collap("\n")%>% remove_lines_starting_with
statements <- ex$statements[4:9,]

auto_code(statements,start=prompt,model="3",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn,convert_to_links = T,file="testing_again")

statements <- ex$statements[4:9,];auto_code(statements,start=prompt,model="3",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn,convert_to_links = T,file="testing_again")


statements <- ex$statements[4:9,];auto_code(statements,start=prompt,model="4",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn,convert_to_links = T,file="testing_again")


statements <- ex$statements[4:9,];auto_code(statements,start=prompt,model="gpt-4-turbo",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn,convert_to_links = T,file="testing_again")





tbl(conn,"cm3links") %>% filter(file=="testing3")

auto_code(statements,start=prompt,model="4",api_key = config$open_ai_api_key,characters_per_prompt = 2000,conn=conn)




code_statements(statements,
                            prompt=prompt,
                            remove_duplicates=T, # these can happen because of the way we ask for interlocking links
                            model="4",
                            characters_per_prompt=2000,
                            combine_small = T,
                            prompts_per_batch=1,
                           api_key=config$open_ai_api_key,
                            conn=conn,user="id links",file="asdfadsf"
                            )
}

Embeddings

if(F){
config = configr::read.config("config.yml")$default
conn <- DBI::dbConnect(
  drv = RMariaDB::MariaDB(),
  load_data_local_infile = F,
  timeout=6000,
  reconnect=T,#FIXME could be dangerous
  # drv = RMySQL::MySQL(max.con=1000, fetch.default.rec=1000),
  idleTimeout=900000,#15 minutes
  interactive_timeout=900000,#15 minutes
  wait_timeout=900000,#15 minutes
  dbname = config$sql_cm$dbname,
  host = config$sql_cm$host,
  port = config$sql_cm$port,
  username = config$sql_cm$username,
  password = config$sql_cm$password,
  mysql=F
)
api_key <- config$open_ai_api_key

library(tictoc)    
ex="assets/import_examples/example-file.xlsx" %>%
  get_from_excel()
ex$statements$file <- "example-file"

q <- paste0("DELETE FROM cm3embeddings_s WHERE file='","example-file","';")
q <- paste0("DELETE FROM cm3embeddings_s_long WHERE file='","example-file","';")
dbExecute(conn,q)

ex$statements %>% 
    filter(source_id %in% xc("MNX-1 MNX-2")) %>%  
    write_and_return_embeddings(api_key = api_key,conn = conn,long=F)
ex$statements %>% 
    filter(source_id %in% xc("MNX-1 MNX-2")) %>%  
    write_and_return_embeddings(api_key = api_key,conn = conn,long=T)

ex$statements %>% 
    filter(source_id %in% xc("MNX-1 MNX-2")) %>%  
    fetch_embeddings(.,api_key=api_key,conn=conn,file="example_file",reset=F,already=NULL)



already <- 
  ex$statements %>% 
    filter(source_id %in% xc("MNX-1 MNX-2")) 
statements <- 
  ex$statements %>% 
    filter(source_id %in% xc("MNX-1 MNX-2")) 

fetch_embeddings(statements,api_key=api_key,conn=conn,file="example_file",reset=F,already=already)

already <- 
  ex$statements %>% 
    filter(source_id %in% xc("MNX-1")) 

fetch_embeddings(statements,api_key=api_key,conn=conn,file="example_file",reset=F,already=already)



q <- paste0("DELETE FROM cm3embeddings_s_long WHERE file='","tearfund-all-2024","';")
dbExecute(conn,q)

  

tic()
create_new_statement_embedding_rows_long(statements = statements,api_key = api_key,conn = conn,file = "example-file",reset=F,already=already)  
toc()

already <- 
  ex$statements %>% 
    filter(source_id %in% xc("MNX-1")) 
tic()
create_new_statement_embedding_rows_long(statements = statements,api_key = api_key,conn = conn,file = "example-file",reset=F,already=already)  
toc()




}