# 🔬 Analysis

- See: notebooks
- Summary statistics on entire conversation
- How many participants were there?
- How many comments were submitted?
- How many comments were moderated in?
- How many comments moderated out?

- What was the shape of the matrix, ie., how many possible votes were there?
- How many votes were submitted in total?
- How many agrees?
- How many disagrees?
- How many passes?

- Percent sparse / no vote
- What was the distribution of total votes by participant?
- What was the distribution of total agrees by participant?
- What was the distribution of total disagrees by participant?
- What was the mean of participant votes?
- What was the variance of participant votes
- What was the distribution of variance per comment
- as beeswarm

- PCA on entire conversation
- Color by highest variance
- Color by vote counts

- UMAP on entire conversation
- Leiden graph based community detection
- Facilitate running 👾 Algorithms on topical sub conversations
- Listing groups of comments identified as topical subconversations
- Heatmaps on sub conversations
- Discussing clusters found
- Heatmaps of comments by average vote on subconversation comments
- Show overall contentiousness of a topic, ie., summary statistics on a subconversation, ie., summary stats on a heatmap

- Heatmaps by total disagree by cluster
- Heatmaps by total agree by cluster
- Representativeness in notebook for use on subconversations using Leiden
- Ie., add Representative Comments math to notebook

- group informed consensus in automated report
- group informed consensus in notebook
- Given what we know about groups and controlling for subgroups, here are the statements we can truly say are consensus (because even 90% agree could have a very important minority viewpoint)

- Tell stories over time
- Timeline with binned counts to see who participated late
- Use aggregate stats on metadata and overall agreeableness to tell a story about who com es in or does what when
- What percentage of people come back and when?

- Demographic info overlay
- Is one cluster more disagreeable than another?
- Multiple dimensions
- How do you do a Bayesian dimensionality reduction that would do what we did with jetpack but do it probabilistically
- We have the time series by participant. How well are we able to predict the next vote of the participant as the conversation proceeds given our model of existing clusters? Do we better understand people?
- Agreeableness metric:
- Ideally, probabilistic (Bayesian)
- Heuristic if need be
- What is the likelihood that someone agrees or disagrees with a given comment?

- Topic clustering