Simen Bjerkelund
  • Home
  • CV
  • Blog
  • Data Projects
    • All Data Projects

    • Tennis Data
    • Tennis Statistics
    • Tennis Match Predictor
    • Grand Slam Point-by-Point

    • Ad Hoc
    • Oslo Housing
    • Research Trends
  • SSB Daily
  • Risk analysis and Machine Learning

On this page

  • Topic timeline
  • Data Science — top 5
  • Finance — top 5
  • Social Sciences — top 5
  • Mind graph
  • Methodology & cadence

Research Trends Radar

research
ai-agent
finance
Bi-weekly arXiv scan across data science, finance, and social sciences — top-5 papers per area, topic timeline, and an interactive mind graph.
Published

May 19, 2026

A simple overview of trending topics in data science, finance, and social sciences — top-5 arXiv papers per area, refreshed every two weeks.

Topic timeline

How often each topic appears across runs. Bigger dots = more papers in that topic that fortnight.

How each topic’s rank shifts across runs — rank 1 = most papers that fortnight.

Data Science — top 5

#1 — Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

William Lugoloobi

LLM agent security and interpretability

Reveals critical security vulnerability in LLM-based agents through passive UI tracking, achieving 96% F1 across frontier models—directly relevant to deployed agent trustworthiness.

arXiv

#2 — GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning

Paolo Mandica

Efficient LLM training and adaptation

Addresses fundamental limitation in LoRA fine-tuning by achieving end-to-end isometry in parameter space, advancing parameter-efficient LLM adaptation at scale.

arXiv

#3 — XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference

Thomas Witt

Efficient LLM inference and quantization

Introduces quality-targeted dynamic quantization for LLM inference with automatic hyperparameter selection, enabling practical sub-byte compression without calibration.

arXiv

#4 — GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning

Yan Jiang

Foundation model adaptation and generalization

Extends graph prompt tuning to foundation models with test-time adaptation, improving generalization across domains—key for scaling GFMs.

arXiv

#5 — From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement

Varad Vishwarupe

AI alignment and pluralism

Reframes AI alignment beyond preference aggregation to surface genuine disagreement and pluralism, addressing critical limitation of current RLHF approaches.

arXiv

Finance — top 5

#1 — Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage

Fusheng Luo

Neural networks for fixed-income modeling · No-arbitrage deep learning

Integrates physics-informed constraints (no-arbitrage) into deep generative models for yield curves, resolving ‘manifold collapse’ and achieving consistent term structure forecasting.

arXiv

#2 — Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction

Namhyoung Kim

Stock prediction with financial priors · Factor models and machine learning

Combines expert financial priors with vector-quantized discrete latent factors for dynamic cross-sectional stock ranking, balancing interpretability and predictive power.

arXiv

#3 — Synthetic American Option Pricing via Jump-HMM-Driven Heston Implied Volatility

Julia Sun

Derivatives pricing and volatility · Synthetic data generation

Breaks circular dependency in synthetic option pricing by deriving implied volatility from structural models, enabling synthetic data generation for ML/risk applications.

arXiv

#4 — Bayesian Dynamic Modeling of Realized Volatility in Financial Asset Price Forecasting

Patrick Woitschig

Volatility forecasting and modeling

Introduces dynamic gamma process models for realized volatility in Bayesian framework, improving volatility forecasting through integration of high-frequency data.

arXiv

#5 — A deep learning approach for pricing convertible bonds with path-dependent reset and call provisions

Qinwen Zhu

Derivatives pricing with deep learning

Develops deep learning framework for path-dependent convertible bond pricing under complex contractual features, extending DL methods to realistic derivative valuation.

arXiv

Social Sciences — top 5

#1 — A Practical Guide to Instrumental Variables Methods with Heterogeneous Treatment Effects

Tymon Słoczyński

Causal inference and econometrics

Provides practical guide to IV methods accounting for heterogeneous treatment effects, aligning empirical econometric practice with recent LATE framework advances.

arXiv

#2 — Identification and Estimation of Staggered Difference-in-Differences with Network Spillovers

Hayato Tagawa

Causal inference with spillovers

Extends staggered difference-in-differences to settings with network spillovers, enabling policy evaluation under realistic interdependence structures.

arXiv

#3 — Deep Learning for Solving and Estimating Dynamic Models in Economics and Finance

Simon Scheidegger

Deep learning for dynamic economic models

Comprehensive guide to deep learning for solving high-dimensional heterogeneous-agent and macro-finance models, addressing curse of dimensionality in economic modeling.

arXiv

#4 — Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top

Hyunso Kim

AI and labor economics · Entrepreneurship

Documents how generative AI reshapes entrepreneurial entry composition (solo vs. teams) using 160K+ product launches, revealing paradox where entry increases but top quality remains team-driven.

arXiv

#5 — Regret Equals Covariance: A Closed-Form Characterization for Stochastic Optimization

Irene Aldridge

Decision theory and optimization

Proves exact closed-form decomposition of regret as covariance between uncertain parameters and optimal decisions, replacing expensive SAA simulation.

arXiv

Mind graph

An interactive force-directed view of the topic landscape. Drag nodes, zoom, hover for details. Colors come from a Louvain community-detection pass on the co-occurrence graph.

Methodology & cadence

  • Cadence — runs on the 1st and 15th of each month at 08:00 UTC via GitHub Actions (.github/workflows/ssrn-research.yml). Manual triggers via workflow_dispatch.
  • Phase A (deterministic) — three arXiv queries, one per bucket, each returning the most-recent 80 candidates filtered to the last 14 days.
  • Phase B (Claude Haiku tool-use) — picks the top-5 per bucket and clusters all candidates into 5–10 named topics. Edges are not emitted by the model — they’re computed from paper_ids intersections to guarantee consistency.
  • Snapshots — one JSON per run in ssrn-research/snapshots/. Page reads them all on every render.

Last 4 runs:

  • 2026-05-06 — 75 candidates (data_science=26, finance=25, social_sciences=27), 10 topics
  • 2026-05-07 — 73 candidates (data_science=25, finance=25, social_sciences=27), 10 topics
  • 2026-05-08 — 49 candidates (data_science=0, finance=25, social_sciences=25), 10 topics
  • 2026-05-15 — 71 candidates (data_science=27, finance=25, social_sciences=26), 10 topics
Source Code
---
title: "Research Trends Radar"
description: "Bi-weekly arXiv scan across data science, finance, and social sciences — top-5 papers per area, topic timeline, and an interactive mind graph."
date: today
date-format: long
categories: [research, ai-agent, finance]
execute:
  freeze: false
  echo: false
  warning: false
  message: false
format:
  html:
    page-layout: full
    toc: true
---

```{r setup}
library(jsonlite)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
library(lubridate)
library(ggplot2)
library(visNetwork)
library(igraph)
library(MetBrewer)
library(DT)
library(htmltools)
library(fs)

`%||%` <- function(a, b) if (is.null(a) || length(a) == 0L) b else a

SNAPSHOT_DIR <- "snapshots"

snapshot_files <- if (dir_exists(SNAPSHOT_DIR)) {
  dir_ls(SNAPSHOT_DIR, regexp = "\\d{4}-\\d{2}-\\d{2}\\.json$")
} else character(0)

snapshots <- map(snapshot_files, ~ read_json(.x, simplifyVector = FALSE))
snapshots <- keep(snapshots, ~ !is.null(.x$run_date))
snapshots <- snapshots[order(map_chr(snapshots, "run_date"))]

has_data <- length(snapshots) > 0L &&
            any(map_int(snapshots, ~ length(.x$topics %||% list())) > 0L)

PAL_PRIMARY   <- "#1a1a2e"
PAL_HIGHLIGHT <- "#e94560"
PAL_ACCENT    <- "#0f3460"

BUCKETS <- list(
  data_science    = "Data Science",
  finance         = "Finance",
  social_sciences = "Social Sciences"
)
```

::: {.callout-note appearance="minimal" icon=false}
A simple overview of trending topics in **data science**, **finance**, and **social sciences** — top-5 arXiv papers per area, refreshed every two weeks.
:::

```{r empty-state, eval = !has_data, results = "asis"}
cat("\n## No snapshots yet\n\n")
cat("The first bi-weekly run will populate this page. Trigger it manually with:\n\n")
cat("```bash\nRscript ssrn-research/scan_ssrn.R --dry-run   # Phase A only\nRscript ssrn-research/scan_ssrn.R              # full run (needs ANTHROPIC_API_KEY)\n```\n")
```

```{r prep-data, eval = has_data}
latest <- snapshots[[length(snapshots)]]

# Long table of (run_date, topic_name, paper_ids) across all snapshots
topics_long <- map_dfr(snapshots, function(s) {
  topics <- s$topics %||% list()
  if (length(topics) == 0L) return(tibble())
  tibble(
    run_date  = as.Date(s$run_date),
    name      = map_chr(topics, "name"),
    paper_ids = map(topics, ~ unlist(.x$paper_ids))
  )
})

topics_long <- topics_long |>
  mutate(n_papers = lengths(paper_ids))

render_top_paper <- function(p) {
  authors_txt <- paste(unlist(p$authors %||% list()), collapse = ", ")
  if (nchar(authors_txt) > 200) authors_txt <- paste0(substr(authors_txt, 1, 200), " ...")
  topics_txt  <- paste(unlist(p$topics %||% list()), collapse = " · ")
  cat(sprintf(
    '\n::: {.project-card}\n#### #%s — [%s](%s)\n**%s**\n\n_%s_\n\n%s\n\n[arXiv](%s)\n:::\n',
    p$rank, p$title, p$url,
    authors_txt,
    topics_txt,
    p$why_trending,
    p$url
  ))
}

render_bucket_section <- function(bucket_key, bucket_label) {
  papers <- latest$top_papers[[bucket_key]] %||% list()
  cat(sprintf("\n## %s — top 5\n\n", bucket_label))
  if (length(papers) == 0L) {
    cat(sprintf("_No top-5 picks for %s in the latest snapshot._\n", bucket_label))
    return(invisible())
  }
  for (p in papers) render_top_paper(p)
}
```

## Topic timeline

How often each topic appears across runs. Bigger dots = more papers in that topic that fortnight.

```{r timeline, eval = has_data, fig.width = 11, fig.height = 5.5}
top_n_topics <- 12

topic_totals <- topics_long |>
  group_by(name) |>
  summarise(total = sum(n_papers), .groups = "drop") |>
  arrange(desc(total)) |>
  head(top_n_topics)

plot_df <- topics_long |>
  semi_join(topic_totals, by = "name") |>
  mutate(name = factor(name, levels = topic_totals$name))

n_colors <- nrow(topic_totals)
palette  <- if (n_colors <= 12) MetBrewer::met.brewer("Cassatt2", n_colors) else
            colorRampPalette(MetBrewer::met.brewer("Cassatt2", 12))(n_colors)

ggplot(plot_df, aes(x = run_date, y = name, size = n_papers, color = name)) +
  geom_point(alpha = 0.85) +
  scale_color_manual(values = as.character(palette), guide = "none") +
  scale_size_area(max_size = 11, name = "papers") +
  scale_x_date(date_labels = "%d %b %Y") +
  labs(x = NULL, y = NULL,
       title = paste0("Top ", n_colors, " topics across ", length(snapshots), " run(s)")) +
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.minor   = element_blank(),
    panel.grid.major.y = element_line(color = "grey92"),
    plot.title         = element_text(face = "bold", color = PAL_PRIMARY),
    axis.text.y        = element_text(color = PAL_PRIMARY)
  )
```

How each topic's rank shifts across runs — rank 1 = most papers that fortnight.

```{r bump-chart, eval = has_data && length(snapshots) >= 2, fig.width = 11, fig.height = 6}
top_n_bump <- min(10L, n_distinct(topics_long$name))

bump_totals <- topics_long |>
  group_by(name) |>
  summarise(total = sum(n_papers), .groups = "drop") |>
  arrange(desc(total)) |>
  slice_head(n = top_n_bump)

n_bump       <- nrow(bump_totals)
bump_palette <- if (n_bump <= 12) {
  MetBrewer::met.brewer("Cassatt2", max(n_bump, 2L))
} else {
  colorRampPalette(MetBrewer::met.brewer("Cassatt2", 12L))(n_bump)
}
bump_colors <- setNames(as.character(bump_palette), bump_totals$name)

bump_df <- topics_long |>
  semi_join(bump_totals, by = "name") |>
  group_by(run_date) |>
  mutate(rank = rank(-n_papers, ties.method = "min")) |>
  ungroup()

label_right <- bump_df |>
  filter(run_date == max(run_date)) |>
  mutate(label = str_trunc(name, 28))

ggplot(bump_df, aes(x = run_date, y = rank, group = name, color = name)) +
  geom_line(linewidth = 1.3, alpha = 0.8, lineend = "round") +
  geom_point(aes(size = n_papers), alpha = 0.9) +
  geom_text(data = label_right,
            aes(label = str_trunc(name, 28)),
            hjust = -0.12, size = 3.1, fontface = "bold") +
  scale_color_manual(values = bump_colors, guide = "none") +
  scale_size_area(max_size = 9, name = "papers") +
  scale_y_reverse(breaks = seq_len(top_n_bump), minor_breaks = NULL) +
  scale_x_date(date_labels = "%d %b '%y",
               expand = expansion(mult = c(0.04, 0.42))) +
  labs(x = NULL, y = "Rank",
       title = paste0("Topic rank over time — top ", top_n_bump, " by cumulative papers")) +
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.minor    = element_blank(),
    panel.grid.major.x  = element_line(color = "grey85"),
    panel.grid.major.y  = element_line(color = "grey92", linetype = "dashed"),
    plot.title          = element_text(face = "bold", color = PAL_PRIMARY),
    axis.text           = element_text(color = PAL_PRIMARY),
    axis.title.y        = element_text(color = PAL_PRIMARY, size = 10)
  )
```

```{r top-data-science, eval = has_data, results = "asis"}
render_bucket_section("data_science", BUCKETS$data_science)
```

```{r top-finance, eval = has_data, results = "asis"}
render_bucket_section("finance", BUCKETS$finance)
```

```{r top-social-sciences, eval = has_data, results = "asis"}
render_bucket_section("social_sciences", BUCKETS$social_sciences)
```

## Mind graph

An interactive force-directed view of the topic landscape. Drag nodes, zoom, hover for details. Colors come from a Louvain community-detection pass on the co-occurrence graph.

```{r mindgraph, eval = has_data}
topic_papers <- topics_long |>
  unnest(paper_ids) |>
  rename(paper_id = paper_ids) |>
  distinct(name, paper_id) |>
  group_by(name) |>
  summarise(papers = list(paper_id),
            n_papers_total = n(),
            .groups = "drop")

# Always start with a 3-column shell so empty results keep their schema
empty_edges <- tibble(from = character(), to = character(), weight = integer())

edges <- if (nrow(topic_papers) >= 2) {
  pairs <- combn(topic_papers$name, 2, simplify = FALSE)
  result <- map_dfr(pairs, function(pr) {
    a <- topic_papers$papers[[which(topic_papers$name == pr[1])]]
    b <- topic_papers$papers[[which(topic_papers$name == pr[2])]]
    w <- length(intersect(a, b))
    if (w >= 1) tibble(from = pr[1], to = pr[2], weight = w) else NULL
  })
  if (is.null(result) || nrow(result) == 0L) empty_edges else result
} else empty_edges

if (nrow(edges) > 0L) {
  g <- igraph::graph_from_data_frame(
    edges, directed = FALSE,
    vertices = topic_papers |> select(name, n_papers_total)
  )
  comm <- igraph::cluster_louvain(g)
  groups <- tibble(name = igraph::V(g)$name,
                   group = as.integer(igraph::membership(comm)))
} else {
  groups <- tibble(name = topic_papers$name,
                   group = seq_len(nrow(topic_papers)))
}

n_groups <- max(groups$group, 1L)
group_palette <- if (n_groups <= 12) MetBrewer::met.brewer("Cassatt2", max(n_groups, 2)) else
                 colorRampPalette(MetBrewer::met.brewer("Cassatt2", 12))(n_groups)

nodes <- topic_papers |>
  left_join(groups, by = "name") |>
  mutate(
    id    = name,
    label = name,
    value = n_papers_total,
    color = as.character(group_palette[group]),
    title = sprintf("<b>%s</b><br>%d papers across all runs", name, n_papers_total)
  ) |>
  select(id, label, value, color, title, group)

vis_edges <- edges |>
  mutate(
    value = weight,
    title = sprintf("%d shared paper(s)", weight)
  )

if (nrow(nodes) == 0L) {
  htmltools::tags$div(class = "alert alert-info",
                      "Not enough topics yet for a mind graph.")
} else {
  visNetwork(nodes, vis_edges, height = "640px", width = "100%") |>
    visNodes(font = list(face = "Inter", size = 16),
             shape = "dot",
             borderWidth = 2,
             color = list(border = "#1a1a2e", highlight = list(border = "#e94560"))) |>
    visEdges(smooth = list(enabled = TRUE, type = "continuous"),
             color = list(color = "#bbbbbb", highlight = "#e94560")) |>
    visPhysics(solver = "forceAtlas2Based",
               forceAtlas2Based = list(gravitationalConstant = -55,
                                        centralGravity        = 0.012,
                                        springLength          = 120,
                                        springConstant        = 0.06,
                                        avoidOverlap          = 0.6),
               stabilization = list(iterations = 220)) |>
    visInteraction(hover = TRUE, navigationButtons = TRUE,
                   tooltipDelay = 100, dragNodes = TRUE) |>
    visOptions(highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE),
               nodesIdSelection = TRUE)
}
```

## Methodology & cadence

- **Cadence** — runs on the 1st and 15th of each month at 08:00 UTC via GitHub Actions (`.github/workflows/ssrn-research.yml`). Manual triggers via `workflow_dispatch`.
- **Phase A (deterministic)** — three arXiv queries, one per bucket, each returning the most-recent 80 candidates filtered to the last 14 days.
- **Phase B (Claude Haiku tool-use)** — picks the top-5 per bucket and clusters all candidates into 5–10 named topics. Edges are *not* emitted by the model — they're computed from `paper_ids` intersections to guarantee consistency.
- **Snapshots** — one JSON per run in `ssrn-research/snapshots/`. Page reads them all on every render.

```{r run-summary, eval = has_data, results = "asis"}
last_n <- min(8, length(snapshots))
recent <- tail(snapshots, last_n)
rows <- map_chr(recent, function(s) {
  bc <- s$bucket_counts %||% list()
  bc_str <- if (length(bc) > 0L)
    paste(names(bc), unlist(bc), sep = "=", collapse = ", ")
  else "n/a"
  sprintf("- **%s** — %d candidates (%s), %d topics",
          s$run_date,
          s$candidates_count %||% 0L,
          bc_str,
          length(s$topics %||% list()))
})
cat("**Last ", last_n, " runs:**\n\n", paste(rows, collapse = "\n"), sep = "")
```

© 2025 Simen Bjerkelund · Built with Quarto