Research Trends Radar

research

ai-agent

finance

Bi-weekly arXiv scan across data science, finance, and social sciences — top-5 papers per area, topic timeline, and an interactive mind graph.

Published

May 19, 2026

A simple overview of trending topics in data science, finance, and social sciences — top-5 arXiv papers per area, refreshed every two weeks.

Topic timeline

How often each topic appears across runs. Bigger dots = more papers in that topic that fortnight.

How each topic’s rank shifts across runs — rank 1 = most papers that fortnight.

Data Science — top 5

#1 — Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

William Lugoloobi

LLM agent security and interpretability

Reveals critical security vulnerability in LLM-based agents through passive UI tracking, achieving 96% F1 across frontier models—directly relevant to deployed agent trustworthiness.

arXiv

#2 — GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning

Paolo Mandica

Efficient LLM training and adaptation

Addresses fundamental limitation in LoRA fine-tuning by achieving end-to-end isometry in parameter space, advancing parameter-efficient LLM adaptation at scale.

arXiv

#3 — XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference

Thomas Witt

Efficient LLM inference and quantization

Introduces quality-targeted dynamic quantization for LLM inference with automatic hyperparameter selection, enabling practical sub-byte compression without calibration.

arXiv

#4 — GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning

Yan Jiang

Foundation model adaptation and generalization

Extends graph prompt tuning to foundation models with test-time adaptation, improving generalization across domains—key for scaling GFMs.

arXiv

#5 — From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement

Varad Vishwarupe

AI alignment and pluralism

Reframes AI alignment beyond preference aggregation to surface genuine disagreement and pluralism, addressing critical limitation of current RLHF approaches.

arXiv

Finance — top 5

#1 — Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage

Fusheng Luo

Neural networks for fixed-income modeling · No-arbitrage deep learning

Integrates physics-informed constraints (no-arbitrage) into deep generative models for yield curves, resolving ‘manifold collapse’ and achieving consistent term structure forecasting.

arXiv

#2 — Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction

Namhyoung Kim

Stock prediction with financial priors · Factor models and machine learning

Combines expert financial priors with vector-quantized discrete latent factors for dynamic cross-sectional stock ranking, balancing interpretability and predictive power.

arXiv

#3 — Synthetic American Option Pricing via Jump-HMM-Driven Heston Implied Volatility

Julia Sun

Derivatives pricing and volatility · Synthetic data generation

Breaks circular dependency in synthetic option pricing by deriving implied volatility from structural models, enabling synthetic data generation for ML/risk applications.

arXiv

#4 — Bayesian Dynamic Modeling of Realized Volatility in Financial Asset Price Forecasting

Patrick Woitschig

Volatility forecasting and modeling

Introduces dynamic gamma process models for realized volatility in Bayesian framework, improving volatility forecasting through integration of high-frequency data.

arXiv

#5 — A deep learning approach for pricing convertible bonds with path-dependent reset and call provisions

Qinwen Zhu

Derivatives pricing with deep learning

Develops deep learning framework for path-dependent convertible bond pricing under complex contractual features, extending DL methods to realistic derivative valuation.

arXiv

Mind graph

An interactive force-directed view of the topic landscape. Drag nodes, zoom, hover for details. Colors come from a Louvain community-detection pass on the co-occurrence graph.

Methodology & cadence

Cadence — runs on the 1st and 15th of each month at 08:00 UTC via GitHub Actions (.github/workflows/ssrn-research.yml). Manual triggers via workflow_dispatch.
Phase A (deterministic) — three arXiv queries, one per bucket, each returning the most-recent 80 candidates filtered to the last 14 days.
Phase B (Claude Haiku tool-use) — picks the top-5 per bucket and clusters all candidates into 5–10 named topics. Edges are not emitted by the model — they’re computed from paper_ids intersections to guarantee consistency.
Snapshots — one JSON per run in ssrn-research/snapshots/. Page reads them all on every render.

Last 4 runs:

2026-05-06 — 75 candidates (data_science=26, finance=25, social_sciences=27), 10 topics
2026-05-07 — 73 candidates (data_science=25, finance=25, social_sciences=27), 10 topics
2026-05-08 — 49 candidates (data_science=0, finance=25, social_sciences=25), 10 topics
2026-05-15 — 71 candidates (data_science=27, finance=25, social_sciences=26), 10 topics

--- title: "Research Trends Radar" description: "Bi-weekly arXiv scan across data science, finance, and social sciences — top-5 papers per area, topic timeline, and an interactive mind graph." date: today date-format: long categories: [research, ai-agent, finance] execute: freeze: false echo: false warning: false message: false format: html: page-layout: full toc: true --- ```{r setup} library(jsonlite) library(dplyr) library(tidyr) library(purrr) library(stringr) library(lubridate) library(ggplot2) library(visNetwork) library(igraph) library(MetBrewer) library(DT) library(htmltools) library(fs) `%||%` <- function(a, b) if (is.null(a) || length(a) == 0L) b else a SNAPSHOT_DIR <- "snapshots" snapshot_files <- if (dir_exists(SNAPSHOT_DIR)) { dir_ls(SNAPSHOT_DIR, regexp = "\\d{4}-\\d{2}-\\d{2}\\.json$") } else character(0) snapshots <- map(snapshot_files, ~ read_json(.x, simplifyVector = FALSE)) snapshots <- keep(snapshots, ~ !is.null(.x$run_date)) snapshots <- snapshots[order(map_chr(snapshots, "run_date"))] has_data <- length(snapshots) > 0L && any(map_int(snapshots, ~ length(.x$topics %||% list())) > 0L) PAL_PRIMARY <- "#1a1a2e" PAL_HIGHLIGHT <- "#e94560" PAL_ACCENT <- "#0f3460" BUCKETS <- list( data_science = "Data Science", finance = "Finance", social_sciences = "Social Sciences" ) ``` ::: {.callout-note appearance="minimal" icon=false} A simple overview of trending topics in **data science**, **finance**, and **social sciences** — top-5 arXiv papers per area, refreshed every two weeks. ::: ```{r empty-state, eval = !has_data, results = "asis"} cat("\n## No snapshots yet\n\n") cat("The first bi-weekly run will populate this page. Trigger it manually with:\n\n") cat("```bash\nRscript ssrn-research/scan_ssrn.R --dry-run # Phase A only\nRscript ssrn-research/scan_ssrn.R # full run (needs ANTHROPIC_API_KEY)\n```\n") ``` ```{r prep-data, eval = has_data} latest <- snapshots[[length(snapshots)]] # Long table of (run_date, topic_name, paper_ids) across all snapshots topics_long <- map_dfr(snapshots, function(s) { topics <- s$topics %||% list() if (length(topics) == 0L) return(tibble()) tibble( run_date = as.Date(s$run_date), name = map_chr(topics, "name"), paper_ids = map(topics, ~ unlist(.x$paper_ids)) ) }) topics_long <- topics_long |> mutate(n_papers = lengths(paper_ids)) render_top_paper <- function(p) { authors_txt <- paste(unlist(p$authors %||% list()), collapse = ", ") if (nchar(authors_txt) > 200) authors_txt <- paste0(substr(authors_txt, 1, 200), " ...") topics_txt <- paste(unlist(p$topics %||% list()), collapse = " · ") cat(sprintf( '\n::: {.project-card}\n#### #%s — [%s](%s)\n**%s**\n\n_%s_\n\n%s\n\n[arXiv](%s)\n:::\n', p$rank, p$title, p$url, authors_txt, topics_txt, p$why_trending, p$url )) } render_bucket_section <- function(bucket_key, bucket_label) { papers <- latest$top_papers[[bucket_key]] %||% list() cat(sprintf("\n## %s — top 5\n\n", bucket_label)) if (length(papers) == 0L) { cat(sprintf("_No top-5 picks for %s in the latest snapshot._\n", bucket_label)) return(invisible()) } for (p in papers) render_top_paper(p) } ``` ## Topic timeline How often each topic appears across runs. Bigger dots = more papers in that topic that fortnight. ```{r timeline, eval = has_data, fig.width = 11, fig.height = 5.5} top_n_topics <- 12 topic_totals <- topics_long |> group_by(name) |> summarise(total = sum(n_papers), .groups = "drop") |> arrange(desc(total)) |> head(top_n_topics) plot_df <- topics_long |> semi_join(topic_totals, by = "name") |> mutate(name = factor(name, levels = topic_totals$name)) n_colors <- nrow(topic_totals) palette <- if (n_colors <= 12) MetBrewer::met.brewer("Cassatt2", n_colors) else colorRampPalette(MetBrewer::met.brewer("Cassatt2", 12))(n_colors) ggplot(plot_df, aes(x = run_date, y = name, size = n_papers, color = name)) + geom_point(alpha = 0.85) + scale_color_manual(values = as.character(palette), guide = "none") + scale_size_area(max_size = 11, name = "papers") + scale_x_date(date_labels = "%d %b %Y") + labs(x = NULL, y = NULL, title = paste0("Top ", n_colors, " topics across ", length(snapshots), " run(s)")) + theme_minimal(base_size = 12) + theme( panel.grid.minor = element_blank(), panel.grid.major.y = element_line(color = "grey92"), plot.title = element_text(face = "bold", color = PAL_PRIMARY), axis.text.y = element_text(color = PAL_PRIMARY) ) ``` How each topic's rank shifts across runs — rank 1 = most papers that fortnight. ```{r bump-chart, eval = has_data && length(snapshots) >= 2, fig.width = 11, fig.height = 6} top_n_bump <- min(10L, n_distinct(topics_long$name)) bump_totals <- topics_long |> group_by(name) |> summarise(total = sum(n_papers), .groups = "drop") |> arrange(desc(total)) |> slice_head(n = top_n_bump) n_bump <- nrow(bump_totals) bump_palette <- if (n_bump <= 12) { MetBrewer::met.brewer("Cassatt2", max(n_bump, 2L)) } else { colorRampPalette(MetBrewer::met.brewer("Cassatt2", 12L))(n_bump) } bump_colors <- setNames(as.character(bump_palette), bump_totals$name) bump_df <- topics_long |> semi_join(bump_totals, by = "name") |> group_by(run_date) |> mutate(rank = rank(-n_papers, ties.method = "min")) |> ungroup() label_right <- bump_df |> filter(run_date == max(run_date)) |> mutate(label = str_trunc(name, 28)) ggplot(bump_df, aes(x = run_date, y = rank, group = name, color = name)) + geom_line(linewidth = 1.3, alpha = 0.8, lineend = "round") + geom_point(aes(size = n_papers), alpha = 0.9) + geom_text(data = label_right, aes(label = str_trunc(name, 28)), hjust = -0.12, size = 3.1, fontface = "bold") + scale_color_manual(values = bump_colors, guide = "none") + scale_size_area(max_size = 9, name = "papers") + scale_y_reverse(breaks = seq_len(top_n_bump), minor_breaks = NULL) + scale_x_date(date_labels = "%d %b '%y", expand = expansion(mult = c(0.04, 0.42))) + labs(x = NULL, y = "Rank", title = paste0("Topic rank over time — top ", top_n_bump, " by cumulative papers")) + theme_minimal(base_size = 12) + theme( panel.grid.minor = element_blank(), panel.grid.major.x = element_line(color = "grey85"), panel.grid.major.y = element_line(color = "grey92", linetype = "dashed"), plot.title = element_text(face = "bold", color = PAL_PRIMARY), axis.text = element_text(color = PAL_PRIMARY), axis.title.y = element_text(color = PAL_PRIMARY, size = 10) ) ``` ```{r top-data-science, eval = has_data, results = "asis"} render_bucket_section("data_science", BUCKETS$data_science) ``` ```{r top-finance, eval = has_data, results = "asis"} render_bucket_section("finance", BUCKETS$finance) ``` ```{r top-social-sciences, eval = has_data, results = "asis"} render_bucket_section("social_sciences", BUCKETS$social_sciences) ``` ## Mind graph An interactive force-directed view of the topic landscape. Drag nodes, zoom, hover for details. Colors come from a Louvain community-detection pass on the co-occurrence graph. ```{r mindgraph, eval = has_data} topic_papers <- topics_long |> unnest(paper_ids) |> rename(paper_id = paper_ids) |> distinct(name, paper_id) |> group_by(name) |> summarise(papers = list(paper_id), n_papers_total = n(), .groups = "drop") # Always start with a 3-column shell so empty results keep their schema empty_edges <- tibble(from = character(), to = character(), weight = integer()) edges <- if (nrow(topic_papers) >= 2) { pairs <- combn(topic_papers$name, 2, simplify = FALSE) result <- map_dfr(pairs, function(pr) { a <- topic_papers$papers[[which(topic_papers$name == pr[1])]] b <- topic_papers$papers[[which(topic_papers$name == pr[2])]] w <- length(intersect(a, b)) if (w >= 1) tibble(from = pr[1], to = pr[2], weight = w) else NULL }) if (is.null(result) || nrow(result) == 0L) empty_edges else result } else empty_edges if (nrow(edges) > 0L) { g <- igraph::graph_from_data_frame( edges, directed = FALSE, vertices = topic_papers |> select(name, n_papers_total) ) comm <- igraph::cluster_louvain(g) groups <- tibble(name = igraph::V(g)$name, group = as.integer(igraph::membership(comm))) } else { groups <- tibble(name = topic_papers$name, group = seq_len(nrow(topic_papers))) } n_groups <- max(groups$group, 1L) group_palette <- if (n_groups <= 12) MetBrewer::met.brewer("Cassatt2", max(n_groups, 2)) else colorRampPalette(MetBrewer::met.brewer("Cassatt2", 12))(n_groups) nodes <- topic_papers |> left_join(groups, by = "name") |> mutate( id = name, label = name, value = n_papers_total, color = as.character(group_palette[group]), title = sprintf("<b>%s</b><br>%d papers across all runs", name, n_papers_total) ) |> select(id, label, value, color, title, group) vis_edges <- edges |> mutate( value = weight, title = sprintf("%d shared paper(s)", weight) ) if (nrow(nodes) == 0L) { htmltools::tags$div(class = "alert alert-info", "Not enough topics yet for a mind graph.") } else { visNetwork(nodes, vis_edges, height = "640px", width = "100%") |> visNodes(font = list(face = "Inter", size = 16), shape = "dot", borderWidth = 2, color = list(border = "#1a1a2e", highlight = list(border = "#e94560"))) |> visEdges(smooth = list(enabled = TRUE, type = "continuous"), color = list(color = "#bbbbbb", highlight = "#e94560")) |> visPhysics(solver = "forceAtlas2Based", forceAtlas2Based = list(gravitationalConstant = -55, centralGravity = 0.012, springLength = 120, springConstant = 0.06, avoidOverlap = 0.6), stabilization = list(iterations = 220)) |> visInteraction(hover = TRUE, navigationButtons = TRUE, tooltipDelay = 100, dragNodes = TRUE) |> visOptions(highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE), nodesIdSelection = TRUE) } ``` ## Methodology & cadence - **Cadence** — runs on the 1st and 15th of each month at 08:00 UTC via GitHub Actions (`.github/workflows/ssrn-research.yml`). Manual triggers via `workflow_dispatch`. - **Phase A (deterministic)** — three arXiv queries, one per bucket, each returning the most-recent 80 candidates filtered to the last 14 days. - **Phase B (Claude Haiku tool-use)** — picks the top-5 per bucket and clusters all candidates into 5–10 named topics. Edges are *not* emitted by the model — they're computed from `paper_ids` intersections to guarantee consistency. - **Snapshots** — one JSON per run in `ssrn-research/snapshots/`. Page reads them all on every render. ```{r run-summary, eval = has_data, results = "asis"} last_n <- min(8, length(snapshots)) recent <- tail(snapshots, last_n) rows <- map_chr(recent, function(s) { bc <- s$bucket_counts %||% list() bc_str <- if (length(bc) > 0L) paste(names(bc), unlist(bc), sep = "=", collapse = ", ") else "n/a" sprintf("- **%s** — %d candidates (%s), %d topics", s$run_date, s$candidates_count %||% 0L, bc_str, length(s$topics %||% list())) }) cat("**Last ", last_n, " runs:**\n\n", paste(rows, collapse = "\n"), sep = "") ```

Topic timeline

Data Science — top 5

#1 — Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

#2 — GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning

#3 — XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference

#4 — GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning

#5 — From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement

Finance — top 5

#1 — Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage

#2 — Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction

#3 — Synthetic American Option Pricing via Jump-HMM-Driven Heston Implied Volatility

#4 — Bayesian Dynamic Modeling of Realized Volatility in Financial Asset Price Forecasting

#5 — A deep learning approach for pricing convertible bonds with path-dependent reset and call provisions

Social Sciences — top 5

#1 — A Practical Guide to Instrumental Variables Methods with Heterogeneous Treatment Effects

#2 — Identification and Estimation of Staggered Difference-in-Differences with Network Spillovers

#3 — Deep Learning for Solving and Estimating Dynamic Models in Economics and Finance

#4 — Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top

#5 — Regret Equals Covariance: A Closed-Form Characterization for Stochastic Optimization

Mind graph

Methodology & cadence