Here, you will find some of the projects I am working on.
I am currently developing some C++ frameworks (not a Python fan, sorry!) for modeling dynamic semantics.
A computational implementation of Groenendijk, Stokhof and Veltman's update semantics for Quantified Modal Logic, developed in:
You can take a look at the code here.
As part of my work on Large Language Model Evaluation, I have written an application for managing evaluation data from Stanford's CRFM HELM project. You can see the applicaction code here
This application is intended as a complement to POKTscan's pnyx-lm-taxonomies suite.
Stanford CRFM's Holistic Evaluation of Language Models (HELM) is a scientifically designed benchmark for the evaluation of Language Models (LMs) in various tasks and skills, along a varied array of metrics. However, the publicly available data falls short of fully covering all the tasks and general abilities of LMs. To improve the usability of the data generated by HELM, Pocket Scan LLC. developed a suite of Python scripts that allow for the construction of custom datasets built from the data already availabe in HELM's repositories.
However, exploring the sea of data available in HELM's evaluation output is a tool order without an adequate tool. The HELM Prompt Browser is a tool designed to help AI researchers in navigating the complexity of HELM's data (250 GB of raw evaluation data), allowing filtering and selection according to diverse criteria. The custom datasets constructed on this basis can then be exported to a JSON file that serves as input to the scripts in the pnyx-lm-taxonomies suite.
Also in the way are some apps to use those frameworks in a friendly manner.