Curriculum Vitae

Basics

Name John Chenxi Song
Label Research Asistant | Network and System Administrator | Software Engineer
Email chs342[at]pitt[dot]edu
Summary Web Technology | LLMs | Machine Learning & Deep Learning | Data Visualization

Skills

Machine Learning
Languages: Python3 | R
Data: Pandas | Numpy | Matplotlib | Tidyverse
Modeling: Pytorch | Scikit-learn | CRAN
Tools: CUDA | Anaconda | RStudio
Large Language Model
Languages: Python3
Packages: LangChain | Transformers
Models: Llama 3 (Decoder-LLM) | Transformer
Tools: Ollama | HuggingFace
Software Engineering
OOP: Java 1.8 | JavaScript ES6 | HTML
Database: SQL | MongoDB
Frameworks: Spark | Flask | ReactJS
Tools: Docker | Github | Powershell

Work

  • 2024.02 - Present
    Network and System Administrator & Learning Management
    Reformed Presbyterian Theological Seminary
    • Install, configure, and maintain the school’s local area network (LAN), wide area network (WAN), operating systems (Windows/Ubuntu/Linux Mint), and physical and virtual servers (Hybrid with Azure)
    • Perform system monitoring network over 200 connections, server resources and systems over 40 PCs.
    • Analyze data from 300 students, implement visualizations to uncover key insights, optimized office software to achieve $3,000 in annual saving, and developed automated workflows to reduce processing time by 20%.
    • Established an IT department inventory and ticketing system, centralizing asset management and streamlining issue tracking, resulting in improved resource allocation and enhanced accountability.
  • 2023.03 - Present
    ML Research Assistant
    School of Medicine, University of Pittsburgh
    • Implement multi-source models based on research papers’ pseudocode and two Java libraries (Weka & Smile) to diagnose influenza and increase 2% accuracy comparing to existing single source model prediction.
    • Supervised 2 undergrads summer interships, providing coding guidance in Python for research projects.
    • Led a Medical Image Processing project, managing over 100GB of data for preprocessing, and utilizing a pre-trained large model (Prov-GigaPath) to extract features for cancer diagnosis.
    • Guided the development of a project using a Large Language Model(LLM) to diagnose influenza, including designing CoT prompt templates for effective analysis.
    • Deployed a Llama3-8B model locally on a 25GB GPU, ensuring data privacy and security for sensitive medical research.

Education

  • 2022.08 - 2023.12

    Pittsburgh, USA

    M.S. in Information Science
    University of Pittsburgh
    Large Language Model | Software Development
    • Machine Learning
    • Web Technology
    • Advanced DataMining
    • Information retrieval
    • Artificial intelligence
  • 2020.03 - 2022.07

    Pittsburgh, USA

    Master of Theological Study
    Reformed Presbyterian Theological Seminary
    Theology
    • Research Writing
  • 2016.08 - 2018.12

    Bevear Falls, PA, USA

    B.S. in Computer Science
    Geneva College
    Computer Science | Engineering
    • Data Structure
    • Algorithm
    • Database Management

Publications

  • 2024.09.20
    Transfer Learning with Clinical Concept Embeddings from Large Language Models
    Submit AMIA, Under-Review
    This study shows domain-specific language models like Med-BERT improve knowledge transfer across healthcare sites, outperforming generic models, but excessive tuning of biomedical embeddings can reduce effectiveness. Balance is crucial.
  • 2024.06.03
    Online Transfer Learning for RSV Case Detection
    Publisher: IEEE
    Multi-Source Adaptive Weighting (MSAW) is an online transfer learning method that dynamically adjusts weights for historical and new data, improving performance in sequential classification tasks, particularly in healthcare applications.

Projects

Interests

Machine Learning
Bayesian Theory
Natural Language Processing
Semantic Searching
Large Language Model
Report Generation
Inference
Transfer Learning
Cross-domain learning