Real-time Contrainer Tracking API, 2023

This project delivered an API for tracking sea container movements in real-time by scraping publicly available data. It successfully navigated through the complexities of bot detection and blocking systems using Scraping Fish API and employed custom-developed computer vision algorithms to solve CAPTCHAs and ensure uninterrupted data retrieval. It demonstrated a blend of web scraping proficiency and practical application in logistics and supply chain monitoring.

Rotating Mobile Proxy and Webscraping API, 2022 - ongoing

A SaaS product under the banner of Scraping Fish, delivering a robust web scraping API. Managing a cluster of custom browsers, rotating proxies, JavaScript rendering, and CAPTCHAs solving. The innovative product operates on custom-made rotating mobile proxies. It offers users a seamless and efficient scraping experience, mitigating common web scraping hurdles such as IP bans and data retrieval from JavaScript-heavy pages.

Forecasting Framework, 2020 - ongoing

Internal Python package with a suite of tools designed for comprehensive management of time series data, extending from initial processing to the predictive inference of machine learning models. Tailored to the needs of forecasting tasks, the framework enables robust time series feature engineering, training, and testing of machine learning models, including the implementation of state-of-the-art deep learning models.

Multichannel Marketing Journey Optimization, 2020 - ongoing

Optimization and orchestration of customer journeys across multiple marketing channels to boost engagement and amplify product sales. Leveraging historical data on customer engagement, product sales, and customer demographics, the project employed time seties data modeling to enhance customer interaction and purchasing pathways. Furthermore, it provided insights into the efficacy of various marketing channels for specific customer. This endeavor not only paved the way for increased customer engagement and improved sales through personalized marketing strategies but also offered a lens into the effectiveness of different channels, facilitating data-driven decision-making in marketing initiatives.

Inventory of Technical and Transportation Infrastructure, 2021 - 2022

Successful implementation of an efficient data pipeline for processing drone imagery data and development of machine learning models for object detection, semantic segmentation, as well as instance segmentation on 3D point cloud data. The created models were subsequently utilized by a system engineered to conduct an inventory and progress tracking on construction sites, specifically focusing on transportation and energy infrastructure. This innovative approach streamlined the operational oversight of construction and development projects. ensured accurate, timely, and automated updates to inventory and progress metrics.

Real Estate Search Engine, 2019 - 2022

A comprehensive system for real estate data accumulation, management, and user interaction. Comprising a web scraping component for collecting public real estate data from the Internet and extracting structured information, API backend for managing data access alongside user and group management, while the web application facilitates users in executing real estate offer searches, creating filters, and managing notifications about newly matched offers.

Machine Learning Statistical Toolkit, 2019 - 2022

Open source Python package that provides statistical functions, leveraging bootstrapping, to compute confidence intervals and p-values when comparing machine learning models to human readers. Designed to enhance rigor in model evaluation and empower research with robust statistical validation tools, this toolkit fosters methodological robustness in machine learning research and applications, making sophisticated statistical analyses accessible to developers and researchers alike.

Computer-aided Diagnosis for Thyroid Nodules, 2018 - 2023

Research project for the detection and segmentation of thyroid nodules, with a pivotal contribution being the development of a multi-task Convolutional Neural Network (CNN) model for malignancy prediction of thyroid nodules from ultrasound (US) images, achieving radiologist-level accuracy. Stepping towards practical application, this system is in the pipeline for commercial deployment. Furthermore, in a collaboration with radiologists, the project optimized and simplified a thyroid nodule interpretation guideline system by employing a genetic algorithm, which, when tested by independent researchers globally, validated its robustness and efficacy.

Computer-aided Diagnosis for Breast Cancer, 2018 - 2023

A collaborative effort with radiologists, the project involved curation and annotation of a 3D digital breast tomosynthesis dataset. Furthermore, a baseline cancer detection model was developed, alongside comprehensive data handling and evaluation code, all of which were made publicly available, bolstering the global collaborative research environment. The project also explored self-supervised learning approach, employing a strategy that utilized images without lesions to detection method based on an image completion using Generative Adversarial Network.

Radiogenomics, 2017 - 2020

Research project, conducted at Duke University, employing Convolutional Neural Networks (CNN) to harness critical insights into brain cancer therapy and prognosis estimates. This project used CNN for segmentation of brain tumors within MRI volumes, effectively extracting tumor volumetric features predictive of genomic subtypes and patient outcomes. Subsequent stages involved the development of a CNN model tasked with the direct prediction of tumor radiomic characteristics.

Word Predictor, 2016

Word Predictor aimed to optimize text input on mobile platforms, providing a streamlined user experience on both iOS and Android. This predictive framework, implemented in C++, utilized a Hidden Markov Model to enable next-word prediction and word completion in mobile keyboard applications.