Data Science
Leverage data-driven decision making through rigorous statistical analysis, feature engineering, and machine learning to extract actionable insights and power smarter business strategies.
Hello, I'm
Data Scientist
Get To Know More
Hi, I'm Vivek Chaurasia, a graduate student in Artificial Intelligence at RIT with hands-on experience in machine learning, computer vision, NLP, and statistical modeling. I’ve built and deployed real-world AI systems—ranging from RAG-based news assistants to real-time fraud detection—using tools like TensorFlow, PyTorch, AWS, and Docker. I specialize in fine-tuning large language models, applying statistical methods, and implementing research papers from scratch to solve complex problems. I'm passionate about building smart, scalable solutions that make a real impact.
Here Are My
I specialize in transforming unstructured text into actionable insights using Natural Language Processing. From fine-tuning transformer-based models to deploying end-to-end NLP pipelines, I leverage tools like Python, PyTorch, TensorFlow, Hugging Face, LangChain, and OpenAI APIs to build smart, scalable language systems. Whether it’s text classification, generation, summarization, or semantic search—I bring a deep understanding of both language and learning to solve real-world problems.
Leverage data-driven decision making through rigorous statistical analysis, feature engineering, and machine learning to extract actionable insights and power smarter business strategies.
Unlock insights from unstructured text using cutting-edge NLP techniques—from sentiment analysis and summarization to question answering and LLM fine-tuning—to build systems that truly understand language.
Deploy scalable machine learning solutions using cloud platforms like AWS and GCP. Automate workflows with CI/CD pipelines, model monitoring, and real-time data integration for robust production-grade systems.
Containerize and deploy models seamlessly across environments using Docker. Ensure reproducibility, scalability, and faster iteration through efficient DevOps practices.
Drive smarter decisions with advanced data analytics—combining statistical modeling, predictive algorithms, and machine learning to uncover patterns, optimize performance, and deliver measurable impact.
Strong believer in cross-functional collaboration—working closely with data scientists, engineers, and domain experts to translate ideas into real-world solutions that deliver value.
Browse Through My
In an era where AI-generated content is becoming indistinguishable from human writing, I developed an autoencoder-based model to detect AI-generated text with 88% accuracy—a 66% improvement over GAN-based approaches. The system was built using NLP, deep learning, and generative models, with Docker for containerization and MLflow for experiment tracking.
Built an AI-powered news assistant using a Retrieval-Augmented Generation (RAG) system to fetch, summarize, and answer user queries on real-time news articles. Leveraged LangChain, OpenAI, ChromaDB, and BeautifulSoup to ensure accurate and dynamic topic retrieval. Achieved 85% summarization accuracy (ROUGE-1) and deployed optimized APIs for seamless real-time news exploration.
Developed an NLP-powered email rewriter that detects and transforms email tone with 93% accuracy. Fine-tuned BERT for tone classification (outperforming XGBoost by 32%) and optimized LLaMA with LoRA for efficient passive-aggressive style rewriting. Deployed on AWS Lambda with S3 storage, ensuring real-time scalability and cost-efficient performance monitoring via CloudWatch.
Built an AI-driven image captioning model that generates descriptive captions for images using Xception for feature extraction and an LSTM-based sequence model for text generation. Achieved a BLEU score of 0.35—a 40% improvement over the baseline. Deployed on AWS EC2 with Docker and integrated CI/CD pipelines via Jenkins for seamless automation.
Developed a machine learning model to predict customer churn with 94% accuracy, leveraging SQL, Python, and PySpark for data processing on 240K records. Automated the ETL pipeline (SQL → Python → AWS S3) with AWS Lambda, improving efficiency by 40%. Built an interactive Power BI dashboard to visualize churn trends, enabling data-driven decision-making.
Explore My