A comprehensive system for Kumaoni language translation, conversation, grammar analysis, and pattern recognition.
The Kumaoni Translator and Chatbot is a sophisticated language processing system designed to:
- Translate between Hinglish and Kumaoni with word-to-word mapping
- Converse naturally in Kumaoni language
- Analyze grammar patterns and rules in Kumaoni
- Recognize idioms, expressions, and speech patterns
- Learn new words, phrases, and grammar rules through interactive training
- Integrate with Ollama for easy deployment and use
KumaoniTrans/
├── README.md # This file
├── src/ # Source code
│ ├── __init__.py # Package initialization
│ ├── chatbot.py # Core chatbot functionality
│ ├── grammar_analyzer.py # Grammar analysis tools
│ ├── pattern_recognizer.py # Pattern recognition tools
│ ├── training_module.py # Interactive training interface
│ └── ollama_model.py # Ollama model integration
├── data/ # Data files
│ ├── vocab_mapping.json # Hinglish-Kumaoni word mappings
│ ├── phrases_mapping.json # Phrase mappings
│ ├── grammar_rules.json # Grammar rules
│ ├── idioms.json # Kumaoni idioms and expressions
│ └── data.json # Original dataset
├── models/ # Model files
│ └── gemma/ # Gemma model directory
└── scripts/ # Utility scripts
├── run_chatbot.sh # Run the chatbot
├── analyze_grammar.sh # Analyze grammar patterns
├── recognize_patterns.sh # Recognize patterns
├── train.sh # Run the training module
└── create_ollama_model.sh # Create an Ollama model
-
Clone the repository:
git clone https://github.com/yourusername/KumaoniTrans.git cd KumaoniTrans
-
Install the required dependencies:
pip install torch transformers datasets peft
-
(Optional) Install Ollama if you want to create an Ollama model:
# Follow instructions at https://ollama.ai/
./scripts/run_chatbot.sh
Options:
--language kumaoni|hinglish|mixed
: Set the language preference (default: mixed)--learn
: Enter learning mode to teach new words and phrases--stats
: Show statistics about the chatbot's knowledge--no-color
: Disable colored output
Once the chatbot is running, you can use the following commands:
translate: <text>
: Translate text between Hinglish and Kumaonilearn word: <hinglish> = <kumaoni>
: Teach a new wordlearn phrase: <hinglish> = <kumaoni>
: Teach a new phraselanguage: <kumaoni|hinglish|mixed>
: Set language preferenceexit
: Quit the chatbot
./scripts/analyze_grammar.sh
Options:
--patterns-only
: Only extract patterns, skip grammar analysis--grammar-only
: Only analyze grammar, skip pattern extraction
./scripts/recognize_patterns.sh
Options:
--text <text>
: Recognize patterns in the given text
./scripts/train.sh
Options:
--import <file>
: Import data from JSON file--export <file>
: Export data to JSON file--no-color
: Disable colored output
./scripts/create_ollama_model.sh
Options:
--base-model <model>
: Base Ollama model to use (default: llama2:7b)--model-name <name>
: Name for the Ollama model (default: kumaoni-chatbot)--description <desc>
: Description of the model--create
: Create the Ollama model after generating the Modelfile--package <dir>
: Package the model files for distribution
The system provides accurate word-to-word translation between Hinglish and Kumaoni, leveraging:
- Vocabulary mapping
- Phrase recognition
- Grammar rules application
- Context-aware translation
The chatbot can engage in natural conversations in Kumaoni, with features like:
- Intent recognition
- Context awareness
- Natural responses
- Language preference settings
The grammar analyzer extracts patterns and rules from the dataset, including:
- Verb endings
- Postpositions
- Pronouns
- Question words
- Sentence structures
The pattern recognizer identifies:
- Idioms and expressions
- Common collocations
- Functional phrases (greetings, farewells, etc.)
The interactive training interface allows users to:
- Add new words and phrases
- Define idioms and expressions
- Add grammar rules
- Import and export data
- Search the knowledge base
The system can be packaged as an Ollama model for easy deployment and use.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to all contributors to the Kumaoni language preservation efforts
- Special thanks to the Gemma model team for providing the base model