Training computers on 2D images and what can be found online is cheap and convenient, but it is not how a human learns and in my opinion not part of what will achieve AGI.
We need to emulate a human as much as possible, with an android robot, and let it try/fail/learn in the real world, to achieve what humans do. In 3D, like humans do.
Uniformity and Mass Adoption
There needs to be a lot of robots, because the more we have, the quicker they can learn collectively. 10,000 is a starting guess.
For the SMAL described below, the robots need to be highly consistent, physically, for the whole duration, possibly decades. That means waiting until robots of the required specifications can be made in bulk (battery and fine mechanics will be key), making a lot of them, and never upgrading them if that at all changes their physical dimensions, including weight and weight distribution.
Hive Mind
Having 10,000 robots means that in one day of 10,000 attempt the same task, but in different environments, then they will collectively have enough knowledge to nail it in the future.
They will share knowledge, especially of objects they encounter in the world, and how those objects (including machines and people) tend to operate.
Training
Aside from some fundamentals which can be hard-coded, the robots begin their training from small children who are at age where they can talk and play and teach. Perhaps starting at age 3. The robots will operate with a vocabulary that fits who they are interacting with.
At age 3, all the robots will do is play with the child, with everything led by the child. Not dissimilar to how a child plays with toys and dolls, except they can tell the robot what to do, and it will try.
The robot will also observe and attempt to mimic what the child does, and the child knows that is what is happening, that it is trying to learn to be a person. After attempting to mimic something, the child will tell the robot if it succeeded or not, and what it got right or wrong, especially the latter.
SMAL
Once the attempt to mimic is completed, the robot will store the details. This will require a custom-made programming language, which stores details of the environment, what it observed, and the actions taken when attempting to mimic them.
Scaled Memory of Actions Language (SMAL) is called scaled because scaling things is easier, more efficient, and easier for AI to work with. For example, the speed the robot moved at can be described as part of a scale from 1 to 10, where say 4 is anywhere between 3 and 4 kms per hour. Things like lighting, time of day, how crowded it was, how it thinks it was feeling (pressured, for example), and the relative distances of all the relevant objects are from it, can be captured in scale form.
I moved at speed 3, in direction 22, my arms were in position 4 and 8, visibility was 6, I felt a 2 of pressure to perform, and the cat was a distance of 7 away. After 4 seconds I was closer to the cat which was at distance 6 now.
After the attempt, the child or instructor will tell them what was a key factor in what they did wrong. AI can later spot factors that might not have been explained, like failure happening more in poor visibility.
Enter AI
Then we add AI, similar to the chatbots of 2023, to the mix. That type of programming is good for coming up with an averaged, approximate response based on thousands of reports, separated into success and failure. Once learned sufficiently, a robot should be able to achieve a task, according to the environment, based on the collective past efforts.
Types of learning
It’s unlimited, but primarily will be the same as what children do and learn every day. They move around and interact, try their best, fail sometimes, and learn.
Graduating
After spending sufficient time, collectively at one year of childhood (years are approximate, children mature at different rates, one year increments should be fine), they all move to the next year, with the same child or someone new.
Once they reach adulthood, half of the robots are no longer needed. They know enough about the 3D aspects of the world to switch in AR, and be built into AR spectacles. Those ones observe and ask questions. Who is that? Why did you respond like that? There will still need to be many robots that are physical, to properly interact with the world, but at that stage they can take any form, they no longer need to be all the same dimensions. They will still need to learn some physicality, primarily things like hugs and shaking hands, and all the subtleties within.